Message ID | 20240824215130.2134153-2-max@kutsevol.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/2] netpoll: Make netpoll_send_udp return status instead of void | expand |
On Sat, 24 Aug 2024 14:50:24 -0700 Maksym Kutsevol wrote: > Enhance observability of netconsole. UDP sends can fail. Start tracking at nit: "UDP sends" sounds a bit too much like it's using sockets maybe "packet sends"? > least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target. > Stats are exposed via an additional attribute in CONFIGFS. Please provide a reference to another configfs user in the kernel which exposes stats. To help reviewers validate it's a legit use case. > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > +struct netconsole_target_stats { > + size_t xmit_drop_count; > + size_t enomem_count; > +}; > +#endif Don't hide types under ifdefs In fact I'm not sure if hiding stats if DYNAMIC isn't enabled makes sense. They don't take up much space. > +static ssize_t stats_show(struct config_item *item, char *buf) > +{ > + struct netconsole_target *nt = to_target(item); > + > + return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", > + nt->stats.xmit_drop_count, nt->stats.enomem_count); does configfs require value per file like sysfs or this is okay? > /** > * send_ext_msg_udp - send extended log message to target > * @nt: target to send message to > @@ -1063,7 +1102,9 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, > "%s", userdata); > > msg_ready = buf; > - netpoll_send_udp(&nt->np, msg_ready, msg_len); > + count_udp_send_stats(nt, netpoll_send_udp(&nt->np, > + msg_ready, > + msg_len)); Please add a wrapper which calls netpoll_send_udp() and counts the stats. This sort of nested function calls are unlikely to meet kernel coding requirements.
Hey Jakub, thank you for your time looking into this. PS. Sorry for the html message, noob mistake. On Mon, Aug 26, 2024 at 5:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 24 Aug 2024 14:50:24 -0700 Maksym Kutsevol wrote: > > Enhance observability of netconsole. UDP sends can fail. Start tracking at > > nit: "UDP sends" sounds a bit too much like it's using sockets > maybe "packet sends"? Sure, it makes sense, I will update it. > > > least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target. > > Stats are exposed via an additional attribute in CONFIGFS. > > Please provide a reference to another configfs user in the kernel which > exposes stats. To help reviewers validate it's a legit use case. > doc/Documentation/block/stat.txt describes what stats block devices expose. The idea is the same there - a single read gives a coherent snapshot of stats. Did you mean that the commit message needs updating or info provided here is enough? > > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > > +struct netconsole_target_stats { > > + size_t xmit_drop_count; > > + size_t enomem_count; > > +}; > > +#endif > > Don't hide types under ifdefs > In fact I'm not sure if hiding stats if DYNAMIC isn't enabled makes > sense. They don't take up much space. > I'll remove the ifdef. Without _DYNAMIC it will not create the sysfs config group, so there will be no place to expose the stats from, hence the attachment to dynamic config. The reason to hide the type comes from the same idea. It's not about saving space, but about the fact that it can't exist without it the way it's currently implemented. There's no way to expose those metrics if _DYNAMIC is not enabled. > > +static ssize_t stats_show(struct config_item *item, char *buf) > > +{ > > + struct netconsole_target *nt = to_target(item); > > + > > + return > > + nt->stats.xmit_drop_count, nt->stats.enomem_count); > > does configfs require value per file like sysfs or this is okay? Docs say (Documentation/filesystems/sysfs.txt): Attributes should be ASCII text files, preferably with only one value per file. It is noted that it may not be efficient to contain only one value per file, so it is socially acceptable to express an array of values of the same type. Given those are of the same type, I thought it's ok. To make it less "fancy" maybe move to just values separated by whitespace + a block in Documentation/networking/netconsole.rst describing the format? E.g. sysfs_emit(buf, "%lu %lu\n", .....) ? I really don't want to have multiple files for it. What do you think? > > > > /** > > * send_ext_msg_udp - send extended log message to target > > * @nt: target to send message to > > @@ -1063,7 +1102,9 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, > > "%s", userdata); > > > > msg_ready = buf; > > - netpoll_send_udp(&nt->np, msg_ready, msg_len); > > + count_udp_send_stats(nt, netpoll_send_udp(&nt->np, > > + msg_ready, > > + msg_len)); > > Please add a wrapper which calls netpoll_send_udp() and counts the > stats. This sort of nested function calls are unlikely to meet kernel > coding requirements. I see, will do. Noob question - I couldn't find any written guidance regarding this, if you have it handy - I'd appreciate a link to some guidance regarding passing the result of a function to a classifier vs wrapper function. > -- > pw-bot: cr
Hi Maksym, kernel test robot noticed the following build warnings: [auto build test WARNING on 8af174ea863c72f25ce31cee3baad8a301c0cf0f] url: https://github.com/intel-lab-lkp/linux/commits/Maksym-Kutsevol/netcons-Add-udp-send-fail-statistics-to-netconsole/20240826-163850 base: 8af174ea863c72f25ce31cee3baad8a301c0cf0f patch link: https://lore.kernel.org/r/20240824215130.2134153-2-max%40kutsevol.com patch subject: [PATCH 2/2] netcons: Add udp send fail statistics to netconsole config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20240827/202408271419.3JJwwxE7-lkp@intel.com/config) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 08e5a1de8227512d4774a534b91cb2353cef6284) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408271419.3JJwwxE7-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408271419.3JJwwxE7-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from drivers/net/netconsole.c:27: In file included from include/linux/mm.h:2228: include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 517 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from drivers/net/netconsole.c:35: In file included from include/linux/netpoll.h:11: In file included from include/linux/netdevice.h:38: In file included from include/net/net_namespace.h:43: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/hexagon/include/asm/io.h:328: include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 548 | val = __raw_readb(PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 561 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' 37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) | ^ In file included from drivers/net/netconsole.c:35: In file included from include/linux/netpoll.h:11: In file included from include/linux/netdevice.h:38: In file included from include/net/net_namespace.h:43: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/hexagon/include/asm/io.h:328: include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 574 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' 35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) | ^ In file included from drivers/net/netconsole.c:35: In file included from include/linux/netpoll.h:11: In file included from include/linux/netdevice.h:38: In file included from include/net/net_namespace.h:43: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/hexagon/include/asm/io.h:328: include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 585 | __raw_writeb(value, PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 595 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 605 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ >> drivers/net/netconsole.c:341:3: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 340 | return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", | ~~~ | %zu 341 | nt->stats.xmit_drop_count, nt->stats.enomem_count); | ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/netconsole.c:341:30: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 340 | return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", | ~~~ | %zu 341 | nt->stats.xmit_drop_count, nt->stats.enomem_count); | ^~~~~~~~~~~~~~~~~~~~~~ 9 warnings generated. vim +341 drivers/net/netconsole.c 335 336 static ssize_t stats_show(struct config_item *item, char *buf) 337 { 338 struct netconsole_target *nt = to_target(item); 339 340 return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", > 341 nt->stats.xmit_drop_count, nt->stats.enomem_count); 342 } 343
Hi Maksym, kernel test robot noticed the following build warnings: [auto build test WARNING on 8af174ea863c72f25ce31cee3baad8a301c0cf0f] url: https://github.com/intel-lab-lkp/linux/commits/Maksym-Kutsevol/netcons-Add-udp-send-fail-statistics-to-netconsole/20240826-163850 base: 8af174ea863c72f25ce31cee3baad8a301c0cf0f patch link: https://lore.kernel.org/r/20240824215130.2134153-2-max%40kutsevol.com patch subject: [PATCH 2/2] netcons: Add udp send fail statistics to netconsole config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20240827/202408271711.RhzKTDRD-lkp@intel.com/config) compiler: arceb-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408271711.RhzKTDRD-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408271711.RhzKTDRD-lkp@intel.com/ All warnings (new ones prefixed by >>): drivers/net/netconsole.c: In function 'stats_show': >> drivers/net/netconsole.c:340:46: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=] 340 | return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", | ~~^ | | | long unsigned int | %u 341 | nt->stats.xmit_drop_count, nt->stats.enomem_count); | ~~~~~~~~~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} drivers/net/netconsole.c:340:58: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=] 340 | return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", | ~~^ | | | long unsigned int | %u 341 | nt->stats.xmit_drop_count, nt->stats.enomem_count); | ~~~~~~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} vim +340 drivers/net/netconsole.c 335 336 static ssize_t stats_show(struct config_item *item, char *buf) 337 { 338 struct netconsole_target *nt = to_target(item); 339 > 340 return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", 341 nt->stats.xmit_drop_count, nt->stats.enomem_count); 342 } 343
On Sat, Aug 24, 2024 at 02:50:24PM -0700, Maksym Kutsevol wrote: > Enhance observability of netconsole. UDP sends can fail. Start tracking at > least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target. > Stats are exposed via an additional attribute in CONFIGFS. > > The exposed statistics allows easier debugging of cases when netconsole > messages were not seen by receivers, eliminating the guesswork if the > sender thinks that messages in question were sent out. > > Stats are not reset on enable/disable/change remote ip/etc, they > belong to the netcons target itself. > > Signed-off-by: Maksym Kutsevol <max@kutsevol.com> Would you mind adding a "Reported-by" me in this case? https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/ > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c > index 9c09293b5258..45c07ec7842d 100644 > --- a/drivers/net/netconsole.c > +++ b/drivers/net/netconsole.c > @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock); > */ > static struct console netconsole_ext; > > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > +struct netconsole_target_stats { > + size_t xmit_drop_count; > + size_t enomem_count; I am looking at other drivers, and they use a specific type for these counters, u64_stats_sync. if it is possible to use this format, then you can leverage the `__u64_stats_update` helpers, and not worry about locking/overflow!? > @@ -1015,6 +1035,25 @@ static struct notifier_block netconsole_netdev_notifier = { > .notifier_call = netconsole_netdev_event, > }; > > +/** > + * count_udp_send_stats - Classify netpoll_send_udp result and count errors. > + * @nt: target that was sent to > + * @result: result of netpoll_send_udp > + * > + * Takes the result of netpoll_send_udp and classifies the type of error that > + * occurred. Increments statistics in nt->stats accordingly. > + */ > +static void count_udp_send_stats(struct netconsole_target *nt, int result) > +{ > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > + if (result == NET_XMIT_DROP) { > + nt->stats.xmit_drop_count++; If you change the type, you can use the `u64_stats_inc` helper here. > @@ -1126,7 +1167,11 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, > this_offset += this_chunk; > } > > - netpoll_send_udp(&nt->np, buf, this_header + this_offset); > + count_udp_send_stats(nt, > + netpoll_send_udp(&nt->np, > + buf, > + this_header + this_offset) > + ); as Jakub said, this is not a format that is common in the Linux kenrel.
On Mon, 26 Aug 2024 19:55:36 -0400 Maksym Kutsevol wrote: > > > +static ssize_t stats_show(struct config_item *item, char *buf) > > > +{ > > > + struct netconsole_target *nt = to_target(item); > > > + > > > + return > > > + nt->stats.xmit_drop_count, nt->stats.enomem_count); > > > > does configfs require value per file like sysfs or this is okay? > > Docs say (Documentation/filesystems/sysfs.txt): > > Attributes should be ASCII text files, preferably with only one value > per file. It is noted that it may not be efficient to contain only one > value per file, so it is socially acceptable to express an array of > values of the same type. Right, but this is for sysfs, main question is whether configfs has the same expectations. > Given those are of the same type, I thought it's ok. To make it less > "fancy" maybe move to > just values separated by whitespace + a block in > Documentation/networking/netconsole.rst describing the format? > E.g. sysfs_emit(buf, "%lu %lu\n", .....) ? I really don't want to have > multiple files for it. > What do you think? Stats as an array are quite hard to read / understand
Hey Jakub, thanks for looking into this. PS. A couple more email send mistakes and I'll go install mutt, sorry for the noise :) On Tue, Aug 27, 2024 at 9:59 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 26 Aug 2024 19:55:36 -0400 Maksym Kutsevol wrote: > > > > +static ssize_t stats_show(struct config_item *item, char *buf) > > > > +{ > > > > + struct netconsole_target *nt = to_target(item); > > > > + > > > > + return > > > > + nt->stats.xmit_drop_count, nt->stats.enomem_count); > > > > > > does configfs require value per file like sysfs or this is okay? > > > > Docs say (Documentation/filesystems/sysfs.txt): > > > > Attributes should be ASCII text files, preferably with only one value > > per file. It is noted that it may not be efficient to contain only one > > value per file, so it is socially acceptable to express an array of > > values of the same type. > > Right, but this is for sysfs, main question is whether configfs has > the same expectations. Eh, my bad, thank you :) Docs on configfs (Documentation/filesystems/configfs.rst) say approximately the same, quote: * Normal attributes, which similar to sysfs attributes, are small ASCII text files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably only one value per file should be used, and the same caveats from sysfs apply. Configfs expects write(2) to store the entire buffer at once. When writing to normal configfs attributes, userspace processes should first read the entire file, modify the portions they wish to change, and then write the entire buffer back. so based on sysfs+configfs docs it looks ok to do so. What do you think? Regarding the overall idea of exposing stats via configfs I found this: https://github.com/torvalds/linux/blob/master/drivers/target/iscsi/iscsi_target_stat.c#L82-L87 as an example of another place doing it, which exposes the number of active sessions. > > Given those are of the same type, I thought it's ok. To make it less > > "fancy" maybe move to > > just values separated by whitespace + a block in > > Documentation/networking/netconsole.rst describing the format? > > E.g. sysfs_emit(buf, "%lu %lu\n", .....) ? I really don't want to have > > multiple files for it. > > What do you think? > > Stats as an array are quite hard to read / understand I agree with that. I couldn't find examples of multiple values exported as stats from configfs. Only from sysfs, e.g. https://www.kernel.org/doc/Documentation/block/stat.txt, which describes a whitespace separated file with stats. I want to lean on the opinion of someone more experienced in kernel dev on how to proceed here. - as is - whitespace separated like blockdev stats - multiple files and stop talking about it? :)
Hey Breno, thanks for looking at it again :) On Tue, Aug 27, 2024 at 8:18 AM Breno Leitao <leitao@debian.org> wrote: > > On Sat, Aug 24, 2024 at 02:50:24PM -0700, Maksym Kutsevol wrote: > > Enhance observability of netconsole. UDP sends can fail. Start tracking at > > least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target. > > Stats are exposed via an additional attribute in CONFIGFS. > > > > The exposed statistics allows easier debugging of cases when netconsole > > messages were not seen by receivers, eliminating the guesswork if the > > sender thinks that messages in question were sent out. > > > > Stats are not reset on enable/disable/change remote ip/etc, they > > belong to the netcons target itself. > > > > Signed-off-by: Maksym Kutsevol <max@kutsevol.com> > > Would you mind adding a "Reported-by" me in this case? > > https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/ Absolutely! > > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c > > index 9c09293b5258..45c07ec7842d 100644 > > --- a/drivers/net/netconsole.c > > +++ b/drivers/net/netconsole.c > > @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock); > > */ > > static struct console netconsole_ext; > > > > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > > +struct netconsole_target_stats { > > + size_t xmit_drop_count; > > + size_t enomem_count; > > I am looking at other drivers, and they use a specific type for these > counters, u64_stats_sync. > if it is possible to use this format, then you can leverage the > `__u64_stats_update` helpers, and not worry about locking/overflow!? > Do you think that these counters really need more than an int? Switching them to unsigned int instead might be better? I'd argue that at the point when an external system collection interval is not short enough to see the unsigned counter going to a lesser value (counters are unsigned, they go to 0 at UINT_MAX+1). I need advice/pointer on locking - I'm looking at it and it looks to me as if there's no locking needed when updating a member of nt there. Tell me if I'm wrong. > > @@ -1015,6 +1035,25 @@ static struct notifier_block netconsole_netdev_notifier = { > > .notifier_call = netconsole_netdev_event, > > }; > > > > +/** > > + * count_udp_send_stats - Classify netpoll_send_udp result and count errors. > > + * @nt: target that was sent to > > + * @result: result of netpoll_send_udp > > + * > > + * Takes the result of netpoll_send_udp and classifies the type of error that > > + * occurred. Increments statistics in nt->stats accordingly. > > + */ > > +static void count_udp_send_stats(struct netconsole_target *nt, int result) > > +{ > > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > > + if (result == NET_XMIT_DROP) { > > + nt->stats.xmit_drop_count++; > > If you change the type, you can use the `u64_stats_inc` helper here. ok > > @@ -1126,7 +1167,11 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, > > this_offset += this_chunk; > > } > > > > - netpoll_send_udp(&nt->np, buf, this_header + this_offset); > > + count_udp_send_stats(nt, > > + netpoll_send_udp(&nt->np, > > + buf, > > + this_header + this_offset) > > + ); > > as Jakub said, this is not a format that is common in the Linux kenrel. ok
Hello Maksym, On Wed, Aug 28, 2024 at 11:03:09AM -0400, Maksym Kutsevol wrote: > > Stats as an array are quite hard to read / understand > I agree with that. > I couldn't find examples of multiple values exported as stats from > configfs. Only from sysfs, > e.g. https://www.kernel.org/doc/Documentation/block/stat.txt, which > describes a whitespace > separated file with stats. > > I want to lean on the opinion of someone more experienced in kernel > dev on how to proceed here. > - as is > - whitespace separated like blockdev stats > - multiple files and stop talking about it? :) Do we really need both stats/numbers here? Would it be easier if we just have a "dropped_packet" counter that count packets that netconsole dropped for one reason or another? I don't see statistics for lack of memory in regular netdev interfaces. We probably don't need it for netconsole.
Hey Breno, thanks for looking at it. On Wed, Aug 28, 2024 at 11:19 AM Breno Leitao <leitao@debian.org> wrote: > > Hello Maksym, > > On Wed, Aug 28, 2024 at 10:26:20AM -0400, Maksym Kutsevol wrote: > > > > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c > > > > index 9c09293b5258..45c07ec7842d 100644 > > > > --- a/drivers/net/netconsole.c > > > > +++ b/drivers/net/netconsole.c > > > > @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock); > > > > */ > > > > static struct console netconsole_ext; > > > > > > > > +#ifdef CONFIG_NETCONSOLE_DYNAMIC > > > > +struct netconsole_target_stats { > > > > + size_t xmit_drop_count; > > > > + size_t enomem_count; > > > > > > I am looking at other drivers, and they use a specific type for these > > > counters, u64_stats_sync. > > > if it is possible to use this format, then you can leverage the > > > `__u64_stats_update` helpers, and not worry about locking/overflow!? > > > > > Do you think that these counters really need more than an int? > > An int can overflow and become negative, so, you will see negative > values, right? It's an unsigned int (maybe not an int on every platform, but still unsigned), it will become 0, not negative. E.g. https://www.programiz.com/online-compiler/3qbkayylX5Cmf > > Switching them to unsigned int instead might be better? > > Why not `u64_stats_sync` ? > Only because it's smaller and does the job. Unless locking is needed. > > I'd argue that at the point when an external system collection > > interval is not short enough > > to see the unsigned counter going to a lesser value (counters are > > unsigned, they go to 0 at UINT_MAX+1). > > I need advice/pointer on locking - I'm looking at it and it looks to > > me as if there's no locking needed when > > updating a member of nt there. Tell me if I'm wrong. > > well, they are updated while holding `target_list_lock` right? But I > would not rely on it. Then I'll update to use them instead. > If you just convert the values to u64_stats_sync, you get the > synchronization for free, basically doing the following: > > u64_stats_update_begin() > u64_stats_inc() > u64_stats_update_end() > > Thanks for working on it, > --breno Absolutely, my pleasure :)
Hey Breno, On Wed, Aug 28, 2024 at 11:25 AM Breno Leitao <leitao@debian.org> wrote: > > Hello Maksym, > > On Wed, Aug 28, 2024 at 11:03:09AM -0400, Maksym Kutsevol wrote: > > > > Stats as an array are quite hard to read / understand > > > I agree with that. > > I couldn't find examples of multiple values exported as stats from > > configfs. Only from sysfs, > > e.g. https://www.kernel.org/doc/Documentation/block/stat.txt, which > > describes a whitespace > > separated file with stats. > > > > I want to lean on the opinion of someone more experienced in kernel > > dev on how to proceed here. > > - as is > > - whitespace separated like blockdev stats > > - multiple files and stop talking about it? :) > > Do we really need both stats/numbers here? Would it be easier if we just have > a "dropped_packet" counter that count packets that netconsole dropped > for one reason or another? > > I don't see statistics for lack of memory in regular netdev interfaces. > We probably don't need it for netconsole. I like this idea! I'll send a V2 with only one attribute. Thanks, Breno!
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst index d55c2a22ec7a..733d4a93878e 100644 --- a/Documentation/networking/netconsole.rst +++ b/Documentation/networking/netconsole.rst @@ -135,6 +135,7 @@ The interface exposes these parameters of a netconsole target to userspace: remote_ip Remote agent's IP address (read-write) local_mac Local interface's MAC address (read-only) remote_mac Remote agent's MAC address (read-write) + stats Send error stats (read-only) ============== ================================= ============ The "enabled" attribute is also used to control whether the parameters of diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 9c09293b5258..45c07ec7842d 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -82,6 +82,13 @@ static DEFINE_SPINLOCK(target_list_lock); */ static struct console netconsole_ext; +#ifdef CONFIG_NETCONSOLE_DYNAMIC +struct netconsole_target_stats { + size_t xmit_drop_count; + size_t enomem_count; +}; +#endif + /** * struct netconsole_target - Represents a configured netconsole target. * @list: Links this target into the target_list. @@ -89,6 +96,7 @@ static struct console netconsole_ext; * @userdata_group: Links to the userdata configfs hierarchy * @userdata_complete: Cached, formatted string of append * @userdata_length: String length of userdata_complete + * @stats: UDP send stats for the target. Used for debugging. * @enabled: On / off knob to enable / disable target. * Visible from userspace (read-write). * We maintain a strict 1:1 correspondence between this and @@ -115,6 +123,7 @@ struct netconsole_target { struct config_group userdata_group; char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS]; size_t userdata_length; + struct netconsole_target_stats stats; #endif bool enabled; bool extended; @@ -227,6 +236,7 @@ static struct netconsole_target *alloc_and_init(void) * | remote_ip * | local_mac * | remote_mac + * | stats * | userdata/ * | <key>/ * | value @@ -323,6 +333,14 @@ static ssize_t remote_mac_show(struct config_item *item, char *buf) return sysfs_emit(buf, "%pM\n", to_target(item)->np.remote_mac); } +static ssize_t stats_show(struct config_item *item, char *buf) +{ + struct netconsole_target *nt = to_target(item); + + return sysfs_emit(buf, "xmit_drop: %lu enomem: %lu\n", + nt->stats.xmit_drop_count, nt->stats.enomem_count); +} + /* * This one is special -- targets created through the configfs interface * are not enabled (and the corresponding netpoll activated) by default. @@ -795,6 +813,7 @@ CONFIGFS_ATTR(, remote_ip); CONFIGFS_ATTR_RO(, local_mac); CONFIGFS_ATTR(, remote_mac); CONFIGFS_ATTR(, release); +CONFIGFS_ATTR_RO(, stats); static struct configfs_attribute *netconsole_target_attrs[] = { &attr_enabled, @@ -807,6 +826,7 @@ static struct configfs_attribute *netconsole_target_attrs[] = { &attr_remote_ip, &attr_local_mac, &attr_remote_mac, + &attr_stats, NULL, }; @@ -1015,6 +1035,25 @@ static struct notifier_block netconsole_netdev_notifier = { .notifier_call = netconsole_netdev_event, }; +/** + * count_udp_send_stats - Classify netpoll_send_udp result and count errors. + * @nt: target that was sent to + * @result: result of netpoll_send_udp + * + * Takes the result of netpoll_send_udp and classifies the type of error that + * occurred. Increments statistics in nt->stats accordingly. + */ +static void count_udp_send_stats(struct netconsole_target *nt, int result) +{ +#ifdef CONFIG_NETCONSOLE_DYNAMIC + if (result == NET_XMIT_DROP) { + nt->stats.xmit_drop_count++; + } else if (result == -ENOMEM) { + nt->stats.enomem_count++; + }; +#endif +} + /** * send_ext_msg_udp - send extended log message to target * @nt: target to send message to @@ -1063,7 +1102,9 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, "%s", userdata); msg_ready = buf; - netpoll_send_udp(&nt->np, msg_ready, msg_len); + count_udp_send_stats(nt, netpoll_send_udp(&nt->np, + msg_ready, + msg_len)); return; } @@ -1126,7 +1167,11 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, this_offset += this_chunk; } - netpoll_send_udp(&nt->np, buf, this_header + this_offset); + count_udp_send_stats(nt, + netpoll_send_udp(&nt->np, + buf, + this_header + this_offset) + ); offset += this_offset; } } @@ -1172,7 +1217,10 @@ static void write_msg(struct console *con, const char *msg, unsigned int len) tmp = msg; for (left = len; left;) { frag = min(left, MAX_PRINT_CHUNK); - netpoll_send_udp(&nt->np, tmp, frag); + int send_result = netpoll_send_udp(&nt->np, + tmp, + frag); + count_udp_send_stats(nt, send_result); tmp += frag; left -= frag; }
Enhance observability of netconsole. UDP sends can fail. Start tracking at least two failure possibilities: ENOMEM and NET_XMIT_DROP for every target. Stats are exposed via an additional attribute in CONFIGFS. The exposed statistics allows easier debugging of cases when netconsole messages were not seen by receivers, eliminating the guesswork if the sender thinks that messages in question were sent out. Stats are not reset on enable/disable/change remote ip/etc, they belong to the netcons target itself. Signed-off-by: Maksym Kutsevol <max@kutsevol.com> --- Documentation/networking/netconsole.rst | 1 + drivers/net/netconsole.c | 54 +++++++++++++++++++++++-- 2 files changed, 52 insertions(+), 3 deletions(-)