Message ID | 20240819103616.2260006-3-leitao@debian.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | netconsole: Populate dynamic entry even if netpoll fails | expand |
On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote: > netpoll_setup() can fail in several ways, some of which print an error > message, while others simply return without any message. For example, > __netpoll_setup() returns in a few places without printing anything. > > To address this issue, modify the code to print an error message on > netconsole if the target is not enabled. This will help us identify and > troubleshoot netcnsole issues related to netpoll setup failures > more easily. Only if memory allocation fails, it seems, and memory allocation failures with GFP_KERNEL will be quite noisy. BTW I looked thru 4 random implementations of ndo_netpoll_setup and they look almost identical :S Perhaps they can be refactored?
On Tue, Aug 20, 2024 at 04:24:09PM -0700, Jakub Kicinski wrote: > On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote: > > netpoll_setup() can fail in several ways, some of which print an error > > message, while others simply return without any message. For example, > > __netpoll_setup() returns in a few places without printing anything. > > > > To address this issue, modify the code to print an error message on > > netconsole if the target is not enabled. This will help us identify and > > troubleshoot netcnsole issues related to netpoll setup failures > > more easily. > > Only if memory allocation fails, it seems, and memory allocation > failures with GFP_KERNEL will be quite noisy. Or anything that fails in ->ndo_netpoll_setup() and doesn't print anything else. Do you think this is useless? > BTW I looked thru 4 random implementations of ndo_netpoll_setup > and they look almost identical :S Perhaps they can be refactored? correct. This should be refactored. In fact, since you opened this topic, there are a few things that also come to my mind 1) Possible reduce refill_skb() work in the critical path (UDP send path), moving it to a workqueue? When sending a message, netpoll tries fill the whole skb poll, and then try to allocate a new skb before sending the packet. netconsole needs to write a message, which calls netpoll_send_udp() send_ext_msg_udp() { netpoll_send_udp() { refill_skbs() { while (skb_pool.qlen < MAX_SKBS) { skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC); } } skb = alloc_skb(len, GFP_ATOMIC); if (!skb) skb = skb_dequeue(&skb_pool); } } } Would it be better if the hot path just get one of the skbs from the pool, and refill it in a workqueue? If the skb_poll() is empty, then alloc_skb(len, GFP_ATOMIC) !? 2) Report statistic back from netpoll_send_udp(). netpoll_send_skb() return values are being discarded, so, it is hard to know if the packet was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY, NETDEV_TX_OK. It is unclear where this should be reported two. Maybe a configfs entry?
On Wed, 21 Aug 2024 01:41:55 -0700 Breno Leitao wrote: > On Tue, Aug 20, 2024 at 04:24:09PM -0700, Jakub Kicinski wrote: > > On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote: > > > netpoll_setup() can fail in several ways, some of which print an error > > > message, while others simply return without any message. For example, > > > __netpoll_setup() returns in a few places without printing anything. > > > > > > To address this issue, modify the code to print an error message on > > > netconsole if the target is not enabled. This will help us identify and > > > troubleshoot netcnsole issues related to netpoll setup failures > > > more easily. > > > > Only if memory allocation fails, it seems, and memory allocation > > failures with GFP_KERNEL will be quite noisy. > > Or anything that fails in ->ndo_netpoll_setup() and doesn't print > anything else. Which also only fails because of memory allocation AFAICT. > Do you think this is useless? I think it's better to push up more precise message into the fail sites. > > BTW I looked thru 4 random implementations of ndo_netpoll_setup > > and they look almost identical :S Perhaps they can be refactored? > > correct. This should be refactored. > > In fact, since you opened this topic, there are a few things that also > come to my mind > > 1) Possible reduce refill_skb() work in the critical path (UDP send > path), moving it to a workqueue? > > When sending a message, netpoll tries fill the whole skb poll, and then try to > allocate a new skb before sending the packet. > > netconsole needs to write a message, which calls netpoll_send_udp() > > send_ext_msg_udp() { > netpoll_send_udp() { > refill_skbs() { > while (skb_pool.qlen < MAX_SKBS) { > skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC); > } > } > skb = alloc_skb(len, GFP_ATOMIC); > if (!skb) > skb = skb_dequeue(&skb_pool); > } > } > } > > Would it be better if the hot path just get one of the skbs from the > pool, and refill it in a workqueue? If the skb_poll() is empty, then > alloc_skb(len, GFP_ATOMIC) !? Yeah, that seems a bit odd. If you can't find anything in the history that would explain this design - refactoring SG. > 2) Report statistic back from netpoll_send_udp(). netpoll_send_skb() > return values are being discarded, so, it is hard to know if the packet > was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY, > NETDEV_TX_OK. > > It is unclear where this should be reported two. Maybe a configfs entry? Also sounds good. We don't use configfs much in networking so IDK if it's okay to use it for stats. But no other obviously better place comes to mind for me.
Hello Jakub, On Wed, Aug 21, 2024 at 03:54:04PM -0700, Jakub Kicinski wrote: > On Wed, 21 Aug 2024 01:41:55 -0700 Breno Leitao wrote: > > Do you think this is useless? > > I think it's better to push up more precise message into the fail sites. Makese sense, I will remove it, and add the failing message once we refactor ndo_netpoll_setup() callbacks. > > Would it be better if the hot path just get one of the skbs from the > > pool, and refill it in a workqueue? If the skb_poll() is empty, then > > alloc_skb(len, GFP_ATOMIC) !? > > Yeah, that seems a bit odd. If you can't find anything in the history > that would explain this design - refactoring SG. Thanks. I will add it to my todo list. > > 2) Report statistic back from netpoll_send_udp(). netpoll_send_skb() > > return values are being discarded, so, it is hard to know if the packet > > was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY, > > NETDEV_TX_OK. > > > > It is unclear where this should be reported two. Maybe a configfs entry? > > Also sounds good. We don't use configfs much in networking so IDK if > it's okay to use it for stats. But no other obviously better place > comes to mind for me. Exactly, configfs seems a bit weird, but, at the same time, I don't have a better idea. Let me send a patch for this one, and we can continue the discussion over there. Thanks --breno
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 72384c1ecc5c..9b5f605fe87a 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -414,8 +414,10 @@ static ssize_t enabled_store(struct config_item *item, netpoll_print_options(&nt->np); ret = netpoll_setup(&nt->np); - if (ret) + if (ret) { + pr_err("Not enabling netconsole. Netpoll setup failed\n"); goto out_unlock; + } nt->enabled = true; pr_info("network logging started\n");
netpoll_setup() can fail in several ways, some of which print an error message, while others simply return without any message. For example, __netpoll_setup() returns in a few places without printing anything. To address this issue, modify the code to print an error message on netconsole if the target is not enabled. This will help us identify and troubleshoot netcnsole issues related to netpoll setup failures more easily. Signed-off-by: Breno Leitao <leitao@debian.org> --- drivers/net/netconsole.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)