diff mbox series

[net-next,v2,2/3] netconsole: pr_err() when netpoll_setup fails

Message ID 20240819103616.2260006-3-leitao@debian.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series netconsole: Populate dynamic entry even if netpoll fails | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-20--09-00 (tests: 712)

Commit Message

Breno Leitao Aug. 19, 2024, 10:36 a.m. UTC
netpoll_setup() can fail in several ways, some of which print an error
message, while others simply return without any message. For example,
__netpoll_setup() returns in a few places without printing anything.

To address this issue, modify the code to print an error message on
netconsole if the target is not enabled. This will help us identify and
troubleshoot netcnsole issues related to netpoll setup failures
more easily.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/netconsole.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski Aug. 20, 2024, 11:24 p.m. UTC | #1
On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote:
> netpoll_setup() can fail in several ways, some of which print an error
> message, while others simply return without any message. For example,
> __netpoll_setup() returns in a few places without printing anything.
> 
> To address this issue, modify the code to print an error message on
> netconsole if the target is not enabled. This will help us identify and
> troubleshoot netcnsole issues related to netpoll setup failures
> more easily.

Only if memory allocation fails, it seems, and memory allocation
failures with GFP_KERNEL will be quite noisy.

BTW I looked thru 4 random implementations of ndo_netpoll_setup
and they look almost identical :S Perhaps they can be refactored?
Breno Leitao Aug. 21, 2024, 8:41 a.m. UTC | #2
On Tue, Aug 20, 2024 at 04:24:09PM -0700, Jakub Kicinski wrote:
> On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote:
> > netpoll_setup() can fail in several ways, some of which print an error
> > message, while others simply return without any message. For example,
> > __netpoll_setup() returns in a few places without printing anything.
> > 
> > To address this issue, modify the code to print an error message on
> > netconsole if the target is not enabled. This will help us identify and
> > troubleshoot netcnsole issues related to netpoll setup failures
> > more easily.
> 
> Only if memory allocation fails, it seems, and memory allocation
> failures with GFP_KERNEL will be quite noisy.

Or anything that fails in ->ndo_netpoll_setup() and doesn't print
anything else.

Do you think this is useless?

> BTW I looked thru 4 random implementations of ndo_netpoll_setup
> and they look almost identical :S Perhaps they can be refactored?

correct.  This should be refactored.

In fact, since you opened this topic, there are a few things that also
come to my mind

1) Possible reduce refill_skb() work in the critical path (UDP send
path), moving it to a workqueue?

When sending a message, netpoll tries fill the whole skb poll, and then try to
allocate a new skb before sending the packet. 

netconsole needs to write a message, which calls netpoll_send_udp()

	send_ext_msg_udp() {
		netpoll_send_udp() {
			refill_skbs() {
				while (skb_pool.qlen < MAX_SKBS) {
					skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC);
				}
			}
			skb = alloc_skb(len, GFP_ATOMIC);
				if (!skb)
					skb = skb_dequeue(&skb_pool);
			}
		}
	}
		
Would it be better if the hot path just get one of the skbs from the
pool, and refill it in a workqueue? If the skb_poll() is empty, then
alloc_skb(len, GFP_ATOMIC) !?


2) Report statistic back from netpoll_send_udp(). netpoll_send_skb()
return values are being discarded, so, it is hard to know if the packet
was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY,
NETDEV_TX_OK.

It is unclear where this should be reported two. Maybe a configfs entry?
Jakub Kicinski Aug. 21, 2024, 10:54 p.m. UTC | #3
On Wed, 21 Aug 2024 01:41:55 -0700 Breno Leitao wrote:
> On Tue, Aug 20, 2024 at 04:24:09PM -0700, Jakub Kicinski wrote:
> > On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote:  
> > > netpoll_setup() can fail in several ways, some of which print an error
> > > message, while others simply return without any message. For example,
> > > __netpoll_setup() returns in a few places without printing anything.
> > > 
> > > To address this issue, modify the code to print an error message on
> > > netconsole if the target is not enabled. This will help us identify and
> > > troubleshoot netcnsole issues related to netpoll setup failures
> > > more easily.  
> > 
> > Only if memory allocation fails, it seems, and memory allocation
> > failures with GFP_KERNEL will be quite noisy.  
> 
> Or anything that fails in ->ndo_netpoll_setup() and doesn't print
> anything else.

Which also only fails because of memory allocation AFAICT.

> Do you think this is useless?

I think it's better to push up more precise message into the fail sites.

> > BTW I looked thru 4 random implementations of ndo_netpoll_setup
> > and they look almost identical :S Perhaps they can be refactored?  
> 
> correct.  This should be refactored.
> 
> In fact, since you opened this topic, there are a few things that also
> come to my mind
> 
> 1) Possible reduce refill_skb() work in the critical path (UDP send
> path), moving it to a workqueue?
> 
> When sending a message, netpoll tries fill the whole skb poll, and then try to
> allocate a new skb before sending the packet. 
> 
> netconsole needs to write a message, which calls netpoll_send_udp()
> 
> 	send_ext_msg_udp() {
> 		netpoll_send_udp() {
> 			refill_skbs() {
> 				while (skb_pool.qlen < MAX_SKBS) {
> 					skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC);
> 				}
> 			}
> 			skb = alloc_skb(len, GFP_ATOMIC);
> 				if (!skb)
> 					skb = skb_dequeue(&skb_pool);
> 			}
> 		}
> 	}
> 		
> Would it be better if the hot path just get one of the skbs from the
> pool, and refill it in a workqueue? If the skb_poll() is empty, then
> alloc_skb(len, GFP_ATOMIC) !?

Yeah, that seems a bit odd. If you can't find anything in the history
that would explain this design - refactoring SG.

> 2) Report statistic back from netpoll_send_udp(). netpoll_send_skb()
> return values are being discarded, so, it is hard to know if the packet
> was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY,
> NETDEV_TX_OK.
> 
> It is unclear where this should be reported two. Maybe a configfs entry?

Also sounds good. We don't use configfs much in networking so IDK if
it's okay to use it for stats. But no other obviously better place
comes to mind for me.
Breno Leitao Aug. 22, 2024, 10:01 a.m. UTC | #4
Hello Jakub,

On Wed, Aug 21, 2024 at 03:54:04PM -0700, Jakub Kicinski wrote:
> On Wed, 21 Aug 2024 01:41:55 -0700 Breno Leitao wrote:

> > Do you think this is useless?
> 
> I think it's better to push up more precise message into the fail sites.

Makese sense, I will remove it, and add the failing message once we
refactor ndo_netpoll_setup() callbacks.

> > Would it be better if the hot path just get one of the skbs from the
> > pool, and refill it in a workqueue? If the skb_poll() is empty, then
> > alloc_skb(len, GFP_ATOMIC) !?
> 
> Yeah, that seems a bit odd. If you can't find anything in the history
> that would explain this design - refactoring SG.

Thanks. I will add it to my todo list.

> > 2) Report statistic back from netpoll_send_udp(). netpoll_send_skb()
> > return values are being discarded, so, it is hard to know if the packet
> > was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY,
> > NETDEV_TX_OK.
> > 
> > It is unclear where this should be reported two. Maybe a configfs entry?
> 
> Also sounds good. We don't use configfs much in networking so IDK if
> it's okay to use it for stats. But no other obviously better place
> comes to mind for me.

Exactly, configfs seems a bit weird, but, at the same time, I don't have
a better idea. Let me send a patch for this one, and we can continue the
discussion over there.

Thanks
--breno
diff mbox series

Patch

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 72384c1ecc5c..9b5f605fe87a 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -414,8 +414,10 @@  static ssize_t enabled_store(struct config_item *item,
 		netpoll_print_options(&nt->np);
 
 		ret = netpoll_setup(&nt->np);
-		if (ret)
+		if (ret) {
+			pr_err("Not enabling netconsole. Netpoll setup failed\n");
 			goto out_unlock;
+		}
 
 		nt->enabled = true;
 		pr_info("network logging started\n");