diff mbox series

[net-next] net: netconsole: Populate dynamic entry even if netpoll fails

Message ID 20240809161935.3129104-1-leitao@debian.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: netconsole: Populate dynamic entry even if netpoll fails | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 29 this patch: 29
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 25 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-10--18-00 (tests: 707)

Commit Message

Breno Leitao Aug. 9, 2024, 4:19 p.m. UTC
Currently, netconsole discards targets that fail during initialization,
causing two issues:

1) Inconsistency between target list and configfs entries
  * user pass cmdline0, cmdline1. If cmdline0 fails, then cmdline1
    becomes cmdline0 in configfs.

2) Inability to manage failed targets from userspace
  * If user pass a target that fails with netpoll (interface not loaded at
    netcons initialization time, such as interface is a module), then
    the target will not exist in the configfs, so, user cannot re-enable
    or modify it from userspace.

Failed targets are now added to the target list and configfs, but
remain disabled until manually enabled or reconfigured. This change does
not change the behaviour if CONFIG_NETCONSOLE_DYNAMIC is not set.

CC: Aijay Adams <aijay@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/netconsole.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Paolo Abeni Aug. 13, 2024, 11:55 a.m. UTC | #1
On 8/9/24 18:19, Breno Leitao wrote:> @@ -1304,8 +1308,6 @@ static int 
__init init_netconsole(void)
>   		while ((target_config = strsep(&input, ";"))) {
>   			nt = alloc_param_target(target_config, count);
>   			if (IS_ERR(nt)) {
> -				if (IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
> -					continue;
>   				err = PTR_ERR(nt);
>   				goto fail;
>   			}

AFAICS the above introduces a behavior change: if 
CONFIG_NETCONSOLE_DYNAMIC is enabled, and the options parsing fails for 
any targets in the command line, all the targets will be removed.

I think the old behavior is preferable - just skip the targets with 
wrong options.

Side note: I think the error paths in __netpoll_setup() assume the 
struct netpoll will be freed in case of error, e.g. the device refcount 
is released but np->dev is not cleared, I fear we could hit a reference 
underflow on <setup error>, <disable>

Paolo
Breno Leitao Aug. 13, 2024, 3:57 p.m. UTC | #2
Hello Paolo,

On Tue, Aug 13, 2024 at 01:55:27PM +0200, Paolo Abeni wrote:
> On 8/9/24 18:19, Breno Leitao wrote:> @@ -1304,8 +1308,6 @@ static int
> __init init_netconsole(void)
> >   		while ((target_config = strsep(&input, ";"))) {
> >   			nt = alloc_param_target(target_config, count);
> >   			if (IS_ERR(nt)) {
> > -				if (IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
> > -					continue;
> >   				err = PTR_ERR(nt);
> >   				goto fail;
> >   			}

First of all, thanks for the in-depth review and suggestion.


> AFAICS the above introduces a behavior change: if CONFIG_NETCONSOLE_DYNAMIC
> is enabled, and the options parsing fails for any targets in the command
> line, all the targets will be removed.
> 
> I think the old behavior is preferable - just skip the targets with wrong
> options.

Thinking about it again, and I think I agree with you here. I will
update.

> Side note: I think the error paths in __netpoll_setup() assume the struct
> netpoll will be freed in case of error, e.g. the device refcount is released
> but np->dev is not cleared, I fear we could hit a reference underflow on
> <setup error>, <disable>

That is a good catch, and I was even able to simulate it, by forcing two
errors:

Something as:

--- a/net/core/netpoll.c
	+++ b/net/core/netpoll.c
	@@ -637,7 +637,8 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
		}

		if (!ndev->npinfo) {
	-               npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
	+               npinfo = NULL;

	diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
	index 41a61fa88c32..2c6190808e75 100644
	--- a/drivers/net/netconsole.c
	+++ b/drivers/net/netconsole.c
	@@ -1327,12 +1330,17 @@ static int __init init_netconsole(void)
		}

		err = dynamic_netconsole_init();
	+       err = 1;
		if (err)
			goto undonotifier;

	
This caused the following issue:
	
	[   19.530831] netconsole: network logging stopped on interface eth0 as it unregistered
	[   19.531205] ref_tracker: reference already released.
	[   19.531426] ref_tracker: allocated in:
	[   19.531505]  netpoll_setup+0xfd/0x7f0
	[   19.531505]  init_netconsole+0x300/0x960
	....
	[   19.534532] ------------[ cut here ]------------
	[   19.534784] WARNING: CPU: 5 PID: 1 at lib/ref_tracker.c:255 ref_tracker_free+0x4e5/0x740
	[   19.535103] Modules linked in:
	....
	[   19.542116]  ? ref_tracker_free+0x4e5/0x740
	[   19.542286]  ? refcount_inc+0x40/0x40
	[   19.542571]  ? do_netpoll_cleanup+0x4e/0xb0
	[   19.542752]  ? netconsole_process_cleanups_core+0xcd/0x260
	[   19.542961]  ? netconsole_netdev_event+0x3ab/0x3e0
	[   19.543199]  ? unregister_netdevice_notifier+0x27c/0x2f0
	[   19.543456]  ? init_netconsole+0xe4/0x960
	[   19.543615]  ? do_one_initcall+0x1a8/0x5d0
	[   19.543764]  ? do_initcall_level+0x133/0x1e0
	[   19.543963]  ? do_initcalls+0x43/0x80
	....

That said, now, the list contains enabled and disabled targets. All the
disable targets have netpoll disabled, thus, we don't handle network
operations in the disabled targets.

This is my new proposal, based on your feedback, how does it look like?

Thanks for the in-depth review,
--breno

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 30b6aac08411..60325383ab6d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -1008,6 +1008,8 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	mutex_lock(&target_cleanup_list_lock);
 	spin_lock_irqsave(&target_list_lock, flags);
 	list_for_each_entry_safe(nt, tmp, &target_list, list) {
+		if (!nt->enabled)
+			continue;
 		netconsole_target_get(nt);
 		if (nt->np.dev == dev) {
 			switch (event) {
@@ -1258,11 +1260,15 @@ static struct netconsole_target *alloc_param_target(char *target_config,
 		goto fail;
 
 	err = netpoll_setup(&nt->np);
-	if (err)
+	if (!err)
+		nt->enabled = true;
+	else if (!IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
+		/* only fail if dynamic reconfiguration is set,
+		 * otherwise, keep the target in the list, but disabled.
+		 */
 		goto fail;
 
 	populate_configfs_item(nt, cmdline_count);
-	nt->enabled = true;
 
 	return nt;
 
@@ -1274,7 +1280,8 @@ static struct netconsole_target *alloc_param_target(char *target_config,
 /* Cleanup netpoll for given target (from boot/module param) and free it */
 static void free_param_target(struct netconsole_target *nt)
 {
-	netpoll_cleanup(&nt->np);
+	if (nt->enabled)
+		netpoll_cleanup(&nt->np);
 	kfree(nt);
 }
Paolo Abeni Aug. 14, 2024, 10:06 a.m. UTC | #3
On 8/13/24 17:57, Breno Leitao wrote:
> On Tue, Aug 13, 2024 at 01:55:27PM +0200, Paolo Abeni wrote:
>> On 8/9/24 18:19, Breno Leitao wrote:> @@ -1304,8 +1308,6 @@ static int
>> __init init_netconsole(void)
>>>    		while ((target_config = strsep(&input, ";"))) {
>>>    			nt = alloc_param_target(target_config, count);
>>>    			if (IS_ERR(nt)) {
>>> -				if (IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
>>> -					continue;
>>>    				err = PTR_ERR(nt);
>>>    				goto fail;
>>>    			}
> 
> First of all, thanks for the in-depth review and suggestion.
> 
> 
>> AFAICS the above introduces a behavior change: if CONFIG_NETCONSOLE_DYNAMIC
>> is enabled, and the options parsing fails for any targets in the command
>> line, all the targets will be removed.
>>
>> I think the old behavior is preferable - just skip the targets with wrong
>> options.
> 
> Thinking about it again, and I think I agree with you here. I will
> update.
> 
>> Side note: I think the error paths in __netpoll_setup() assume the struct
>> netpoll will be freed in case of error, e.g. the device refcount is released
>> but np->dev is not cleared, I fear we could hit a reference underflow on
>> <setup error>, <disable>
> 
> That is a good catch, and I was even able to simulate it, by forcing two
> errors:
> 
> Something as:
> 
> --- a/net/core/netpoll.c
> 	+++ b/net/core/netpoll.c
> 	@@ -637,7 +637,8 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
> 		}
> 
> 		if (!ndev->npinfo) {
> 	-               npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
> 	+               npinfo = NULL;
> 
> 	diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> 	index 41a61fa88c32..2c6190808e75 100644
> 	--- a/drivers/net/netconsole.c
> 	+++ b/drivers/net/netconsole.c
> 	@@ -1327,12 +1330,17 @@ static int __init init_netconsole(void)
> 		}
> 
> 		err = dynamic_netconsole_init();
> 	+       err = 1;
> 		if (err)
> 			goto undonotifier;
> 
> 	
> This caused the following issue:
> 	
> 	[   19.530831] netconsole: network logging stopped on interface eth0 as it unregistered
> 	[   19.531205] ref_tracker: reference already released.
> 	[   19.531426] ref_tracker: allocated in:
> 	[   19.531505]  netpoll_setup+0xfd/0x7f0
> 	[   19.531505]  init_netconsole+0x300/0x960
> 	....
> 	[   19.534532] ------------[ cut here ]------------
> 	[   19.534784] WARNING: CPU: 5 PID: 1 at lib/ref_tracker.c:255 ref_tracker_free+0x4e5/0x740
> 	[   19.535103] Modules linked in:
> 	....
> 	[   19.542116]  ? ref_tracker_free+0x4e5/0x740
> 	[   19.542286]  ? refcount_inc+0x40/0x40
> 	[   19.542571]  ? do_netpoll_cleanup+0x4e/0xb0
> 	[   19.542752]  ? netconsole_process_cleanups_core+0xcd/0x260
> 	[   19.542961]  ? netconsole_netdev_event+0x3ab/0x3e0
> 	[   19.543199]  ? unregister_netdevice_notifier+0x27c/0x2f0
> 	[   19.543456]  ? init_netconsole+0xe4/0x960
> 	[   19.543615]  ? do_one_initcall+0x1a8/0x5d0
> 	[   19.543764]  ? do_initcall_level+0x133/0x1e0
> 	[   19.543963]  ? do_initcalls+0x43/0x80
> 	....
> 
> That said, now, the list contains enabled and disabled targets. All the
> disable targets have netpoll disabled, thus, we don't handle network
> operations in the disabled targets.
> 
> This is my new proposal, based on your feedback, how does it look like?
> 
> Thanks for the in-depth review,
> --breno
> 
> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index 30b6aac08411..60325383ab6d 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -1008,6 +1008,8 @@ static int netconsole_netdev_event(struct notifier_block *this,
>   	mutex_lock(&target_cleanup_list_lock);
>   	spin_lock_irqsave(&target_list_lock, flags);
>   	list_for_each_entry_safe(nt, tmp, &target_list, list) {
> +		if (!nt->enabled)
> +			continue;
>   		netconsole_target_get(nt);
>   		if (nt->np.dev == dev) {
>   			switch (event) {
> @@ -1258,11 +1260,15 @@ static struct netconsole_target *alloc_param_target(char *target_config,
>   		goto fail;
>   
>   	err = netpoll_setup(&nt->np);
> -	if (err)
> +	if (!err)
> +		nt->enabled = true;
> +	else if (!IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
> +		/* only fail if dynamic reconfiguration is set,
> +		 * otherwise, keep the target in the list, but disabled.
> +		 */
>   		goto fail;
>   
>   	populate_configfs_item(nt, cmdline_count);
> -	nt->enabled = true;
>   
>   	return nt;
>   
> @@ -1274,7 +1280,8 @@ static struct netconsole_target *alloc_param_target(char *target_config,
>   /* Cleanup netpoll for given target (from boot/module param) and free it */
>   static void free_param_target(struct netconsole_target *nt)
>   {
> -	netpoll_cleanup(&nt->np);
> +	if (nt->enabled)
> +		netpoll_cleanup(&nt->np);
>   	kfree(nt);
>   }

I fear the late cleanup could still be dangerous - what if multiple, 
consecutive, enabled_store() on the same target fails?

I *think* it would be safer always zeroing np->dev in the error path of 
netpoll_setup().

It could be a separate patch for bisectability.

Side note: I additionally think that in the same error path we should 
conditionally clear np->local_ip.ip, if the previous code initialized 
such field, or we could get weird results if e.g.
- a target uses eth0 with local_ip == 0
- enabled_store() of such target fails e.g. due ndo_netpoll_setup() failure
- address on eth0 changes for some reason
- anoter enabled_store() is issued on the same target.

At this point the netpoll target should be wrongly using the old address.

Thanks,

Paolo
Breno Leitao Aug. 15, 2024, 1:46 p.m. UTC | #4
On Wed, Aug 14, 2024 at 12:06:48PM +0200, Paolo Abeni wrote:
> I fear the late cleanup could still be dangerous - what if multiple,
> consecutive, enabled_store() on the same target fails?
> 
> I *think* it would be safer always zeroing np->dev in the error path of
> netpoll_setup().
> 
> It could be a separate patch for bisectability.
> 
> Side note: I additionally think that in the same error path we should
> conditionally clear np->local_ip.ip, if the previous code initialized such
> field, or we could get weird results if e.g.
> - a target uses eth0 with local_ip == 0
> - enabled_store() of such target fails e.g. due ndo_netpoll_setup() failure
> - address on eth0 changes for some reason
> - anoter enabled_store() is issued on the same target.
> 
> At this point the netpoll target should be wrongly using the old address.

Agree with you. I think we always want to keep struct netpoll objects
either initialized or unitialized, not keeping them half-baked.

How about the following patch:

    netpoll: Ensure clean state on setup failures
    
    Modify netpoll_setup() and __netpoll_setup() to ensure that the netpoll
    structure (np) is left in a clean state if setup fails for any reason.
    This prevents carrying over misconfigured fields in case of partial
    setup success.
    
    Key changes:
    - np->dev is now set only after successful setup, ensuring it's always
      NULL if netpoll is not configured or if netpoll_setup() fails.
    - np->local_ip is zeroed if netpoll setup doesn't complete successfully.
    - Added DEBUG_NET_WARN_ON_ONCE() checks to catch unexpected states.
    
    These changes improve the reliability of netpoll configuration, since it
    assures that the structure is fully initialized or totally unset.
    
    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a58ea724790c..348d76a51c20 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -626,12 +626,10 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 	const struct net_device_ops *ops;
 	int err;
 
-	np->dev = ndev;
-	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
-
+	DEBUG_NET_WARN_ON_ONCE(np->dev);
 	if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
 		np_err(np, "%s doesn't support polling, aborting\n",
-		       np->dev_name);
+		       ndev->name);
 		err = -ENOTSUPP;
 		goto out;
 	}
@@ -649,7 +647,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 
 		refcount_set(&npinfo->refcnt, 1);
 
-		ops = np->dev->netdev_ops;
+		ops = ndev->netdev_ops;
 		if (ops->ndo_netpoll_setup) {
 			err = ops->ndo_netpoll_setup(ndev, npinfo);
 			if (err)
@@ -660,6 +658,8 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 		refcount_inc(&npinfo->refcnt);
 	}
 
+	np->dev = ndev;
+	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
 	npinfo->netpoll = np;
 
 	/* last thing to do is link it to the net device structure */
@@ -681,6 +681,7 @@ int netpoll_setup(struct netpoll *np)
 	int err;
 
 	rtnl_lock();
+	DEBUG_NET_WARN_ON_ONCE(np->dev);
 	if (np->dev_name[0]) {
 		struct net *net = current->nsproxy->net_ns;
 		ndev = __dev_get_by_name(net, np->dev_name);
@@ -782,11 +783,14 @@ int netpoll_setup(struct netpoll *np)
 
 	err = __netpoll_setup(np, ndev);
 	if (err)
-		goto put;
+		goto clear_ip;
 	rtnl_unlock();
 	return 0;
 
+clear_ip:
+	memset(&np->local_ip, 0, sizeof(np->local_ip));
 put:
+	DEBUG_NET_WARN_ON_ONCE(np->dev);
 	netdev_put(ndev, &np->dev_tracker);
 unlock:
 	rtnl_unlock();
Paolo Abeni Aug. 15, 2024, 4:07 p.m. UTC | #5
On 8/15/24 15:46, Breno Leitao wrote:
> On Wed, Aug 14, 2024 at 12:06:48PM +0200, Paolo Abeni wrote:
>> I fear the late cleanup could still be dangerous - what if multiple,
>> consecutive, enabled_store() on the same target fails?
>>
>> I *think* it would be safer always zeroing np->dev in the error path of
>> netpoll_setup().
>>
>> It could be a separate patch for bisectability.
>>
>> Side note: I additionally think that in the same error path we should
>> conditionally clear np->local_ip.ip, if the previous code initialized such
>> field, or we could get weird results if e.g.
>> - a target uses eth0 with local_ip == 0
>> - enabled_store() of such target fails e.g. due ndo_netpoll_setup() failure
>> - address on eth0 changes for some reason
>> - anoter enabled_store() is issued on the same target.
>>
>> At this point the netpoll target should be wrongly using the old address.
> 
> Agree with you. I think we always want to keep struct netpoll objects
> either initialized or unitialized, not keeping them half-baked.
> 
> How about the following patch:

Overall LGTM, a couple of minor comments below.

>      netpoll: Ensure clean state on setup failures
>      
>      Modify netpoll_setup() and __netpoll_setup() to ensure that the netpoll
>      structure (np) is left in a clean state if setup fails for any reason.
>      This prevents carrying over misconfigured fields in case of partial
>      setup success.
>      
>      Key changes:
>      - np->dev is now set only after successful setup, ensuring it's always
>        NULL if netpoll is not configured or if netpoll_setup() fails.
>      - np->local_ip is zeroed if netpoll setup doesn't complete successfully.
>      - Added DEBUG_NET_WARN_ON_ONCE() checks to catch unexpected states.
>      
>      These changes improve the reliability of netpoll configuration, since it
>      assures that the structure is fully initialized or totally unset.
>      
>      Suggested-by: Paolo Abeni <pabeni@redhat.com>
>      Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index a58ea724790c..348d76a51c20 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -626,12 +626,10 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
>   	const struct net_device_ops *ops;
>   	int err;
>   
> -	np->dev = ndev;
> -	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
> -
> +	DEBUG_NET_WARN_ON_ONCE(np->dev);
>   	if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
>   		np_err(np, "%s doesn't support polling, aborting\n",
> -		       np->dev_name);
> +		       ndev->name);
>   		err = -ENOTSUPP;
>   		goto out;
>   	}
> @@ -649,7 +647,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
>   
>   		refcount_set(&npinfo->refcnt, 1);
>   
> -		ops = np->dev->netdev_ops;
> +		ops = ndev->netdev_ops;
>   		if (ops->ndo_netpoll_setup) {
>   			err = ops->ndo_netpoll_setup(ndev, npinfo);
>   			if (err)
> @@ -660,6 +658,8 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
>   		refcount_inc(&npinfo->refcnt);
>   	}
>   
> +	np->dev = ndev;
> +	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
>   	npinfo->netpoll = np;
>   
>   	/* last thing to do is link it to the net device structure */
> @@ -681,6 +681,7 @@ int netpoll_setup(struct netpoll *np)
>   	int err;
>   
>   	rtnl_lock();
> +	DEBUG_NET_WARN_ON_ONCE(np->dev);

This looks redundant

>   	if (np->dev_name[0]) {
>   		struct net *net = current->nsproxy->net_ns;
>   		ndev = __dev_get_by_name(net, np->dev_name);
> @@ -782,11 +783,14 @@ int netpoll_setup(struct netpoll *np)
>   
>   	err = __netpoll_setup(np, ndev);
>   	if (err)
> -		goto put;
> +		goto clear_ip;
>   	rtnl_unlock();
>   	return 0;
>   
> +clear_ip:
> +	memset(&np->local_ip, 0, sizeof(np->local_ip));

I think it would be better to clear the local_ip only if np->local_ip 
was set/initialized/filled by netpoll_setup() otherwise the sysfs 
contents could suddenly/unexpetedly change on failure.

Thanks,

Paolo
Breno Leitao Aug. 15, 2024, 5:16 p.m. UTC | #6
On Thu, Aug 15, 2024 at 06:07:20PM +0200, Paolo Abeni wrote:
> On 8/15/24 15:46, Breno Leitao wrote:
> > On Wed, Aug 14, 2024 at 12:06:48PM +0200, Paolo Abeni wrote:
> > > I fear the late cleanup could still be dangerous - what if multiple,
> > > consecutive, enabled_store() on the same target fails?
> > > 
> > > I *think* it would be safer always zeroing np->dev in the error path of
> > > netpoll_setup().
> > > 
> > > It could be a separate patch for bisectability.
> > > 
> > > Side note: I additionally think that in the same error path we should
> > > conditionally clear np->local_ip.ip, if the previous code initialized such
> > > field, or we could get weird results if e.g.
> > > - a target uses eth0 with local_ip == 0
> > > - enabled_store() of such target fails e.g. due ndo_netpoll_setup() failure
> > > - address on eth0 changes for some reason
> > > - anoter enabled_store() is issued on the same target.
> > > 
> > > At this point the netpoll target should be wrongly using the old address.
> > 
> > Agree with you. I think we always want to keep struct netpoll objects
> > either initialized or unitialized, not keeping them half-baked.
> > 
> > How about the following patch:
> 
> Overall LGTM, a couple of minor comments below.
> 
> >      netpoll: Ensure clean state on setup failures
> >      Modify netpoll_setup() and __netpoll_setup() to ensure that the netpoll
> >      structure (np) is left in a clean state if setup fails for any reason.
> >      This prevents carrying over misconfigured fields in case of partial
> >      setup success.
> >      Key changes:
> >      - np->dev is now set only after successful setup, ensuring it's always
> >        NULL if netpoll is not configured or if netpoll_setup() fails.
> >      - np->local_ip is zeroed if netpoll setup doesn't complete successfully.
> >      - Added DEBUG_NET_WARN_ON_ONCE() checks to catch unexpected states.
> >      These changes improve the reliability of netpoll configuration, since it
> >      assures that the structure is fully initialized or totally unset.
> >      Suggested-by: Paolo Abeni <pabeni@redhat.com>
> >      Signed-off-by: Breno Leitao <leitao@debian.org>
> > 
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> > index a58ea724790c..348d76a51c20 100644
> > --- a/net/core/netpoll.c
> > +++ b/net/core/netpoll.c
> > @@ -626,12 +626,10 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
> >   	const struct net_device_ops *ops;
> >   	int err;
> > -	np->dev = ndev;
> > -	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
> > -
> > +	DEBUG_NET_WARN_ON_ONCE(np->dev);
> >   	if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
> >   		np_err(np, "%s doesn't support polling, aborting\n",
> > -		       np->dev_name);
> > +		       ndev->name);
> >   		err = -ENOTSUPP;
> >   		goto out;
> >   	}
> > @@ -649,7 +647,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
> >   		refcount_set(&npinfo->refcnt, 1);
> > -		ops = np->dev->netdev_ops;
> > +		ops = ndev->netdev_ops;
> >   		if (ops->ndo_netpoll_setup) {
> >   			err = ops->ndo_netpoll_setup(ndev, npinfo);
> >   			if (err)
> > @@ -660,6 +658,8 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
> >   		refcount_inc(&npinfo->refcnt);
> >   	}
> > +	np->dev = ndev;
> > +	strscpy(np->dev_name, ndev->name, IFNAMSIZ);
> >   	npinfo->netpoll = np;
> >   	/* last thing to do is link it to the net device structure */
> > @@ -681,6 +681,7 @@ int netpoll_setup(struct netpoll *np)
> >   	int err;
> >   	rtnl_lock();
> > +	DEBUG_NET_WARN_ON_ONCE(np->dev);
> 
> This looks redundant
> 
> >   	if (np->dev_name[0]) {
> >   		struct net *net = current->nsproxy->net_ns;
> >   		ndev = __dev_get_by_name(net, np->dev_name);
> > @@ -782,11 +783,14 @@ int netpoll_setup(struct netpoll *np)
> >   	err = __netpoll_setup(np, ndev);
> >   	if (err)
> > -		goto put;
> > +		goto clear_ip;
> >   	rtnl_unlock();
> >   	return 0;
> > +clear_ip:
> > +	memset(&np->local_ip, 0, sizeof(np->local_ip));
> 
> I think it would be better to clear the local_ip only if np->local_ip was
> set/initialized/filled by netpoll_setup() otherwise the sysfs contents could
> suddenly/unexpetedly change on failure.

Makes sense. I was not able to come up with any other solution other than
tracking the overwrite and checking it later, which is admittedly not
beautiful, but might do the job. Let me know if you think about
something more elegant.

	 int netpoll_setup(struct netpoll *np)
	 {
		struct net_device *ndev = NULL;
	+       bool ip_overwritten = false;
		struct in_device *in_dev;
		int err;

	@@ -740,6 +741,7 @@ int netpoll_setup(struct netpoll *np)
					goto put;
				}

	+                       ip_overwritten = true;
				np->local_ip.ip = ifa->ifa_local;
				np_info(np, "local IP %pI4\n", &np->local_ip.ip);
			} else {
	@@ -757,6 +759,7 @@ int netpoll_setup(struct netpoll *np)
						    !!(ipv6_addr_type(&np->remote_ip.in6) & IPV6_ADDR_LINKLOCAL))
							continue;
						np->local_ip.in6 = ifp->addr;
	+                                       ip_overwritten = true;
						err = 0;
						break;
					}
	@@ -787,6 +790,9 @@ int netpoll_setup(struct netpoll *np)
		return 0;

	 put:
	+       DEBUG_NET_WARN_ON_ONCE(np->dev);
	+       if (ip_overwritten)
	+               memset(&np->local_ip, 0, sizeof(np->local_ip));
		netdev_put(ndev, &np->dev_tracker);


I really appreciate your time helping me here!
--breno
diff mbox series

Patch

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 43c29b15adbf..41a61fa88c32 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -1258,11 +1258,15 @@  static struct netconsole_target *alloc_param_target(char *target_config,
 		goto fail;
 
 	err = netpoll_setup(&nt->np);
-	if (err)
+	if (!err)
+		nt->enabled = true;
+	else if (!IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
+		/* only fail if dynamic reconfiguration is set,
+		 * otherwise, keep the target in the list, but disabled.
+		 */
 		goto fail;
 
 	populate_configfs_item(nt, cmdline_count);
-	nt->enabled = true;
 
 	return nt;
 
@@ -1304,8 +1308,6 @@  static int __init init_netconsole(void)
 		while ((target_config = strsep(&input, ";"))) {
 			nt = alloc_param_target(target_config, count);
 			if (IS_ERR(nt)) {
-				if (IS_ENABLED(CONFIG_NETCONSOLE_DYNAMIC))
-					continue;
 				err = PTR_ERR(nt);
 				goto fail;
 			}