diff mbox series

[PATCHv3,net-next] bonding: 3ad: send ifinfo notify when mux state changed

Message ID 20240626075156.2565966-1-liuhangbin@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [PATCHv3,net-next] bonding: 3ad: send ifinfo notify when mux state changed | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 842 this patch: 842
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 849 this patch: 849
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 849 this patch: 849
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 22 this patch: 22
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-06-26--15-00 (tests: 664)

Commit Message

Hangbin Liu June 26, 2024, 7:51 a.m. UTC
Currently, administrators need to retrieve LACP mux state changes from
the kernel DEBUG log using netdev_dbg and slave_dbg macros. To simplify
this process, let's send the ifinfo notification whenever the mux state
changes. This will enable users to directly access and monitor this
information using the ip monitor command.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v3: forgot to use GFP_ATOMIC. (Nikolay Aleksandrov)
    export symbol for rtmsg_ifinfo. It's weird that my build succeed with
    tools/testing/selftests/drivers/net/bonding/config without export
    the symbol, but build failed with tools/testing/selftests/net/config.
v2: don't use call_netdevice_notifiers as it will case sleeping in atomic
    context (Nikolay Aleksandrov)

After this patch, we can see the following info with `ip -d monitor link`

7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
    link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
    veth
    bond_slave state BACKUP mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 143 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,expired> ad_partner_oper_port_state 55 ad_partner_oper_port_state_str <active,short_timeout,aggregating,collecting,distributing> ...
7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
    link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
    veth
    bond_slave state ACTIVE mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 79 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,defaulted> ad_partner_oper_port_state 1 ad_partner_oper_port_state_str <active> ...
7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
    link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
    veth
    bond_slave state ACTIVE mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 63 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ad_partner_oper_port_state 63 ad_partner_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ...
---
 drivers/net/bonding/bond_3ad.c | 3 +++
 net/core/rtnetlink.c           | 1 +
 2 files changed, 4 insertions(+)

Comments

Nikolay Aleksandrov June 26, 2024, 8:22 a.m. UTC | #1
On 26/06/2024 10:51, Hangbin Liu wrote:
> Currently, administrators need to retrieve LACP mux state changes from
> the kernel DEBUG log using netdev_dbg and slave_dbg macros. To simplify
> this process, let's send the ifinfo notification whenever the mux state
> changes. This will enable users to directly access and monitor this
> information using the ip monitor command.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> v3: forgot to use GFP_ATOMIC. (Nikolay Aleksandrov)
>     export symbol for rtmsg_ifinfo. It's weird that my build succeed with
>     tools/testing/selftests/drivers/net/bonding/config without export
>     the symbol, but build failed with tools/testing/selftests/net/config.
> v2: don't use call_netdevice_notifiers as it will case sleeping in atomic
>     context (Nikolay Aleksandrov)
> 
> After this patch, we can see the following info with `ip -d monitor link`
> 
> 7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
>     link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
>     veth
>     bond_slave state BACKUP mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 143 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,expired> ad_partner_oper_port_state 55 ad_partner_oper_port_state_str <active,short_timeout,aggregating,collecting,distributing> ...
> 7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
>     link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
>     veth
>     bond_slave state ACTIVE mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 79 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,defaulted> ad_partner_oper_port_state 1 ad_partner_oper_port_state_str <active> ...
> 7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
>     link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
>     veth
>     bond_slave state ACTIVE mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 63 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ad_partner_oper_port_state 63 ad_partner_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ...
> ---
>  drivers/net/bonding/bond_3ad.c | 3 +++
>  net/core/rtnetlink.c           | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index c6807e473ab7..b57c5670b31a 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -11,6 +11,7 @@
>  #include <linux/etherdevice.h>
>  #include <linux/if_bonding.h>
>  #include <linux/pkt_sched.h>
> +#include <linux/rtnetlink.h>
>  #include <net/net_namespace.h>
>  #include <net/bonding.h>
>  #include <net/bond_3ad.h>
> @@ -1185,6 +1186,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>  		default:
>  			break;
>  		}
> +
> +		rtmsg_ifinfo(RTM_NEWLINK, port->slave->dev, 0, GFP_ATOMIC, 0, NULL);
>  	}
>  }
>  
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index eabfc8290f5e..4507bb8d5264 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4116,6 +4116,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
>  	rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
>  			   NULL, 0, portid, nlh);
>  }
> +EXPORT_SYMBOL(rtmsg_ifinfo);
>  
>  void rtmsg_ifinfo_newnet(int type, struct net_device *dev, unsigned int change,
>  			 gfp_t flags, int *new_nsid, int new_ifindex)

Looks good to me, thanks!
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Jay Vosburgh June 26, 2024, 3:30 p.m. UTC | #2
Hangbin Liu <liuhangbin@gmail.com> wrote:

>Currently, administrators need to retrieve LACP mux state changes from
>the kernel DEBUG log using netdev_dbg and slave_dbg macros. To simplify
>this process, let's send the ifinfo notification whenever the mux state
>changes. This will enable users to directly access and monitor this
>information using the ip monitor command.
>
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>---
>v3: forgot to use GFP_ATOMIC. (Nikolay Aleksandrov)
>    export symbol for rtmsg_ifinfo. It's weird that my build succeed with
>    tools/testing/selftests/drivers/net/bonding/config without export
>    the symbol, but build failed with tools/testing/selftests/net/config.

	I would hazard to guess that bonding/config works without export
because it has

CONFIG_BONDING=y

	which builds bonding into the main image (not as a module),
which wouldn't need the EXPORT_SYMBOL.

	I think the change is fine, the only question is whether it's
better to have a wrapper for rtmsg_ifinfo() in net/core/dev.c (where all
current callers are).  I don't see a particular need, but others might
want some consistency.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J


>v2: don't use call_netdevice_notifiers as it will case sleeping in atomic
>    context (Nikolay Aleksandrov)
>
>After this patch, we can see the following info with `ip -d monitor link`
>
>7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
>    link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
>    veth
>    bond_slave state BACKUP mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 143 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,expired> ad_partner_oper_port_state 55 ad_partner_oper_port_state_str <active,short_timeout,aggregating,collecting,distributing> ...
>7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
>    link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
>    veth
>    bond_slave state ACTIVE mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 79 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,defaulted> ad_partner_oper_port_state 1 ad_partner_oper_port_state_str <active> ...
>7: veth1@if6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP group default
>    link/ether 02:0a:04:c2:d6:21 brd ff:ff:ff:ff:ff:ff link-netns b promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
>    veth
>    bond_slave state ACTIVE mii_status UP ... ad_aggregator_id 1 ad_actor_oper_port_state 63 ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ad_partner_oper_port_state 63 ad_partner_oper_port_state_str <active,short_timeout,aggregating,in_sync,collecting,distributing> ...
>---
> drivers/net/bonding/bond_3ad.c | 3 +++
> net/core/rtnetlink.c           | 1 +
> 2 files changed, 4 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c6807e473ab7..b57c5670b31a 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -11,6 +11,7 @@
> #include <linux/etherdevice.h>
> #include <linux/if_bonding.h>
> #include <linux/pkt_sched.h>
>+#include <linux/rtnetlink.h>
> #include <net/net_namespace.h>
> #include <net/bonding.h>
> #include <net/bond_3ad.h>
>@@ -1185,6 +1186,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
> 		default:
> 			break;
> 		}
>+
>+		rtmsg_ifinfo(RTM_NEWLINK, port->slave->dev, 0, GFP_ATOMIC, 0, NULL);
> 	}
> }
> 
>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>index eabfc8290f5e..4507bb8d5264 100644
>--- a/net/core/rtnetlink.c
>+++ b/net/core/rtnetlink.c
>@@ -4116,6 +4116,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> 	rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> 			   NULL, 0, portid, nlh);
> }
>+EXPORT_SYMBOL(rtmsg_ifinfo);
> 
> void rtmsg_ifinfo_newnet(int type, struct net_device *dev, unsigned int change,
> 			 gfp_t flags, int *new_nsid, int new_ifindex)
>-- 
>2.45.0
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jakub Kicinski June 26, 2024, 9:53 p.m. UTC | #3
On Wed, 26 Jun 2024 15:51:56 +0800 Hangbin Liu wrote:
> Currently, administrators need to retrieve LACP mux state changes from
> the kernel DEBUG log using netdev_dbg and slave_dbg macros. To simplify
> this process, let's send the ifinfo notification whenever the mux state
> changes. This will enable users to directly access and monitor this
> information using the ip monitor command.

Hits:

RTNL: assertion failed at net/core/rtnetlink.c (1823)

On two selftests. Please run the selftests on a debug kernel..
Jay Vosburgh June 27, 2024, 12:06 a.m. UTC | #4
Jakub Kicinski <kuba@kernel.org> wrote:

>On Wed, 26 Jun 2024 15:51:56 +0800 Hangbin Liu wrote:
>> Currently, administrators need to retrieve LACP mux state changes from
>> the kernel DEBUG log using netdev_dbg and slave_dbg macros. To simplify
>> this process, let's send the ifinfo notification whenever the mux state
>> changes. This will enable users to directly access and monitor this
>> information using the ip monitor command.
>
>Hits:
>
>RTNL: assertion failed at net/core/rtnetlink.c (1823)
>
>On two selftests. Please run the selftests on a debug kernel..

	Oh, I forgot about needing RTNL.

	We cannot simply acquire RTNL in ad_mux_machine(), as the
bond->mode_lock is already held, and the lock ordering must be RTNL
first, then mode_lock, lest we deadlock.

	Hangbin, I'd suggest you look at how bond_netdev_notify_work()
complies with the lock ordering (basically, doing the actual work out of
line in a workqueue event), or how the "should_notify" flag is used in
bond_3ad_state_machine_handler().  The first is more complicated, but
won't skip events; the second may miss intermediate state transitions if
it cannot acquire RTNL and has to delay the notification.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Hangbin Liu June 27, 2024, 8:26 a.m. UTC | #5
On Wed, Jun 26, 2024 at 05:06:00PM -0700, Jay Vosburgh wrote:
> >Hits:
> >
> >RTNL: assertion failed at net/core/rtnetlink.c (1823)

Thanks for this hits...

> >
> >On two selftests. Please run the selftests on a debug kernel..

OK, I will try run my tests on debug kernel in future.

> 
> 	Oh, I forgot about needing RTNL.
> 
> 	We cannot simply acquire RTNL in ad_mux_machine(), as the
> bond->mode_lock is already held, and the lock ordering must be RTNL
> first, then mode_lock, lest we deadlock.
> 
> 	Hangbin, I'd suggest you look at how bond_netdev_notify_work()
> complies with the lock ordering (basically, doing the actual work out of
> line in a workqueue event), or how the "should_notify" flag is used in
> bond_3ad_state_machine_handler().  The first is more complicated, but
> won't skip events; the second may miss intermediate state transitions if
> it cannot acquire RTNL and has to delay the notification.

I think the administer doesn't want to loose the state change info. So how
about something like:

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c6807e473ab7..046f11c5c47a 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1185,6 +1185,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
 		default:
 			break;
 		}
+
+		port->slave->should_notify_lacp = 1;
 	}
 }
 
@@ -2527,6 +2529,9 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
 		bond_slave_state_notify(bond);
 		rtnl_unlock();
 	}
+
+	/* Notify the mux state changes */
+	bond_slave_link_notify(bond);
 	queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
 }
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3c3fcce4acd4..db8f2fb613df 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1748,11 +1748,19 @@ static void bond_netdev_notify_work(struct work_struct *_work)
 					   notify_work.work);
 
 	if (rtnl_trylock()) {
-		struct netdev_bonding_info binfo;
+		if (slave->should_notify_link) {
+			struct netdev_bonding_info binfo;
+			bond_fill_ifslave(slave, &binfo.slave);
+			bond_fill_ifbond(slave->bond, &binfo.master);
+			netdev_bonding_info_change(slave->dev, &binfo);
+			slave->should_notify_link = 0;
+		}
+
+		if (slave->should_notify_lacp) {
+			rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_KERNEL, 0, NULL);
+			slave->should_notify_lacp = 0;
+		}
 
-		bond_fill_ifslave(slave, &binfo.slave);
-		bond_fill_ifbond(slave->bond, &binfo.master);
-		netdev_bonding_info_change(slave->dev, &binfo);
 		rtnl_unlock();
 	} else {
 		queue_delayed_work(slave->bond->wq, &slave->notify_work, 1);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index b61fb1aa3a56..38d37ea2382c 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -170,7 +170,8 @@ struct slave {
 	       inactive:1, /* indicates inactive slave */
 	       rx_disabled:1, /* indicates whether slave's Rx is disabled */
 	       should_notify:1, /* indicates whether the state changed */
-	       should_notify_link:1; /* indicates whether the link changed */
+	       should_notify_link:1, /* indicates whether the link changed */
+	       should_notify_lacp:1; /* indicates whether the lacp state changed */
 	u8     duplex;
 	u32    original_mtu;
 	u32    link_failure_count;
@@ -641,11 +642,10 @@ static inline void bond_slave_link_notify(struct bonding *bond)
 	struct slave *tmp;
 
 	bond_for_each_slave(bond, tmp, iter) {
-		if (tmp->should_notify_link) {
+		if (tmp->should_notify_link || tmp->should_notify_lacp)
 			bond_queue_slave_event(tmp);
+		if (tmp->should_notify_link)
 			bond_lower_state_changed(tmp);
-			tmp->should_notify_link = 0;
-		}
 	}
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eabfc8290f5e..4507bb8d5264 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4116,6 +4116,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
 	rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
 			   NULL, 0, portid, nlh);
 }
+EXPORT_SYMBOL(rtmsg_ifinfo);
 
 void rtmsg_ifinfo_newnet(int type, struct net_device *dev, unsigned int change,
 			 gfp_t flags, int *new_nsid, int new_ifindex)
Nikolay Aleksandrov June 27, 2024, 8:29 a.m. UTC | #6
On 27/06/2024 11:26, Hangbin Liu wrote:
> On Wed, Jun 26, 2024 at 05:06:00PM -0700, Jay Vosburgh wrote:
>>> Hits:
>>>
>>> RTNL: assertion failed at net/core/rtnetlink.c (1823)
> 
> Thanks for this hits...
> 
>>>
>>> On two selftests. Please run the selftests on a debug kernel..
> 
> OK, I will try run my tests on debug kernel in future.
> 
>>
>> 	Oh, I forgot about needing RTNL.
>>

+1 & facepalm, completely forgot it was running without rtnl

>> 	We cannot simply acquire RTNL in ad_mux_machine(), as the
>> bond->mode_lock is already held, and the lock ordering must be RTNL
>> first, then mode_lock, lest we deadlock.
>>
>> 	Hangbin, I'd suggest you look at how bond_netdev_notify_work()
>> complies with the lock ordering (basically, doing the actual work out of
>> line in a workqueue event), or how the "should_notify" flag is used in
>> bond_3ad_state_machine_handler().  The first is more complicated, but
>> won't skip events; the second may miss intermediate state transitions if
>> it cannot acquire RTNL and has to delay the notification.
> 
> I think the administer doesn't want to loose the state change info. So how
> about something like:
> 

You can (and will) miss events with the below code. It is kind of best effort,
but if the notification is not run before the next state change, you will
lose the intermediate changes.

> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index c6807e473ab7..046f11c5c47a 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -1185,6 +1185,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
>  		default:
>  			break;
>  		}
> +
> +		port->slave->should_notify_lacp = 1;
>  	}
>  }
>  
> @@ -2527,6 +2529,9 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>  		bond_slave_state_notify(bond);
>  		rtnl_unlock();
>  	}
> +
> +	/* Notify the mux state changes */
> +	bond_slave_link_notify(bond);
>  	queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
>  }
>  
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 3c3fcce4acd4..db8f2fb613df 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1748,11 +1748,19 @@ static void bond_netdev_notify_work(struct work_struct *_work)
>  					   notify_work.work);
>  
>  	if (rtnl_trylock()) {
> -		struct netdev_bonding_info binfo;
> +		if (slave->should_notify_link) {
> +			struct netdev_bonding_info binfo;
> +			bond_fill_ifslave(slave, &binfo.slave);
> +			bond_fill_ifbond(slave->bond, &binfo.master);
> +			netdev_bonding_info_change(slave->dev, &binfo);
> +			slave->should_notify_link = 0;
> +		}
> +
> +		if (slave->should_notify_lacp) {
> +			rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_KERNEL, 0, NULL);
> +			slave->should_notify_lacp = 0;
> +		}
>  
> -		bond_fill_ifslave(slave, &binfo.slave);
> -		bond_fill_ifbond(slave->bond, &binfo.master);
> -		netdev_bonding_info_change(slave->dev, &binfo);
>  		rtnl_unlock();
>  	} else {
>  		queue_delayed_work(slave->bond->wq, &slave->notify_work, 1);
> diff --git a/include/net/bonding.h b/include/net/bonding.h
> index b61fb1aa3a56..38d37ea2382c 100644
> --- a/include/net/bonding.h
> +++ b/include/net/bonding.h
> @@ -170,7 +170,8 @@ struct slave {
>  	       inactive:1, /* indicates inactive slave */
>  	       rx_disabled:1, /* indicates whether slave's Rx is disabled */
>  	       should_notify:1, /* indicates whether the state changed */
> -	       should_notify_link:1; /* indicates whether the link changed */
> +	       should_notify_link:1, /* indicates whether the link changed */
> +	       should_notify_lacp:1; /* indicates whether the lacp state changed */
>  	u8     duplex;
>  	u32    original_mtu;
>  	u32    link_failure_count;
> @@ -641,11 +642,10 @@ static inline void bond_slave_link_notify(struct bonding *bond)
>  	struct slave *tmp;
>  
>  	bond_for_each_slave(bond, tmp, iter) {
> -		if (tmp->should_notify_link) {
> +		if (tmp->should_notify_link || tmp->should_notify_lacp)
>  			bond_queue_slave_event(tmp);
> +		if (tmp->should_notify_link)
>  			bond_lower_state_changed(tmp);
> -			tmp->should_notify_link = 0;
> -		}
>  	}
>  }
>  
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index eabfc8290f5e..4507bb8d5264 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4116,6 +4116,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
>  	rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
>  			   NULL, 0, portid, nlh);
>  }
> +EXPORT_SYMBOL(rtmsg_ifinfo);
>  
>  void rtmsg_ifinfo_newnet(int type, struct net_device *dev, unsigned int change,
>  			 gfp_t flags, int *new_nsid, int new_ifindex)
Hangbin Liu June 27, 2024, 10:05 a.m. UTC | #7
On Thu, Jun 27, 2024 at 11:29:21AM +0300, Nikolay Aleksandrov wrote:
> On 27/06/2024 11:26, Hangbin Liu wrote:
> > On Wed, Jun 26, 2024 at 05:06:00PM -0700, Jay Vosburgh wrote:
> >>> Hits:
> >>>
> >>> RTNL: assertion failed at net/core/rtnetlink.c (1823)
> > 
> > Thanks for this hits...
> > 
> >>>
> >>> On two selftests. Please run the selftests on a debug kernel..
> > 
> > OK, I will try run my tests on debug kernel in future.
> > 
> >>
> >> 	Oh, I forgot about needing RTNL.
> >>
> 
> +1 & facepalm, completely forgot it was running without rtnl
> 
> >> 	We cannot simply acquire RTNL in ad_mux_machine(), as the
> >> bond->mode_lock is already held, and the lock ordering must be RTNL
> >> first, then mode_lock, lest we deadlock.
> >>
> >> 	Hangbin, I'd suggest you look at how bond_netdev_notify_work()
> >> complies with the lock ordering (basically, doing the actual work out of
> >> line in a workqueue event), or how the "should_notify" flag is used in
> >> bond_3ad_state_machine_handler().  The first is more complicated, but
> >> won't skip events; the second may miss intermediate state transitions if
> >> it cannot acquire RTNL and has to delay the notification.
> > 
> > I think the administer doesn't want to loose the state change info. So how
> > about something like:
> > 
> 
> You can (and will) miss events with the below code. It is kind of best effort,
> but if the notification is not run before the next state change, you will
> lose the intermediate changes.

Yes, but at least the admin could get the latest state. With the following
code the admin may not get the latest update if lock rtnl failed.

        if (should_notify_rtnl && rtnl_trylock()) {
                bond_slave_lacp_notify(bond);
                rtnl_unlock();
	}

Thanks
Hangbin
Nikolay Aleksandrov June 27, 2024, 10:33 a.m. UTC | #8
On 27/06/2024 13:05, Hangbin Liu wrote:
> On Thu, Jun 27, 2024 at 11:29:21AM +0300, Nikolay Aleksandrov wrote:
>> On 27/06/2024 11:26, Hangbin Liu wrote:
>>> On Wed, Jun 26, 2024 at 05:06:00PM -0700, Jay Vosburgh wrote:
>>>>> Hits:
>>>>>
>>>>> RTNL: assertion failed at net/core/rtnetlink.c (1823)
>>>
>>> Thanks for this hits...
>>>
>>>>>
>>>>> On two selftests. Please run the selftests on a debug kernel..
>>>
>>> OK, I will try run my tests on debug kernel in future.
>>>
>>>>
>>>> 	Oh, I forgot about needing RTNL.
>>>>
>>
>> +1 & facepalm, completely forgot it was running without rtnl
>>
>>>> 	We cannot simply acquire RTNL in ad_mux_machine(), as the
>>>> bond->mode_lock is already held, and the lock ordering must be RTNL
>>>> first, then mode_lock, lest we deadlock.
>>>>
>>>> 	Hangbin, I'd suggest you look at how bond_netdev_notify_work()
>>>> complies with the lock ordering (basically, doing the actual work out of
>>>> line in a workqueue event), or how the "should_notify" flag is used in
>>>> bond_3ad_state_machine_handler().  The first is more complicated, but
>>>> won't skip events; the second may miss intermediate state transitions if
>>>> it cannot acquire RTNL and has to delay the notification.
>>>
>>> I think the administer doesn't want to loose the state change info. So how
>>> about something like:
>>>
>>
>> You can (and will) miss events with the below code. It is kind of best effort,
>> but if the notification is not run before the next state change, you will
>> lose the intermediate changes.
> 
> Yes, but at least the admin could get the latest state. With the following
> code the admin may not get the latest update if lock rtnl failed.
> 
>         if (should_notify_rtnl && rtnl_trylock()) {
>                 bond_slave_lacp_notify(bond);
>                 rtnl_unlock();
> 	}
> 
> Thanks
> Hangbin

Well, you mentioned administrators want to see the state changes, please
better clarify the exact end goal. Note that technically may even not be
the last state as the state change itself happens in parallel (different
locks) and any update could be delayed depending on rtnl availability
and workqueue re-scheduling. But sure, they will get some update at some point. :)

It all depends on what are the requirements.

An uglier but lockless alternative would be to poll the slave's sysfs oper state,
that doesn't require any locks and would be up-to-date.

Cheers,
 Nik
Hangbin Liu June 27, 2024, 1:17 p.m. UTC | #9
On Thu, Jun 27, 2024 at 01:33:10PM +0300, Nikolay Aleksandrov wrote:
> > Yes, but at least the admin could get the latest state. With the following
> > code the admin may not get the latest update if lock rtnl failed.
> > 
> >         if (should_notify_rtnl && rtnl_trylock()) {
> >                 bond_slave_lacp_notify(bond);
> >                 rtnl_unlock();
> > 	}
> > 
> Well, you mentioned administrators want to see the state changes, please
> better clarify the exact end goal. Note that technically may even not be
> the last state as the state change itself happens in parallel (different
> locks) and any update could be delayed depending on rtnl availability
> and workqueue re-scheduling. But sure, they will get some update at some point. :)

Ah.. Yes, that's a sad fact :(
> 
> It all depends on what are the requirements.
> 
> An uglier but lockless alternative would be to poll the slave's sysfs oper state,
> that doesn't require any locks and would be up-to-date.

Hmm, that's a workaround, but the admin need to poll the state frequently as
they don't know when the state will change.

Hi Jay, are you OK to add this sysfs in bonding?

Thanks
Hangbin
Nikolay Aleksandrov June 27, 2024, 2:12 p.m. UTC | #10
On 27/06/2024 16:17, Hangbin Liu wrote:
> On Thu, Jun 27, 2024 at 01:33:10PM +0300, Nikolay Aleksandrov wrote:
>>> Yes, but at least the admin could get the latest state. With the following
>>> code the admin may not get the latest update if lock rtnl failed.
>>>
>>>         if (should_notify_rtnl && rtnl_trylock()) {
>>>                 bond_slave_lacp_notify(bond);
>>>                 rtnl_unlock();
>>> 	}
>>>
>> Well, you mentioned administrators want to see the state changes, please
>> better clarify the exact end goal. Note that technically may even not be
>> the last state as the state change itself happens in parallel (different
>> locks) and any update could be delayed depending on rtnl availability
>> and workqueue re-scheduling. But sure, they will get some update at some point. :)
> 
> Ah.. Yes, that's a sad fact :(
>>
>> It all depends on what are the requirements.
>>
>> An uglier but lockless alternative would be to poll the slave's sysfs oper state,
>> that doesn't require any locks and would be up-to-date.
> 
> Hmm, that's a workaround, but the admin need to poll the state frequently as
> they don't know when the state will change.
> 

Oh wait, that wasn't what I was proposing, I was thinking about the port's oper state
which is already available via a sysfs attribute. Generally sysfs is getting deprecated.

> Hi Jay, are you OK to add this sysfs in bonding?
> 
> Thanks
> Hangbin
Jay Vosburgh June 27, 2024, 2:24 p.m. UTC | #11
Hangbin Liu <liuhangbin@gmail.com> wrote:

>On Thu, Jun 27, 2024 at 01:33:10PM +0300, Nikolay Aleksandrov wrote:
>> > Yes, but at least the admin could get the latest state. With the following
>> > code the admin may not get the latest update if lock rtnl failed.
>> > 
>> >         if (should_notify_rtnl && rtnl_trylock()) {
>> >                 bond_slave_lacp_notify(bond);
>> >                 rtnl_unlock();
>> > 	}
>> > 
>> Well, you mentioned administrators want to see the state changes, please
>> better clarify the exact end goal. Note that technically may even not be
>> the last state as the state change itself happens in parallel (different
>> locks) and any update could be delayed depending on rtnl availability
>> and workqueue re-scheduling. But sure, they will get some update at some point. :)
>
>Ah.. Yes, that's a sad fact :(

	There are basically two paths that will change the LACP state
that's passed up via netlink (the aggregator ID, and actor and partner
oper port states): bond_3ad_state_machine_handler(), or incoming
LACPDUs, which call into ad_rx_machine().  Administrative changes to the
bond will do it too, like adding or removing interfaces, but those
originate in user space and aren't happening asynchronously.

	If you want (almost) absolute reliability in communicating every
state change for the state machine and LACPDU processing, I think you'd
have to (a) create an object with the changed state, (b) queue it
somewhere, then (c) call a workqueue event to process that queue out of
line.

>> It all depends on what are the requirements.
>> 
>> An uglier but lockless alternative would be to poll the slave's sysfs oper state,
>> that doesn't require any locks and would be up-to-date.
>
>Hmm, that's a workaround, but the admin need to poll the state frequently as
>they don't know when the state will change.
>
>Hi Jay, are you OK to add this sysfs in bonding?

	I think what Nik is proposing is for your userspace to poll the
/sys/class/net/${DEV}/operstate.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Hangbin Liu June 28, 2024, 3:10 a.m. UTC | #12
On Thu, Jun 27, 2024 at 07:24:10AM -0700, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@gmail.com> wrote:
> >Ah.. Yes, that's a sad fact :(
> 
> 	There are basically two paths that will change the LACP state
> that's passed up via netlink (the aggregator ID, and actor and partner
> oper port states): bond_3ad_state_machine_handler(), or incoming
> LACPDUs, which call into ad_rx_machine().  Administrative changes to the

Ah, thanks, I didn't notice this. I will also enable lacp notify
in ad_rx_machine().

> bond will do it too, like adding or removing interfaces, but those
> originate in user space and aren't happening asynchronously.
> 
> 	If you want (almost) absolute reliability in communicating every
> state change for the state machine and LACPDU processing, I think you'd
> have to (a) create an object with the changed state, (b) queue it
> somewhere, then (c) call a workqueue event to process that queue out of
> line.

Hmm... This looks too complex. If we store all the states. A frequent flashing
may consume the memory. If we made a limit for the queue, we may still loosing
some state changes.

I'm not sure which way is better.

> 
> >> It all depends on what are the requirements.
> >> 
> >> An uglier but lockless alternative would be to poll the slave's sysfs oper state,
> >> that doesn't require any locks and would be up-to-date.
> >
> >Hmm, that's a workaround, but the admin need to poll the state frequently as
> >they don't know when the state will change.
> >
> >Hi Jay, are you OK to add this sysfs in bonding?
> 
> 	I think what Nik is proposing is for your userspace to poll the
> /sys/class/net/${DEV}/operstate.

OK. There are 2 scenarios I got.

1) the local user want to get the local/partner state and make sure not
send pkts before they are in DISTRIBUTING state to avoid pkts drop, Or vice
versa. Only checking link operstate or up/down status is not enough.

2) the admin want to get the switch/partner status via LACP status incase
the switch is crashed.

Do you have any suggestion for the implementation?

Thanks
Hangbin
Nikolay Aleksandrov June 28, 2024, 7:04 a.m. UTC | #13
On 28/06/2024 06:10, Hangbin Liu wrote:
> On Thu, Jun 27, 2024 at 07:24:10AM -0700, Jay Vosburgh wrote:
>> Hangbin Liu <liuhangbin@gmail.com> wrote:
>>> Ah.. Yes, that's a sad fact :(
>>
>> 	There are basically two paths that will change the LACP state
>> that's passed up via netlink (the aggregator ID, and actor and partner
>> oper port states): bond_3ad_state_machine_handler(), or incoming
>> LACPDUs, which call into ad_rx_machine().  Administrative changes to the
> 
> Ah, thanks, I didn't notice this. I will also enable lacp notify
> in ad_rx_machine().
> 
>> bond will do it too, like adding or removing interfaces, but those
>> originate in user space and aren't happening asynchronously.
>>
>> 	If you want (almost) absolute reliability in communicating every
>> state change for the state machine and LACPDU processing, I think you'd
>> have to (a) create an object with the changed state, (b) queue it
>> somewhere, then (c) call a workqueue event to process that queue out of
>> line.
> 
> Hmm... This looks too complex. If we store all the states. A frequent flashing
> may consume the memory. If we made a limit for the queue, we may still loosing
> some state changes.
> 
> I'm not sure which way is better.
> 
>>
>>>> It all depends on what are the requirements.
>>>>
>>>> An uglier but lockless alternative would be to poll the slave's sysfs oper state,
>>>> that doesn't require any locks and would be up-to-date.
>>>
>>> Hmm, that's a workaround, but the admin need to poll the state frequently as
>>> they don't know when the state will change.
>>>
>>> Hi Jay, are you OK to add this sysfs in bonding?
>>
>> 	I think what Nik is proposing is for your userspace to poll the
>> /sys/class/net/${DEV}/operstate.
> 

Actually I was talking about:
 /sys/class/net/<bond port>/bonding_slave/ad_actor_oper_port_state
 /sys/class/net/<bond port>/bonding_slave/ad_partner_oper_port_state
etc

Wouldn't these work for you?

> OK. There are 2 scenarios I got.
> 
> 1) the local user want to get the local/partner state and make sure not
> send pkts before they are in DISTRIBUTING state to avoid pkts drop, Or vice
> versa. Only checking link operstate or up/down status is not enough.
> 
> 2) the admin want to get the switch/partner status via LACP status incase
> the switch is crashed.
> 
> Do you have any suggestion for the implementation?
> 
> Thanks
> Hangbin
Nikolay Aleksandrov June 28, 2024, 7:22 a.m. UTC | #14
On 28/06/2024 10:04, Nikolay Aleksandrov wrote:
> On 28/06/2024 06:10, Hangbin Liu wrote:
>> On Thu, Jun 27, 2024 at 07:24:10AM -0700, Jay Vosburgh wrote:
>>> Hangbin Liu <liuhangbin@gmail.com> wrote:
>>>> Ah.. Yes, that's a sad fact :(
>>>
>>> 	There are basically two paths that will change the LACP state
>>> that's passed up via netlink (the aggregator ID, and actor and partner
>>> oper port states): bond_3ad_state_machine_handler(), or incoming
>>> LACPDUs, which call into ad_rx_machine().  Administrative changes to the
>>
>> Ah, thanks, I didn't notice this. I will also enable lacp notify
>> in ad_rx_machine().
>>
>>> bond will do it too, like adding or removing interfaces, but those
>>> originate in user space and aren't happening asynchronously.
>>>
>>> 	If you want (almost) absolute reliability in communicating every
>>> state change for the state machine and LACPDU processing, I think you'd
>>> have to (a) create an object with the changed state, (b) queue it
>>> somewhere, then (c) call a workqueue event to process that queue out of
>>> line.
>>
>> Hmm... This looks too complex. If we store all the states. A frequent flashing
>> may consume the memory. If we made a limit for the queue, we may still loosing
>> some state changes.
>>
>> I'm not sure which way is better.
>>
>>>
>>>>> It all depends on what are the requirements.
>>>>>
>>>>> An uglier but lockless alternative would be to poll the slave's sysfs oper state,
>>>>> that doesn't require any locks and would be up-to-date.
>>>>
>>>> Hmm, that's a workaround, but the admin need to poll the state frequently as
>>>> they don't know when the state will change.
>>>>
>>>> Hi Jay, are you OK to add this sysfs in bonding?
>>>
>>> 	I think what Nik is proposing is for your userspace to poll the
>>> /sys/class/net/${DEV}/operstate.
>>
> 
> Actually I was talking about:
>  /sys/class/net/<bond port>/bonding_slave/ad_actor_oper_port_state
>  /sys/class/net/<bond port>/bonding_slave/ad_partner_oper_port_state
> etc
> 
> Wouldn't these work for you?
> 

But it gets much more complicated, I guess it will be easier to read the
proc bond file with all the LACP information. That is under RCU only as
well.

>> OK. There are 2 scenarios I got.
>>
>> 1) the local user want to get the local/partner state and make sure not
>> send pkts before they are in DISTRIBUTING state to avoid pkts drop, Or vice
>> versa. Only checking link operstate or up/down status is not enough.
>>
>> 2) the admin want to get the switch/partner status via LACP status incase
>> the switch is crashed.
>>
>> Do you have any suggestion for the implementation?
>>
>> Thanks
>> Hangbin
>
Hangbin Liu June 28, 2024, 9:55 a.m. UTC | #15
Hi Nikolay,
On Fri, Jun 28, 2024 at 10:22:25AM +0300, Nikolay Aleksandrov wrote:
> > Actually I was talking about:
> >  /sys/class/net/<bond port>/bonding_slave/ad_actor_oper_port_state
> >  /sys/class/net/<bond port>/bonding_slave/ad_partner_oper_port_state
> > etc
> > 
> > Wouldn't these work for you?
> > 
> 
> But it gets much more complicated, I guess it will be easier to read the
> proc bond file with all the LACP information. That is under RCU only as
> well.

Good question. The monitor application want a more elegant/general way
to deal with the LACP state and do other network reconfigurations.
Here is the requirement I got from customer.

1) As a server administrator, I want ip monitor to show state change events
   related to LACP bonds so that I can react quickly to network reconfigurations.
2) As a network monitoring application developer, I want my application to be
   notified about LACP bond operational state changes without having to
   poll /proc/net/bonding/<bond> and parse its output so that it can trigger
   predefined failover remediation policies.
3) As a server administrator, I want my LACP bond monitoring application to
   receive a Netlink-based notification whenever the number of member
   interfaces is reduced so that the operations support system can provision
   a member interface replacement.

What I understand is the user/admin need to know the latest stable state so
they can do some other network configuration based on the status. Losing
a middle state notification during fast changes is acceptable.

> Well, you mentioned administrators want to see the state changes, please
> better clarify the exact end goal. Note that technically may even not be
> the last state as the state change itself happens in parallel (different
> locks) and any update could be delayed depending on rtnl availability
> and workqueue re-scheduling. But sure, they will get some update at some point. :)

Would you please help explain why we may not get the latest state? From what
I understand:

1) State A -> B, queue notify
       rtnl_trylock, fail, queue again
2) State B -> C, queue notify
      rtnl_trylock, success, post current state C
3) State C -> D, queue notify
      rtnl_trylock, fail, queue again
4) State D -> A, queue notify
      rtnl_trylock, fail, queue again
      rtnl_trylock, fail, queue again
      rtnl_trylock, success, post current state A

So how could the step 3) state send but step 4) state not send?

BTW, in my code, I should set the should_notify_lacp = 0 first before sending
ifinfo message. So that even the should_notify_lacp = 1 in ad_mux_machine()
is over written here, it still send the latest status.

> +
> +             if (slave->should_notify_lacp) {
> +                     slave->should_notify_lacp = 0;
> +                     rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_KERNEL, 0, NULL);
> +             }

The side effect is that we may send 2 same latest lacp status(the
should_notify_lacp is over written to 1 and queue again), which should
be OK.

Thanks
Hangbin
Jay Vosburgh June 28, 2024, 11:36 p.m. UTC | #16
Hangbin Liu <liuhangbin@gmail.com> wrote:

>Hi Nikolay,
>On Fri, Jun 28, 2024 at 10:22:25AM +0300, Nikolay Aleksandrov wrote:
>> > Actually I was talking about:
>> >  /sys/class/net/<bond port>/bonding_slave/ad_actor_oper_port_state
>> >  /sys/class/net/<bond port>/bonding_slave/ad_partner_oper_port_state
>> > etc
>> > 
>> > Wouldn't these work for you?
>> > 
>> 
>> But it gets much more complicated, I guess it will be easier to read the
>> proc bond file with all the LACP information. That is under RCU only as
>> well.
>
>Good question. The monitor application want a more elegant/general way
>to deal with the LACP state and do other network reconfigurations.
>Here is the requirement I got from customer.
>
>1) As a server administrator, I want ip monitor to show state change events
>   related to LACP bonds so that I can react quickly to network reconfigurations.
>2) As a network monitoring application developer, I want my application to be
>   notified about LACP bond operational state changes without having to
>   poll /proc/net/bonding/<bond> and parse its output so that it can trigger
>   predefined failover remediation policies.
>3) As a server administrator, I want my LACP bond monitoring application to
>   receive a Netlink-based notification whenever the number of member
>   interfaces is reduced so that the operations support system can provision
>   a member interface replacement.

	What notifications are they not getting that they want?  Or is
it that events happen but lack some information they want?

	Currently, the end of bond_3ad_state_machine_handler() will send
notifications via bond_slave_state_notify() if the state of any member
of the bond has changed (via the struct slave.should_notify field).

	The notifications here come from bond_lower_state_changed(),
which propagates link up/down and tx_enabled (active-ness), but not any
LACP specifics.  These are sent as NETDEV_CHANGELOWERSTATE events, which
rtnetlink_event() handles, so they should be visible to ip monitor.

>What I understand is the user/admin need to know the latest stable state so
>they can do some other network configuration based on the status. Losing
>a middle state notification during fast changes is acceptable.

	This is a much simpler problem to solve.

>> Well, you mentioned administrators want to see the state changes, please
>> better clarify the exact end goal. Note that technically may even not be
>> the last state as the state change itself happens in parallel (different
>> locks) and any update could be delayed depending on rtnl availability
>> and workqueue re-scheduling. But sure, they will get some update at some point. :)
>
>Would you please help explain why we may not get the latest state? From what
>I understand:
>
>1) State A -> B, queue notify
>       rtnl_trylock, fail, queue again
>2) State B -> C, queue notify
>      rtnl_trylock, success, post current state C
>3) State C -> D, queue notify
>      rtnl_trylock, fail, queue again
>4) State D -> A, queue notify
>      rtnl_trylock, fail, queue again
>      rtnl_trylock, fail, queue again
>      rtnl_trylock, success, post current state A
>
>So how could the step 3) state send but step 4) state not send?

	I'm going to speculate here that the scenario envisioned would
be something like CPU A is in the midst of generating an event at the
end of the state machine, but CPU B could be processing a LACPDU
simultaneously-ish.  The CPU A event is sent, but the CPU B event is
delayed due to RTNL contention.  In this scenario, the last event seen
in user space is CPU A, but the actual state has moved on to that set by
CPU B, whose notification will be received eventually.

>BTW, in my code, I should set the should_notify_lacp = 0 first before sending
>ifinfo message. So that even the should_notify_lacp = 1 in ad_mux_machine()
>is over written here, it still send the latest status.
>
>> +
>> +             if (slave->should_notify_lacp) {
>> +                     slave->should_notify_lacp = 0;
>> +                     rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_KERNEL, 0, NULL);
>> +             }
>
>The side effect is that we may send 2 same latest lacp status(the
>should_notify_lacp is over written to 1 and queue again), which should
>be OK.

	Looking at the current notifications in bonding, I wonder if it
would be sufficient to add the desired information to what
bond_lower_state_changed() sends, rather than trying to shoehorn in
another rtnl_trylock() gizmo.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c6807e473ab7..b57c5670b31a 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -11,6 +11,7 @@ 
 #include <linux/etherdevice.h>
 #include <linux/if_bonding.h>
 #include <linux/pkt_sched.h>
+#include <linux/rtnetlink.h>
 #include <net/net_namespace.h>
 #include <net/bonding.h>
 #include <net/bond_3ad.h>
@@ -1185,6 +1186,8 @@  static void ad_mux_machine(struct port *port, bool *update_slave_arr)
 		default:
 			break;
 		}
+
+		rtmsg_ifinfo(RTM_NEWLINK, port->slave->dev, 0, GFP_ATOMIC, 0, NULL);
 	}
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eabfc8290f5e..4507bb8d5264 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4116,6 +4116,7 @@  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
 	rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
 			   NULL, 0, portid, nlh);
 }
+EXPORT_SYMBOL(rtmsg_ifinfo);
 
 void rtmsg_ifinfo_newnet(int type, struct net_device *dev, unsigned int change,
 			 gfp_t flags, int *new_nsid, int new_ifindex)