diff mbox series

[net-next,v2] Do not invoke addrconf_verify_rtnl unnecessarily

Message ID 20241205171248.3958156-1-gnaaman@drivenets.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] Do not invoke addrconf_verify_rtnl unnecessarily | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 304 this patch: 304
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 70 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline fail Was 0 now: 1
netdev/contest success net-next-2024-12-06--18-00 (tests: 764)

Commit Message

Gilad Naaman Dec. 5, 2024, 5:12 p.m. UTC
Do not invoke costly `addrconf_verify_rtnl` if the added address
wouldn't need it, or affect the delayed_work timer.

For new/modified addresses, call "verify" only if added address has an
expiration, or if any temporary address was created.

This is done to account for a case where the new expiration time might
be sooner than the current delayed_work's expiration.

For deleted addresses, avoid calling verify at all:

If the address being deleted is not perishable, and thus does not affect
the delayed_work expiration, there is not point in going over the entire
table.

If the address IS perishable, but is not the soonest-to-be-expired
address, calling "verify" would not change the expiration, and would be
a very expensive nop.

If the address IS perishable, and IS the soonest-to-be-expired address,
calling or not-calling "verify" for a single address deletion is
equivalent in cost.

But calling "verify" immediately will result in a performance hit when
deleting many addresses.

Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
---
 net/ipv6/addrconf.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

Comments

Paolo Abeni Dec. 10, 2024, 10:04 a.m. UTC | #1
On 12/5/24 18:12, Gilad Naaman wrote:
> Do not invoke costly `addrconf_verify_rtnl` if the added address
> wouldn't need it, or affect the delayed_work timer.
> 
> For new/modified addresses, call "verify" only if added address has an
> expiration, or if any temporary address was created.
> 
> This is done to account for a case where the new expiration time might
> be sooner than the current delayed_work's expiration.
> 
> For deleted addresses, avoid calling verify at all:
> 
> If the address being deleted is not perishable, and thus does not affect
> the delayed_work expiration, there is not point in going over the entire
> table.
> 
> If the address IS perishable, but is not the soonest-to-be-expired
> address, calling "verify" would not change the expiration, and would be
> a very expensive nop.
> 
> If the address IS perishable, and IS the soonest-to-be-expired address,
> calling or not-calling "verify" for a single address deletion is
> equivalent in cost.

This last statement is not obvious to me, could you please expand the
reasoning?

> 
> But calling "verify" immediately will result in a performance hit when
> deleting many addresses.

Since this is about (control plane) performances, please include the
relevant test details (or even better, please add a small/fast self-test
covering the use-case).

> 
> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
> ---
>  net/ipv6/addrconf.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index c489a1e6aec9..893502787554 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -310,6 +310,16 @@ static inline bool addrconf_link_ready(const struct net_device *dev)
>  	return netif_oper_up(dev) && !qdisc_tx_is_noop(dev);
>  }
>  
> +static inline bool addrconf_perishable(int ifa_flags, u32 prefered_lft)

Please, do not use 'inline' in c files.

> +{
> +	/* When setting preferred_lft to a value not zero or
> +	 * infinity, while valid_lft is infinity
> +	 * IFA_F_PERMANENT has a non-infinity life time.
> +	 */
> +	return !((ifa_flags & IFA_F_PERMANENT) &&
> +		(prefered_lft == INFINITY_LIFE_TIME));

Minor nit: the intentation is wrong should be:
		
	return !((ifa_flags & IFA_F_PERMANENT) &&
		 (prefered_lft == INFINITY_LIFE_TIME));
> +}
> +
>  static void addrconf_del_rs_timer(struct inet6_dev *idev)
>  {
>  	if (del_timer(&idev->rs_timer))
> @@ -3090,8 +3100,7 @@ static int inet6_addr_add(struct net *net, int ifindex,
>  		 */
>  		if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD)))
>  			ipv6_ifa_notify(0, ifp);
> -		/*
> -		 * Note that section 3.1 of RFC 4429 indicates
> +		/* Note that section 3.1 of RFC 4429 indicates
>  		 * that the Optimistic flag should not be set for
>  		 * manually configured addresses
>  		 */
> @@ -3100,7 +3109,14 @@ static int inet6_addr_add(struct net *net, int ifindex,
>  			manage_tempaddrs(idev, ifp, cfg->valid_lft,
>  					 cfg->preferred_lft, true, jiffies);
>  		in6_ifa_put(ifp);
> -		addrconf_verify_rtnl(net);
> +
> +		/* Verify only if it's possible that adding this address
> +		 * may modify the worker expiration time.
> +		 */
> +		if ((cfg->ifa_flags & IFA_F_MANAGETEMPADDR) ||
> +		    addrconf_perishable(cfg->ifa_flags, cfg->preferred_lft))
> +			addrconf_verify_rtnl(net);
> +
>  		return 0;
>  	} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
>  		ipv6_mc_config(net->ipv6.mc_autojoin_sk, false,
> @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags,
>  			    (ifp->flags & IFA_F_MANAGETEMPADDR))
>  				delete_tempaddrs(idev, ifp);
>  
> -			addrconf_verify_rtnl(net);

With an additional 'addrconf_perishable' check here protecting the (here
removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone
to unintended side-effects.

Thanks,

Paolo
Paolo Abeni Dec. 10, 2024, 10:50 a.m. UTC | #2
On 12/5/24 18:12, Gilad Naaman wrote:
> @@ -3090,8 +3100,7 @@ static int inet6_addr_add(struct net *net, int ifindex,
>  		 */
>  		if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD)))
>  			ipv6_ifa_notify(0, ifp);
> -		/*
> -		 * Note that section 3.1 of RFC 4429 indicates
> +		/* Note that section 3.1 of RFC 4429 indicates
>  		 * that the Optimistic flag should not be set for
>  		 * manually configured addresses
>  		 */

I almost forgot: the above looks like purely cosmetic/unrelated.
Additionally, the 'net' and 'net-next' tree are now accepting the
comment format you update above, see
82b8000c28b56b014ce52a1f1581bef4af148681.

Please drop this chunk.

Thanks,

Paolo
Gilad Naaman Dec. 13, 2024, 3:13 p.m. UTC | #3
> > If the address IS perishable, and IS the soonest-to-be-expired address,
> > calling or not-calling "verify" for a single address deletion is
> > equivalent in cost.
> 
> This last statement is not obvious to me, could you please expand the
> reasoning?

Sorry, it does seem a bit vague when I am re-reading it now.

What I meant is that calling addrconf_verify_rtnl when no upkeep needs to be
done has some cost K (in seconds) which is roughly a function of the
total amount of addresses.

Let's say you've configured some addresses, 4 of which are perishable:

	   |                
	T0-+- <---We are here
	   |                
	   |                
	T1-+- A <---Timer      
	   |                
	   |                
	   |                
	T2-+- B              
	   |                
	   |                
	   |                
	T3-+- C              
	   |                
	   |                
	   |                
	T4-+- D              
	   |                
	   |                
	   |                
	   v                

The timer is scheduled to expire in T1, because this is when address A
perishes.

If you delete a non-perishable address, running addrconf_verify_rtnl is
redundant, since it won't change the fact that the timer expires in T1.

If you delete A specifically, which is the cause of scheduling the timer
to T1, you have 2 options:

 1. Pay K now, in T0, to reschedule the timer to T2
 2. Pay nothing now, let the timer expire, pay the K in T1, and then reschedule

If we're talking about a deleting A, it seems equivalent in cost and result.
Either way, exactly one K is paid, and the time eventually gets rescheduled to T2.

If we're talking about deleting an arbitrary address, using option 2 is
better, since you do not lose functionaility, but you might be saving some
Ks. (If you deleted B, the timer won't be rescheduled anyway)

If we're talking about deleting multiple/many address in a short time,
option 2 is greatly preferable, since paying K for each address can get
costly as the hash-table grows.

> > 
> > But calling "verify" immediately will result in a performance hit when
> > deleting many addresses.
> 
> Since this is about (control plane) performances, please include the
> relevant test details (or even better, please add a small/fast self-test
> covering the use-case).

Is it common to add scale-test to selftests?
From my limited experience these tend to fail in automation for no good reason,
though I feel I may have misunderstood the text in parens.
I've added a link to the benchmark below.

Regarding the original test case:

We're developing a core-router and trying to scale-up to around 12K VLANs.
Considering GUA+LLA this means 24K address in this table.
In practice it's a bit more than that, due to other interfaces in the same
namespace.

This makes addrconf_verify_rtnl very very very expensive for us.
When initially setting the system up after boot, or when applying big
configuration changes,
adding addresses quickly slows down, as each added address has to pay
for its predecesors. (all of our addresses are static)

On the reverse, when the VLANs' parent link goes down, the VLANs
go down with it, causing us to pay for a lot of addrconf_verify_rtnl calls,
during which rtnl_lock is held for a single long stretch of time.

I've ran some perf on an upatched kernel to demonstrate it:

	https://github.com/gnaaman-dn/perf-addrconf-verify-rtnl

Turned out to be 13% of time when creating static addresses, and 18%
when flushing them.

(In our original bug the VLANs were deleted, it is just easier to perf
one iproute command if it's a flush)


> > @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags,
> >  			    (ifp->flags & IFA_F_MANAGETEMPADDR))
> >  				delete_tempaddrs(idev, ifp);
> >  
> > -			addrconf_verify_rtnl(net);
> 
> With an additional 'addrconf_perishable' check here protecting the (here
> removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone
> to unintended side-effects.

I hope my explanation will be convincing that that is not needed.
(or just coherent enough to point out a mistake in my understanding)

If not, I will send V3 with this condition added, as in practice most
of our addresses are static.

Thank you for review,
Gilad
Paolo Abeni Dec. 16, 2024, 7:52 a.m. UTC | #4
On 12/13/24 16:13, Gilad Naaman wrote:
>>> If the address IS perishable, and IS the soonest-to-be-expired address,
>>> calling or not-calling "verify" for a single address deletion is
>>> equivalent in cost.
>>
>> This last statement is not obvious to me, could you please expand the
>> reasoning?
> 
> Sorry, it does seem a bit vague when I am re-reading it now.
> 
> What I meant is that calling addrconf_verify_rtnl when no upkeep needs to be
> done has some cost K (in seconds) which is roughly a function of the
> total amount of addresses.
> 
> Let's say you've configured some addresses, 4 of which are perishable:
> 
> 	   |                
> 	T0-+- <---We are here
> 	   |                
> 	   |                
> 	T1-+- A <---Timer      
> 	   |                
> 	   |                
> 	   |                
> 	T2-+- B              
> 	   |                
> 	   |                
> 	   |                
> 	T3-+- C              
> 	   |                
> 	   |                
> 	   |                
> 	T4-+- D              
> 	   |                
> 	   |                
> 	   |                
> 	   v                
> 
> The timer is scheduled to expire in T1, because this is when address A
> perishes.
> 
> If you delete a non-perishable address, running addrconf_verify_rtnl is
> redundant, since it won't change the fact that the timer expires in T1.
> 
> If you delete A specifically, which is the cause of scheduling the timer
> to T1, you have 2 options:
> 
>  1. Pay K now, in T0, to reschedule the timer to T2
>  2. Pay nothing now, let the timer expire, pay the K in T1, and then reschedule

It looks like in this specific case option 1 will be cheaper, as it will
avoid an unneeded timer expiration.

> If we're talking about a deleting A, it seems equivalent in cost and result.
> Either way, exactly one K is paid, and the time eventually gets rescheduled to T2.
> 
> If we're talking about deleting an arbitrary address, using option 2 is
> better, since you do not lose functionaility, but you might be saving some
> Ks. (If you deleted B, the timer won't be rescheduled anyway)
> 
> If we're talking about deleting multiple/many address in a short time,
> option 2 is greatly preferable, since paying K for each address can get
> costly as the hash-table grows.

Makes sense to me.

>>> But calling "verify" immediately will result in a performance hit when
>>> deleting many addresses.
>>
>> Since this is about (control plane) performances, please include the
>> relevant test details (or even better, please add a small/fast self-test
>> covering the use-case).
> 
> Is it common to add scale-test to selftests?

AFAIK, not common at all. Note that the argument "so self-test for this
kind of thing" is actually a very good argument to add a self-tests.

> From my limited experience these tend to fail in automation for no good reason,
> though I feel I may have misunderstood the text in parens.
> I've added a link to the benchmark below.
> 
> Regarding the original test case:
> 
> We're developing a core-router and trying to scale-up to around 12K VLANs.
> Considering GUA+LLA this means 24K address in this table.
> In practice it's a bit more than that, due to other interfaces in the same
> namespace.
> 
> This makes addrconf_verify_rtnl very very very expensive for us.
> When initially setting the system up after boot, or when applying big
> configuration changes,
> adding addresses quickly slows down, as each added address has to pay
> for its predecesors. (all of our addresses are static)
> 
> On the reverse, when the VLANs' parent link goes down, the VLANs
> go down with it, causing us to pay for a lot of addrconf_verify_rtnl calls,
> during which rtnl_lock is held for a single long stretch of time.
> 
> I've ran some perf on an upatched kernel to demonstrate it:
> 
> 	https://github.com/gnaaman-dn/perf-addrconf-verify-rtnl
> 
> Turned out to be 13% of time when creating static addresses, and 18%
> when flushing them.
> 
> (In our original bug the VLANs were deleted, it is just easier to perf
> one iproute command if it's a flush)

Nice, so you already have the test infra ready :)

>>> @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags,
>>>  			    (ifp->flags & IFA_F_MANAGETEMPADDR))
>>>  				delete_tempaddrs(idev, ifp);
>>>  
>>> -			addrconf_verify_rtnl(net);
>>
>> With an additional 'addrconf_perishable' check here protecting the (here
>> removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone
>> to unintended side-effects.
> 
> I hope my explanation will be convincing that that is not needed.
> (or just coherent enough to point out a mistake in my understanding)
> 
> If not, I will send V3 with this condition added, as in practice most
> of our addresses are static.

From my PoV, the main trouble is that this kind of change has the
potential of breaking things in subtle way. Your explanation above makes
sense to me, but I have the gut feeling I'm missing something.

A more verbose description will help for future memory.

Making the patch more straight-forward will give IMHO additional assurances.

A self-test could help tracking/detecting side-effects introduced here.

I think at least one of the above is needed, the more the better ;) Note
that the more verbose description could come in a form of a Link tag
pointing to this conversation.

Thanks,

Paolo
Gilad Naaman Dec. 18, 2024, 6:57 a.m. UTC | #5
> >>> But calling "verify" immediately will result in a performance hit when
> >>> deleting many addresses.
> >>
> >> Since this is about (control plane) performances, please include the
> >> relevant test details (or even better, please add a small/fast self-test
> >> covering the use-case).
> > 
> > Is it common to add scale-test to selftests?
> 
> AFAIK, not common at all. Note that the argument "so self-test for this
> kind of thing" is actually a very good argument to add a self-tests.

Sorry, I didn't mean to present it as an argument.
I wanted a clarification if you inteded me to write a self-test to check
functionality, or performance.

And if the latter, what are we going to set as the pass-criteria for the test,
seeing that the test may run on variety of hardwares/VMs.

No objection to writing it, of course.

> > (In our original bug the VLANs were deleted, it is just easier to perf
> > one iproute command if it's a flush)
> 
> Nice, so you already have the test infra ready :)
Wouldn't call it a test infra ^^', I have a script that calls `ip` in a loop.

Thanks,
Gilad
diff mbox series

Patch

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c489a1e6aec9..893502787554 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -310,6 +310,16 @@  static inline bool addrconf_link_ready(const struct net_device *dev)
 	return netif_oper_up(dev) && !qdisc_tx_is_noop(dev);
 }
 
+static inline bool addrconf_perishable(int ifa_flags, u32 prefered_lft)
+{
+	/* When setting preferred_lft to a value not zero or
+	 * infinity, while valid_lft is infinity
+	 * IFA_F_PERMANENT has a non-infinity life time.
+	 */
+	return !((ifa_flags & IFA_F_PERMANENT) &&
+		(prefered_lft == INFINITY_LIFE_TIME));
+}
+
 static void addrconf_del_rs_timer(struct inet6_dev *idev)
 {
 	if (del_timer(&idev->rs_timer))
@@ -3090,8 +3100,7 @@  static int inet6_addr_add(struct net *net, int ifindex,
 		 */
 		if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD)))
 			ipv6_ifa_notify(0, ifp);
-		/*
-		 * Note that section 3.1 of RFC 4429 indicates
+		/* Note that section 3.1 of RFC 4429 indicates
 		 * that the Optimistic flag should not be set for
 		 * manually configured addresses
 		 */
@@ -3100,7 +3109,14 @@  static int inet6_addr_add(struct net *net, int ifindex,
 			manage_tempaddrs(idev, ifp, cfg->valid_lft,
 					 cfg->preferred_lft, true, jiffies);
 		in6_ifa_put(ifp);
-		addrconf_verify_rtnl(net);
+
+		/* Verify only if it's possible that adding this address
+		 * may modify the worker expiration time.
+		 */
+		if ((cfg->ifa_flags & IFA_F_MANAGETEMPADDR) ||
+		    addrconf_perishable(cfg->ifa_flags, cfg->preferred_lft))
+			addrconf_verify_rtnl(net);
+
 		return 0;
 	} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
 		ipv6_mc_config(net->ipv6.mc_autojoin_sk, false,
@@ -3148,7 +3164,6 @@  static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags,
 			    (ifp->flags & IFA_F_MANAGETEMPADDR))
 				delete_tempaddrs(idev, ifp);
 
-			addrconf_verify_rtnl(net);
 			if (ipv6_addr_is_multicast(pfx)) {
 				ipv6_mc_config(net->ipv6.mc_autojoin_sk,
 					       false, pfx, dev->ifindex);
@@ -4645,12 +4660,7 @@  static void addrconf_verify_rtnl(struct net *net)
 		hlist_for_each_entry_rcu_bh(ifp, &net->ipv6.inet6_addr_lst[i], addr_lst) {
 			unsigned long age;
 
-			/* When setting preferred_lft to a value not zero or
-			 * infinity, while valid_lft is infinity
-			 * IFA_F_PERMANENT has a non-infinity life time.
-			 */
-			if ((ifp->flags & IFA_F_PERMANENT) &&
-			    (ifp->prefered_lft == INFINITY_LIFE_TIME))
+			if (!addrconf_perishable(ifp->flags, ifp->prefered_lft))
 				continue;
 
 			spin_lock(&ifp->lock);
@@ -4979,7 +4989,9 @@  static int inet6_addr_modify(struct net *net, struct inet6_ifaddr *ifp,
 					 jiffies);
 	}
 
-	addrconf_verify_rtnl(net);
+	if (was_managetempaddr ||
+	    addrconf_perishable(cfg->ifa_flags, cfg->preferred_lft))
+		addrconf_verify_rtnl(net);
 
 	return 0;
 }