Message ID | 20241205171248.3958156-1-gnaaman@drivenets.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v2] Do not invoke addrconf_verify_rtnl unnecessarily | expand |
On 12/5/24 18:12, Gilad Naaman wrote: > Do not invoke costly `addrconf_verify_rtnl` if the added address > wouldn't need it, or affect the delayed_work timer. > > For new/modified addresses, call "verify" only if added address has an > expiration, or if any temporary address was created. > > This is done to account for a case where the new expiration time might > be sooner than the current delayed_work's expiration. > > For deleted addresses, avoid calling verify at all: > > If the address being deleted is not perishable, and thus does not affect > the delayed_work expiration, there is not point in going over the entire > table. > > If the address IS perishable, but is not the soonest-to-be-expired > address, calling "verify" would not change the expiration, and would be > a very expensive nop. > > If the address IS perishable, and IS the soonest-to-be-expired address, > calling or not-calling "verify" for a single address deletion is > equivalent in cost. This last statement is not obvious to me, could you please expand the reasoning? > > But calling "verify" immediately will result in a performance hit when > deleting many addresses. Since this is about (control plane) performances, please include the relevant test details (or even better, please add a small/fast self-test covering the use-case). > > Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> > --- > net/ipv6/addrconf.c | 34 +++++++++++++++++++++++----------- > 1 file changed, 23 insertions(+), 11 deletions(-) > > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c > index c489a1e6aec9..893502787554 100644 > --- a/net/ipv6/addrconf.c > +++ b/net/ipv6/addrconf.c > @@ -310,6 +310,16 @@ static inline bool addrconf_link_ready(const struct net_device *dev) > return netif_oper_up(dev) && !qdisc_tx_is_noop(dev); > } > > +static inline bool addrconf_perishable(int ifa_flags, u32 prefered_lft) Please, do not use 'inline' in c files. > +{ > + /* When setting preferred_lft to a value not zero or > + * infinity, while valid_lft is infinity > + * IFA_F_PERMANENT has a non-infinity life time. > + */ > + return !((ifa_flags & IFA_F_PERMANENT) && > + (prefered_lft == INFINITY_LIFE_TIME)); Minor nit: the intentation is wrong should be: return !((ifa_flags & IFA_F_PERMANENT) && (prefered_lft == INFINITY_LIFE_TIME)); > +} > + > static void addrconf_del_rs_timer(struct inet6_dev *idev) > { > if (del_timer(&idev->rs_timer)) > @@ -3090,8 +3100,7 @@ static int inet6_addr_add(struct net *net, int ifindex, > */ > if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD))) > ipv6_ifa_notify(0, ifp); > - /* > - * Note that section 3.1 of RFC 4429 indicates > + /* Note that section 3.1 of RFC 4429 indicates > * that the Optimistic flag should not be set for > * manually configured addresses > */ > @@ -3100,7 +3109,14 @@ static int inet6_addr_add(struct net *net, int ifindex, > manage_tempaddrs(idev, ifp, cfg->valid_lft, > cfg->preferred_lft, true, jiffies); > in6_ifa_put(ifp); > - addrconf_verify_rtnl(net); > + > + /* Verify only if it's possible that adding this address > + * may modify the worker expiration time. > + */ > + if ((cfg->ifa_flags & IFA_F_MANAGETEMPADDR) || > + addrconf_perishable(cfg->ifa_flags, cfg->preferred_lft)) > + addrconf_verify_rtnl(net); > + > return 0; > } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { > ipv6_mc_config(net->ipv6.mc_autojoin_sk, false, > @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags, > (ifp->flags & IFA_F_MANAGETEMPADDR)) > delete_tempaddrs(idev, ifp); > > - addrconf_verify_rtnl(net); With an additional 'addrconf_perishable' check here protecting the (here removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone to unintended side-effects. Thanks, Paolo
On 12/5/24 18:12, Gilad Naaman wrote: > @@ -3090,8 +3100,7 @@ static int inet6_addr_add(struct net *net, int ifindex, > */ > if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD))) > ipv6_ifa_notify(0, ifp); > - /* > - * Note that section 3.1 of RFC 4429 indicates > + /* Note that section 3.1 of RFC 4429 indicates > * that the Optimistic flag should not be set for > * manually configured addresses > */ I almost forgot: the above looks like purely cosmetic/unrelated. Additionally, the 'net' and 'net-next' tree are now accepting the comment format you update above, see 82b8000c28b56b014ce52a1f1581bef4af148681. Please drop this chunk. Thanks, Paolo
> > If the address IS perishable, and IS the soonest-to-be-expired address, > > calling or not-calling "verify" for a single address deletion is > > equivalent in cost. > > This last statement is not obvious to me, could you please expand the > reasoning? Sorry, it does seem a bit vague when I am re-reading it now. What I meant is that calling addrconf_verify_rtnl when no upkeep needs to be done has some cost K (in seconds) which is roughly a function of the total amount of addresses. Let's say you've configured some addresses, 4 of which are perishable: | T0-+- <---We are here | | T1-+- A <---Timer | | | T2-+- B | | | T3-+- C | | | T4-+- D | | | v The timer is scheduled to expire in T1, because this is when address A perishes. If you delete a non-perishable address, running addrconf_verify_rtnl is redundant, since it won't change the fact that the timer expires in T1. If you delete A specifically, which is the cause of scheduling the timer to T1, you have 2 options: 1. Pay K now, in T0, to reschedule the timer to T2 2. Pay nothing now, let the timer expire, pay the K in T1, and then reschedule If we're talking about a deleting A, it seems equivalent in cost and result. Either way, exactly one K is paid, and the time eventually gets rescheduled to T2. If we're talking about deleting an arbitrary address, using option 2 is better, since you do not lose functionaility, but you might be saving some Ks. (If you deleted B, the timer won't be rescheduled anyway) If we're talking about deleting multiple/many address in a short time, option 2 is greatly preferable, since paying K for each address can get costly as the hash-table grows. > > > > But calling "verify" immediately will result in a performance hit when > > deleting many addresses. > > Since this is about (control plane) performances, please include the > relevant test details (or even better, please add a small/fast self-test > covering the use-case). Is it common to add scale-test to selftests? From my limited experience these tend to fail in automation for no good reason, though I feel I may have misunderstood the text in parens. I've added a link to the benchmark below. Regarding the original test case: We're developing a core-router and trying to scale-up to around 12K VLANs. Considering GUA+LLA this means 24K address in this table. In practice it's a bit more than that, due to other interfaces in the same namespace. This makes addrconf_verify_rtnl very very very expensive for us. When initially setting the system up after boot, or when applying big configuration changes, adding addresses quickly slows down, as each added address has to pay for its predecesors. (all of our addresses are static) On the reverse, when the VLANs' parent link goes down, the VLANs go down with it, causing us to pay for a lot of addrconf_verify_rtnl calls, during which rtnl_lock is held for a single long stretch of time. I've ran some perf on an upatched kernel to demonstrate it: https://github.com/gnaaman-dn/perf-addrconf-verify-rtnl Turned out to be 13% of time when creating static addresses, and 18% when flushing them. (In our original bug the VLANs were deleted, it is just easier to perf one iproute command if it's a flush) > > @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags, > > (ifp->flags & IFA_F_MANAGETEMPADDR)) > > delete_tempaddrs(idev, ifp); > > > > - addrconf_verify_rtnl(net); > > With an additional 'addrconf_perishable' check here protecting the (here > removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone > to unintended side-effects. I hope my explanation will be convincing that that is not needed. (or just coherent enough to point out a mistake in my understanding) If not, I will send V3 with this condition added, as in practice most of our addresses are static. Thank you for review, Gilad
On 12/13/24 16:13, Gilad Naaman wrote: >>> If the address IS perishable, and IS the soonest-to-be-expired address, >>> calling or not-calling "verify" for a single address deletion is >>> equivalent in cost. >> >> This last statement is not obvious to me, could you please expand the >> reasoning? > > Sorry, it does seem a bit vague when I am re-reading it now. > > What I meant is that calling addrconf_verify_rtnl when no upkeep needs to be > done has some cost K (in seconds) which is roughly a function of the > total amount of addresses. > > Let's say you've configured some addresses, 4 of which are perishable: > > | > T0-+- <---We are here > | > | > T1-+- A <---Timer > | > | > | > T2-+- B > | > | > | > T3-+- C > | > | > | > T4-+- D > | > | > | > v > > The timer is scheduled to expire in T1, because this is when address A > perishes. > > If you delete a non-perishable address, running addrconf_verify_rtnl is > redundant, since it won't change the fact that the timer expires in T1. > > If you delete A specifically, which is the cause of scheduling the timer > to T1, you have 2 options: > > 1. Pay K now, in T0, to reschedule the timer to T2 > 2. Pay nothing now, let the timer expire, pay the K in T1, and then reschedule It looks like in this specific case option 1 will be cheaper, as it will avoid an unneeded timer expiration. > If we're talking about a deleting A, it seems equivalent in cost and result. > Either way, exactly one K is paid, and the time eventually gets rescheduled to T2. > > If we're talking about deleting an arbitrary address, using option 2 is > better, since you do not lose functionaility, but you might be saving some > Ks. (If you deleted B, the timer won't be rescheduled anyway) > > If we're talking about deleting multiple/many address in a short time, > option 2 is greatly preferable, since paying K for each address can get > costly as the hash-table grows. Makes sense to me. >>> But calling "verify" immediately will result in a performance hit when >>> deleting many addresses. >> >> Since this is about (control plane) performances, please include the >> relevant test details (or even better, please add a small/fast self-test >> covering the use-case). > > Is it common to add scale-test to selftests? AFAIK, not common at all. Note that the argument "so self-test for this kind of thing" is actually a very good argument to add a self-tests. > From my limited experience these tend to fail in automation for no good reason, > though I feel I may have misunderstood the text in parens. > I've added a link to the benchmark below. > > Regarding the original test case: > > We're developing a core-router and trying to scale-up to around 12K VLANs. > Considering GUA+LLA this means 24K address in this table. > In practice it's a bit more than that, due to other interfaces in the same > namespace. > > This makes addrconf_verify_rtnl very very very expensive for us. > When initially setting the system up after boot, or when applying big > configuration changes, > adding addresses quickly slows down, as each added address has to pay > for its predecesors. (all of our addresses are static) > > On the reverse, when the VLANs' parent link goes down, the VLANs > go down with it, causing us to pay for a lot of addrconf_verify_rtnl calls, > during which rtnl_lock is held for a single long stretch of time. > > I've ran some perf on an upatched kernel to demonstrate it: > > https://github.com/gnaaman-dn/perf-addrconf-verify-rtnl > > Turned out to be 13% of time when creating static addresses, and 18% > when flushing them. > > (In our original bug the VLANs were deleted, it is just easier to perf > one iproute command if it's a flush) Nice, so you already have the test infra ready :) >>> @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags, >>> (ifp->flags & IFA_F_MANAGETEMPADDR)) >>> delete_tempaddrs(idev, ifp); >>> >>> - addrconf_verify_rtnl(net); >> >> With an additional 'addrconf_perishable' check here protecting the (here >> removed) addrconf_verify_rtnl(), the patch will be IMHO much less prone >> to unintended side-effects. > > I hope my explanation will be convincing that that is not needed. > (or just coherent enough to point out a mistake in my understanding) > > If not, I will send V3 with this condition added, as in practice most > of our addresses are static. From my PoV, the main trouble is that this kind of change has the potential of breaking things in subtle way. Your explanation above makes sense to me, but I have the gut feeling I'm missing something. A more verbose description will help for future memory. Making the patch more straight-forward will give IMHO additional assurances. A self-test could help tracking/detecting side-effects introduced here. I think at least one of the above is needed, the more the better ;) Note that the more verbose description could come in a form of a Link tag pointing to this conversation. Thanks, Paolo
> >>> But calling "verify" immediately will result in a performance hit when > >>> deleting many addresses. > >> > >> Since this is about (control plane) performances, please include the > >> relevant test details (or even better, please add a small/fast self-test > >> covering the use-case). > > > > Is it common to add scale-test to selftests? > > AFAIK, not common at all. Note that the argument "so self-test for this > kind of thing" is actually a very good argument to add a self-tests. Sorry, I didn't mean to present it as an argument. I wanted a clarification if you inteded me to write a self-test to check functionality, or performance. And if the latter, what are we going to set as the pass-criteria for the test, seeing that the test may run on variety of hardwares/VMs. No objection to writing it, of course. > > (In our original bug the VLANs were deleted, it is just easier to perf > > one iproute command if it's a flush) > > Nice, so you already have the test infra ready :) Wouldn't call it a test infra ^^', I have a script that calls `ip` in a loop. Thanks, Gilad
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index c489a1e6aec9..893502787554 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -310,6 +310,16 @@ static inline bool addrconf_link_ready(const struct net_device *dev) return netif_oper_up(dev) && !qdisc_tx_is_noop(dev); } +static inline bool addrconf_perishable(int ifa_flags, u32 prefered_lft) +{ + /* When setting preferred_lft to a value not zero or + * infinity, while valid_lft is infinity + * IFA_F_PERMANENT has a non-infinity life time. + */ + return !((ifa_flags & IFA_F_PERMANENT) && + (prefered_lft == INFINITY_LIFE_TIME)); +} + static void addrconf_del_rs_timer(struct inet6_dev *idev) { if (del_timer(&idev->rs_timer)) @@ -3090,8 +3100,7 @@ static int inet6_addr_add(struct net *net, int ifindex, */ if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD))) ipv6_ifa_notify(0, ifp); - /* - * Note that section 3.1 of RFC 4429 indicates + /* Note that section 3.1 of RFC 4429 indicates * that the Optimistic flag should not be set for * manually configured addresses */ @@ -3100,7 +3109,14 @@ static int inet6_addr_add(struct net *net, int ifindex, manage_tempaddrs(idev, ifp, cfg->valid_lft, cfg->preferred_lft, true, jiffies); in6_ifa_put(ifp); - addrconf_verify_rtnl(net); + + /* Verify only if it's possible that adding this address + * may modify the worker expiration time. + */ + if ((cfg->ifa_flags & IFA_F_MANAGETEMPADDR) || + addrconf_perishable(cfg->ifa_flags, cfg->preferred_lft)) + addrconf_verify_rtnl(net); + return 0; } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false, @@ -3148,7 +3164,6 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags, (ifp->flags & IFA_F_MANAGETEMPADDR)) delete_tempaddrs(idev, ifp); - addrconf_verify_rtnl(net); if (ipv6_addr_is_multicast(pfx)) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false, pfx, dev->ifindex); @@ -4645,12 +4660,7 @@ static void addrconf_verify_rtnl(struct net *net) hlist_for_each_entry_rcu_bh(ifp, &net->ipv6.inet6_addr_lst[i], addr_lst) { unsigned long age; - /* When setting preferred_lft to a value not zero or - * infinity, while valid_lft is infinity - * IFA_F_PERMANENT has a non-infinity life time. - */ - if ((ifp->flags & IFA_F_PERMANENT) && - (ifp->prefered_lft == INFINITY_LIFE_TIME)) + if (!addrconf_perishable(ifp->flags, ifp->prefered_lft)) continue; spin_lock(&ifp->lock); @@ -4979,7 +4989,9 @@ static int inet6_addr_modify(struct net *net, struct inet6_ifaddr *ifp, jiffies); } - addrconf_verify_rtnl(net); + if (was_managetempaddr || + addrconf_perishable(cfg->ifa_flags, cfg->preferred_lft)) + addrconf_verify_rtnl(net); return 0; }
Do not invoke costly `addrconf_verify_rtnl` if the added address wouldn't need it, or affect the delayed_work timer. For new/modified addresses, call "verify" only if added address has an expiration, or if any temporary address was created. This is done to account for a case where the new expiration time might be sooner than the current delayed_work's expiration. For deleted addresses, avoid calling verify at all: If the address being deleted is not perishable, and thus does not affect the delayed_work expiration, there is not point in going over the entire table. If the address IS perishable, but is not the soonest-to-be-expired address, calling "verify" would not change the expiration, and would be a very expensive nop. If the address IS perishable, and IS the soonest-to-be-expired address, calling or not-calling "verify" for a single address deletion is equivalent in cost. But calling "verify" immediately will result in a performance hit when deleting many addresses. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> --- net/ipv6/addrconf.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-)