diff mbox series

[net] nexthop: Fix performance regression in nexthop deletion

Message ID 20201016172914.643282-1-idosch@idosch.org (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [net] nexthop: Fix performance regression in nexthop deletion | expand

Commit Message

Ido Schimmel Oct. 16, 2020, 5:29 p.m. UTC
From: Ido Schimmel <idosch@nvidia.com>

While insertion of 16k nexthops all using the same netdev ('dummy10')
takes less than a second, deletion takes about 130 seconds:

# time -p ip -b nexthop.batch
real 0.29
user 0.01
sys 0.15

# time -p ip link set dev dummy10 down
real 131.03
user 0.06
sys 0.52

This is because of repeated calls to synchronize_rcu() whenever a
nexthop is removed from a nexthop group:

# /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
...
    b'finish_task_switch'
    b'schedule'
    b'schedule_timeout'
    b'wait_for_completion'
    b'__wait_rcu_gp'
    b'synchronize_rcu.part.0'
    b'synchronize_rcu'
    b'__remove_nexthop'
    b'remove_nexthop'
    b'nexthop_flush_dev'
    b'nh_netdev_event'
    b'raw_notifier_call_chain'
    b'call_netdevice_notifiers_info'
    b'__dev_notify_flags'
    b'dev_change_flags'
    b'do_setlink'
    b'__rtnl_newlink'
    b'rtnl_newlink'
    b'rtnetlink_rcv_msg'
    b'netlink_rcv_skb'
    b'rtnetlink_rcv'
    b'netlink_unicast'
    b'netlink_sendmsg'
    b'____sys_sendmsg'
    b'___sys_sendmsg'
    b'__sys_sendmsg'
    b'__x64_sys_sendmsg'
    b'do_syscall_64'
    b'entry_SYSCALL_64_after_hwframe'
    -                ip (277)
        126554955

Since nexthops are always deleted under RTNL, synchronize_net() can be
used instead. It will call synchronize_rcu_expedited() which only blocks
for several microseconds as opposed to multiple milliseconds like
synchronize_rcu().

With this patch deletion of 16k nexthops takes less than a second:

# time -p ip link set dev dummy10 down
real 0.12
user 0.00
sys 0.04

Tested with fib_nexthops.sh which includes torture tests that prompted
the initial change:

# ./fib_nexthops.sh
...
Tests passed: 134
Tests failed:   0

Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/ipv4/nexthop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jesse Brandeburg Oct. 16, 2020, 9:46 p.m. UTC | #1
Ido Schimmel wrote:

> From: Ido Schimmel <idosch@nvidia.com>
> 
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
> 
> # time -p ip -b nexthop.batch
> real 0.29
> user 0.01
> sys 0.15
> 
> # time -p ip link set dev dummy10 down
> real 131.03
> user 0.06
> sys 0.52

snip...

> Since nexthops are always deleted under RTNL, synchronize_net() can be
> used instead. It will call synchronize_rcu_expedited() which only blocks
> for several microseconds as opposed to multiple milliseconds like
> synchronize_rcu().
> 
> With this patch deletion of 16k nexthops takes less than a second:
> 
> # time -p ip link set dev dummy10 down
> real 0.12
> user 0.00
> sys 0.04

That's a nice result! Well done! I can't really speak to whether or not
there is any horrible side effect of using synchronize_rcu_expedited(),
but FWIW:

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>


> 
> Tested with fib_nexthops.sh which includes torture tests that prompted
> the initial change:
> 
> # ./fib_nexthops.sh
> ...
> Tests passed: 134
> Tests failed:   0
> 
> Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>

Nice fix, good commit message, thanks!

> ---
>  net/ipv4/nexthop.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 8c0f17c6863c..0dc43ad28eb9 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -845,7 +845,7 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
>  		remove_nh_grp_entry(net, nhge, nlinfo);
>  
>  	/* make sure all see the newly published array before releasing rtnl */
> -	synchronize_rcu();
> +	synchronize_net();
>  }
>  
>  static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
David Ahern Oct. 17, 2020, 4:37 a.m. UTC | #2
On 10/16/20 11:29 AM, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
> 
> # time -p ip -b nexthop.batch
> real 0.29
> user 0.01
> sys 0.15
> 
> # time -p ip link set dev dummy10 down
> real 131.03
> user 0.06
> sys 0.52
> 
> This is because of repeated calls to synchronize_rcu() whenever a
> nexthop is removed from a nexthop group:
> 
> # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
> ...
>     b'finish_task_switch'
>     b'schedule'
>     b'schedule_timeout'
>     b'wait_for_completion'
>     b'__wait_rcu_gp'
>     b'synchronize_rcu.part.0'
>     b'synchronize_rcu'
>     b'__remove_nexthop'
>     b'remove_nexthop'
>     b'nexthop_flush_dev'
>     b'nh_netdev_event'
>     b'raw_notifier_call_chain'
>     b'call_netdevice_notifiers_info'
>     b'__dev_notify_flags'
>     b'dev_change_flags'
>     b'do_setlink'
>     b'__rtnl_newlink'
>     b'rtnl_newlink'
>     b'rtnetlink_rcv_msg'
>     b'netlink_rcv_skb'
>     b'rtnetlink_rcv'
>     b'netlink_unicast'
>     b'netlink_sendmsg'
>     b'____sys_sendmsg'
>     b'___sys_sendmsg'
>     b'__sys_sendmsg'
>     b'__x64_sys_sendmsg'
>     b'do_syscall_64'
>     b'entry_SYSCALL_64_after_hwframe'
>     -                ip (277)
>         126554955
> 
> Since nexthops are always deleted under RTNL, synchronize_net() can be
> used instead. It will call synchronize_rcu_expedited() which only blocks
> for several microseconds as opposed to multiple milliseconds like
> synchronize_rcu().
> 
> With this patch deletion of 16k nexthops takes less than a second:
> 
> # time -p ip link set dev dummy10 down
> real 0.12
> user 0.00
> sys 0.04
> 
> Tested with fib_nexthops.sh which includes torture tests that prompted
> the initial change:
> 
> # ./fib_nexthops.sh
> ...
> Tests passed: 134
> Tests failed:   0
> 
> Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/ipv4/nexthop.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Thanks for finding this, Ido.

Reviewed-by: David Ahern <dsahern@gmail.com>
Nikolay Aleksandrov Oct. 17, 2020, 9:16 a.m. UTC | #3
On Fri, 2020-10-16 at 20:29 +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:
> 
> # time -p ip -b nexthop.batch
> real 0.29
> user 0.01
> sys 0.15
> 
> # time -p ip link set dev dummy10 down
> real 131.03
> user 0.06
> sys 0.52
> 
> This is because of repeated calls to synchronize_rcu() whenever a
> nexthop is removed from a nexthop group:
> 
> # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
> ...
>     b'finish_task_switch'
>     b'schedule'
>     b'schedule_timeout'
>     b'wait_for_completion'
>     b'__wait_rcu_gp'
>     b'synchronize_rcu.part.0'
>     b'synchronize_rcu'
>     b'__remove_nexthop'
>     b'remove_nexthop'
>     b'nexthop_flush_dev'
>     b'nh_netdev_event'
>     b'raw_notifier_call_chain'
>     b'call_netdevice_notifiers_info'
>     b'__dev_notify_flags'
>     b'dev_change_flags'
>     b'do_setlink'
>     b'__rtnl_newlink'
>     b'rtnl_newlink'
>     b'rtnetlink_rcv_msg'
>     b'netlink_rcv_skb'
>     b'rtnetlink_rcv'
>     b'netlink_unicast'
>     b'netlink_sendmsg'
>     b'____sys_sendmsg'
>     b'___sys_sendmsg'
>     b'__sys_sendmsg'
>     b'__x64_sys_sendmsg'
>     b'do_syscall_64'
>     b'entry_SYSCALL_64_after_hwframe'
>     -                ip (277)
>         126554955
> 
> Since nexthops are always deleted under RTNL, synchronize_net() can be
> used instead. It will call synchronize_rcu_expedited() which only blocks
> for several microseconds as opposed to multiple milliseconds like
> synchronize_rcu().
> 
> With this patch deletion of 16k nexthops takes less than a second:
> 
> # time -p ip link set dev dummy10 down
> real 0.12
> user 0.00
> sys 0.04
> 
> Tested with fib_nexthops.sh which includes torture tests that prompted
> the initial change:
> 
> # ./fib_nexthops.sh
> ...
> Tests passed: 134
> Tests failed:   0
> 
> Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/ipv4/nexthop.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 

Looks good.
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Jakub Kicinski Oct. 20, 2020, 3:10 a.m. UTC | #4
On Fri, 16 Oct 2020 20:29:14 +0300 Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> While insertion of 16k nexthops all using the same netdev ('dummy10')
> takes less than a second, deletion takes about 130 seconds:

Applied, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 8c0f17c6863c..0dc43ad28eb9 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -845,7 +845,7 @@  static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
 		remove_nh_grp_entry(net, nhge, nlinfo);
 
 	/* make sure all see the newly published array before releasing rtnl */
-	synchronize_rcu();
+	synchronize_net();
 }
 
 static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)