diff mbox series

[net] nexthop: Fix division by zero while replacing a resilient group

Message ID 20210917130218.560510-1-idosch@idosch.org (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net] nexthop: Fix division by zero while replacing a resilient group | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: dsahern@kernel.org; 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/header_inline success Link

Commit Message

Ido Schimmel Sept. 17, 2021, 1:02 p.m. UTC
From: Ido Schimmel <idosch@nvidia.com>

The resilient nexthop group torture tests in fib_nexthop.sh exposed a
possible division by zero while replacing a resilient group [1]. The
division by zero occurs when the data path sees a resilient nexthop
group with zero buckets.

The tests replace a resilient nexthop group in a loop while traffic is
forwarded through it. The tests do not specify the number of buckets
while performing the replacement, resulting in the kernel allocating a
stub resilient table (i.e, 'struct nh_res_table') with zero buckets.

This table should never be visible to the data path, but the old nexthop
group (i.e., 'oldg') might still be used by the data path when the stub
table is assigned to it.

Fix this by only assigning the stub table to the old nexthop group after
making sure the group is no longer used by the data path.

Tested with fib_nexthops.sh:

Tests passed: 222
Tests failed:   0

[1]
 divide error: 0000 [#1] PREEMPT SMP KASAN
 CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
 RIP: 0010:nexthop_select_path+0x2d2/0x1a80
[...]
 Call Trace:
  fib_select_multipath+0x79b/0x1530
  fib_select_path+0x8fb/0x1c10
  ip_route_output_key_hash_rcu+0x1198/0x2da0
  ip_route_output_key_hash+0x190/0x340
  ip_route_output_flow+0x21/0x120
  raw_sendmsg+0x91d/0x2e10
  inet_sendmsg+0x9e/0xe0
  __sys_sendto+0x23d/0x360
  __x64_sys_sendto+0xe1/0x1b0
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
---
 net/ipv4/nexthop.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

David Ahern Sept. 19, 2021, 5:08 p.m. UTC | #1
On 9/17/21 7:02 AM, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> The resilient nexthop group torture tests in fib_nexthop.sh exposed a
> possible division by zero while replacing a resilient group [1]. The
> division by zero occurs when the data path sees a resilient nexthop
> group with zero buckets.
> 
> The tests replace a resilient nexthop group in a loop while traffic is
> forwarded through it. The tests do not specify the number of buckets
> while performing the replacement, resulting in the kernel allocating a
> stub resilient table (i.e, 'struct nh_res_table') with zero buckets.
> 
> This table should never be visible to the data path, but the old nexthop
> group (i.e., 'oldg') might still be used by the data path when the stub
> table is assigned to it.
> 
> Fix this by only assigning the stub table to the old nexthop group after
> making sure the group is no longer used by the data path.
> 
> Tested with fib_nexthops.sh:
> 
> Tests passed: 222
> Tests failed:   0
> 
> [1]
>  divide error: 0000 [#1] PREEMPT SMP KASAN
>  CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
>  RIP: 0010:nexthop_select_path+0x2d2/0x1a80
> [...]
>  Call Trace:
>   fib_select_multipath+0x79b/0x1530
>   fib_select_path+0x8fb/0x1c10
>   ip_route_output_key_hash_rcu+0x1198/0x2da0
>   ip_route_output_key_hash+0x190/0x340
>   ip_route_output_flow+0x21/0x120
>   raw_sendmsg+0x91d/0x2e10
>   inet_sendmsg+0x9e/0xe0
>   __sys_sendto+0x23d/0x360
>   __x64_sys_sendto+0xe1/0x1b0
>   do_syscall_64+0x35/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Cc: stable@vger.kernel.org
> Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> ---
>  net/ipv4/nexthop.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>
diff mbox series

Patch

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 75ca4b6e484f..0e75fd3e57b4 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -1982,6 +1982,8 @@  static int replace_nexthop_grp(struct net *net, struct nexthop *old,
 	rcu_assign_pointer(old->nh_grp, newg);
 
 	if (newg->resilient) {
+		/* Make sure concurrent readers are not using 'oldg' anymore. */
+		synchronize_net();
 		rcu_assign_pointer(oldg->res_table, tmp_table);
 		rcu_assign_pointer(oldg->spare->res_table, tmp_table);
 	}