diff mbox series

[net-next,v2] ipv6: fix locking issues with loops over idev->addr_list

Message ID 20220403231523.45843-1-dossche.niels@gmail.com (mailing list archive)
State Accepted
Commit 51454ea42c1ab4e0c2828bb0d4d53957976980de
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] ipv6: fix locking issues with loops over idev->addr_list | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2136 this patch: 2136
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 564 this patch: 564
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2261 this patch: 2261
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 96 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Niels Dossche April 3, 2022, 11:15 p.m. UTC
idev->addr_list needs to be protected by idev->lock. However, it is not
always possible to do so while iterating and performing actions on
inet6_ifaddr instances. For example, multiple functions (like
addrconf_{join,leave}_anycast) eventually call down to other functions
that acquire the idev->lock. The current code temporarily unlocked the
idev->lock during the loops, which can cause race conditions. Moving the
locks up is also not an appropriate solution as the ordering of lock
acquisition will be inconsistent with for example mc_lock.

This solution adds an additional field to inet6_ifaddr that is used
to temporarily add the instances to a temporary list while holding
idev->lock. The temporary list can then be traversed without holding
idev->lock. This change was done in two places. In addrconf_ifdown, the
list_for_each_entry_safe variant of the list loop is also no longer
necessary as there is no deletion within that specific loop.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
---

Changes in v2:
 - Applied the code style suggestions
 - Made sure the lines don't exceed 80 chars such that checkpatch
   becomes happy

 include/net/if_inet6.h |  8 ++++++++
 net/ipv6/addrconf.c    | 30 ++++++++++++++++++++++++------
 2 files changed, 32 insertions(+), 6 deletions(-)

Comments

Andrew Lunn April 4, 2022, 12:47 p.m. UTC | #1
On Mon, Apr 04, 2022 at 01:15:24AM +0200, Niels Dossche wrote:
> idev->addr_list needs to be protected by idev->lock. However, it is not
> always possible to do so while iterating and performing actions on
> inet6_ifaddr instances. For example, multiple functions (like
> addrconf_{join,leave}_anycast) eventually call down to other functions
> that acquire the idev->lock. The current code temporarily unlocked the
> idev->lock during the loops, which can cause race conditions. Moving the
> locks up is also not an appropriate solution as the ordering of lock
> acquisition will be inconsistent with for example mc_lock.

Hi Niels

What sort of issues could the race result in?

I've been chasing a netdev reference leak, when using the GNS3
simulator. Shutting down the system can result in one interface having
a netdev reference count of 5, and it never gets destroyed. Using the
tracker code Eric recently added, i found one of the leaks is idev,
its reference count does not go to 0, and hence the reference it holds
on the netdev is never released.

I will test this patch out, see if it helps, but i'm just wondering if
you think the issue i'm seeing is theoretically possible because of
this race? If it is, we might want this applied to stable, not just
net-next.

Thanks
	Andrew
Niels Dossche April 4, 2022, 1:57 p.m. UTC | #2
On 04/04/2022 14:47, Andrew Lunn wrote:
> On Mon, Apr 04, 2022 at 01:15:24AM +0200, Niels Dossche wrote:
>> idev->addr_list needs to be protected by idev->lock. However, it is not
>> always possible to do so while iterating and performing actions on
>> inet6_ifaddr instances. For example, multiple functions (like
>> addrconf_{join,leave}_anycast) eventually call down to other functions
>> that acquire the idev->lock. The current code temporarily unlocked the
>> idev->lock during the loops, which can cause race conditions. Moving the
>> locks up is also not an appropriate solution as the ordering of lock
>> acquisition will be inconsistent with for example mc_lock.
> 
> Hi Niels
> 
> What sort of issues could the race result in?

Hi Andrew

The issue is that the protection of the address list is lifted inside of the loop for a brief moment.
Therefore, the looping over the list loses its atomicity.
I believe the list's entries might become corrupted in case of a race occurring.

> 
> I've been chasing a netdev reference leak, when using the GNS3
> simulator. Shutting down the system can result in one interface having
> a netdev reference count of 5, and it never gets destroyed. Using the
> tracker code Eric recently added, i found one of the leaks is idev,
> its reference count does not go to 0, and hence the reference it holds
> on the netdev is never released.> 
> I will test this patch out, see if it helps, but i'm just wondering if
> you think the issue i'm seeing is theoretically possible because of
> this race? If it is, we might want this applied to stable, not just
> net-next.

I am not sure, but I believe that it may be related, although I believe it would be unlikely to happen.
In your case, it could be because of this non-atomic handling of the list entries:
this could perhaps, for example, result in skipping an instance of ifaddr in the loop if
there is another change happening to the list in the meantime. Then the instance would've never been put,
hence not changing its refcount. But again, I'm not sure about this for your case.

> 
> Thanks
> 	Andrew

Kind regards
Niels
patchwork-bot+netdevbpf@kernel.org April 7, 2022, 5:50 a.m. UTC | #3
Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  4 Apr 2022 01:15:24 +0200 you wrote:
> idev->addr_list needs to be protected by idev->lock. However, it is not
> always possible to do so while iterating and performing actions on
> inet6_ifaddr instances. For example, multiple functions (like
> addrconf_{join,leave}_anycast) eventually call down to other functions
> that acquire the idev->lock. The current code temporarily unlocked the
> idev->lock during the loops, which can cause race conditions. Moving the
> locks up is also not an appropriate solution as the ordering of lock
> acquisition will be inconsistent with for example mc_lock.
> 
> [...]

Here is the summary with links:
  - [net-next,v2] ipv6: fix locking issues with loops over idev->addr_list
    https://git.kernel.org/netdev/net-next/c/51454ea42c1a

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 4cfdef6ca4f6..c8490729b4ae 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -64,6 +64,14 @@  struct inet6_ifaddr {
 
 	struct hlist_node	addr_lst;
 	struct list_head	if_list;
+	/*
+	 * Used to safely traverse idev->addr_list in process context
+	 * if the idev->lock needed to protect idev->addr_list cannot be held.
+	 * In that case, add the items to this list temporarily and iterate
+	 * without holding idev->lock.
+	 * See addrconf_ifdown and dev_forward_change.
+	 */
+	struct list_head	if_list_aux;
 
 	struct list_head	tmp_list;
 	struct inet6_ifaddr	*ifpub;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index b22504176588..1afc4c024981 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -797,6 +797,7 @@  static void dev_forward_change(struct inet6_dev *idev)
 {
 	struct net_device *dev;
 	struct inet6_ifaddr *ifa;
+	LIST_HEAD(tmp_addr_list);
 
 	if (!idev)
 		return;
@@ -815,14 +816,24 @@  static void dev_forward_change(struct inet6_dev *idev)
 		}
 	}
 
+	read_lock_bh(&idev->lock);
 	list_for_each_entry(ifa, &idev->addr_list, if_list) {
 		if (ifa->flags&IFA_F_TENTATIVE)
 			continue;
+		list_add_tail(&ifa->if_list_aux, &tmp_addr_list);
+	}
+	read_unlock_bh(&idev->lock);
+
+	while (!list_empty(&tmp_addr_list)) {
+		ifa = list_first_entry(&tmp_addr_list,
+				       struct inet6_ifaddr, if_list_aux);
+		list_del(&ifa->if_list_aux);
 		if (idev->cnf.forwarding)
 			addrconf_join_anycast(ifa);
 		else
 			addrconf_leave_anycast(ifa);
 	}
+
 	inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF,
 				     NETCONFA_FORWARDING,
 				     dev->ifindex, &idev->cnf);
@@ -3728,7 +3739,8 @@  static int addrconf_ifdown(struct net_device *dev, bool unregister)
 	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
-	struct inet6_ifaddr *ifa, *tmp;
+	struct inet6_ifaddr *ifa;
+	LIST_HEAD(tmp_addr_list);
 	bool keep_addr = false;
 	bool was_ready;
 	int state, i;
@@ -3820,16 +3832,23 @@  static int addrconf_ifdown(struct net_device *dev, bool unregister)
 		write_lock_bh(&idev->lock);
 	}
 
-	list_for_each_entry_safe(ifa, tmp, &idev->addr_list, if_list) {
+	list_for_each_entry(ifa, &idev->addr_list, if_list)
+		list_add_tail(&ifa->if_list_aux, &tmp_addr_list);
+	write_unlock_bh(&idev->lock);
+
+	while (!list_empty(&tmp_addr_list)) {
 		struct fib6_info *rt = NULL;
 		bool keep;
 
+		ifa = list_first_entry(&tmp_addr_list,
+				       struct inet6_ifaddr, if_list_aux);
+		list_del(&ifa->if_list_aux);
+
 		addrconf_del_dad_work(ifa);
 
 		keep = keep_addr && (ifa->flags & IFA_F_PERMANENT) &&
 			!addr_is_local(&ifa->addr);
 
-		write_unlock_bh(&idev->lock);
 		spin_lock_bh(&ifa->lock);
 
 		if (keep) {
@@ -3860,15 +3879,14 @@  static int addrconf_ifdown(struct net_device *dev, bool unregister)
 			addrconf_leave_solict(ifa->idev, &ifa->addr);
 		}
 
-		write_lock_bh(&idev->lock);
 		if (!keep) {
+			write_lock_bh(&idev->lock);
 			list_del_rcu(&ifa->if_list);
+			write_unlock_bh(&idev->lock);
 			in6_ifa_put(ifa);
 		}
 	}
 
-	write_unlock_bh(&idev->lock);
-
 	/* Step 5: Discard anycast and multicast list */
 	if (unregister) {
 		ipv6_ac_destroy_dev(idev);