diff mbox series

[net] ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr

Message ID 8bbe1218656e66552ff28cbee8c7d1f0ffd8e9fd.1712314149.git.jbenc@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1354 this patch: 1354
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 3 maintainers not CCed: edumazet@google.com kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 956 this patch: 956
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1416 this patch: 1416
netdev/checkpatch warning WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 5c578aedcb21 ("IPv6: convert addrconf hash list to RCU")'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 5 this patch: 5
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-07--00-00 (tests: 956)

Commit Message

Jiri Benc April 5, 2024, 10:54 a.m. UTC
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
still means hlist_for_each_entry_rcu can return an item that got removed
from the list. The memory itself of such item is not freed thanks to RCU
but nothing guarantees the actual content of the memory is sane.

In particular, the reference count can be zero. This can happen if
ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
timing, this can happen:

1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.

2. Then, the whole ipv6_del_addr is executed for the given entry. The
   reference count drops to zero and kfree_rcu is scheduled.

3. ipv6_get_ifaddr continues and increments the reference count
   (in6_ifa_hold).

4. The rcu is unlocked and the entry is freed.

5. Later, the reference count is dropped to zero (again) and kfree_rcu
   is scheduled (again).

Prevent increasing of the reference count in such case. The name
in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.

Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---

Side note: While this fixes one bug, there may be more locking bugs
lurking aroung inet6_ifaddr. The semantics of locking of inet6_ifaddr is
wild and fragile. Some of the fields are freed in ipv6_del_addr and
guarded by ifa->state == INET6_IFADDR_STATE_DEAD and RTNL. Some of the
fields are freed in inet6_ifa_finish_destroy and guarded by ifa->refcnt
and RCU. Needless to say, this semantics is undocumented. Worse,
ifa->state guard may not be enough. For example, ipv6_get_ifaddr can
still return an entry that proceeded through ipv6_del_addr, which means
ifa->state is INET6_IFADDR_STATE_DEAD. However, at least some callers
(e.g. ndisc_recv_ns) seem to change ifa->state to something else. As
another example, ipv6_del_addr relies on ifa->flags, which are changed
throughout the code without RTNL. All of this may be okay but it's far
from clear.
---
 include/net/addrconf.h | 4 ++++
 net/ipv6/addrconf.c    | 7 ++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

Comments

Eric Dumazet April 5, 2024, 11:31 a.m. UTC | #1
On 4/5/24 12:54, Jiri Benc wrote:
> Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
> still means hlist_for_each_entry_rcu can return an item that got removed
> from the list. The memory itself of such item is not freed thanks to RCU
> but nothing guarantees the actual content of the memory is sane.
>
> In particular, the reference count can be zero. This can happen if
> ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
> from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
> references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
> timing, this can happen:
>
> 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
>
> 2. Then, the whole ipv6_del_addr is executed for the given entry. The
>     reference count drops to zero and kfree_rcu is scheduled.
>
> 3. ipv6_get_ifaddr continues and increments the reference count
>     (in6_ifa_hold).
>
> 4. The rcu is unlocked and the entry is freed.
>
> 5. Later, the reference count is dropped to zero (again) and kfree_rcu
>     is scheduled (again).

refcount_t semantic should prevent this double transition to 0  ?

Can you include a stack trace in the changelog ?

Otherwise patch looks good to me, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>


>
> Prevent increasing of the reference count in such case. The name
> in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
>
> Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU")
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
>
> Side note: While this fixes one bug, there may be more locking bugs
> lurking aroung inet6_ifaddr. The semantics of locking of inet6_ifaddr is
> wild and fragile. Some of the fields are freed in ipv6_del_addr and
> guarded by ifa->state == INET6_IFADDR_STATE_DEAD and RTNL. Some of the
> fields are freed in inet6_ifa_finish_destroy and guarded by ifa->refcnt
> and RCU. Needless to say, this semantics is undocumented. Worse,
> ifa->state guard may not be enough. For example, ipv6_get_ifaddr can
> still return an entry that proceeded through ipv6_del_addr, which means
> ifa->state is INET6_IFADDR_STATE_DEAD. However, at least some callers
> (e.g. ndisc_recv_ns) seem to change ifa->state to something else. As
> another example, ipv6_del_addr relies on ifa->flags, which are changed
> throughout the code without RTNL. All of this may be okay but it's far
> from clear.
> ---
>   include/net/addrconf.h | 4 ++++
>   net/ipv6/addrconf.c    | 7 ++++---
>   2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index 9d06eb945509..62a407db1bf5 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -438,6 +438,10 @@ static inline void in6_ifa_hold(struct inet6_ifaddr *ifp)
>   	refcount_inc(&ifp->refcnt);
>   }
>   
> +static inline bool in6_ifa_hold_safe(struct inet6_ifaddr *ifp)
> +{
> +	return refcount_inc_not_zero(&ifp->refcnt);
> +}
>   
>   /*
>    *	compute link-local solicited-node multicast address
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 92db9b474f2b..779aa6ecdd49 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2091,9 +2091,10 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add
>   		if (ipv6_addr_equal(&ifp->addr, addr)) {
>   			if (!dev || ifp->idev->dev == dev ||
>   			    !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) {
> -				result = ifp;
> -				in6_ifa_hold(ifp);
> -				break;
> +				if (in6_ifa_hold_safe(ifp)) {
> +					result = ifp;
> +					break;
> +				}
>   			}
>   		}
>   	}
Jiri Benc April 5, 2024, 11:47 a.m. UTC | #2
On Fri, 5 Apr 2024 13:31:52 +0200, Eric Dumazet wrote:
> refcount_t semantic should prevent this double transition to 0  ?

You're right. This was found in an old kernel where it manifested
by a crash. In not so ancient kernels there's a refcount_t WARN:

[  202.500654] refcount_t: addition on 0; use-after-free.
[  202.500665] WARNING: CPU: 0 PID: 1327 at lib/refcount.c:25 refcount_warn_saturate+0x74/0x110

I knew about that but I wrote the commit message on the old kernel and
when updating it I missed this spot. Sorry about that.

> Can you include a stack trace in the changelog ?

I don't think I have a stack trace from the latest net kernel; let me
get one.

> Otherwise patch looks good to me, thanks !
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks for the review! v2 with the updated commit message coming on
Monday.

 Jiri
David Ahern April 6, 2024, 7:51 p.m. UTC | #3
On 4/5/24 4:54 AM, Jiri Benc wrote:
> Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
> still means hlist_for_each_entry_rcu can return an item that got removed
> from the list. The memory itself of such item is not freed thanks to RCU
> but nothing guarantees the actual content of the memory is sane.
> 
> In particular, the reference count can be zero. This can happen if
> ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
> from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
> references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
> timing, this can happen:
> 
> 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
> 
> 2. Then, the whole ipv6_del_addr is executed for the given entry. The
>    reference count drops to zero and kfree_rcu is scheduled.
> 
> 3. ipv6_get_ifaddr continues and increments the reference count
>    (in6_ifa_hold).
> 
> 4. The rcu is unlocked and the entry is freed.
> 
> 5. Later, the reference count is dropped to zero (again) and kfree_rcu
>    is scheduled (again).
> 
> Prevent increasing of the reference count in such case. The name
> in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
> 
> Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU")
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
> 
> Side note: While this fixes one bug, there may be more locking bugs
> lurking aroung inet6_ifaddr. The semantics of locking of inet6_ifaddr is
> wild and fragile. Some of the fields are freed in ipv6_del_addr and
> guarded by ifa->state == INET6_IFADDR_STATE_DEAD and RTNL. Some of the
> fields are freed in inet6_ifa_finish_destroy and guarded by ifa->refcnt
> and RCU. Needless to say, this semantics is undocumented. Worse,
> ifa->state guard may not be enough. For example, ipv6_get_ifaddr can
> still return an entry that proceeded through ipv6_del_addr, which means
> ifa->state is INET6_IFADDR_STATE_DEAD. However, at least some callers
> (e.g. ndisc_recv_ns) seem to change ifa->state to something else. As
> another example, ipv6_del_addr relies on ifa->flags, which are changed
> throughout the code without RTNL. All of this may be okay but it's far
> from clear.
> ---
>  include/net/addrconf.h | 4 ++++
>  net/ipv6/addrconf.c    | 7 ++++---
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 


Reviewed-by: David Ahern <dsahern@kernel.org>
diff mbox series

Patch

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 9d06eb945509..62a407db1bf5 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -438,6 +438,10 @@  static inline void in6_ifa_hold(struct inet6_ifaddr *ifp)
 	refcount_inc(&ifp->refcnt);
 }
 
+static inline bool in6_ifa_hold_safe(struct inet6_ifaddr *ifp)
+{
+	return refcount_inc_not_zero(&ifp->refcnt);
+}
 
 /*
  *	compute link-local solicited-node multicast address
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 92db9b474f2b..779aa6ecdd49 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2091,9 +2091,10 @@  struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add
 		if (ipv6_addr_equal(&ifp->addr, addr)) {
 			if (!dev || ifp->idev->dev == dev ||
 			    !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) {
-				result = ifp;
-				in6_ifa_hold(ifp);
-				break;
+				if (in6_ifa_hold_safe(ifp)) {
+					result = ifp;
+					break;
+				}
 			}
 		}
 	}