Message ID | 8bbe1218656e66552ff28cbee8c7d1f0ffd8e9fd.1712314149.git.jbenc@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr | expand |
On 4/5/24 12:54, Jiri Benc wrote: > Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it > still means hlist_for_each_entry_rcu can return an item that got removed > from the list. The memory itself of such item is not freed thanks to RCU > but nothing guarantees the actual content of the memory is sane. > > In particular, the reference count can be zero. This can happen if > ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry > from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all > references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough > timing, this can happen: > > 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry. > > 2. Then, the whole ipv6_del_addr is executed for the given entry. The > reference count drops to zero and kfree_rcu is scheduled. > > 3. ipv6_get_ifaddr continues and increments the reference count > (in6_ifa_hold). > > 4. The rcu is unlocked and the entry is freed. > > 5. Later, the reference count is dropped to zero (again) and kfree_rcu > is scheduled (again). refcount_t semantic should prevent this double transition to 0 ? Can you include a stack trace in the changelog ? Otherwise patch looks good to me, thanks ! Reviewed-by: Eric Dumazet <edumazet@google.com> > > Prevent increasing of the reference count in such case. The name > in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe. > > Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU") > Signed-off-by: Jiri Benc <jbenc@redhat.com> > --- > > Side note: While this fixes one bug, there may be more locking bugs > lurking aroung inet6_ifaddr. The semantics of locking of inet6_ifaddr is > wild and fragile. Some of the fields are freed in ipv6_del_addr and > guarded by ifa->state == INET6_IFADDR_STATE_DEAD and RTNL. Some of the > fields are freed in inet6_ifa_finish_destroy and guarded by ifa->refcnt > and RCU. Needless to say, this semantics is undocumented. Worse, > ifa->state guard may not be enough. For example, ipv6_get_ifaddr can > still return an entry that proceeded through ipv6_del_addr, which means > ifa->state is INET6_IFADDR_STATE_DEAD. However, at least some callers > (e.g. ndisc_recv_ns) seem to change ifa->state to something else. As > another example, ipv6_del_addr relies on ifa->flags, which are changed > throughout the code without RTNL. All of this may be okay but it's far > from clear. > --- > include/net/addrconf.h | 4 ++++ > net/ipv6/addrconf.c | 7 ++++--- > 2 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/include/net/addrconf.h b/include/net/addrconf.h > index 9d06eb945509..62a407db1bf5 100644 > --- a/include/net/addrconf.h > +++ b/include/net/addrconf.h > @@ -438,6 +438,10 @@ static inline void in6_ifa_hold(struct inet6_ifaddr *ifp) > refcount_inc(&ifp->refcnt); > } > > +static inline bool in6_ifa_hold_safe(struct inet6_ifaddr *ifp) > +{ > + return refcount_inc_not_zero(&ifp->refcnt); > +} > > /* > * compute link-local solicited-node multicast address > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c > index 92db9b474f2b..779aa6ecdd49 100644 > --- a/net/ipv6/addrconf.c > +++ b/net/ipv6/addrconf.c > @@ -2091,9 +2091,10 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add > if (ipv6_addr_equal(&ifp->addr, addr)) { > if (!dev || ifp->idev->dev == dev || > !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) { > - result = ifp; > - in6_ifa_hold(ifp); > - break; > + if (in6_ifa_hold_safe(ifp)) { > + result = ifp; > + break; > + } > } > } > }
On Fri, 5 Apr 2024 13:31:52 +0200, Eric Dumazet wrote: > refcount_t semantic should prevent this double transition to 0 ? You're right. This was found in an old kernel where it manifested by a crash. In not so ancient kernels there's a refcount_t WARN: [ 202.500654] refcount_t: addition on 0; use-after-free. [ 202.500665] WARNING: CPU: 0 PID: 1327 at lib/refcount.c:25 refcount_warn_saturate+0x74/0x110 I knew about that but I wrote the commit message on the old kernel and when updating it I missed this spot. Sorry about that. > Can you include a stack trace in the changelog ? I don't think I have a stack trace from the latest net kernel; let me get one. > Otherwise patch looks good to me, thanks ! > > Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks for the review! v2 with the updated commit message coming on Monday. Jiri
On 4/5/24 4:54 AM, Jiri Benc wrote: > Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it > still means hlist_for_each_entry_rcu can return an item that got removed > from the list. The memory itself of such item is not freed thanks to RCU > but nothing guarantees the actual content of the memory is sane. > > In particular, the reference count can be zero. This can happen if > ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry > from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all > references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough > timing, this can happen: > > 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry. > > 2. Then, the whole ipv6_del_addr is executed for the given entry. The > reference count drops to zero and kfree_rcu is scheduled. > > 3. ipv6_get_ifaddr continues and increments the reference count > (in6_ifa_hold). > > 4. The rcu is unlocked and the entry is freed. > > 5. Later, the reference count is dropped to zero (again) and kfree_rcu > is scheduled (again). > > Prevent increasing of the reference count in such case. The name > in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe. > > Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU") > Signed-off-by: Jiri Benc <jbenc@redhat.com> > --- > > Side note: While this fixes one bug, there may be more locking bugs > lurking aroung inet6_ifaddr. The semantics of locking of inet6_ifaddr is > wild and fragile. Some of the fields are freed in ipv6_del_addr and > guarded by ifa->state == INET6_IFADDR_STATE_DEAD and RTNL. Some of the > fields are freed in inet6_ifa_finish_destroy and guarded by ifa->refcnt > and RCU. Needless to say, this semantics is undocumented. Worse, > ifa->state guard may not be enough. For example, ipv6_get_ifaddr can > still return an entry that proceeded through ipv6_del_addr, which means > ifa->state is INET6_IFADDR_STATE_DEAD. However, at least some callers > (e.g. ndisc_recv_ns) seem to change ifa->state to something else. As > another example, ipv6_del_addr relies on ifa->flags, which are changed > throughout the code without RTNL. All of this may be okay but it's far > from clear. > --- > include/net/addrconf.h | 4 ++++ > net/ipv6/addrconf.c | 7 ++++--- > 2 files changed, 8 insertions(+), 3 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 9d06eb945509..62a407db1bf5 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -438,6 +438,10 @@ static inline void in6_ifa_hold(struct inet6_ifaddr *ifp) refcount_inc(&ifp->refcnt); } +static inline bool in6_ifa_hold_safe(struct inet6_ifaddr *ifp) +{ + return refcount_inc_not_zero(&ifp->refcnt); +} /* * compute link-local solicited-node multicast address diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 92db9b474f2b..779aa6ecdd49 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2091,9 +2091,10 @@ struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *add if (ipv6_addr_equal(&ifp->addr, addr)) { if (!dev || ifp->idev->dev == dev || !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) { - result = ifp; - in6_ifa_hold(ifp); - break; + if (in6_ifa_hold_safe(ifp)) { + result = ifp; + break; + } } } }
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it still means hlist_for_each_entry_rcu can return an item that got removed from the list. The memory itself of such item is not freed thanks to RCU but nothing guarantees the actual content of the memory is sane. In particular, the reference count can be zero. This can happen if ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough timing, this can happen: 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry. 2. Then, the whole ipv6_del_addr is executed for the given entry. The reference count drops to zero and kfree_rcu is scheduled. 3. ipv6_get_ifaddr continues and increments the reference count (in6_ifa_hold). 4. The rcu is unlocked and the entry is freed. 5. Later, the reference count is dropped to zero (again) and kfree_rcu is scheduled (again). Prevent increasing of the reference count in such case. The name in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe. Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU") Signed-off-by: Jiri Benc <jbenc@redhat.com> --- Side note: While this fixes one bug, there may be more locking bugs lurking aroung inet6_ifaddr. The semantics of locking of inet6_ifaddr is wild and fragile. Some of the fields are freed in ipv6_del_addr and guarded by ifa->state == INET6_IFADDR_STATE_DEAD and RTNL. Some of the fields are freed in inet6_ifa_finish_destroy and guarded by ifa->refcnt and RCU. Needless to say, this semantics is undocumented. Worse, ifa->state guard may not be enough. For example, ipv6_get_ifaddr can still return an entry that proceeded through ipv6_del_addr, which means ifa->state is INET6_IFADDR_STATE_DEAD. However, at least some callers (e.g. ndisc_recv_ns) seem to change ifa->state to something else. As another example, ipv6_del_addr relies on ifa->flags, which are changed throughout the code without RTNL. All of this may be okay but it's far from clear. --- include/net/addrconf.h | 4 ++++ net/ipv6/addrconf.c | 7 ++++--- 2 files changed, 8 insertions(+), 3 deletions(-)