diff mbox series

[net-next,1/4] ipv6: make inet6_fill_ifaddr() lockless

Message ID 20240306155144.870421-2-edumazet@google.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series ipv6: lockless inet6_dump_addr() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 940 this patch: 940
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 956 this patch: 956
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 956 this patch: 956
netdev/checkpatch warning CHECK: Unnecessary parentheses around 'preferred == INFINITY_LIFE_TIME' WARNING: 'approriate' may be misspelled - perhaps 'appropriate'?
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-06--21-00 (tests: 892)

Commit Message

Eric Dumazet March 6, 2024, 3:51 p.m. UTC
Make inet6_fill_ifaddr() lockless, and add approriate annotations
on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto
and ifa->rt_priority.

Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr()
and inet6_fill_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/addrconf.c | 66 +++++++++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

Comments

David Ahern March 6, 2024, 11:38 p.m. UTC | #1
On 3/6/24 8:51 AM, Eric Dumazet wrote:
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 2f84e6ecf19f48602cadb47bc378c9b5a1cdbf65..480a1f9492b590bb13008cede5ea7dc9c422af67 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2730,7 +2730,7 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
>  		if (update_lft) {
>  			ifp->valid_lft = valid_lft;
>  			ifp->prefered_lft = prefered_lft;
> -			ifp->tstamp = now;
> +			WRITE_ONCE(ifp->tstamp, now);

There are a lot of instances of ifp->tstamp not annotated with READ_ONCE
or WRITE_ONCE. Most of them before this function seem to be updated or
read under rtnl. What's the general mode of operation for this patch?
e.g., there are tstamp references just above this one in this function
not modified. Commit message does not describe why some are updated and
others not.
Eric Dumazet March 7, 2024, 6:24 a.m. UTC | #2
On Thu, Mar 7, 2024 at 12:38 AM David Ahern <dsahern@kernel.org> wrote:
>
> On 3/6/24 8:51 AM, Eric Dumazet wrote:
> > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> > index 2f84e6ecf19f48602cadb47bc378c9b5a1cdbf65..480a1f9492b590bb13008cede5ea7dc9c422af67 100644
> > --- a/net/ipv6/addrconf.c
> > +++ b/net/ipv6/addrconf.c
> > @@ -2730,7 +2730,7 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
> >               if (update_lft) {
> >                       ifp->valid_lft = valid_lft;
> >                       ifp->prefered_lft = prefered_lft;
> > -                     ifp->tstamp = now;
> > +                     WRITE_ONCE(ifp->tstamp, now);
>
> There are a lot of instances of ifp->tstamp not annotated with READ_ONCE
> or WRITE_ONCE. Most of them before this function seem to be updated or
> read under rtnl. What's the general mode of operation for this patch?
> e.g., there are tstamp references just above this one in this function
> not modified. Commit message does not describe why some are updated and
> others not.


Writes on objects that are not yet visible to other threads/cpu do not
need a WRITE_ONCE()

ipv6_add_addr() allocates a fresh object, so

ifa->cstamp = ifa->tstamp = jiffies;  // do not need any WRITE_ONCE()


Reads while we own ifa->lock do no need a READ_ONCE()

So check_cleanup_prefix_route() :

  if (time_before(*expires, ifa->tstamp + lifetime * HZ))  // no need
       *expires = ifa->tstamp + lifetime * HZ;   // no need

In ipv6_create_tempaddr()

  age = (now - ifp->tstamp) / HZ; // no need because we hold ifp->lock;

In ipv6_create_tempaddr()

  age = (now - ifp->tstamp) / HZ; // no need, we hold ifp->lock;

  tmp_tstamp = ifp->tstamp; // no need, we hold ifp->lock;

 addrconf_prefix_rcv_add_addr()
  The reads are done under ifp->lock
  The write uses WRITE_ONCE()

I did a full audit and I think I did not miss any READ_ONCE()/WRITE_ONCE()

Of course, this is extra precaution anyway, the race has no impact
other than KCSAN and/or would require a dumb compiler in the first
place.

If I had to explain this in the changelog, I guess I would not do all
these changes, this would be too time consuming.

Thanks !
David Ahern March 8, 2024, 3:26 p.m. UTC | #3
On 3/6/24 11:24 PM, Eric Dumazet wrote:
> I did a full audit and I think I did not miss any READ_ONCE()/WRITE_ONCE()
> 
> Of course, this is extra precaution anyway, the race has no impact
> other than KCSAN and/or would require a dumb compiler in the first
> place.

Then I guess I do not need to waste my time on a detailed review:

Acked-by: David Ahern <dsahern@kernel.org>

> 
> If I had to explain this in the changelog, I guess I would not do all
> these changes, this would be too time consuming.

The request was something simple as the following in the changelog:

"New objects not in any list or table do not need the annotations, nor
do updates done while holding a lock."

Reminder for current reviewer and in the future of the intent behind the
change.
Jakub Kicinski March 8, 2024, 7:31 p.m. UTC | #4
On Fri, 8 Mar 2024 08:26:56 -0700 David Ahern wrote:
> > If I had to explain this in the changelog, I guess I would not do all
> > these changes, this would be too time consuming.  
> 
> The request was something simple as the following in the changelog:
> 
> "New objects not in any list or table do not need the annotations, nor
> do updates done while holding a lock."

I was gonna add this when apply but looks like DaveM's already merged
this :S
diff mbox series

Patch

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2f84e6ecf19f48602cadb47bc378c9b5a1cdbf65..480a1f9492b590bb13008cede5ea7dc9c422af67 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2730,7 +2730,7 @@  int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
 		if (update_lft) {
 			ifp->valid_lft = valid_lft;
 			ifp->prefered_lft = prefered_lft;
-			ifp->tstamp = now;
+			WRITE_ONCE(ifp->tstamp, now);
 			flags = ifp->flags;
 			ifp->flags &= ~IFA_F_DEPRECATED;
 			spin_unlock_bh(&ifp->lock);
@@ -4898,13 +4898,13 @@  static int inet6_addr_modify(struct net *net, struct inet6_ifaddr *ifp,
 			IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR |
 			IFA_F_NOPREFIXROUTE);
 	ifp->flags |= cfg->ifa_flags;
-	ifp->tstamp = jiffies;
-	ifp->valid_lft = cfg->valid_lft;
-	ifp->prefered_lft = cfg->preferred_lft;
-	ifp->ifa_proto = cfg->ifa_proto;
+	WRITE_ONCE(ifp->tstamp, jiffies);
+	WRITE_ONCE(ifp->valid_lft, cfg->valid_lft);
+	WRITE_ONCE(ifp->prefered_lft, cfg->preferred_lft);
+	WRITE_ONCE(ifp->ifa_proto, cfg->ifa_proto);
 
 	if (cfg->rt_priority && cfg->rt_priority != ifp->rt_priority)
-		ifp->rt_priority = cfg->rt_priority;
+		WRITE_ONCE(ifp->rt_priority, cfg->rt_priority);
 
 	if (new_peer)
 		ifp->peer_addr = *cfg->peer_pfx;
@@ -5125,17 +5125,21 @@  struct inet6_fill_args {
 	enum addr_type_t type;
 };
 
-static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
+static int inet6_fill_ifaddr(struct sk_buff *skb,
+			     const struct inet6_ifaddr *ifa,
 			     struct inet6_fill_args *args)
 {
-	struct nlmsghdr  *nlh;
+	struct nlmsghdr *nlh;
 	u32 preferred, valid;
+	u32 flags, priority;
+	u8 proto;
 
 	nlh = nlmsg_put(skb, args->portid, args->seq, args->event,
 			sizeof(struct ifaddrmsg), args->flags);
 	if (!nlh)
 		return -EMSGSIZE;
 
+	flags = READ_ONCE(ifa->flags);
 	put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope),
 		      ifa->idev->dev->ifindex);
 
@@ -5143,13 +5147,14 @@  static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
 	    nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid))
 		goto error;
 
-	spin_lock_bh(&ifa->lock);
-	if (!((ifa->flags&IFA_F_PERMANENT) &&
-	      (ifa->prefered_lft == INFINITY_LIFE_TIME))) {
-		preferred = ifa->prefered_lft;
-		valid = ifa->valid_lft;
+	preferred = READ_ONCE(ifa->prefered_lft);
+	valid = READ_ONCE(ifa->valid_lft);
+
+	if (!((flags & IFA_F_PERMANENT) &&
+	      (preferred == INFINITY_LIFE_TIME))) {
 		if (preferred != INFINITY_LIFE_TIME) {
-			long tval = (jiffies - ifa->tstamp)/HZ;
+			long tval = (jiffies - READ_ONCE(ifa->tstamp)) / HZ;
+
 			if (preferred > tval)
 				preferred -= tval;
 			else
@@ -5165,28 +5170,29 @@  static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
 		preferred = INFINITY_LIFE_TIME;
 		valid = INFINITY_LIFE_TIME;
 	}
-	spin_unlock_bh(&ifa->lock);
 
 	if (!ipv6_addr_any(&ifa->peer_addr)) {
 		if (nla_put_in6_addr(skb, IFA_LOCAL, &ifa->addr) < 0 ||
 		    nla_put_in6_addr(skb, IFA_ADDRESS, &ifa->peer_addr) < 0)
 			goto error;
-	} else
+	} else {
 		if (nla_put_in6_addr(skb, IFA_ADDRESS, &ifa->addr) < 0)
 			goto error;
+	}
 
-	if (ifa->rt_priority &&
-	    nla_put_u32(skb, IFA_RT_PRIORITY, ifa->rt_priority))
+	priority = READ_ONCE(ifa->rt_priority);
+	if (priority && nla_put_u32(skb, IFA_RT_PRIORITY, priority))
 		goto error;
 
-	if (put_cacheinfo(skb, ifa->cstamp, ifa->tstamp, preferred, valid) < 0)
+	if (put_cacheinfo(skb, ifa->cstamp, READ_ONCE(ifa->tstamp),
+			  preferred, valid) < 0)
 		goto error;
 
-	if (nla_put_u32(skb, IFA_FLAGS, ifa->flags) < 0)
+	if (nla_put_u32(skb, IFA_FLAGS, flags) < 0)
 		goto error;
 
-	if (ifa->ifa_proto &&
-	    nla_put_u8(skb, IFA_PROTO, ifa->ifa_proto))
+	proto = READ_ONCE(ifa->ifa_proto);
+	if (proto && nla_put_u8(skb, IFA_PROTO, proto))
 		goto error;
 
 	nlmsg_end(skb, nlh);
@@ -5197,12 +5203,13 @@  static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
 	return -EMSGSIZE;
 }
 
-static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
+static int inet6_fill_ifmcaddr(struct sk_buff *skb,
+			       const struct ifmcaddr6 *ifmca,
 			       struct inet6_fill_args *args)
 {
-	struct nlmsghdr  *nlh;
-	u8 scope = RT_SCOPE_UNIVERSE;
 	int ifindex = ifmca->idev->dev->ifindex;
+	u8 scope = RT_SCOPE_UNIVERSE;
+	struct nlmsghdr *nlh;
 
 	if (ipv6_addr_scope(&ifmca->mca_addr) & IFA_SITE)
 		scope = RT_SCOPE_SITE;
@@ -5220,7 +5227,7 @@  static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
 
 	put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
 	if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 ||
-	    put_cacheinfo(skb, ifmca->mca_cstamp, ifmca->mca_tstamp,
+	    put_cacheinfo(skb, ifmca->mca_cstamp, READ_ONCE(ifmca->mca_tstamp),
 			  INFINITY_LIFE_TIME, INFINITY_LIFE_TIME) < 0) {
 		nlmsg_cancel(skb, nlh);
 		return -EMSGSIZE;
@@ -5230,13 +5237,14 @@  static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
 	return 0;
 }
 
-static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
+static int inet6_fill_ifacaddr(struct sk_buff *skb,
+			       const struct ifacaddr6 *ifaca,
 			       struct inet6_fill_args *args)
 {
 	struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt);
 	int ifindex = dev ? dev->ifindex : 1;
-	struct nlmsghdr  *nlh;
 	u8 scope = RT_SCOPE_UNIVERSE;
+	struct nlmsghdr *nlh;
 
 	if (ipv6_addr_scope(&ifaca->aca_addr) & IFA_SITE)
 		scope = RT_SCOPE_SITE;
@@ -5254,7 +5262,7 @@  static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
 
 	put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
 	if (nla_put_in6_addr(skb, IFA_ANYCAST, &ifaca->aca_addr) < 0 ||
-	    put_cacheinfo(skb, ifaca->aca_cstamp, ifaca->aca_tstamp,
+	    put_cacheinfo(skb, ifaca->aca_cstamp, READ_ONCE(ifaca->aca_tstamp),
 			  INFINITY_LIFE_TIME, INFINITY_LIFE_TIME) < 0) {
 		nlmsg_cancel(skb, nlh);
 		return -EMSGSIZE;