diff mbox series

[net-next,5/8] rtnetlink: do not depend on RTNL for many attributes

Message ID 20240503192059.3884225-6-edumazet@google.com (mailing list archive)
State Accepted
Commit 6747a5d4990b8c8d7392f7a06b7a4bb5f4ada80e
Delegated to: Netdev Maintainers
Headers show
Series rtnetlink: more rcu conversions for rtnl_fill_ifinfo() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 928 this patch: 928
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 939 this patch: 939
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 70 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 6 this patch: 6
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-05--03-00 (tests: 1003)

Commit Message

Eric Dumazet May 3, 2024, 7:20 p.m. UTC
Following device fields can be read locklessly
in rtnl_fill_ifinfo() :

type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
tso_max_segs, num_rx_queues.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/rtnetlink.c | 51 +++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 20 deletions(-)

Comments

Simon Horman May 5, 2024, 2:46 p.m. UTC | #1
On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> Following device fields can be read locklessly
> in rtnl_fill_ifinfo() :
> 
> type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> tso_max_segs, num_rx_queues.

Hi Eric,

* Regarding mtu, as the comment you added to sruct net_device
  some time ago mentions, mtu is written in many places.

  I'm wondering if, in particular wrt ndo_change_mtu implementations,
  if some it is appropriate to add WRITE_ONCE() annotations.

* Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?
Eric Dumazet May 5, 2024, 3 p.m. UTC | #2
On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
>
> On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > Following device fields can be read locklessly
> > in rtnl_fill_ifinfo() :
> >
> > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > tso_max_segs, num_rx_queues.
>
> Hi Eric,
>
> * Regarding mtu, as the comment you added to sruct net_device
>   some time ago mentions, mtu is written in many places.
>
>   I'm wondering if, in particular wrt ndo_change_mtu implementations,
>   if some it is appropriate to add WRITE_ONCE() annotations.

Sure thing. I called for these changes in commit
501a90c94510 ("inet: protect against too small mtu values.")
when I said "Hopefully we will add the missing ones in followup patches."


>
> * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?

In general, a lot of write sides would need to be changed.

In practice, most compilers will not perform store tearing, this would
be quite expensive.

Also, adding WRITE_ONCE() will not prevent a reader from reading some
temporary values,
take a look at dev_change_tx_queue_len() for instance.
Simon Horman May 5, 2024, 3:06 p.m. UTC | #3
On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> >
> > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > Following device fields can be read locklessly
> > > in rtnl_fill_ifinfo() :
> > >
> > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > tso_max_segs, num_rx_queues.
> >
> > Hi Eric,
> >
> > * Regarding mtu, as the comment you added to sruct net_device
> >   some time ago mentions, mtu is written in many places.
> >
> >   I'm wondering if, in particular wrt ndo_change_mtu implementations,
> >   if some it is appropriate to add WRITE_ONCE() annotations.
> 
> Sure thing. I called for these changes in commit
> 501a90c94510 ("inet: protect against too small mtu values.")
> when I said "Hopefully we will add the missing ones in followup patches."

Ok, so basically it would be nice to add them,
but they don't block progress of this patchset?

> > * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?
> 
> In general, a lot of write sides would need to be changed.
> 
> In practice, most compilers will not perform store tearing, this would
> be quite expensive.
> 
> Also, adding WRITE_ONCE() will not prevent a reader from reading some
> temporary values,
> take a look at dev_change_tx_queue_len() for instance.

Thank you, I will study this more closely.
Eric Dumazet May 5, 2024, 3:14 p.m. UTC | #4
On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
>
> On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > >
> > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > Following device fields can be read locklessly
> > > > in rtnl_fill_ifinfo() :
> > > >
> > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > tso_max_segs, num_rx_queues.
> > >
> > > Hi Eric,
> > >
> > > * Regarding mtu, as the comment you added to sruct net_device
> > >   some time ago mentions, mtu is written in many places.
> > >
> > >   I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > >   if some it is appropriate to add WRITE_ONCE() annotations.
> >
> > Sure thing. I called for these changes in commit
> > 501a90c94510 ("inet: protect against too small mtu values.")
> > when I said "Hopefully we will add the missing ones in followup patches."
>
> Ok, so basically it would be nice to add them,
> but they don't block progress of this patchset?

A patch set adding WRITE_ONCE() on all dev->mtu would be great,
and seems orthogonal. Note that we already have many points in the
stack where dev->mtu is read locklessly.
Simon Horman May 7, 2024, 4:38 p.m. UTC | #5
On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote:
> On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
> >
> > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > > >
> > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > > Following device fields can be read locklessly
> > > > > in rtnl_fill_ifinfo() :
> > > > >
> > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > > tso_max_segs, num_rx_queues.
> > > >
> > > > Hi Eric,
> > > >
> > > > * Regarding mtu, as the comment you added to sruct net_device
> > > >   some time ago mentions, mtu is written in many places.
> > > >
> > > >   I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > >   if some it is appropriate to add WRITE_ONCE() annotations.
> > >
> > > Sure thing. I called for these changes in commit
> > > 501a90c94510 ("inet: protect against too small mtu values.")
> > > when I said "Hopefully we will add the missing ones in followup patches."
> >
> > Ok, so basically it would be nice to add them,
> > but they don't block progress of this patchset?
> 
> A patch set adding WRITE_ONCE() on all dev->mtu would be great,
> and seems orthogonal.

Ack. I'm guessing an incremental approach to getting better coverage would
be best. I'll add this to my todo list.

> Note that we already have many points in the
> stack where dev->mtu is read locklessly.

Understood.

For the record, I have no objections to this patch.

Reviewed-by: Simon Horman <horms@kernel.org>
Eric Dumazet May 7, 2024, 4:39 p.m. UTC | #6
On Tue, May 7, 2024 at 6:38 PM Simon Horman <horms@kernel.org> wrote:
>
> On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote:
> > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
> > >
> > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > > > >
> > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > > > Following device fields can be read locklessly
> > > > > > in rtnl_fill_ifinfo() :
> > > > > >
> > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > > > tso_max_segs, num_rx_queues.
> > > > >
> > > > > Hi Eric,
> > > > >
> > > > > * Regarding mtu, as the comment you added to sruct net_device
> > > > >   some time ago mentions, mtu is written in many places.
> > > > >
> > > > >   I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > > >   if some it is appropriate to add WRITE_ONCE() annotations.
> > > >
> > > > Sure thing. I called for these changes in commit
> > > > 501a90c94510 ("inet: protect against too small mtu values.")
> > > > when I said "Hopefully we will add the missing ones in followup patches."
> > >
> > > Ok, so basically it would be nice to add them,
> > > but they don't block progress of this patchset?
> >
> > A patch set adding WRITE_ONCE() on all dev->mtu would be great,
> > and seems orthogonal.
>
> Ack. I'm guessing an incremental approach to getting better coverage would
> be best. I'll add this to my todo list.

I sent a single patch about that already ;)

https://patchwork.kernel.org/project/netdevbpf/patch/20240506102812.3025432-1-edumazet@google.com/

Thanks !
Simon Horman May 7, 2024, 4:55 p.m. UTC | #7
On Tue, May 07, 2024 at 06:39:54PM +0200, Eric Dumazet wrote:
> On Tue, May 7, 2024 at 6:38 PM Simon Horman <horms@kernel.org> wrote:
> >
> > On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote:
> > > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote:
> > > >
> > > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote:
> > > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote:
> > > > > > > Following device fields can be read locklessly
> > > > > > > in rtnl_fill_ifinfo() :
> > > > > > >
> > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group,
> > > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size,
> > > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size,
> > > > > > > tso_max_segs, num_rx_queues.
> > > > > >
> > > > > > Hi Eric,
> > > > > >
> > > > > > * Regarding mtu, as the comment you added to sruct net_device
> > > > > >   some time ago mentions, mtu is written in many places.
> > > > > >
> > > > > >   I'm wondering if, in particular wrt ndo_change_mtu implementations,
> > > > > >   if some it is appropriate to add WRITE_ONCE() annotations.
> > > > >
> > > > > Sure thing. I called for these changes in commit
> > > > > 501a90c94510 ("inet: protect against too small mtu values.")
> > > > > when I said "Hopefully we will add the missing ones in followup patches."
> > > >
> > > > Ok, so basically it would be nice to add them,
> > > > but they don't block progress of this patchset?
> > >
> > > A patch set adding WRITE_ONCE() on all dev->mtu would be great,
> > > and seems orthogonal.
> >
> > Ack. I'm guessing an incremental approach to getting better coverage would
> > be best. I'll add this to my todo list.
> 
> I sent a single patch about that already ;)

:)
diff mbox series

Patch

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 77d14528bdefc8b655f5da37ed88d0b937f35a61..242c24e857ec4c799f0239be3371fd589a8ed191 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1603,7 +1603,8 @@  static int put_master_ifindex(struct sk_buff *skb, struct net_device *dev)
 
 	upper_dev = netdev_master_upper_dev_get_rcu(dev);
 	if (upper_dev)
-		ret = nla_put_u32(skb, IFLA_MASTER, upper_dev->ifindex);
+		ret = nla_put_u32(skb, IFLA_MASTER,
+				  READ_ONCE(upper_dev->ifindex));
 
 	rcu_read_unlock();
 	return ret;
@@ -1825,8 +1826,8 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	ifm = nlmsg_data(nlh);
 	ifm->ifi_family = AF_UNSPEC;
 	ifm->__ifi_pad = 0;
-	ifm->ifi_type = dev->type;
-	ifm->ifi_index = dev->ifindex;
+	ifm->ifi_type = READ_ONCE(dev->type);
+	ifm->ifi_index = READ_ONCE(dev->ifindex);
 	ifm->ifi_flags = dev_get_flags(dev);
 	ifm->ifi_change = change;
 
@@ -1839,24 +1840,34 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb,
 
 	if (nla_put_u32(skb, IFLA_TXQLEN, READ_ONCE(dev->tx_queue_len)) ||
 	    nla_put_u8(skb, IFLA_OPERSTATE,
-		       netif_running(dev) ? dev->operstate : IF_OPER_DOWN) ||
-	    nla_put_u8(skb, IFLA_LINKMODE, dev->link_mode) ||
-	    nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
-	    nla_put_u32(skb, IFLA_MIN_MTU, dev->min_mtu) ||
-	    nla_put_u32(skb, IFLA_MAX_MTU, dev->max_mtu) ||
-	    nla_put_u32(skb, IFLA_GROUP, dev->group) ||
-	    nla_put_u32(skb, IFLA_PROMISCUITY, dev->promiscuity) ||
-	    nla_put_u32(skb, IFLA_ALLMULTI, dev->allmulti) ||
-	    nla_put_u32(skb, IFLA_NUM_TX_QUEUES, dev->num_tx_queues) ||
-	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
-	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
-	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
-	    nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE, dev->gso_ipv4_max_size) ||
-	    nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE, dev->gro_ipv4_max_size) ||
-	    nla_put_u32(skb, IFLA_TSO_MAX_SIZE, dev->tso_max_size) ||
-	    nla_put_u32(skb, IFLA_TSO_MAX_SEGS, dev->tso_max_segs) ||
+		       netif_running(dev) ? READ_ONCE(dev->operstate) :
+					    IF_OPER_DOWN) ||
+	    nla_put_u8(skb, IFLA_LINKMODE, READ_ONCE(dev->link_mode)) ||
+	    nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) ||
+	    nla_put_u32(skb, IFLA_MIN_MTU, READ_ONCE(dev->min_mtu)) ||
+	    nla_put_u32(skb, IFLA_MAX_MTU, READ_ONCE(dev->max_mtu)) ||
+	    nla_put_u32(skb, IFLA_GROUP, READ_ONCE(dev->group)) ||
+	    nla_put_u32(skb, IFLA_PROMISCUITY, READ_ONCE(dev->promiscuity)) ||
+	    nla_put_u32(skb, IFLA_ALLMULTI, READ_ONCE(dev->allmulti)) ||
+	    nla_put_u32(skb, IFLA_NUM_TX_QUEUES,
+			READ_ONCE(dev->num_tx_queues)) ||
+	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS,
+			READ_ONCE(dev->gso_max_segs)) ||
+	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE,
+			READ_ONCE(dev->gso_max_size)) ||
+	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE,
+			READ_ONCE(dev->gro_max_size)) ||
+	    nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE,
+			READ_ONCE(dev->gso_ipv4_max_size)) ||
+	    nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE,
+			READ_ONCE(dev->gro_ipv4_max_size)) ||
+	    nla_put_u32(skb, IFLA_TSO_MAX_SIZE,
+			READ_ONCE(dev->tso_max_size)) ||
+	    nla_put_u32(skb, IFLA_TSO_MAX_SEGS,
+			READ_ONCE(dev->tso_max_segs)) ||
 #ifdef CONFIG_RPS
-	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
+	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES,
+			READ_ONCE(dev->num_rx_queues)) ||
 #endif
 	    put_master_ifindex(skb, dev) ||
 	    nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||