diff mbox series

[net-next] netdevice: define and allocate &net_device _properly_

Message ID 20240507123937.15364-1-aleksander.lobakin@intel.com (mailing list archive)
State Handled Elsewhere
Headers show
Series [net-next] netdevice: define and allocate &net_device _properly_ | expand

Commit Message

Alexander Lobakin May 7, 2024, 12:39 p.m. UTC
In fact, this structure contains a flexible array at the end, but
historically its size, alignment etc., is calculated manually.
There are several instances of the structure embedded into other
structures, but also there's ongoing effort to remove them and we
could in the meantime declare &net_device properly.
Declare the array explicitly, use struct_size() and store the array
size inside the structure, so that __counted_by() can be applied.
Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the
allocated buffer is aligned to what the user expects.
Also, change its alignment from %NETDEV_ALIGN to the cacheline size
as per several suggestions on the netdev ML.

bloat-o-meter for vmlinux:

free_netdev                                  445     440      -5
netdev_freemem                                24       -     -24
alloc_netdev_mqs                            1481    1450     -31

On x86_64 with several NICs of different vendors, I was never able to
get a &net_device pointer not aligned to the cacheline size after the
change.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/netdevice.h | 12 +++++++-----
 net/core/dev.c            | 31 +++++++------------------------
 net/core/net-sysfs.c      |  2 +-
 3 files changed, 15 insertions(+), 30 deletions(-)

Comments

Przemek Kitszel May 7, 2024, 3:30 p.m. UTC | #1
On 5/7/24 14:39, Alexander Lobakin wrote:
> In fact, this structure contains a flexible array at the end, but
> historically its size, alignment etc., is calculated manually.
> There are several instances of the structure embedded into other
> structures, but also there's ongoing effort to remove them and we
> could in the meantime declare &net_device properly.
> Declare the array explicitly, use struct_size() and store the array
> size inside the structure, so that __counted_by() can be applied.
> Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the
> allocated buffer is aligned to what the user expects.
> Also, change its alignment from %NETDEV_ALIGN to the cacheline size
> as per several suggestions on the netdev ML.
> 
> bloat-o-meter for vmlinux:
> 
> free_netdev                                  445     440      -5
> netdev_freemem                                24       -     -24
> alloc_netdev_mqs                            1481    1450     -31
> 
> On x86_64 with several NICs of different vendors, I was never able to
> get a &net_device pointer not aligned to the cacheline size after the
> change.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>   include/linux/netdevice.h | 12 +++++++-----
>   net/core/dev.c            | 31 +++++++------------------------
>   net/core/net-sysfs.c      |  2 +-
>   3 files changed, 15 insertions(+), 30 deletions(-)
> 

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Jakub Kicinski May 7, 2024, 6:10 p.m. UTC | #2
On Tue,  7 May 2024 14:39:37 +0200 Alexander Lobakin wrote:
> There are several instances of the structure embedded into other
> structures, but also there's ongoing effort to remove them and we
> could in the meantime declare &net_device properly.

Is there a reason you're reposting this before that effort is completed?
The warnings this adds come from sparse and you think they should be
ignored?

TBH since Breno is doing the heavy lifting of changing the embedders 
it'd seem more fair to me if he got to send this at the end. Or at
least, you know, got a mention or a CC.
Eric Dumazet May 7, 2024, 6:21 p.m. UTC | #3
On Tue, May 7, 2024 at 2:40 PM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> In fact, this structure contains a flexible array at the end, but
> historically its size, alignment etc., is calculated manually.
> There are several instances of the structure embedded into other
> structures, but also there's ongoing effort to remove them and we
> could in the meantime declare &net_device properly.
> Declare the array explicitly, use struct_size() and store the array
> size inside the structure, so that __counted_by() can be applied.
> Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the
> allocated buffer is aligned to what the user expects.
> Also, change its alignment from %NETDEV_ALIGN to the cacheline size
> as per several suggestions on the netdev ML.
>
> bloat-o-meter for vmlinux:
>
> free_netdev                                  445     440      -5
> netdev_freemem                                24       -     -24
> alloc_netdev_mqs                            1481    1450     -31
>
> On x86_64 with several NICs of different vendors, I was never able to
> get a &net_device pointer not aligned to the cacheline size after the
> change.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---

...

> -       p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
> -       if (!p)
> +       sizeof_priv = ALIGN(sizeof_priv, SMP_CACHE_BYTES);

If we have a __counted_by(priv_len), why do you ALIGN(sizeof_priv,
SMP_CACHE_BYTES) ?

If a driver pretends its private part is 4 bytes, we should get a
warning if 20 bytes are used instead.

You added two ____cacheline_aligned already in net_device already.
Alexander Lobakin May 8, 2024, 9:10 a.m. UTC | #4
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 7 May 2024 20:21:58 +0200

> On Tue, May 7, 2024 at 2:40 PM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> In fact, this structure contains a flexible array at the end, but
>> historically its size, alignment etc., is calculated manually.
>> There are several instances of the structure embedded into other
>> structures, but also there's ongoing effort to remove them and we
>> could in the meantime declare &net_device properly.
>> Declare the array explicitly, use struct_size() and store the array
>> size inside the structure, so that __counted_by() can be applied.
>> Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the
>> allocated buffer is aligned to what the user expects.
>> Also, change its alignment from %NETDEV_ALIGN to the cacheline size
>> as per several suggestions on the netdev ML.
>>
>> bloat-o-meter for vmlinux:
>>
>> free_netdev                                  445     440      -5
>> netdev_freemem                                24       -     -24
>> alloc_netdev_mqs                            1481    1450     -31
>>
>> On x86_64 with several NICs of different vendors, I was never able to
>> get a &net_device pointer not aligned to the cacheline size after the
>> change.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> ---
> 
> ...
> 
>> -       p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
>> -       if (!p)
>> +       sizeof_priv = ALIGN(sizeof_priv, SMP_CACHE_BYTES);
> 
> If we have a __counted_by(priv_len), why do you ALIGN(sizeof_priv,
> SMP_CACHE_BYTES) ?

To have the whole block of &net_device + private part aligned to a
cacheline size.

> 
> If a driver pretends its private part is 4 bytes, we should get a
> warning if 20 bytes are used instead.

Ah okay, so this should be

	p = kvzalloc(struct_size(ALIGN(sizeof_priv, SMP_CACHE_BYTES)));
	p->priv_len = sizeof_priv;

> 
> You added two ____cacheline_aligned already in net_device already.

The whole size passed to kvzalloc() must be cacheline-aligned, otherwise
the MM layer can miscalculate the pointer alignment.

Thanks,
Olek
Alexander Lobakin May 8, 2024, 9:13 a.m. UTC | #5
From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 7 May 2024 11:10:35 -0700

> On Tue,  7 May 2024 14:39:37 +0200 Alexander Lobakin wrote:
>> There are several instances of the structure embedded into other
>> structures, but also there's ongoing effort to remove them and we
>> could in the meantime declare &net_device properly.
> 
> Is there a reason you're reposting this before that effort is completed?

To speed up the conversion probably :D

> The warnings this adds come from sparse and you think they should be
> ignored?

For now...

> 
> TBH since Breno is doing the heavy lifting of changing the embedders 
> it'd seem more fair to me if he got to send this at the end. Or at
> least, you know, got a mention or a CC.

I was lazy enough to add tags, sorry. The idea of him sending this at
the end sounds reasonable.

Thanks,
Olek
Breno Leitao June 14, 2024, noon UTC | #6
On Wed, May 08, 2024 at 11:13:21AM +0200, Alexander Lobakin wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Tue, 7 May 2024 11:10:35 -0700
> 
> > On Tue,  7 May 2024 14:39:37 +0200 Alexander Lobakin wrote:
> >> There are several instances of the structure embedded into other
> >> structures, but also there's ongoing effort to remove them and we
> >> could in the meantime declare &net_device properly.
> > 
> > Is there a reason you're reposting this before that effort is completed?
> 
> To speed up the conversion probably :D
> 
> > The warnings this adds come from sparse and you think they should be
> > ignored?
> 
> For now...
> 
> > 
> > TBH since Breno is doing the heavy lifting of changing the embedders 
> > it'd seem more fair to me if he got to send this at the end. Or at
> > least, you know, got a mention or a CC.
> 
> I was lazy enough to add tags, sorry. The idea of him sending this at
> the end sounds reasonable.

I think we are almost at the time to get rid of the last user of
embedded netdev.

	https://lore.kernel.org/all/20240614115317.657700-1-leitao@debian.org/

Once that patch lands, I will submit this patch on top of that final
fix.

Let me know if you have any concern.

--breno
Breno Leitao June 21, 2024, 1:11 p.m. UTC | #7
On Fri, Jun 14, 2024 at 05:00:54AM -0700, Breno Leitao wrote:
> On Wed, May 08, 2024 at 11:13:21AM +0200, Alexander Lobakin wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Date: Tue, 7 May 2024 11:10:35 -0700
> > 
> > > On Tue,  7 May 2024 14:39:37 +0200 Alexander Lobakin wrote:
> > >> There are several instances of the structure embedded into other
> > >> structures, but also there's ongoing effort to remove them and we
> > >> could in the meantime declare &net_device properly.
> > > 
> > > Is there a reason you're reposting this before that effort is completed?
> > 
> > To speed up the conversion probably :D
> > 
> > > The warnings this adds come from sparse and you think they should be
> > > ignored?
> > 
> > For now...
> > 
> > > 
> > > TBH since Breno is doing the heavy lifting of changing the embedders 
> > > it'd seem more fair to me if he got to send this at the end. Or at
> > > least, you know, got a mention or a CC.
> > 
> > I was lazy enough to add tags, sorry. The idea of him sending this at
> > the end sounds reasonable.
> 
> I think we are almost at the time to get rid of the last user of
> embedded netdev.
> 
> 	https://lore.kernel.org/all/20240614115317.657700-1-leitao@debian.org/
> 
> Once that patch lands, I will submit this patch on top of that final
> fix.

In fact, I jumped the gun, and reviewing the embedded users of
netdevice, I found there are three more devices, that will need some
care before we proceed

drivers/crypto/caam/caamalg_qi2.h:

  struct dpaa2_caam_priv_per_cpu {
	...
	struct net_device net_dev;


drivers/crypto/caam/qi.c:

  struct caam_qi_pcpu_priv {
	...
        struct net_device net_dev;

drivers/net/ethernet/cavium/thunder/thunder_bgx.c:

  struct lmac {
	...
        struct net_device       netdev;
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf261fb89d73..171d70618a70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2199,10 +2199,10 @@  struct net_device {
 	unsigned short		neigh_priv_len;
 	unsigned short          dev_id;
 	unsigned short          dev_port;
-	unsigned short		padded;
+	int			irq;
+	u32			priv_len;
 
 	spinlock_t		addr_list_lock;
-	int			irq;
 
 	struct netdev_hw_addr_list	uc;
 	struct netdev_hw_addr_list	mc;
@@ -2403,7 +2403,10 @@  struct net_device {
 	/** @page_pools: page pools created for this netdevice */
 	struct hlist_head	page_pools;
 #endif
-};
+
+	u8			priv[] ____cacheline_aligned
+				       __counted_by(priv_len);
+} ____cacheline_aligned;
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
 /*
@@ -2593,7 +2596,7 @@  void dev_net_set(struct net_device *dev, struct net *net)
  */
 static inline void *netdev_priv(const struct net_device *dev)
 {
-	return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
+	return (void *)dev->priv;
 }
 
 /* Set the sysfs physical device reference for the network logical device
@@ -3123,7 +3126,6 @@  static inline void unregister_netdevice(struct net_device *dev)
 
 int netdev_refcnt_read(const struct net_device *dev);
 void free_netdev(struct net_device *dev);
-void netdev_freemem(struct net_device *dev);
 void init_dummy_netdev(struct net_device *dev);
 
 struct net_device *netdev_get_xmit_slave(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index d6b24749eb2e..38c2e3c2df86 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10889,13 +10889,6 @@  void netdev_sw_irq_coalesce_default_on(struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(netdev_sw_irq_coalesce_default_on);
 
-void netdev_freemem(struct net_device *dev)
-{
-	char *addr = (char *)dev - dev->padded;
-
-	kvfree(addr);
-}
-
 /**
  * alloc_netdev_mqs - allocate network device
  * @sizeof_priv: size of private data to allocate space for
@@ -10915,8 +10908,6 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		unsigned int txqs, unsigned int rxqs)
 {
 	struct net_device *dev;
-	unsigned int alloc_size;
-	struct net_device *p;
 
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
@@ -10930,21 +10921,13 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
-	alloc_size = sizeof(struct net_device);
-	if (sizeof_priv) {
-		/* ensure 32-byte alignment of private area */
-		alloc_size = ALIGN(alloc_size, NETDEV_ALIGN);
-		alloc_size += sizeof_priv;
-	}
-	/* ensure 32-byte alignment of whole construct */
-	alloc_size += NETDEV_ALIGN - 1;
-
-	p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
-	if (!p)
+	sizeof_priv = ALIGN(sizeof_priv, SMP_CACHE_BYTES);
+	dev = kvzalloc(struct_size(dev, priv, sizeof_priv),
+		       GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
+	if (!dev)
 		return NULL;
 
-	dev = PTR_ALIGN(p, NETDEV_ALIGN);
-	dev->padded = (char *)dev - (char *)p;
+	dev->priv_len = sizeof_priv;
 
 	ref_tracker_dir_init(&dev->refcnt_tracker, 128, name);
 #ifdef CONFIG_PCPU_DEV_REFCNT
@@ -11034,7 +11017,7 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	free_percpu(dev->pcpu_refcnt);
 free_dev:
 #endif
-	netdev_freemem(dev);
+	kvfree(dev);
 	return NULL;
 }
 EXPORT_SYMBOL(alloc_netdev_mqs);
@@ -11090,7 +11073,7 @@  void free_netdev(struct net_device *dev)
 	/*  Compatibility with error handling in drivers */
 	if (dev->reg_state == NETREG_UNINITIALIZED ||
 	    dev->reg_state == NETREG_DUMMY) {
-		netdev_freemem(dev);
+		kvfree(dev);
 		return;
 	}
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 4c27a360c294..0e2084ce7b75 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -2028,7 +2028,7 @@  static void netdev_release(struct device *d)
 	 * device is dead and about to be freed.
 	 */
 	kfree(rcu_access_pointer(dev->ifalias));
-	netdev_freemem(dev);
+	kvfree(dev);
 }
 
 static const void *net_namespace(const struct device *d)