diff mbox series

[net-next] stmmac: align RX buffers

Message ID 20210614022504.24458-1-mcroce@linux.microsoft.com (mailing list archive)
State Accepted
Commit a955318fe67ec0d962760b5ee58e74bffaf649b8
Delegated to: Netdev Maintainers
Headers show
Series [net-next] stmmac: align RX buffers | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 5 maintainers not CCed: aou@eecs.berkeley.edu linux-arm-kernel@lists.infradead.org joabreu@synopsys.com linux-stm32@st-md-mailman.stormreply.com mcoquelin.stm32@gmail.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/header_inline success Link

Commit Message

Matteo Croce June 14, 2021, 2:25 a.m. UTC
From: Matteo Croce <mcroce@microsoft.com>

On RX an SKB is allocated and the received buffer is copied into it.
But on some architectures, the memcpy() needs the source and destination
buffers to have the same alignment to be efficient.

This is not our case, because SKB data pointer is misaligned by two bytes
to compensate the ethernet header.

Align the RX buffer the same way as the SKB one, so the copy is faster.
An iperf3 RX test gives a decent improvement on a RISC-V machine:

before:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
[  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver

after:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

And the memcpy() overhead during the RX drops dramatically.

before:
Overhead  Shared O  Symbol
  43.35%  [kernel]  [k] memcpy
  33.77%  [kernel]  [k] __asm_copy_to_user
   3.64%  [kernel]  [k] sifive_l2_flush64_range

after:
Overhead  Shared O  Symbol
  45.40%  [kernel]  [k] __asm_copy_to_user
  28.09%  [kernel]  [k] memcpy
   4.27%  [kernel]  [k] sifive_l2_flush64_range

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Miller June 14, 2021, 7:51 p.m. UTC | #1
But thois means the ethernet header will be misaliugned and this will kill performance on some cpus as misaligned accessed
are resolved wioth a trap handler.

Even on cpus that don't trap, the access will be slower.

Thanks.
Matteo Croce June 14, 2021, 11:21 p.m. UTC | #2
On Mon, 14 Jun 2021 12:51:11 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> 
> But thois means the ethernet header will be misaliugned and this will
> kill performance on some cpus as misaligned accessed are resolved
> wioth a trap handler.
> 
> Even on cpus that don't trap, the access will be slower.
> 
> Thanks.

Isn't the IP header which should be aligned to avoid expensive traps?
From include/linux/skbuff.h:

 * Since an ethernet header is 14 bytes network drivers often end up with
 * the IP header at an unaligned offset. The IP header can be aligned by
 * shifting the start of the packet by 2 bytes. Drivers should do this
 * with:
 *
 * skb_reserve(skb, NET_IP_ALIGN);

But the problem here really is not the header alignment, the problem is
that the rx buffer is copied into an skb, and the two buffers have
different alignments.
If I add this print, I get this for every packet:

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5460,6 +5460,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
+               printk("skb->data alignment: %lu\n", (uintptr_t)skb->data & 7);
+               printk("xdp.data alignment: %lu\n" , (uintptr_t)xdp.data & 7);
                skb_copy_to_linear_data(skb, xdp.data, buf1_len);

[ 1060.967768] skb->data alignment: 2
[ 1060.971174] xdp.data alignment: 0
[ 1061.967589] skb->data alignment: 2
[ 1061.970994] xdp.data alignment: 0

And many architectures do an optimized memcpy when the low order bits of the
two pointers match, to name a few:

arch/alpha/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

arch/xtensa/lib/memcopy.S:
	/*
	 * Destination and source are word-aligned, use word copy.
	 */
	# copy 16 bytes per iteration for word-aligned dst and word-aligned src

arch/openrisc/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

And so on. With my patch I (mis)align the two buffer at an offset 2
(NET_IP_ALIGN) so the data can be copied faster:

[   16.648485] skb->data alignment: 2
[   16.651894] xdp.data alignment: 2
[   16.714260] skb->data alignment: 2
[   16.717688] xdp.data alignment: 2

Does this make sense?

Regards,
David Miller June 15, 2021, 5:28 p.m. UTC | #3
Thank you for the explanation, I have appplied this patch.
patchwork-bot+netdevbpf@kernel.org June 15, 2021, 5:30 p.m. UTC | #4
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Mon, 14 Jun 2021 04:25:04 +0200 you wrote:
> From: Matteo Croce <mcroce@microsoft.com>
> 
> On RX an SKB is allocated and the received buffer is copied into it.
> But on some architectures, the memcpy() needs the source and destination
> buffers to have the same alignment to be efficient.
> 
> This is not our case, because SKB data pointer is misaligned by two bytes
> to compensate the ethernet header.
> 
> [...]

Here is the summary with links:
  - [net-next] stmmac: align RX buffers
    https://git.kernel.org/netdev/net-next/c/a955318fe67e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index b6cd43eda7ac..04bdb3950d63 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -338,9 +338,9 @@  static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
 static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
 {
 	if (stmmac_xdp_is_enabled(priv))
-		return XDP_PACKET_HEADROOM;
+		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
 
-	return 0;
+	return NET_SKB_PAD + NET_IP_ALIGN;
 }
 
 void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);