diff mbox series

[net-next] stmmac: align RX buffers

Message ID 20210614022504.24458-1-mcroce@linux.microsoft.com (mailing list archive)
State Accepted
Commit a955318fe67ec0d962760b5ee58e74bffaf649b8
Delegated to: Netdev Maintainers
Headers show
Series [net-next] stmmac: align RX buffers | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 5 maintainers not CCed: aou@eecs.berkeley.edu linux-arm-kernel@lists.infradead.org joabreu@synopsys.com linux-stm32@st-md-mailman.stormreply.com mcoquelin.stm32@gmail.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/header_inline success Link

Commit Message

Matteo Croce June 14, 2021, 2:25 a.m. UTC
From: Matteo Croce <mcroce@microsoft.com>

On RX an SKB is allocated and the received buffer is copied into it.
But on some architectures, the memcpy() needs the source and destination
buffers to have the same alignment to be efficient.

This is not our case, because SKB data pointer is misaligned by two bytes
to compensate the ethernet header.

Align the RX buffer the same way as the SKB one, so the copy is faster.
An iperf3 RX test gives a decent improvement on a RISC-V machine:

before:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
[  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver

after:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

And the memcpy() overhead during the RX drops dramatically.

before:
Overhead  Shared O  Symbol
  43.35%  [kernel]  [k] memcpy
  33.77%  [kernel]  [k] __asm_copy_to_user
   3.64%  [kernel]  [k] sifive_l2_flush64_range

after:
Overhead  Shared O  Symbol
  45.40%  [kernel]  [k] __asm_copy_to_user
  28.09%  [kernel]  [k] memcpy
   4.27%  [kernel]  [k] sifive_l2_flush64_range

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Miller June 14, 2021, 7:51 p.m. UTC | #1
But thois means the ethernet header will be misaliugned and this will kill performance on some cpus as misaligned accessed
are resolved wioth a trap handler.

Even on cpus that don't trap, the access will be slower.

Thanks.
Matteo Croce June 14, 2021, 11:21 p.m. UTC | #2
On Mon, 14 Jun 2021 12:51:11 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> 
> But thois means the ethernet header will be misaliugned and this will
> kill performance on some cpus as misaligned accessed are resolved
> wioth a trap handler.
> 
> Even on cpus that don't trap, the access will be slower.
> 
> Thanks.

Isn't the IP header which should be aligned to avoid expensive traps?
From include/linux/skbuff.h:

 * Since an ethernet header is 14 bytes network drivers often end up with
 * the IP header at an unaligned offset. The IP header can be aligned by
 * shifting the start of the packet by 2 bytes. Drivers should do this
 * with:
 *
 * skb_reserve(skb, NET_IP_ALIGN);

But the problem here really is not the header alignment, the problem is
that the rx buffer is copied into an skb, and the two buffers have
different alignments.
If I add this print, I get this for every packet:

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5460,6 +5460,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
+               printk("skb->data alignment: %lu\n", (uintptr_t)skb->data & 7);
+               printk("xdp.data alignment: %lu\n" , (uintptr_t)xdp.data & 7);
                skb_copy_to_linear_data(skb, xdp.data, buf1_len);

[ 1060.967768] skb->data alignment: 2
[ 1060.971174] xdp.data alignment: 0
[ 1061.967589] skb->data alignment: 2
[ 1061.970994] xdp.data alignment: 0

And many architectures do an optimized memcpy when the low order bits of the
two pointers match, to name a few:

arch/alpha/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

arch/xtensa/lib/memcopy.S:
	/*
	 * Destination and source are word-aligned, use word copy.
	 */
	# copy 16 bytes per iteration for word-aligned dst and word-aligned src

arch/openrisc/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

And so on. With my patch I (mis)align the two buffer at an offset 2
(NET_IP_ALIGN) so the data can be copied faster:

[   16.648485] skb->data alignment: 2
[   16.651894] xdp.data alignment: 2
[   16.714260] skb->data alignment: 2
[   16.717688] xdp.data alignment: 2

Does this make sense?

Regards,
David Miller June 15, 2021, 5:28 p.m. UTC | #3
Thank you for the explanation, I have appplied this patch.
patchwork-bot+netdevbpf@kernel.org June 15, 2021, 5:30 p.m. UTC | #4
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Mon, 14 Jun 2021 04:25:04 +0200 you wrote:
> From: Matteo Croce <mcroce@microsoft.com>
> 
> On RX an SKB is allocated and the received buffer is copied into it.
> But on some architectures, the memcpy() needs the source and destination
> buffers to have the same alignment to be efficient.
> 
> This is not our case, because SKB data pointer is misaligned by two bytes
> to compensate the ethernet header.
> 
> [...]

Here is the summary with links:
  - [net-next] stmmac: align RX buffers
    https://git.kernel.org/netdev/net-next/c/a955318fe67e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Marc Zyngier Aug. 10, 2021, 7:07 p.m. UTC | #5
Hi all,

[adding Thierry, Jon and Will to the fun]

On Mon, 14 Jun 2021 03:25:04 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> From: Matteo Croce <mcroce@microsoft.com>
> 
> On RX an SKB is allocated and the received buffer is copied into it.
> But on some architectures, the memcpy() needs the source and destination
> buffers to have the same alignment to be efficient.
> 
> This is not our case, because SKB data pointer is misaligned by two bytes
> to compensate the ethernet header.
> 
> Align the RX buffer the same way as the SKB one, so the copy is faster.
> An iperf3 RX test gives a decent improvement on a RISC-V machine:
> 
> before:
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
> [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
> 
> after:
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
> [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
> 
> And the memcpy() overhead during the RX drops dramatically.
> 
> before:
> Overhead  Shared O  Symbol
>   43.35%  [kernel]  [k] memcpy
>   33.77%  [kernel]  [k] __asm_copy_to_user
>    3.64%  [kernel]  [k] sifive_l2_flush64_range
> 
> after:
> Overhead  Shared O  Symbol
>   45.40%  [kernel]  [k] __asm_copy_to_user
>   28.09%  [kernel]  [k] memcpy
>    4.27%  [kernel]  [k] sifive_l2_flush64_range
> 
> Signed-off-by: Matteo Croce <mcroce@microsoft.com>

This patch completely breaks my Jetson TX2 system, composed of 2
Nvidia Denver and 4 Cortex-A57, in a very "funny" way.

Any significant amount of traffic result in all sort of corruption
(ssh connections get dropped, Debian packages downloaded have the
wrong checksums) if any Denver core is involved in any significant way
(packet processing, interrupt handling). And it is all triggered by
this very change.

The only way I have to make it work on a Denver core is to route the
interrupt to that particular core and taskset the workload to it. Any
other configuration involving a Denver CPU results in some sort of
corruption. On their own, the A57s are fine.

This smells of memory ordering going really wrong, which this change
would expose. I haven't had a chance to dig into the driver yet (it
took me long enough to bisect it), but if someone points me at what is
supposed to synchronise the DMA when receiving an interrupt, I'll have
a look.

Thanks,

	M.

> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> index b6cd43eda7ac..04bdb3950d63 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> @@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
>  static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
>  {
>  	if (stmmac_xdp_is_enabled(priv))
> -		return XDP_PACKET_HEADROOM;
> +		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
>  
> -	return 0;
> +	return NET_SKB_PAD + NET_IP_ALIGN;
>  }
>  
>  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> -- 
> 2.31.1
> 
>
Thierry Reding Aug. 11, 2021, 10:28 a.m. UTC | #6
On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
> Hi all,
> 
> [adding Thierry, Jon and Will to the fun]
> 
> On Mon, 14 Jun 2021 03:25:04 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > 
> > From: Matteo Croce <mcroce@microsoft.com>
> > 
> > On RX an SKB is allocated and the received buffer is copied into it.
> > But on some architectures, the memcpy() needs the source and destination
> > buffers to have the same alignment to be efficient.
> > 
> > This is not our case, because SKB data pointer is misaligned by two bytes
> > to compensate the ethernet header.
> > 
> > Align the RX buffer the same way as the SKB one, so the copy is faster.
> > An iperf3 RX test gives a decent improvement on a RISC-V machine:
> > 
> > before:
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
> > [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
> > 
> > after:
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
> > [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
> > 
> > And the memcpy() overhead during the RX drops dramatically.
> > 
> > before:
> > Overhead  Shared O  Symbol
> >   43.35%  [kernel]  [k] memcpy
> >   33.77%  [kernel]  [k] __asm_copy_to_user
> >    3.64%  [kernel]  [k] sifive_l2_flush64_range
> > 
> > after:
> > Overhead  Shared O  Symbol
> >   45.40%  [kernel]  [k] __asm_copy_to_user
> >   28.09%  [kernel]  [k] memcpy
> >    4.27%  [kernel]  [k] sifive_l2_flush64_range
> > 
> > Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> 
> This patch completely breaks my Jetson TX2 system, composed of 2
> Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
> 
> Any significant amount of traffic result in all sort of corruption
> (ssh connections get dropped, Debian packages downloaded have the
> wrong checksums) if any Denver core is involved in any significant way
> (packet processing, interrupt handling). And it is all triggered by
> this very change.
> 
> The only way I have to make it work on a Denver core is to route the
> interrupt to that particular core and taskset the workload to it. Any
> other configuration involving a Denver CPU results in some sort of
> corruption. On their own, the A57s are fine.
> 
> This smells of memory ordering going really wrong, which this change
> would expose. I haven't had a chance to dig into the driver yet (it
> took me long enough to bisect it), but if someone points me at what is
> supposed to synchronise the DMA when receiving an interrupt, I'll have
> a look.

I recall that Jon was looking into a similar issue recently, though I
think the failure mode was slightly different. I also vaguely recall
that CPU frequency was impacting this to some degree (lower CPU
frequencies would increase the chances of this happening).

Jon's currently out of office, but let me try and dig up the details
on this.

Thierry

> 
> Thanks,
> 
> 	M.
> 
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > index b6cd43eda7ac..04bdb3950d63 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > @@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
> >  static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> >  {
> >  	if (stmmac_xdp_is_enabled(priv))
> > -		return XDP_PACKET_HEADROOM;
> > +		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> >  
> > -	return 0;
> > +	return NET_SKB_PAD + NET_IP_ALIGN;
> >  }
> >  
> >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > -- 
> > 2.31.1
> > 
> > 
> 
> -- 
> Without deviation from the norm, progress is not possible.
Thierry Reding Aug. 11, 2021, 10:41 a.m. UTC | #7
On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
> Hi all,
> 
> [adding Thierry, Jon and Will to the fun]
> 
> On Mon, 14 Jun 2021 03:25:04 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > 
> > From: Matteo Croce <mcroce@microsoft.com>
> > 
> > On RX an SKB is allocated and the received buffer is copied into it.
> > But on some architectures, the memcpy() needs the source and destination
> > buffers to have the same alignment to be efficient.
> > 
> > This is not our case, because SKB data pointer is misaligned by two bytes
> > to compensate the ethernet header.
> > 
> > Align the RX buffer the same way as the SKB one, so the copy is faster.
> > An iperf3 RX test gives a decent improvement on a RISC-V machine:
> > 
> > before:
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
> > [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
> > 
> > after:
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
> > [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
> > 
> > And the memcpy() overhead during the RX drops dramatically.
> > 
> > before:
> > Overhead  Shared O  Symbol
> >   43.35%  [kernel]  [k] memcpy
> >   33.77%  [kernel]  [k] __asm_copy_to_user
> >    3.64%  [kernel]  [k] sifive_l2_flush64_range
> > 
> > after:
> > Overhead  Shared O  Symbol
> >   45.40%  [kernel]  [k] __asm_copy_to_user
> >   28.09%  [kernel]  [k] memcpy
> >    4.27%  [kernel]  [k] sifive_l2_flush64_range
> > 
> > Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> 
> This patch completely breaks my Jetson TX2 system, composed of 2
> Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
> 
> Any significant amount of traffic result in all sort of corruption
> (ssh connections get dropped, Debian packages downloaded have the
> wrong checksums) if any Denver core is involved in any significant way
> (packet processing, interrupt handling). And it is all triggered by
> this very change.
> 
> The only way I have to make it work on a Denver core is to route the
> interrupt to that particular core and taskset the workload to it. Any
> other configuration involving a Denver CPU results in some sort of
> corruption. On their own, the A57s are fine.
> 
> This smells of memory ordering going really wrong, which this change
> would expose. I haven't had a chance to dig into the driver yet (it
> took me long enough to bisect it), but if someone points me at what is
> supposed to synchronise the DMA when receiving an interrupt, I'll have
> a look.

One other thing that kind of rings a bell when reading DMA and
interrupts is a recent report (and attempt to fix this) where upon
resume from system suspend, the DMA descriptors would get corrupted.

I don't think we ever figured out what exactly the problem was, but
interestingly the fix for the issue immediately caused things to go
haywire on... Jetson TX2.

I recall looking at this a bit and couldn't find where exactly the DMA
was being synchronized on suspend/resume, or what the mechanism was to
ensure that (in transit) packets were not received after the suspension
of the Ethernet device. Some information about this can be found here:

	https://lore.kernel.org/netdev/708edb92-a5df-ecc4-3126-5ab36707e275@nvidia.com/

It's interesting that this happens only on Jetson TX2. Apparently on the
newer Jetson AGX Xavier this problem does not occur. I think Jon also
narrowed this down to being related to the IOMMU being enabled on Jetson
TX2, whereas Jetson AGX Xavier didn't have it enabled. I wasn't able to
find any notes on whether disabling the IOMMU on Jetson TX2 did anything
to improve on this, so perhaps that's something worth trying.

We have since enabled the IOMMU on Jetson AGX Xavier, and I haven't seen
any test reports indicating that this is causing issues. So I don't
think this has anything directly to do with the IOMMU support.

That said, if these problems are all exclusive to Jetson TX2, or rather
Tegra186, that could indicate that we're missing something at a more
fundamental level (maybe some cache maintenance quirk?).

Thierry

> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > index b6cd43eda7ac..04bdb3950d63 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > @@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
> >  static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> >  {
> >  	if (stmmac_xdp_is_enabled(priv))
> > -		return XDP_PACKET_HEADROOM;
> > +		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> >  
> > -	return 0;
> > +	return NET_SKB_PAD + NET_IP_ALIGN;
> >  }
> >  
> >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > -- 
> > 2.31.1
> > 
> > 
> 
> -- 
> Without deviation from the norm, progress is not possible.
Joakim Zhang Aug. 11, 2021, 10:56 a.m. UTC | #8
> -----Original Message-----
> From: Thierry Reding <thierry.reding@gmail.com>
> Sent: 2021年8月11日 18:42
> To: Marc Zyngier <maz@kernel.org>
> Cc: Matteo Croce <mcroce@linux.microsoft.com>; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-riscv@lists.infradead.org; Giuseppe
> Cavallaro <peppe.cavallaro@st.com>; Alexandre Torgue
> <alexandre.torgue@foss.st.com>; David S. Miller <davem@davemloft.net>;
> Jakub Kicinski <kuba@kernel.org>; Palmer Dabbelt <palmer@dabbelt.com>;
> Paul Walmsley <paul.walmsley@sifive.com>; Drew Fustini
> <drew@beagleboard.org>; Emil Renner Berthing <kernel@esmil.dk>; Jon
> Hunter <jonathanh@nvidia.com>; Will Deacon <will@kernel.org>
> Subject: Re: [PATCH net-next] stmmac: align RX buffers
> 
> On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
> > Hi all,
> >
> > [adding Thierry, Jon and Will to the fun]
> >
> > On Mon, 14 Jun 2021 03:25:04 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > From: Matteo Croce <mcroce@microsoft.com>
> > >
> > > On RX an SKB is allocated and the received buffer is copied into it.
> > > But on some architectures, the memcpy() needs the source and
> > > destination buffers to have the same alignment to be efficient.
> > >
> > > This is not our case, because SKB data pointer is misaligned by two
> > > bytes to compensate the ethernet header.
> > >
> > > Align the RX buffer the same way as the SKB one, so the copy is faster.
> > > An iperf3 RX test gives a decent improvement on a RISC-V machine:
> > >
> > > before:
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88
> sender
> > > [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec
> receiver
> > >
> > > after:
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0
> sender
> > > [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec
> receiver
> > >
> > > And the memcpy() overhead during the RX drops dramatically.
> > >
> > > before:
> > > Overhead  Shared O  Symbol
> > >   43.35%  [kernel]  [k] memcpy
> > >   33.77%  [kernel]  [k] __asm_copy_to_user
> > >    3.64%  [kernel]  [k] sifive_l2_flush64_range
> > >
> > > after:
> > > Overhead  Shared O  Symbol
> > >   45.40%  [kernel]  [k] __asm_copy_to_user
> > >   28.09%  [kernel]  [k] memcpy
> > >    4.27%  [kernel]  [k] sifive_l2_flush64_range
> > >
> > > Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> >
> > This patch completely breaks my Jetson TX2 system, composed of 2
> > Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
> >
> > Any significant amount of traffic result in all sort of corruption
> > (ssh connections get dropped, Debian packages downloaded have the
> > wrong checksums) if any Denver core is involved in any significant way
> > (packet processing, interrupt handling). And it is all triggered by
> > this very change.
> >
> > The only way I have to make it work on a Denver core is to route the
> > interrupt to that particular core and taskset the workload to it. Any
> > other configuration involving a Denver CPU results in some sort of
> > corruption. On their own, the A57s are fine.
> >
> > This smells of memory ordering going really wrong, which this change
> > would expose. I haven't had a chance to dig into the driver yet (it
> > took me long enough to bisect it), but if someone points me at what is
> > supposed to synchronise the DMA when receiving an interrupt, I'll have
> > a look.
> 
> One other thing that kind of rings a bell when reading DMA and interrupts is a
> recent report (and attempt to fix this) where upon resume from system
> suspend, the DMA descriptors would get corrupted.
> 
> I don't think we ever figured out what exactly the problem was, but
> interestingly the fix for the issue immediately caused things to go haywire on...
> Jetson TX2.
> 
> I recall looking at this a bit and couldn't find where exactly the DMA was being
> synchronized on suspend/resume, or what the mechanism was to ensure that
> (in transit) packets were not received after the suspension of the Ethernet
> device. Some information about this can be found here:
> 
> 	https://lore.kernel.org/netdev/708edb92-a5df-ecc4-3126-5ab36707e275
> @nvidia.com/
> 
> It's interesting that this happens only on Jetson TX2. Apparently on the newer
> Jetson AGX Xavier this problem does not occur. I think Jon also narrowed this
> down to being related to the IOMMU being enabled on Jetson TX2, whereas
> Jetson AGX Xavier didn't have it enabled. I wasn't able to find any notes on
> whether disabling the IOMMU on Jetson TX2 did anything to improve on this,
> so perhaps that's something worth trying.
> 
> We have since enabled the IOMMU on Jetson AGX Xavier, and I haven't seen
> any test reports indicating that this is causing issues. So I don't think this has
> anything directly to do with the IOMMU support.
> 
> That said, if these problems are all exclusive to Jetson TX2, or rather Tegra186,
> that could indicate that we're missing something at a more fundamental level
> (maybe some cache maintenance quirk?).


Hey Thierry,

Please also notice me if you found the root cause, that would be appreciated!
I have not upstream the fix you mentioned yet since your continuous NACK.

Thanks in advance 
Eric Dumazet Aug. 11, 2021, 12:53 p.m. UTC | #9
On 8/11/21 12:28 PM, Thierry Reding wrote:
> On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
>> Hi all,
>>
>> [adding Thierry, Jon and Will to the fun]
>>
>> On Mon, 14 Jun 2021 03:25:04 +0100,
>> Matteo Croce <mcroce@linux.microsoft.com> wrote:
>>>
>>> From: Matteo Croce <mcroce@microsoft.com>
>>>
>>> On RX an SKB is allocated and the received buffer is copied into it.
>>> But on some architectures, the memcpy() needs the source and destination
>>> buffers to have the same alignment to be efficient.
>>>
>>> This is not our case, because SKB data pointer is misaligned by two bytes
>>> to compensate the ethernet header.
>>>
>>> Align the RX buffer the same way as the SKB one, so the copy is faster.
>>> An iperf3 RX test gives a decent improvement on a RISC-V machine:
>>>
>>> before:
>>> [ ID] Interval           Transfer     Bitrate         Retr
>>> [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
>>> [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
>>>
>>> after:
>>> [ ID] Interval           Transfer     Bitrate         Retr
>>> [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
>>> [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
>>>
>>> And the memcpy() overhead during the RX drops dramatically.
>>>
>>> before:
>>> Overhead  Shared O  Symbol
>>>   43.35%  [kernel]  [k] memcpy
>>>   33.77%  [kernel]  [k] __asm_copy_to_user
>>>    3.64%  [kernel]  [k] sifive_l2_flush64_range
>>>
>>> after:
>>> Overhead  Shared O  Symbol
>>>   45.40%  [kernel]  [k] __asm_copy_to_user
>>>   28.09%  [kernel]  [k] memcpy
>>>    4.27%  [kernel]  [k] sifive_l2_flush64_range
>>>
>>> Signed-off-by: Matteo Croce <mcroce@microsoft.com>
>>
>> This patch completely breaks my Jetson TX2 system, composed of 2
>> Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
>>
>> Any significant amount of traffic result in all sort of corruption
>> (ssh connections get dropped, Debian packages downloaded have the
>> wrong checksums) if any Denver core is involved in any significant way
>> (packet processing, interrupt handling). And it is all triggered by
>> this very change.
>>
>> The only way I have to make it work on a Denver core is to route the
>> interrupt to that particular core and taskset the workload to it. Any
>> other configuration involving a Denver CPU results in some sort of
>> corruption. On their own, the A57s are fine.
>>
>> This smells of memory ordering going really wrong, which this change
>> would expose. I haven't had a chance to dig into the driver yet (it
>> took me long enough to bisect it), but if someone points me at what is
>> supposed to synchronise the DMA when receiving an interrupt, I'll have
>> a look.
> 
> I recall that Jon was looking into a similar issue recently, though I
> think the failure mode was slightly different. I also vaguely recall
> that CPU frequency was impacting this to some degree (lower CPU
> frequencies would increase the chances of this happening).
> 
> Jon's currently out of office, but let me try and dig up the details
> on this.
> 
> Thierry
> 
>>
>> Thanks,
>>
>> 	M.
>>
>>> ---
>>>  drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
>>> index b6cd43eda7ac..04bdb3950d63 100644
>>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
>>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
>>> @@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
>>>  static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
>>>  {
>>>  	if (stmmac_xdp_is_enabled(priv))
>>> -		return XDP_PACKET_HEADROOM;
>>> +		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
>>>  
>>> -	return 0;
>>> +	return NET_SKB_PAD + NET_IP_ALIGN;
>>>  }
>>>  
>>>  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
>>> -- 
>>> 2.31.1
>>>
>>>
>>
>> -- 
>> Without deviation from the norm, progress is not possible.

Are you sure you do not need to adjust stmmac_set_bfsize(), 
stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?

Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD

Patch for stmmac_rx_buf1_len() :

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv,
 
        /* First descriptor, not last descriptor and not split header */
        if (status & rx_not_ls)
-               return priv->dma_buf_sz;
+               return priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN;
 
        plen = stmmac_get_rx_frame_len(priv, p, coe);
 
        /* First descriptor and last descriptor and not split header */
-       return min_t(unsigned int, priv->dma_buf_sz, plen);
+       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN, plen);
 }
 
 static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
Marc Zyngier Aug. 11, 2021, 1:23 p.m. UTC | #10
On Wed, 11 Aug 2021 11:41:59 +0100,
Thierry Reding <thierry.reding@gmail.com> wrote:
> 
> On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
> > Hi all,
> > 
> > [adding Thierry, Jon and Will to the fun]
> > 
> > On Mon, 14 Jun 2021 03:25:04 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > 
> > > From: Matteo Croce <mcroce@microsoft.com>
> > > 
> > > On RX an SKB is allocated and the received buffer is copied into it.
> > > But on some architectures, the memcpy() needs the source and destination
> > > buffers to have the same alignment to be efficient.
> > > 
> > > This is not our case, because SKB data pointer is misaligned by two bytes
> > > to compensate the ethernet header.
> > > 
> > > Align the RX buffer the same way as the SKB one, so the copy is faster.
> > > An iperf3 RX test gives a decent improvement on a RISC-V machine:
> > > 
> > > before:
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
> > > [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
> > > 
> > > after:
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
> > > [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
> > > 
> > > And the memcpy() overhead during the RX drops dramatically.
> > > 
> > > before:
> > > Overhead  Shared O  Symbol
> > >   43.35%  [kernel]  [k] memcpy
> > >   33.77%  [kernel]  [k] __asm_copy_to_user
> > >    3.64%  [kernel]  [k] sifive_l2_flush64_range
> > > 
> > > after:
> > > Overhead  Shared O  Symbol
> > >   45.40%  [kernel]  [k] __asm_copy_to_user
> > >   28.09%  [kernel]  [k] memcpy
> > >    4.27%  [kernel]  [k] sifive_l2_flush64_range
> > > 
> > > Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> > 
> > This patch completely breaks my Jetson TX2 system, composed of 2
> > Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
> > 
> > Any significant amount of traffic result in all sort of corruption
> > (ssh connections get dropped, Debian packages downloaded have the
> > wrong checksums) if any Denver core is involved in any significant way
> > (packet processing, interrupt handling). And it is all triggered by
> > this very change.
> > 
> > The only way I have to make it work on a Denver core is to route the
> > interrupt to that particular core and taskset the workload to it. Any
> > other configuration involving a Denver CPU results in some sort of
> > corruption. On their own, the A57s are fine.
> > 
> > This smells of memory ordering going really wrong, which this change
> > would expose. I haven't had a chance to dig into the driver yet (it
> > took me long enough to bisect it), but if someone points me at what is
> > supposed to synchronise the DMA when receiving an interrupt, I'll have
> > a look.
> 
> One other thing that kind of rings a bell when reading DMA and
> interrupts is a recent report (and attempt to fix this) where upon
> resume from system suspend, the DMA descriptors would get corrupted.
> 
> I don't think we ever figured out what exactly the problem was, but
> interestingly the fix for the issue immediately caused things to go
> haywire on... Jetson TX2.

I love this machine... Did this issue occur with the Denver CPUs
disabled?

> I recall looking at this a bit and couldn't find where exactly the DMA
> was being synchronized on suspend/resume, or what the mechanism was to
> ensure that (in transit) packets were not received after the suspension
> of the Ethernet device. Some information about this can be found here:
> 
> 	https://lore.kernel.org/netdev/708edb92-a5df-ecc4-3126-5ab36707e275@nvidia.com/
> 
> It's interesting that this happens only on Jetson TX2. Apparently on the
> newer Jetson AGX Xavier this problem does not occur. I think Jon also
> narrowed this down to being related to the IOMMU being enabled on Jetson
> TX2, whereas Jetson AGX Xavier didn't have it enabled. I wasn't able to
> find any notes on whether disabling the IOMMU on Jetson TX2 did anything
> to improve on this, so perhaps that's something worth trying.

Actually, I was running with the SMMU disabled, as I use the upstream
u-boot provided DT. Switching to the kernel one didn't change a thing
(with passthough or not).

> We have since enabled the IOMMU on Jetson AGX Xavier, and I haven't seen
> any test reports indicating that this is causing issues. So I don't
> think this has anything directly to do with the IOMMU support.

No, it looks more like either ordering or cache management. The fact
that this patch messes with the buffer alignment makes me favour the
latter...

> That said, if these problems are all exclusive to Jetson TX2, or rather
> Tegra186, that could indicate that we're missing something at a more
> fundamental level (maybe some cache maintenance quirk?).

That'd be pretty annoying. Do you know if the Ethernet is a coherent
device on this machine? or does it need active cache maintenance?

Thanks,

	M.
Marc Zyngier Aug. 11, 2021, 2:16 p.m. UTC | #11
On Wed, 11 Aug 2021 13:53:59 +0100,
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 8/11/21 12:28 PM, Thierry Reding wrote:
> > On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
> >> Hi all,
> >>
> >> [adding Thierry, Jon and Will to the fun]
> >>
> >> On Mon, 14 Jun 2021 03:25:04 +0100,
> >> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >>>
> >>> From: Matteo Croce <mcroce@microsoft.com>
> >>>
> >>> On RX an SKB is allocated and the received buffer is copied into it.
> >>> But on some architectures, the memcpy() needs the source and destination
> >>> buffers to have the same alignment to be efficient.
> >>>
> >>> This is not our case, because SKB data pointer is misaligned by two bytes
> >>> to compensate the ethernet header.
> >>>
> >>> Align the RX buffer the same way as the SKB one, so the copy is faster.
> >>> An iperf3 RX test gives a decent improvement on a RISC-V machine:
> >>>
> >>> before:
> >>> [ ID] Interval           Transfer     Bitrate         Retr
> >>> [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
> >>> [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
> >>>
> >>> after:
> >>> [ ID] Interval           Transfer     Bitrate         Retr
> >>> [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
> >>> [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
> >>>
> >>> And the memcpy() overhead during the RX drops dramatically.
> >>>
> >>> before:
> >>> Overhead  Shared O  Symbol
> >>>   43.35%  [kernel]  [k] memcpy
> >>>   33.77%  [kernel]  [k] __asm_copy_to_user
> >>>    3.64%  [kernel]  [k] sifive_l2_flush64_range
> >>>
> >>> after:
> >>> Overhead  Shared O  Symbol
> >>>   45.40%  [kernel]  [k] __asm_copy_to_user
> >>>   28.09%  [kernel]  [k] memcpy
> >>>    4.27%  [kernel]  [k] sifive_l2_flush64_range
> >>>
> >>> Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> >>
> >> This patch completely breaks my Jetson TX2 system, composed of 2
> >> Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
> >>
> >> Any significant amount of traffic result in all sort of corruption
> >> (ssh connections get dropped, Debian packages downloaded have the
> >> wrong checksums) if any Denver core is involved in any significant way
> >> (packet processing, interrupt handling). And it is all triggered by
> >> this very change.
> >>
> >> The only way I have to make it work on a Denver core is to route the
> >> interrupt to that particular core and taskset the workload to it. Any
> >> other configuration involving a Denver CPU results in some sort of
> >> corruption. On their own, the A57s are fine.
> >>
> >> This smells of memory ordering going really wrong, which this change
> >> would expose. I haven't had a chance to dig into the driver yet (it
> >> took me long enough to bisect it), but if someone points me at what is
> >> supposed to synchronise the DMA when receiving an interrupt, I'll have
> >> a look.
> > 
> > I recall that Jon was looking into a similar issue recently, though I
> > think the failure mode was slightly different. I also vaguely recall
> > that CPU frequency was impacting this to some degree (lower CPU
> > frequencies would increase the chances of this happening).
> > 
> > Jon's currently out of office, but let me try and dig up the details
> > on this.
> > 
> > Thierry
> > 
> >>
> >> Thanks,
> >>
> >> 	M.
> >>
> >>> ---
> >>>  drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> >>> index b6cd43eda7ac..04bdb3950d63 100644
> >>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> >>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> >>> @@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
> >>>  static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> >>>  {
> >>>  	if (stmmac_xdp_is_enabled(priv))
> >>> -		return XDP_PACKET_HEADROOM;
> >>> +		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> >>>  
> >>> -	return 0;
> >>> +	return NET_SKB_PAD + NET_IP_ALIGN;
> >>>  }
> >>>  
> >>>  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> >>> -- 
> >>> 2.31.1
> >>>
> >>>
> >>
> >> -- 
> >> Without deviation from the norm, progress is not possible.
> 
> Are you sure you do not need to adjust stmmac_set_bfsize(), 
> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
> 
> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
> 
> Patch for stmmac_rx_buf1_len() :
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv,
>  
>         /* First descriptor, not last descriptor and not split header */
>         if (status & rx_not_ls)
> -               return priv->dma_buf_sz;
> +               return priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN;
>  
>         plen = stmmac_get_rx_frame_len(priv, p, coe);
>  
>         /* First descriptor and last descriptor and not split header */
> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN, plen);
>  }
>  
>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,

Feels like a major deficiency of the original patch. Happy to test a
more complete patch if/when you have one.

Thanks,

	M.
Eric Dumazet Aug. 12, 2021, 8:48 a.m. UTC | #12
On 8/11/21 4:16 PM, Marc Zyngier wrote:
> On Wed, 11 Aug 2021 13:53:59 +0100,
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> Are you sure you do not need to adjust stmmac_set_bfsize(), 
>> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
>>
>> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
>>
>> Patch for stmmac_rx_buf1_len() :
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv,
>>  
>>         /* First descriptor, not last descriptor and not split header */
>>         if (status & rx_not_ls)
>> -               return priv->dma_buf_sz;
>> +               return priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN;
>>  
>>         plen = stmmac_get_rx_frame_len(priv, p, coe);
>>  
>>         /* First descriptor and last descriptor and not split header */
>> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
>> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN, plen);
>>  }
>>  
>>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> 
> Feels like a major deficiency of the original patch. Happy to test a
> more complete patch if/when you have one.

I wont have time in the immediate future.

Matteo, if you do not work on a fix, I suggest we revert
 a955318fe67ec0d962760b5ee58e74bffaf649b8 stmmac: align RX buffers

before a more polished version can be submitted.
Matteo Croce Aug. 12, 2021, 10:18 a.m. UTC | #13
On Thu, 12 Aug 2021 10:48:03 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> 
> 
> On 8/11/21 4:16 PM, Marc Zyngier wrote:
> > On Wed, 11 Aug 2021 13:53:59 +0100,
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >> Are you sure you do not need to adjust stmmac_set_bfsize(), 
> >> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
> >>
> >> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
> >>
> >> Patch for stmmac_rx_buf1_len() :
> >>
> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index
> >> 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9
> >> 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++
> >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4508,12
> >> +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct
> >> stmmac_priv *priv, /* First descriptor, not last descriptor and
> >> not split header */ if (status & rx_not_ls)
> >> -               return priv->dma_buf_sz;
> >> +               return priv->dma_buf_sz - NET_SKB_PAD -
> >> NET_IP_ALIGN; 
> >>         plen = stmmac_get_rx_frame_len(priv, p, coe);
> >>  
> >>         /* First descriptor and last descriptor and not split
> >> header */
> >> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
> >> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD
> >> - NET_IP_ALIGN, plen); }
> >>  
> >>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> > 
> > Feels like a major deficiency of the original patch. Happy to test a
> > more complete patch if/when you have one.
> 
> I wont have time in the immediate future.
> 
> Matteo, if you do not work on a fix, I suggest we revert
>  a955318fe67ec0d962760b5ee58e74bffaf649b8 stmmac: align RX buffers
> 
> before a more polished version can be submitted.
> 

Better to use stmmac_rx_offset() so to have the correct length when
using XDP. Also, when XDP is enabled, the offset was
XDP_PACKET_HEADROOM (i.e. 256 bytes) even before the change, so it
could be already broken. Mark, can you try on the Jetson TX2 by
attaching an XDP program and see if it works without my change?

A possible fix, which takes in account also the XDP headroom for
stmmac_rx_buf1_len() only could be (only compile tested, I don't have
the hardware now):

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7b8404a21544..b205f43f849a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -93,7 +93,7 @@ static int tc = TC_DEFAULT;
 module_param(tc, int, 0644);
 MODULE_PARM_DESC(tc, "DMA threshold control value");
 
-#define	DEFAULT_BUFSIZE	1536
+#define	DEFAULT_BUFSIZE	1536 + XDP_PACKET_HEADROOM + NET_IP_ALIGN
 static int buf_sz = DEFAULT_BUFSIZE;
 module_param(buf_sz, int, 0644);
 MODULE_PARM_DESC(buf_sz, "DMA buffer size");
@@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv,
 
 	/* First descriptor, not last descriptor and not split header */
 	if (status & rx_not_ls)
-		return priv->dma_buf_sz;
+		return priv->dma_buf_sz - stmmac_rx_offset(priv);
 
 	plen = stmmac_get_rx_frame_len(priv, p, coe);
 
 	/* First descriptor and last descriptor and not split header */
-	return min_t(unsigned int, priv->dma_buf_sz, plen);
+	return min_t(unsigned int, priv->dma_buf_sz - stmmac_rx_offset(priv), plen);
 }
 
 static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
Marc Zyngier Aug. 12, 2021, 11:05 a.m. UTC | #14
On Thu, 12 Aug 2021 11:18:35 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Thu, 12 Aug 2021 10:48:03 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > 
> > 
> > On 8/11/21 4:16 PM, Marc Zyngier wrote:
> > > On Wed, 11 Aug 2021 13:53:59 +0100,
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > >> Are you sure you do not need to adjust stmmac_set_bfsize(), 
> > >> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
> > >>
> > >> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
> > >>
> > >> Patch for stmmac_rx_buf1_len() :
> > >>
> > >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index
> > >> 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9
> > >> 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++
> > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4508,12
> > >> +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct
> > >> stmmac_priv *priv, /* First descriptor, not last descriptor and
> > >> not split header */ if (status & rx_not_ls)
> > >> -               return priv->dma_buf_sz;
> > >> +               return priv->dma_buf_sz - NET_SKB_PAD -
> > >> NET_IP_ALIGN; 
> > >>         plen = stmmac_get_rx_frame_len(priv, p, coe);
> > >>  
> > >>         /* First descriptor and last descriptor and not split
> > >> header */
> > >> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
> > >> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD
> > >> - NET_IP_ALIGN, plen); }
> > >>  
> > >>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> > > 
> > > Feels like a major deficiency of the original patch. Happy to test a
> > > more complete patch if/when you have one.
> > 
> > I wont have time in the immediate future.
> > 
> > Matteo, if you do not work on a fix, I suggest we revert
> >  a955318fe67ec0d962760b5ee58e74bffaf649b8 stmmac: align RX buffers
> > 
> > before a more polished version can be submitted.
> > 
> 
> Better to use stmmac_rx_offset() so to have the correct length when
> using XDP. Also, when XDP is enabled, the offset was
> XDP_PACKET_HEADROOM (i.e. 256 bytes) even before the change, so it
> could be already broken. Mark, can you try on the Jetson TX2 by
> attaching an XDP program and see if it works without my change?

Sorry, you'll have to hold my hand here, as I know exactly nothing
about XDP....

> A possible fix, which takes in account also the XDP headroom for
> stmmac_rx_buf1_len() only could be (only compile tested, I don't have
> the hardware now):

However, this doesn't fix my issue. I still get all sort of
corruption. Probably stmmac_rx_buf2_len() also need adjusting (it has
a similar logic as its buf1 counterpart...)

Unless you can fix it very quickly, and given that we're towards the
end of the cycle, I'd be more comfortable if we reverted this patch.

Thanks,

	M.
Matteo Croce Aug. 12, 2021, 11:18 a.m. UTC | #15
On Thu, Aug 12, 2021 at 1:05 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 12 Aug 2021 11:18:35 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Thu, 12 Aug 2021 10:48:03 +0200
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > >
> > >
> > > On 8/11/21 4:16 PM, Marc Zyngier wrote:
> > > > On Wed, 11 Aug 2021 13:53:59 +0100,
> > > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > >
> > > >> Are you sure you do not need to adjust stmmac_set_bfsize(),
> > > >> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
> > > >>
> > > >> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
> > > >>
> > > >> Patch for stmmac_rx_buf1_len() :
> > > >>
> > > >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index
> > > >> 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9
> > > >> 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++
> > > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4508,12
> > > >> +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct
> > > >> stmmac_priv *priv, /* First descriptor, not last descriptor and
> > > >> not split header */ if (status & rx_not_ls)
> > > >> -               return priv->dma_buf_sz;
> > > >> +               return priv->dma_buf_sz - NET_SKB_PAD -
> > > >> NET_IP_ALIGN;
> > > >>         plen = stmmac_get_rx_frame_len(priv, p, coe);
> > > >>
> > > >>         /* First descriptor and last descriptor and not split
> > > >> header */
> > > >> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
> > > >> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD
> > > >> - NET_IP_ALIGN, plen); }
> > > >>
> > > >>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> > > >
> > > > Feels like a major deficiency of the original patch. Happy to test a
> > > > more complete patch if/when you have one.
> > >
> > > I wont have time in the immediate future.
> > >
> > > Matteo, if you do not work on a fix, I suggest we revert
> > >  a955318fe67ec0d962760b5ee58e74bffaf649b8 stmmac: align RX buffers
> > >
> > > before a more polished version can be submitted.
> > >
> >
> > Better to use stmmac_rx_offset() so to have the correct length when
> > using XDP. Also, when XDP is enabled, the offset was
> > XDP_PACKET_HEADROOM (i.e. 256 bytes) even before the change, so it
> > could be already broken. Mark, can you try on the Jetson TX2 by
> > attaching an XDP program and see if it works without my change?
>
> Sorry, you'll have to hold my hand here, as I know exactly nothing
> about XDP....
>

Attach the attached object with:

ip link set eth0 xdp object passall.o

This is an empty XDP program, its source:

__attribute__((section("prog"), used))
int xdp_main(struct xdp_md *ctx)
{
       return XDP_PASS;
}

Every packet will pass untouched, but the offset will be shifted from
0 to 256 bytes, which could trigger the problem anyway:

> > A possible fix, which takes in account also the XDP headroom for
> > stmmac_rx_buf1_len() only could be (only compile tested, I don't have
> > the hardware now):
>
> However, this doesn't fix my issue. I still get all sort of
> corruption. Probably stmmac_rx_buf2_len() also need adjusting (it has
> a similar logic as its buf1 counterpart...)
>
> Unless you can fix it very quickly, and given that we're towards the
> end of the cycle, I'd be more comfortable if we reverted this patch.
>

Can it be that the HW can't do DMA on an address which is not word aligned?
What if you replace NET_SKB_PAD with, let's say, 8?

Regards,
Thierry Reding Aug. 12, 2021, 2:29 p.m. UTC | #16
On Wed, Aug 11, 2021 at 02:23:10PM +0100, Marc Zyngier wrote:
> On Wed, 11 Aug 2021 11:41:59 +0100,
> Thierry Reding <thierry.reding@gmail.com> wrote:
> > 
> > On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote:
> > > Hi all,
> > > 
> > > [adding Thierry, Jon and Will to the fun]
> > > 
> > > On Mon, 14 Jun 2021 03:25:04 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > 
> > > > From: Matteo Croce <mcroce@microsoft.com>
> > > > 
> > > > On RX an SKB is allocated and the received buffer is copied into it.
> > > > But on some architectures, the memcpy() needs the source and destination
> > > > buffers to have the same alignment to be efficient.
> > > > 
> > > > This is not our case, because SKB data pointer is misaligned by two bytes
> > > > to compensate the ethernet header.
> > > > 
> > > > Align the RX buffer the same way as the SKB one, so the copy is faster.
> > > > An iperf3 RX test gives a decent improvement on a RISC-V machine:
> > > > 
> > > > before:
> > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > [  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
> > > > [  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver
> > > > 
> > > > after:
> > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
> > > > [  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver
> > > > 
> > > > And the memcpy() overhead during the RX drops dramatically.
> > > > 
> > > > before:
> > > > Overhead  Shared O  Symbol
> > > >   43.35%  [kernel]  [k] memcpy
> > > >   33.77%  [kernel]  [k] __asm_copy_to_user
> > > >    3.64%  [kernel]  [k] sifive_l2_flush64_range
> > > > 
> > > > after:
> > > > Overhead  Shared O  Symbol
> > > >   45.40%  [kernel]  [k] __asm_copy_to_user
> > > >   28.09%  [kernel]  [k] memcpy
> > > >    4.27%  [kernel]  [k] sifive_l2_flush64_range
> > > > 
> > > > Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> > > 
> > > This patch completely breaks my Jetson TX2 system, composed of 2
> > > Nvidia Denver and 4 Cortex-A57, in a very "funny" way.
> > > 
> > > Any significant amount of traffic result in all sort of corruption
> > > (ssh connections get dropped, Debian packages downloaded have the
> > > wrong checksums) if any Denver core is involved in any significant way
> > > (packet processing, interrupt handling). And it is all triggered by
> > > this very change.
> > > 
> > > The only way I have to make it work on a Denver core is to route the
> > > interrupt to that particular core and taskset the workload to it. Any
> > > other configuration involving a Denver CPU results in some sort of
> > > corruption. On their own, the A57s are fine.
> > > 
> > > This smells of memory ordering going really wrong, which this change
> > > would expose. I haven't had a chance to dig into the driver yet (it
> > > took me long enough to bisect it), but if someone points me at what is
> > > supposed to synchronise the DMA when receiving an interrupt, I'll have
> > > a look.
> > 
> > One other thing that kind of rings a bell when reading DMA and
> > interrupts is a recent report (and attempt to fix this) where upon
> > resume from system suspend, the DMA descriptors would get corrupted.
> > 
> > I don't think we ever figured out what exactly the problem was, but
> > interestingly the fix for the issue immediately caused things to go
> > haywire on... Jetson TX2.
> 
> I love this machine... Did this issue occur with the Denver CPUs
> disabled?

Interestingly I've been doing some work on a newer device called Jetson
TX2 NX (which is kind of a trimmed-down version of Jetson TX2, in the
spirit of the Jetson Nano) and I can't seem to reproduce these failures
there (tested on next-20210812).

I'll go dig out my Jetson TX2 to run the same tests there, because I've
also been using a development version of the bootloader stack and
flashing tools and all that, so it's possible that something was fixed
at that level. I don't think I've ever tried disabling the Denver CPUs,
but then I've also never seen these issues myself.

Just out of curiosity, what version of the BSP have you been using to
flash?

One other thing that I ran into: there's a known issue with the PHY
configuration. We mark the PHY on most devices as "rgmii-id" on most
devices and then the Marvell PHY driver needs to be enabled. Jetson TX2
has phy-mode = "rgmii", so it /should/ work okay.

Typically what we're seeing with that misconfiguration is that the
device fails to get an IP address, but it might still be worth trying to
switch Jetson TX2 to rgmii-id and using the Marvell PHY, to see if that
improves anything.

> > I recall looking at this a bit and couldn't find where exactly the DMA
> > was being synchronized on suspend/resume, or what the mechanism was to
> > ensure that (in transit) packets were not received after the suspension
> > of the Ethernet device. Some information about this can be found here:
> > 
> > 	https://lore.kernel.org/netdev/708edb92-a5df-ecc4-3126-5ab36707e275@nvidia.com/
> > 
> > It's interesting that this happens only on Jetson TX2. Apparently on the
> > newer Jetson AGX Xavier this problem does not occur. I think Jon also
> > narrowed this down to being related to the IOMMU being enabled on Jetson
> > TX2, whereas Jetson AGX Xavier didn't have it enabled. I wasn't able to
> > find any notes on whether disabling the IOMMU on Jetson TX2 did anything
> > to improve on this, so perhaps that's something worth trying.
> 
> Actually, I was running with the SMMU disabled, as I use the upstream
> u-boot provided DT. Switching to the kernel one didn't change a thing
> (with passthough or not).
> 
> > We have since enabled the IOMMU on Jetson AGX Xavier, and I haven't seen
> > any test reports indicating that this is causing issues. So I don't
> > think this has anything directly to do with the IOMMU support.
> 
> No, it looks more like either ordering or cache management. The fact
> that this patch messes with the buffer alignment makes me favour the
> latter...
> 
> > That said, if these problems are all exclusive to Jetson TX2, or rather
> > Tegra186, that could indicate that we're missing something at a more
> > fundamental level (maybe some cache maintenance quirk?).
> 
> That'd be pretty annoying. Do you know if the Ethernet is a coherent
> device on this machine? or does it need active cache maintenance?

I don't think Ethernet is a coherent device on Tegra186. I think
Tegra194 had various improvements with regard to coherency, but most
devices on Tegra186 do need active cache maintenance.

Let me dig through some old patches and mailing list threads. I vaguely
recall prototyping a patch that did something special for outer cache
flushing, but that may have been Tegra132, not Tegra186. I also don't
think we ended up merging that because it turned out to not be needed.

Thierry
Marc Zyngier Aug. 12, 2021, 3:26 p.m. UTC | #17
On Thu, 12 Aug 2021 15:29:06 +0100,
Thierry Reding <thierry.reding@gmail.com> wrote:
> 
> On Wed, Aug 11, 2021 at 02:23:10PM +0100, Marc Zyngier wrote:

[...]

> > I love this machine... Did this issue occur with the Denver CPUs
> > disabled?
> 
> Interestingly I've been doing some work on a newer device called Jetson
> TX2 NX (which is kind of a trimmed-down version of Jetson TX2, in the
> spirit of the Jetson Nano) and I can't seem to reproduce these failures
> there (tested on next-20210812).
> 
> I'll go dig out my Jetson TX2 to run the same tests there, because I've
> also been using a development version of the bootloader stack and
> flashing tools and all that, so it's possible that something was fixed
> at that level. I don't think I've ever tried disabling the Denver CPUs,
> but then I've also never seen these issues myself.
> 
> Just out of curiosity, what version of the BSP have you been using to
> flash?

I've only used the BSP for a few weeks when I got the board last
year. The only thing I use from it is u-boot to chainload an upstream
u-boot, and boot Debian from there.

> One other thing that I ran into: there's a known issue with the PHY
> configuration. We mark the PHY on most devices as "rgmii-id" on most
> devices and then the Marvell PHY driver needs to be enabled. Jetson TX2
> has phy-mode = "rgmii", so it /should/ work okay.
> 
> Typically what we're seeing with that misconfiguration is that the
> device fails to get an IP address, but it might still be worth trying to
> switch Jetson TX2 to rgmii-id and using the Marvell PHY, to see if that
> improves anything.

I never failed to get an IP address. Overall, networking has been
solid on this machine until this patch. I'll try and mess with this
when I get time, but that's probably going to be next week now.

[...]

> > That'd be pretty annoying. Do you know if the Ethernet is a coherent
> > device on this machine? or does it need active cache maintenance?
> 
> I don't think Ethernet is a coherent device on Tegra186. I think
> Tegra194 had various improvements with regard to coherency, but most
> devices on Tegra186 do need active cache maintenance.
> 
> Let me dig through some old patches and mailing list threads. I vaguely
> recall prototyping a patch that did something special for outer cache
> flushing, but that may have been Tegra132, not Tegra186. I also don't
> think we ended up merging that because it turned out to not be needed.

ARMv8 forbid any sort of *visible* outer cache, so I really hope this
is not required. We wouldn't be able to support it.

Thanks,

	M.
Thierry Reding Aug. 13, 2021, 2:44 p.m. UTC | #18
On Thu, Aug 12, 2021 at 04:26:41PM +0100, Marc Zyngier wrote:
> On Thu, 12 Aug 2021 15:29:06 +0100,
> Thierry Reding <thierry.reding@gmail.com> wrote:
> > 
> > On Wed, Aug 11, 2021 at 02:23:10PM +0100, Marc Zyngier wrote:
> 
> [...]
> 
> > > I love this machine... Did this issue occur with the Denver CPUs
> > > disabled?
> > 
> > Interestingly I've been doing some work on a newer device called Jetson
> > TX2 NX (which is kind of a trimmed-down version of Jetson TX2, in the
> > spirit of the Jetson Nano) and I can't seem to reproduce these failures
> > there (tested on next-20210812).
> > 
> > I'll go dig out my Jetson TX2 to run the same tests there, because I've
> > also been using a development version of the bootloader stack and
> > flashing tools and all that, so it's possible that something was fixed
> > at that level. I don't think I've ever tried disabling the Denver CPUs,
> > but then I've also never seen these issues myself.
> > 
> > Just out of curiosity, what version of the BSP have you been using to
> > flash?
> 
> I've only used the BSP for a few weeks when I got the board last
> year. The only thing I use from it is u-boot to chainload an upstream
> u-boot, and boot Debian from there.

That's interesting... have you ever tried to inject a version of
upstream U-Boot into the BSP and have it flash that instead? That should
allow you to drop the chainloading step.

Not that that's likely to have anything to do with this.

> > One other thing that I ran into: there's a known issue with the PHY
> > configuration. We mark the PHY on most devices as "rgmii-id" on most
> > devices and then the Marvell PHY driver needs to be enabled. Jetson TX2
> > has phy-mode = "rgmii", so it /should/ work okay.
> > 
> > Typically what we're seeing with that misconfiguration is that the
> > device fails to get an IP address, but it might still be worth trying to
> > switch Jetson TX2 to rgmii-id and using the Marvell PHY, to see if that
> > improves anything.
> 
> I never failed to get an IP address. Overall, networking has been
> solid on this machine until this patch. I'll try and mess with this
> when I get time, but that's probably going to be next week now.

So I've hooked up my Jetson TX2 and tried various workloads. I wasn't
able to reproduce this on next-20210813. I've tried both the L4T 32.6.1
release and a local development build.

Perhaps one thing to try would be to upgrade your L4T BSP to something
newer. I know that there have occasionally been bugs in the MTS
firmware, which is what's running on the Denver cores, and newer BSPs
can fix those kinds of issues.

If that doesn't help, perhaps try to read out the SoC version numbers so
that we can compare. I know that some newer Tegra186 chips behave
slightly differently, so that's perhaps a difference that would explain
why it's not happening on all devices.

You can read the version and revision from sysfs using something like:

	# cat /sys/devices/soc0/{major,minor,revision}

> [...]
> 
> > > That'd be pretty annoying. Do you know if the Ethernet is a coherent
> > > device on this machine? or does it need active cache maintenance?
> > 
> > I don't think Ethernet is a coherent device on Tegra186. I think
> > Tegra194 had various improvements with regard to coherency, but most
> > devices on Tegra186 do need active cache maintenance.
> > 
> > Let me dig through some old patches and mailing list threads. I vaguely
> > recall prototyping a patch that did something special for outer cache
> > flushing, but that may have been Tegra132, not Tegra186. I also don't
> > think we ended up merging that because it turned out to not be needed.
> 
> ARMv8 forbid any sort of *visible* outer cache, so I really hope this
> is not required. We wouldn't be able to support it.

I couldn't find any trace of this anywhere. So I'm possibly
misremembering. It's also more likely that this was on an earlier SoC
generation, otherwise I'd probably remember more clearly.

Thierry
Jakub Kicinski Aug. 16, 2021, 3:12 p.m. UTC | #19
On Thu, 12 Aug 2021 12:05:38 +0100 Marc Zyngier wrote:
> > A possible fix, which takes in account also the XDP headroom for
> > stmmac_rx_buf1_len() only could be (only compile tested, I don't have
> > the hardware now):  
> 
> However, this doesn't fix my issue. I still get all sort of
> corruption. Probably stmmac_rx_buf2_len() also need adjusting (it has
> a similar logic as its buf1 counterpart...)
> 
> Unless you can fix it very quickly, and given that we're towards the
> end of the cycle, I'd be more comfortable if we reverted this patch.

Any luck investigating this one? The rc6 announcement sounds like there
may not be that many more rc releases for 5.14.
Matteo Croce Aug. 17, 2021, 12:01 a.m. UTC | #20
On Mon, 16 Aug 2021 08:12:08 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Thu, 12 Aug 2021 12:05:38 +0100 Marc Zyngier wrote:
> > > A possible fix, which takes in account also the XDP headroom for
> > > stmmac_rx_buf1_len() only could be (only compile tested, I don't
> > > have the hardware now):  
> > 
> > However, this doesn't fix my issue. I still get all sort of
> > corruption. Probably stmmac_rx_buf2_len() also need adjusting (it
> > has a similar logic as its buf1 counterpart...)
> > 
> > Unless you can fix it very quickly, and given that we're towards the
> > end of the cycle, I'd be more comfortable if we reverted this patch.
> 
> Any luck investigating this one? The rc6 announcement sounds like
> there may not be that many more rc releases for 5.14.

Hi Jackub.

Unfortunately I have only a device with stmmac, and it works fine with
the patch. It seems that not all hardware suffers from this issue.

Also, using NET_IP_ALIGN on RX is a common pattern, I think that any
ethernet device is doing the same to align the IPv4 header.

Anyway, I asked for two tests on the affected device:
1. Change NET_IP_ALIGN with 8, to see if the DMA has problems in
   receiving to a non word aligned address
2. load a nop XDP program (I provided one), to see if the problem is
   already there when XDP is used

I doubt that changing also stmmac_rx_buf2_len would help,
but it's worth a try, here is a patch:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7b8404a21544..73d1f0ec66ff 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -93,7 +93,7 @@ static int tc = TC_DEFAULT;
 module_param(tc, int, 0644);
 MODULE_PARM_DESC(tc, "DMA threshold control value");
 
-#define	DEFAULT_BUFSIZE	1536
+#define	DEFAULT_BUFSIZE	1536 + XDP_PACKET_HEADROOM + NET_IP_ALIGN
 static int buf_sz = DEFAULT_BUFSIZE;
 module_param(buf_sz, int, 0644);
 MODULE_PARM_DESC(buf_sz, "DMA buffer size");
@@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv,
 
 	/* First descriptor, not last descriptor and not split header */
 	if (status & rx_not_ls)
-		return priv->dma_buf_sz;
+		return priv->dma_buf_sz - stmmac_rx_offset(priv);
 
 	plen = stmmac_get_rx_frame_len(priv, p, coe);
 
 	/* First descriptor and last descriptor and not split header */
-	return min_t(unsigned int, priv->dma_buf_sz, plen);
+	return min_t(unsigned int, priv->dma_buf_sz - stmmac_rx_offset(priv), plen);
 }
 
 static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
@@ -4529,12 +4529,12 @@ static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
 
 	/* Not last descriptor */
 	if (status & rx_not_ls)
-		return priv->dma_buf_sz;
+		return priv->dma_buf_sz - stmmac_rx_offset(priv);
 
 	plen = stmmac_get_rx_frame_len(priv, p, coe);
 
 	/* Last descriptor */
-	return plen - len;
+	return plen - len - stmmac_rx_offset(priv);
 }
 
 static int stmmac_xdp_xmit_xdpf(struct stmmac_priv *priv, int queue,


Regards,
Marc Zyngier Aug. 19, 2021, 3:26 p.m. UTC | #21
On Tue, 17 Aug 2021 01:01:57 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Mon, 16 Aug 2021 08:12:08 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Thu, 12 Aug 2021 12:05:38 +0100 Marc Zyngier wrote:
> > > > A possible fix, which takes in account also the XDP headroom for
> > > > stmmac_rx_buf1_len() only could be (only compile tested, I don't
> > > > have the hardware now):  
> > > 
> > > However, this doesn't fix my issue. I still get all sort of
> > > corruption. Probably stmmac_rx_buf2_len() also need adjusting (it
> > > has a similar logic as its buf1 counterpart...)
> > > 
> > > Unless you can fix it very quickly, and given that we're towards the
> > > end of the cycle, I'd be more comfortable if we reverted this patch.
> > 
> > Any luck investigating this one? The rc6 announcement sounds like
> > there may not be that many more rc releases for 5.14.
> 
> Hi Jackub.
> 
> Unfortunately I have only a device with stmmac, and it works fine with
> the patch. It seems that not all hardware suffers from this issue.
> 
> Also, using NET_IP_ALIGN on RX is a common pattern, I think that any
> ethernet device is doing the same to align the IPv4 header.
> 
> Anyway, I asked for two tests on the affected device:
> 1. Change NET_IP_ALIGN with 8, to see if the DMA has problems in
>    receiving to a non word aligned address
> 2. load a nop XDP program (I provided one), to see if the problem is
>    already there when XDP is used

Catching up on email, as I was away earlier this week. I'm yet to test
these two things (hopefully by the end of the day), but I just gave
the patch below a go, just in case...

> 
> I doubt that changing also stmmac_rx_buf2_len would help,
> but it's worth a try, here is a patch:
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 7b8404a21544..73d1f0ec66ff 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -93,7 +93,7 @@ static int tc = TC_DEFAULT;
>  module_param(tc, int, 0644);
>  MODULE_PARM_DESC(tc, "DMA threshold control value");
>  
> -#define	DEFAULT_BUFSIZE	1536
> +#define	DEFAULT_BUFSIZE	1536 + XDP_PACKET_HEADROOM + NET_IP_ALIGN
>  static int buf_sz = DEFAULT_BUFSIZE;
>  module_param(buf_sz, int, 0644);
>  MODULE_PARM_DESC(buf_sz, "DMA buffer size");
> @@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv,
>  
>  	/* First descriptor, not last descriptor and not split header */
>  	if (status & rx_not_ls)
> -		return priv->dma_buf_sz;
> +		return priv->dma_buf_sz - stmmac_rx_offset(priv);
>  
>  	plen = stmmac_get_rx_frame_len(priv, p, coe);
>  
>  	/* First descriptor and last descriptor and not split header */
> -	return min_t(unsigned int, priv->dma_buf_sz, plen);
> +	return min_t(unsigned int, priv->dma_buf_sz - stmmac_rx_offset(priv), plen);
>  }
>  
>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> @@ -4529,12 +4529,12 @@ static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
>  
>  	/* Not last descriptor */
>  	if (status & rx_not_ls)
> -		return priv->dma_buf_sz;
> +		return priv->dma_buf_sz - stmmac_rx_offset(priv);
>  
>  	plen = stmmac_get_rx_frame_len(priv, p, coe);
>  
>  	/* Last descriptor */
> -	return plen - len;
> +	return plen - len - stmmac_rx_offset(priv);
>  }
>  
>  static int stmmac_xdp_xmit_xdpf(struct stmmac_priv *priv, int queue,

This patch results in a cosmic explosion, as it tries to invalidate a
range of VAs that cannot possibly be allocated to the DMA. Not really
what was expected.

Thanks,

	M.

[   19.587499] Unable to handle kernel write to read-only memory at virtual address ffff000088f89000
[   19.596375] Mem abort info:
[   19.599165]   ESR = 0x9600014f
[   19.602215]   EC = 0x25: DABT (current EL), IL = 32 bits
[   19.607519]   SET = 0, FnV = 0
[   19.610568]   EA = 0, S1PTW = 0
[   19.613703]   FSC = 0x0f: level 3 permission fault
[   19.618487] Data abort info:
[   19.621362]   ISV = 0, ISS = 0x0000014f
[   19.625192]   CM = 1, WnR = 1
[   19.628154] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000fa7b2000
[   19.634844] [ffff000088f89000] pgd=18000002771f7003, p4d=18000002771f7003, pud=1800000275ffd003, pmd=1800000275fb5003, pte=0060000108f89f87
[   19.647358] Internal error: Oops: 9600014f [#1] PREEMPT SMP
[   19.652920] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) tegra_drm(E) cec(E) drm_kms_helper(E) evdev(E) snd_hda_codec_hdmi(E) snd_hda_tegra(E) snd_hda_codec(E) aes_ce_blk(E) crypto_simd(E) snd_hda_core(E) cryptd(E) snd_hwdep(E) at24(E) aes_ce_cipher(E) snd_pcm(E) ghash_ce(E) gf128mul(E) snd_timer(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) snd(E) soundcore(E) tegra_bpmp_thermal(E) efi_pstore(E) tegra_xudc(E) udc_core(E) host1x(E) ina3221(E) fuse(E) drm(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) nvme(E) nvme_core(E) t10_pi(E) broadcom(E) bcm_phy_lib(E) ahci_tegra(E) libahci_platform(E) gpio_keys(E) libahci(E) sdhci_tegra(E) dwmac_dwc_qos_eth(E) stmmac_platform(E) libata(E) stmmac(E) sdhci_pltfm(E) pcs_xpcs(E) max77620_regulator(E) xhci_tegra(E) phylink(E) of_mdio(E) cqhci(E) scsi_mod(E) fixed_phy(E) sdhci(E) fwnode_mdio(E) libphy(E)
[   19.729311] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S          E     5.14.0-rc6-dirty #4155
[   19.737734] Hardware name:  , BIOS 2021.04-rc2-00042-g2dddc1bb29 02/18/2021
[   19.744679] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[   19.750673] pc : dcache_inval_poc+0x40/0x58
[   19.754852] lr : arch_sync_dma_for_cpu+0x2c/0x40
[   19.759460] sp : ffff800010003be0
[   19.762765] x29: ffff800010003be0 x28: ffff000085338000 x27: 0000000000000028
[   19.769889] x26: ffff000080a8fc00 x25: ffff000085338028 x24: 0000000000000000
[   19.777012] x23: 0000000000000002 x22: 0000000000000002 x21: 00000000ffffffbc
[   19.784135] x20: 0000000108f43000 x19: ffff000080223010 x18: 0000000000000000
[   19.791258] x17: ffff8001e3815000 x16: ffff800010004000 x15: 0000000000004000
[   19.798380] x14: 0000000000000000 x13: 0000000080000000 x12: 0000000000000001
[   19.805504] x11: 0000000000000004 x10: 0000000000000008 x9 : ffff8000109b9a2c
[   19.812626] x8 : 0000000000000000 x7 : 0000000000000001 x6 : ffff000088246500
[   19.819748] x5 : fffffffffffff000 x4 : 0003000108f40000 x3 : 000000000000003f
[   19.826869] x2 : 0000000000000040 x1 : ffff000188f42f80 x0 : ffff000088f89000
[   19.833994] Call trace:
[   19.836434]  dcache_inval_poc+0x40/0x58
[   19.840262]  iommu_dma_sync_single_for_cpu+0xdc/0xe0
[   19.845219]  dma_sync_single_for_cpu+0x3c/0x124
[   19.849742]  stmmac_rx+0x448/0x930 [stmmac]
[   19.853937]  stmmac_napi_poll_rx+0x4c/0xec [stmmac]
[   19.858818]  __napi_poll+0x40/0x210
[   19.862300]  net_rx_action+0x304/0x370
[   19.866041]  __do_softirq+0x130/0x3d8
[   19.869695]  __irq_exit_rcu+0x100/0x110
[   19.873525]  irq_exit+0x1c/0x30
[   19.876658]  handle_domain_irq+0x70/0x9c
[   19.880573]  gic_handle_irq+0x58/0xe0
[   19.884225]  call_on_irq_stack+0x2c/0x54
[   19.888138]  do_interrupt_handler+0x5c/0x70
[   19.892313]  el1_interrupt+0x30/0x70
[   19.895881]  el1h_64_irq_handler+0x18/0x24
[   19.899970]  el1h_64_irq+0x78/0x7c
[   19.903361]  arch_cpu_idle+0x18/0x3c
[   19.906930]  default_idle_call+0x4c/0x1d4
[   19.910931]  cpuidle_idle_call+0x168/0x1e0
[   19.915019]  do_idle+0xb8/0x110
[   19.918152]  cpu_startup_entry+0x30/0x8c
[   19.922065]  rest_init+0xf0/0x100
[   19.925372]  arch_call_rest_init+0x1c/0x28
[   19.929463]  start_kernel+0x4a0/0x4d8
[   19.933114]  __primary_switched+0xc0/0xc8
[   19.937118] Code: 8a230000 54000060 d50b7e20 14000002 (d5087620) 
[   19.943204] ---[ end trace f26fd0d14eea9a7d ]---
[   19.947811] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[   19.954670] SMP: stopping secondary CPUs
[   19.958592] Kernel Offset: 0x1a0000 from 0xffff800010000000
[   19.964151] PHYS_OFFSET: 0x80000000
[   19.967630] CPU features: 0x10000231,00000846
[   19.971976] Memory Limit: none
[   19.975026] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
Marc Zyngier Aug. 19, 2021, 4:29 p.m. UTC | #22
On Thu, 12 Aug 2021 12:18:48 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> [1  <text/plain; UTF-8 (7bit)>]
> On Thu, Aug 12, 2021 at 1:05 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 12 Aug 2021 11:18:35 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > On Thu, 12 Aug 2021 10:48:03 +0200
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > >
> > > >
> > > > On 8/11/21 4:16 PM, Marc Zyngier wrote:
> > > > > On Wed, 11 Aug 2021 13:53:59 +0100,
> > > > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > >
> > > > >> Are you sure you do not need to adjust stmmac_set_bfsize(),
> > > > >> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
> > > > >>
> > > > >> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
> > > > >>
> > > > >> Patch for stmmac_rx_buf1_len() :
> > > > >>
> > > > >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index
> > > > >> 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9
> > > > >> 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++
> > > > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4508,12
> > > > >> +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct
> > > > >> stmmac_priv *priv, /* First descriptor, not last descriptor and
> > > > >> not split header */ if (status & rx_not_ls)
> > > > >> -               return priv->dma_buf_sz;
> > > > >> +               return priv->dma_buf_sz - NET_SKB_PAD -
> > > > >> NET_IP_ALIGN;
> > > > >>         plen = stmmac_get_rx_frame_len(priv, p, coe);
> > > > >>
> > > > >>         /* First descriptor and last descriptor and not split
> > > > >> header */
> > > > >> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
> > > > >> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD
> > > > >> - NET_IP_ALIGN, plen); }
> > > > >>
> > > > >>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> > > > >
> > > > > Feels like a major deficiency of the original patch. Happy to test a
> > > > > more complete patch if/when you have one.
> > > >
> > > > I wont have time in the immediate future.
> > > >
> > > > Matteo, if you do not work on a fix, I suggest we revert
> > > >  a955318fe67ec0d962760b5ee58e74bffaf649b8 stmmac: align RX buffers
> > > >
> > > > before a more polished version can be submitted.
> > > >
> > >
> > > Better to use stmmac_rx_offset() so to have the correct length when
> > > using XDP. Also, when XDP is enabled, the offset was
> > > XDP_PACKET_HEADROOM (i.e. 256 bytes) even before the change, so it
> > > could be already broken. Mark, can you try on the Jetson TX2 by
> > > attaching an XDP program and see if it works without my change?
> >
> > Sorry, you'll have to hold my hand here, as I know exactly nothing
> > about XDP....
> >
> 
> Attach the attached object with:
> 
> ip link set eth0 xdp object passall.o
> 
> This is an empty XDP program, its source:
> 
> __attribute__((section("prog"), used))
> int xdp_main(struct xdp_md *ctx)
> {
>        return XDP_PASS;
> }
> 
> Every packet will pass untouched, but the offset will be shifted from
> 0 to 256 bytes, which could trigger the problem anyway:

Nope. On 5.13, which doesn't have the issue, adding this payload
doesn't result in any problem and the whole thing is rock solid.

> 
> > > A possible fix, which takes in account also the XDP headroom for
> > > stmmac_rx_buf1_len() only could be (only compile tested, I don't have
> > > the hardware now):
> >
> > However, this doesn't fix my issue. I still get all sort of
> > corruption. Probably stmmac_rx_buf2_len() also need adjusting (it has
> > a similar logic as its buf1 counterpart...)
> >
> > Unless you can fix it very quickly, and given that we're towards the
> > end of the cycle, I'd be more comfortable if we reverted this patch.
> >
> 
> Can it be that the HW can't do DMA on an address which is not word aligned?
> What if you replace NET_SKB_PAD with, let's say, 8?

With this:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index fcdb1d20389b..244aa6579ef4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
 	if (stmmac_xdp_is_enabled(priv))
 		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
 
-	return NET_SKB_PAD + NET_IP_ALIGN;
+	return 8 + NET_IP_ALIGN;
 }
 
 void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);

I don't see the system corrupting packets anymore. Is that exactly
what you had in mind? This really seems to point to a basic buffer
overflow.

Thanks,

	M.
Matteo Croce Aug. 20, 2021, 10:37 a.m. UTC | #23
On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 12 Aug 2021 12:18:48 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > [1  <text/plain; UTF-8 (7bit)>]
> > On Thu, Aug 12, 2021 at 1:05 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Thu, 12 Aug 2021 11:18:35 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > >
> > > > On Thu, 12 Aug 2021 10:48:03 +0200
> > > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 8/11/21 4:16 PM, Marc Zyngier wrote:
> > > > > > On Wed, 11 Aug 2021 13:53:59 +0100,
> > > > > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > > >
> > > > > >> Are you sure you do not need to adjust stmmac_set_bfsize(),
> > > > > >> stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ?
> > > > > >>
> > > > > >> Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD
> > > > > >>
> > > > > >> Patch for stmmac_rx_buf1_len() :
> > > > > >>
> > > > > >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > > > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index
> > > > > >> 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9
> > > > > >> 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++
> > > > > >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4508,12
> > > > > >> +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct
> > > > > >> stmmac_priv *priv, /* First descriptor, not last descriptor and
> > > > > >> not split header */ if (status & rx_not_ls)
> > > > > >> -               return priv->dma_buf_sz;
> > > > > >> +               return priv->dma_buf_sz - NET_SKB_PAD -
> > > > > >> NET_IP_ALIGN;
> > > > > >>         plen = stmmac_get_rx_frame_len(priv, p, coe);
> > > > > >>
> > > > > >>         /* First descriptor and last descriptor and not split
> > > > > >> header */
> > > > > >> -       return min_t(unsigned int, priv->dma_buf_sz, plen);
> > > > > >> +       return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD
> > > > > >> - NET_IP_ALIGN, plen); }
> > > > > >>
> > > > > >>  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
> > > > > >
> > > > > > Feels like a major deficiency of the original patch. Happy to test a
> > > > > > more complete patch if/when you have one.
> > > > >
> > > > > I wont have time in the immediate future.
> > > > >
> > > > > Matteo, if you do not work on a fix, I suggest we revert
> > > > >  a955318fe67ec0d962760b5ee58e74bffaf649b8 stmmac: align RX buffers
> > > > >
> > > > > before a more polished version can be submitted.
> > > > >
> > > >
> > > > Better to use stmmac_rx_offset() so to have the correct length when
> > > > using XDP. Also, when XDP is enabled, the offset was
> > > > XDP_PACKET_HEADROOM (i.e. 256 bytes) even before the change, so it
> > > > could be already broken. Mark, can you try on the Jetson TX2 by
> > > > attaching an XDP program and see if it works without my change?
> > >
> > > Sorry, you'll have to hold my hand here, as I know exactly nothing
> > > about XDP....
> > >
> >
> > Attach the attached object with:
> >
> > ip link set eth0 xdp object passall.o
> >
> > This is an empty XDP program, its source:
> >
> > __attribute__((section("prog"), used))
> > int xdp_main(struct xdp_md *ctx)
> > {
> >        return XDP_PASS;
> > }
> >
> > Every packet will pass untouched, but the offset will be shifted from
> > 0 to 256 bytes, which could trigger the problem anyway:
>
> Nope. On 5.13, which doesn't have the issue, adding this payload
> doesn't result in any problem and the whole thing is rock solid.
>
> >
> > > > A possible fix, which takes in account also the XDP headroom for
> > > > stmmac_rx_buf1_len() only could be (only compile tested, I don't have
> > > > the hardware now):
> > >
> > > However, this doesn't fix my issue. I still get all sort of
> > > corruption. Probably stmmac_rx_buf2_len() also need adjusting (it has
> > > a similar logic as its buf1 counterpart...)
> > >
> > > Unless you can fix it very quickly, and given that we're towards the
> > > end of the cycle, I'd be more comfortable if we reverted this patch.
> > >
> >
> > Can it be that the HW can't do DMA on an address which is not word aligned?
> > What if you replace NET_SKB_PAD with, let's say, 8?
>
> With this:
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> index fcdb1d20389b..244aa6579ef4 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
>         if (stmmac_xdp_is_enabled(priv))
>                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
>
> -       return NET_SKB_PAD + NET_IP_ALIGN;
> +       return 8 + NET_IP_ALIGN;
>  }
>
>  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
>
> I don't see the system corrupting packets anymore. Is that exactly
> what you had in mind? This really seems to point to a basic buffer
> overflow.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Sorry, I meant something like:

-       return NET_SKB_PAD + NET_IP_ALIGN;
+       return 8;

I had some hardware which DMA fails if the receive buffer was not word
aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
it's not aligned too.
Marc Zyngier Aug. 20, 2021, 4:26 p.m. UTC | #24
On Fri, 20 Aug 2021 11:37:03 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:

[...]

> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > index fcdb1d20389b..244aa6579ef4 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> >         if (stmmac_xdp_is_enabled(priv))
> >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> >
> > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > +       return 8 + NET_IP_ALIGN;
> >  }
> >
> >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> >
> > I don't see the system corrupting packets anymore. Is that exactly
> > what you had in mind? This really seems to point to a basic buffer
> > overflow.

[...]

> Sorry, I meant something like:
> 
> -       return NET_SKB_PAD + NET_IP_ALIGN;
> +       return 8;
> 
> I had some hardware which DMA fails if the receive buffer was not word
> aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> it's not aligned too.

No error in that case either, as expected. Given that NET_SKB_PAD is
likely to expand to 64, it is likely a DMA buffer overflow which
probably only triggers for large-ish packets.

Now, we're almost at -rc7, and we don't have a solution in sight.

Can we please revert this until we have an understanding of what is
happening? I'll hopefully have more cycles to work on the issue once
5.14 is out, and hopefully the maintainers of this driver can chime in
(they have been pretty quiet so far).

Thanks,

	M.
Matteo Croce Aug. 20, 2021, 4:38 p.m. UTC | #25
On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 20 Aug 2021 11:37:03 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
>
> [...]
>
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > index fcdb1d20389b..244aa6579ef4 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > >         if (stmmac_xdp_is_enabled(priv))
> > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > >
> > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > +       return 8 + NET_IP_ALIGN;
> > >  }
> > >
> > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > >
> > > I don't see the system corrupting packets anymore. Is that exactly
> > > what you had in mind? This really seems to point to a basic buffer
> > > overflow.
>
> [...]
>
> > Sorry, I meant something like:
> >
> > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > +       return 8;
> >
> > I had some hardware which DMA fails if the receive buffer was not word
> > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > it's not aligned too.
>
> No error in that case either, as expected. Given that NET_SKB_PAD is
> likely to expand to 64, it is likely a DMA buffer overflow which
> probably only triggers for large-ish packets.
>
> Now, we're almost at -rc7, and we don't have a solution in sight.
>
> Can we please revert this until we have an understanding of what is
> happening? I'll hopefully have more cycles to work on the issue once
> 5.14 is out, and hopefully the maintainers of this driver can chime in
> (they have been pretty quiet so far).
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?

-       return NET_SKB_PAD + NET_IP_ALIGN;
+       return NET_IP_ALIGN;

I think that alloc_skb adds another NET_SKB_PAD anyway.
Marc Zyngier Aug. 20, 2021, 5:09 p.m. UTC | #26
On Fri, 20 Aug 2021 17:38:14 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 20 Aug 2021 11:37:03 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > [...]
> >
> > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > >         if (stmmac_xdp_is_enabled(priv))
> > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > >
> > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > +       return 8 + NET_IP_ALIGN;
> > > >  }
> > > >
> > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > >
> > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > what you had in mind? This really seems to point to a basic buffer
> > > > overflow.
> >
> > [...]
> >
> > > Sorry, I meant something like:
> > >
> > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > +       return 8;
> > >
> > > I had some hardware which DMA fails if the receive buffer was not word
> > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > it's not aligned too.
> >
> > No error in that case either, as expected. Given that NET_SKB_PAD is
> > likely to expand to 64, it is likely a DMA buffer overflow which
> > probably only triggers for large-ish packets.
> >
> > Now, we're almost at -rc7, and we don't have a solution in sight.
> >
> > Can we please revert this until we have an understanding of what is
> > happening? I'll hopefully have more cycles to work on the issue once
> > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > (they have been pretty quiet so far).
> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
> 
> Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> 
> -       return NET_SKB_PAD + NET_IP_ALIGN;
> +       return NET_IP_ALIGN;
> 
> I think that alloc_skb adds another NET_SKB_PAD anyway.

I don't see any packet corruption with this. However, this doesn't
prove that this is correct either. What was the rational for adding
NET_SKB_PAD the first place?

	M.
Matteo Croce Aug. 20, 2021, 5:14 p.m. UTC | #27
On Fri, Aug 20, 2021 at 7:09 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 20 Aug 2021 17:38:14 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 20 Aug 2021 11:37:03 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > >
> > > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > [...]
> > >
> > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > > >         if (stmmac_xdp_is_enabled(priv))
> > > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > > >
> > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > +       return 8 + NET_IP_ALIGN;
> > > > >  }
> > > > >
> > > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > > >
> > > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > > what you had in mind? This really seems to point to a basic buffer
> > > > > overflow.
> > >
> > > [...]
> > >
> > > > Sorry, I meant something like:
> > > >
> > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > +       return 8;
> > > >
> > > > I had some hardware which DMA fails if the receive buffer was not word
> > > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > > it's not aligned too.
> > >
> > > No error in that case either, as expected. Given that NET_SKB_PAD is
> > > likely to expand to 64, it is likely a DMA buffer overflow which
> > > probably only triggers for large-ish packets.
> > >
> > > Now, we're almost at -rc7, and we don't have a solution in sight.
> > >
> > > Can we please revert this until we have an understanding of what is
> > > happening? I'll hopefully have more cycles to work on the issue once
> > > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > > (they have been pretty quiet so far).
> > >
> > > Thanks,
> > >
> > >         M.
> > >
> > > --
> > > Without deviation from the norm, progress is not possible.
> >
> > Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> >
> > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > +       return NET_IP_ALIGN;
> >
> > I think that alloc_skb adds another NET_SKB_PAD anyway.
>
> I don't see any packet corruption with this. However, this doesn't
> prove that this is correct either. What was the rational for adding
> NET_SKB_PAD the first place?
>

I think it's wrong. The original offset was 0, and to align it to the
boundary we need to add just NET_IP_ALIGN, which is two.
NET_SKB_PAD is a much bigger value, (I think 64), which is used to
reserve space to prepend an header, e.g. with tunnels.
Marc Zyngier Aug. 20, 2021, 5:24 p.m. UTC | #28
On Fri, 20 Aug 2021 18:14:30 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Fri, Aug 20, 2021 at 7:09 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 20 Aug 2021 17:38:14 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 20 Aug 2021 11:37:03 +0100,
> > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > >
> > > > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > [...]
> > > >
> > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > > > >         if (stmmac_xdp_is_enabled(priv))
> > > > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > > > >
> > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > +       return 8 + NET_IP_ALIGN;
> > > > > >  }
> > > > > >
> > > > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > > > >
> > > > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > > > what you had in mind? This really seems to point to a basic buffer
> > > > > > overflow.
> > > >
> > > > [...]
> > > >
> > > > > Sorry, I meant something like:
> > > > >
> > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > +       return 8;
> > > > >
> > > > > I had some hardware which DMA fails if the receive buffer was not word
> > > > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > > > it's not aligned too.
> > > >
> > > > No error in that case either, as expected. Given that NET_SKB_PAD is
> > > > likely to expand to 64, it is likely a DMA buffer overflow which
> > > > probably only triggers for large-ish packets.
> > > >
> > > > Now, we're almost at -rc7, and we don't have a solution in sight.
> > > >
> > > > Can we please revert this until we have an understanding of what is
> > > > happening? I'll hopefully have more cycles to work on the issue once
> > > > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > > > (they have been pretty quiet so far).
> > > >
> > > > Thanks,
> > > >
> > > >         M.
> > > >
> > > > --
> > > > Without deviation from the norm, progress is not possible.
> > >
> > > Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> > >
> > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > +       return NET_IP_ALIGN;
> > >
> > > I think that alloc_skb adds another NET_SKB_PAD anyway.
> >
> > I don't see any packet corruption with this. However, this doesn't
> > prove that this is correct either. What was the rational for adding
> > NET_SKB_PAD the first place?
> >
> 
> I think it's wrong. The original offset was 0, and to align it to the
> boundary we need to add just NET_IP_ALIGN, which is two.
> NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> reserve space to prepend an header, e.g. with tunnels.

How about the other adjustments that Eric mentioned regarding the size
of the buffer? Aren't they required?

	M.
Matteo Croce Aug. 20, 2021, 5:35 p.m. UTC | #29
On Fri, Aug 20, 2021 at 7:24 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 20 Aug 2021 18:14:30 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Fri, Aug 20, 2021 at 7:09 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 20 Aug 2021 17:38:14 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > >
> > > > On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > On Fri, 20 Aug 2021 11:37:03 +0100,
> > > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > > >
> > > > > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > > > > >         if (stmmac_xdp_is_enabled(priv))
> > > > > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > > > > >
> > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > +       return 8 + NET_IP_ALIGN;
> > > > > > >  }
> > > > > > >
> > > > > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > > > > >
> > > > > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > > > > what you had in mind? This really seems to point to a basic buffer
> > > > > > > overflow.
> > > > >
> > > > > [...]
> > > > >
> > > > > > Sorry, I meant something like:
> > > > > >
> > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > +       return 8;
> > > > > >
> > > > > > I had some hardware which DMA fails if the receive buffer was not word
> > > > > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > > > > it's not aligned too.
> > > > >
> > > > > No error in that case either, as expected. Given that NET_SKB_PAD is
> > > > > likely to expand to 64, it is likely a DMA buffer overflow which
> > > > > probably only triggers for large-ish packets.
> > > > >
> > > > > Now, we're almost at -rc7, and we don't have a solution in sight.
> > > > >
> > > > > Can we please revert this until we have an understanding of what is
> > > > > happening? I'll hopefully have more cycles to work on the issue once
> > > > > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > > > > (they have been pretty quiet so far).
> > > > >
> > > > > Thanks,
> > > > >
> > > > >         M.
> > > > >
> > > > > --
> > > > > Without deviation from the norm, progress is not possible.
> > > >
> > > > Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> > > >
> > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > +       return NET_IP_ALIGN;
> > > >
> > > > I think that alloc_skb adds another NET_SKB_PAD anyway.
> > >
> > > I don't see any packet corruption with this. However, this doesn't
> > > prove that this is correct either. What was the rational for adding
> > > NET_SKB_PAD the first place?
> > >
> >
> > I think it's wrong. The original offset was 0, and to align it to the
> > boundary we need to add just NET_IP_ALIGN, which is two.
> > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > reserve space to prepend an header, e.g. with tunnels.
>
> How about the other adjustments that Eric mentioned regarding the size
> of the buffer? Aren't they required?
>

I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
be already broken when XDP is in use.
When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
byte, which would easily trigger an overflow if not accounted.
Did you try attaching a simple XDP program on a stock 5.13 kernel?
Marc Zyngier Aug. 20, 2021, 5:51 p.m. UTC | #30
On Fri, 20 Aug 2021 18:35:45 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Fri, Aug 20, 2021 at 7:24 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 20 Aug 2021 18:14:30 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > On Fri, Aug 20, 2021 at 7:09 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 20 Aug 2021 17:38:14 +0100,
> > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > >
> > > > > On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, 20 Aug 2021 11:37:03 +0100,
> > > > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > > > >
> > > > > > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > > > > > >         if (stmmac_xdp_is_enabled(priv))
> > > > > > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > > > > > >
> > > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > > +       return 8 + NET_IP_ALIGN;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > > > > > >
> > > > > > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > > > > > what you had in mind? This really seems to point to a basic buffer
> > > > > > > > overflow.
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > Sorry, I meant something like:
> > > > > > >
> > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > +       return 8;
> > > > > > >
> > > > > > > I had some hardware which DMA fails if the receive buffer was not word
> > > > > > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > > > > > it's not aligned too.
> > > > > >
> > > > > > No error in that case either, as expected. Given that NET_SKB_PAD is
> > > > > > likely to expand to 64, it is likely a DMA buffer overflow which
> > > > > > probably only triggers for large-ish packets.
> > > > > >
> > > > > > Now, we're almost at -rc7, and we don't have a solution in sight.
> > > > > >
> > > > > > Can we please revert this until we have an understanding of what is
> > > > > > happening? I'll hopefully have more cycles to work on the issue once
> > > > > > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > > > > > (they have been pretty quiet so far).
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >         M.
> > > > > >
> > > > > > --
> > > > > > Without deviation from the norm, progress is not possible.
> > > > >
> > > > > Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> > > > >
> > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > +       return NET_IP_ALIGN;
> > > > >
> > > > > I think that alloc_skb adds another NET_SKB_PAD anyway.
> > > >
> > > > I don't see any packet corruption with this. However, this doesn't
> > > > prove that this is correct either. What was the rational for adding
> > > > NET_SKB_PAD the first place?
> > > >
> > >
> > > I think it's wrong. The original offset was 0, and to align it to the
> > > boundary we need to add just NET_IP_ALIGN, which is two.
> > > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > > reserve space to prepend an header, e.g. with tunnels.
> >
> > How about the other adjustments that Eric mentioned regarding the size
> > of the buffer? Aren't they required?
> >
> 
> I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
> be already broken when XDP is in use.
> When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
> byte, which would easily trigger an overflow if not accounted.
> Did you try attaching a simple XDP program on a stock 5.13 kernel?

Yes, as mentioned in [1], to which you replied...

	M.

[1] https://lore.kernel.org/r/87wnohqty1.wl-maz@kernel.org
Matteo Croce Aug. 20, 2021, 5:56 p.m. UTC | #31
On Fri, Aug 20, 2021 at 7:51 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 20 Aug 2021 18:35:45 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Fri, Aug 20, 2021 at 7:24 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 20 Aug 2021 18:14:30 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > >
> > > > On Fri, Aug 20, 2021 at 7:09 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > On Fri, 20 Aug 2021 17:38:14 +0100,
> > > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > >
> > > > > > > On Fri, 20 Aug 2021 11:37:03 +0100,
> > > > > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > > > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > > > > > > >         if (stmmac_xdp_is_enabled(priv))
> > > > > > > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > > > > > > >
> > > > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > > > +       return 8 + NET_IP_ALIGN;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > > > > > > >
> > > > > > > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > > > > > > what you had in mind? This really seems to point to a basic buffer
> > > > > > > > > overflow.
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > Sorry, I meant something like:
> > > > > > > >
> > > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > > +       return 8;
> > > > > > > >
> > > > > > > > I had some hardware which DMA fails if the receive buffer was not word
> > > > > > > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > > > > > > it's not aligned too.
> > > > > > >
> > > > > > > No error in that case either, as expected. Given that NET_SKB_PAD is
> > > > > > > likely to expand to 64, it is likely a DMA buffer overflow which
> > > > > > > probably only triggers for large-ish packets.
> > > > > > >
> > > > > > > Now, we're almost at -rc7, and we don't have a solution in sight.
> > > > > > >
> > > > > > > Can we please revert this until we have an understanding of what is
> > > > > > > happening? I'll hopefully have more cycles to work on the issue once
> > > > > > > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > > > > > > (they have been pretty quiet so far).
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >         M.
> > > > > > >
> > > > > > > --
> > > > > > > Without deviation from the norm, progress is not possible.
> > > > > >
> > > > > > Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> > > > > >
> > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > +       return NET_IP_ALIGN;
> > > > > >
> > > > > > I think that alloc_skb adds another NET_SKB_PAD anyway.
> > > > >
> > > > > I don't see any packet corruption with this. However, this doesn't
> > > > > prove that this is correct either. What was the rational for adding
> > > > > NET_SKB_PAD the first place?
> > > > >
> > > >
> > > > I think it's wrong. The original offset was 0, and to align it to the
> > > > boundary we need to add just NET_IP_ALIGN, which is two.
> > > > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > > > reserve space to prepend an header, e.g. with tunnels.
> > >
> > > How about the other adjustments that Eric mentioned regarding the size
> > > of the buffer? Aren't they required?
> > >
> >
> > I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
> > be already broken when XDP is in use.
> > When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
> > byte, which would easily trigger an overflow if not accounted.
> > Did you try attaching a simple XDP program on a stock 5.13 kernel?
>
> Yes, as mentioned in [1], to which you replied...
>
>         M.
>
> [1] https://lore.kernel.org/r/87wnohqty1.wl-maz@kernel.org
>

Great.
So I doubt that the adjustment is needed.
Does it work with all the frame size?
Matteo Croce Aug. 20, 2021, 6:05 p.m. UTC | #32
On Fri, Aug 20, 2021 at 7:56 PM Matteo Croce <mcroce@linux.microsoft.com> wrote:
>
> On Fri, Aug 20, 2021 at 7:51 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 20 Aug 2021 18:35:45 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > On Fri, Aug 20, 2021 at 7:24 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 20 Aug 2021 18:14:30 +0100,
> > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > >
> > > > > On Fri, Aug 20, 2021 at 7:09 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, 20 Aug 2021 17:38:14 +0100,
> > > > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > > > >
> > > > > > > On Fri, Aug 20, 2021 at 6:26 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Fri, 20 Aug 2021 11:37:03 +0100,
> > > > > > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Aug 19, 2021 at 6:29 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > > > index fcdb1d20389b..244aa6579ef4 100644
> > > > > > > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > > > > > > > > > @@ -341,7 +341,7 @@ static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
> > > > > > > > > >         if (stmmac_xdp_is_enabled(priv))
> > > > > > > > > >                 return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
> > > > > > > > > >
> > > > > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > > > > +       return 8 + NET_IP_ALIGN;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > >  void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
> > > > > > > > > >
> > > > > > > > > > I don't see the system corrupting packets anymore. Is that exactly
> > > > > > > > > > what you had in mind? This really seems to point to a basic buffer
> > > > > > > > > > overflow.
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > > Sorry, I meant something like:
> > > > > > > > >
> > > > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > > > +       return 8;
> > > > > > > > >
> > > > > > > > > I had some hardware which DMA fails if the receive buffer was not word
> > > > > > > > > aligned, but this seems not the case, as 8 + NET_IP_ALIGN = 10, and
> > > > > > > > > it's not aligned too.
> > > > > > > >
> > > > > > > > No error in that case either, as expected. Given that NET_SKB_PAD is
> > > > > > > > likely to expand to 64, it is likely a DMA buffer overflow which
> > > > > > > > probably only triggers for large-ish packets.
> > > > > > > >
> > > > > > > > Now, we're almost at -rc7, and we don't have a solution in sight.
> > > > > > > >
> > > > > > > > Can we please revert this until we have an understanding of what is
> > > > > > > > happening? I'll hopefully have more cycles to work on the issue once
> > > > > > > > 5.14 is out, and hopefully the maintainers of this driver can chime in
> > > > > > > > (they have been pretty quiet so far).
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >         M.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Without deviation from the norm, progress is not possible.
> > > > > > >
> > > > > > > Last try, what about adding only NET_IP_ALIGN and leaving NET_SKB_PAD?
> > > > > > >
> > > > > > > -       return NET_SKB_PAD + NET_IP_ALIGN;
> > > > > > > +       return NET_IP_ALIGN;
> > > > > > >
> > > > > > > I think that alloc_skb adds another NET_SKB_PAD anyway.
> > > > > >
> > > > > > I don't see any packet corruption with this. However, this doesn't
> > > > > > prove that this is correct either. What was the rational for adding
> > > > > > NET_SKB_PAD the first place?
> > > > > >
> > > > >
> > > > > I think it's wrong. The original offset was 0, and to align it to the
> > > > > boundary we need to add just NET_IP_ALIGN, which is two.
> > > > > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > > > > reserve space to prepend an header, e.g. with tunnels.
> > > >
> > > > How about the other adjustments that Eric mentioned regarding the size
> > > > of the buffer? Aren't they required?
> > > >
> > >
> > > I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
> > > be already broken when XDP is in use.
> > > When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
> > > byte, which would easily trigger an overflow if not accounted.
> > > Did you try attaching a simple XDP program on a stock 5.13 kernel?
> >
> > Yes, as mentioned in [1], to which you replied...
> >
> >         M.
> >
> > [1] https://lore.kernel.org/r/87wnohqty1.wl-maz@kernel.org
> >
>
> Great.
> So I doubt that the adjustment is needed.
> Does it work with all the frame size?
>

Last check, are you sure that the bpf program was loaded in the driver
and not as generic XDP?
You can force it as native with "xdpdrv":

ip link set eth xdpdrv object kernel_passall.o
Marc Zyngier Aug. 20, 2021, 6:09 p.m. UTC | #33
On Fri, 20 Aug 2021 18:56:33 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Fri, Aug 20, 2021 at 7:51 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 20 Aug 2021 18:35:45 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > > > I think it's wrong. The original offset was 0, and to align it to the
> > > > > boundary we need to add just NET_IP_ALIGN, which is two.
> > > > > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > > > > reserve space to prepend an header, e.g. with tunnels.
> > > >
> > > > How about the other adjustments that Eric mentioned regarding the size
> > > > of the buffer? Aren't they required?
> > > >
> > >
> > > I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
> > > be already broken when XDP is in use.
> > > When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
> > > byte, which would easily trigger an overflow if not accounted.
> > > Did you try attaching a simple XDP program on a stock 5.13 kernel?
> >
> > Yes, as mentioned in [1], to which you replied...
> >
> >         M.
> >
> > [1] https://lore.kernel.org/r/87wnohqty1.wl-maz@kernel.org
> >
> 
> Great.
> So I doubt that the adjustment is needed.
> Does it work with all the frame size?

I have no idea. Honestly, you are the one who should be able to answer
these questions, given that you should have worked out how the buffer
allocations work in this particular driver.

This whole "let's try another random set of values until something
sticks" is not how things ought to be done, and doesn't fill me with
the utmost confidence that 5.14 (which apparently may well be cut in
*two days*) is going to have a solid stmmac driver.

I re-re-request that this patch gets reverted until you figure out
what is wrong with the initial patch.

Thanks,

	M.
Matteo Croce Aug. 20, 2021, 6:14 p.m. UTC | #34
On Fri, Aug 20, 2021 at 8:09 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 20 Aug 2021 18:56:33 +0100,
> Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Fri, Aug 20, 2021 at 7:51 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 20 Aug 2021 18:35:45 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > >
> > > > > > I think it's wrong. The original offset was 0, and to align it to the
> > > > > > boundary we need to add just NET_IP_ALIGN, which is two.
> > > > > > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > > > > > reserve space to prepend an header, e.g. with tunnels.
> > > > >
> > > > > How about the other adjustments that Eric mentioned regarding the size
> > > > > of the buffer? Aren't they required?
> > > > >
> > > >
> > > > I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
> > > > be already broken when XDP is in use.
> > > > When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
> > > > byte, which would easily trigger an overflow if not accounted.
> > > > Did you try attaching a simple XDP program on a stock 5.13 kernel?
> > >
> > > Yes, as mentioned in [1], to which you replied...
> > >
> > >         M.
> > >
> > > [1] https://lore.kernel.org/r/87wnohqty1.wl-maz@kernel.org
> > >
> >
> > Great.
> > So I doubt that the adjustment is needed.
> > Does it work with all the frame size?
>
> I have no idea. Honestly, you are the one who should be able to answer
> these questions, given that you should have worked out how the buffer
> allocations work in this particular driver.
>
> This whole "let's try another random set of values until something
> sticks" is not how things ought to be done, and doesn't fill me with
> the utmost confidence that 5.14 (which apparently may well be cut in
> *two days*) is going to have a solid stmmac driver.
>
> I re-re-request that this patch gets reverted until you figure out
> what is wrong with the initial patch.
>
> Thanks,
>

I would have done it, but I'll not have the hardware until next week at least,
otherwise I'd have tried all these tests myself.

I'm sure that NET_SKB_PAD doesn't need to be there, if just removing
it fixes the problem, consider applying it and put a Fixes tag.
Marc Zyngier Aug. 20, 2021, 6:14 p.m. UTC | #35
On Fri, 20 Aug 2021 19:05:57 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Fri, Aug 20, 2021 at 7:56 PM Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Fri, Aug 20, 2021 at 7:51 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 20 Aug 2021 18:35:45 +0100,
> > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > Great.
> > So I doubt that the adjustment is needed.
> > Does it work with all the frame size?
> >
> 
> Last check, are you sure that the bpf program was loaded in the driver
> and not as generic XDP?
> You can force it as native with "xdpdrv":
> 
> ip link set eth xdpdrv object kernel_passall.o

No issue whatsoever with 5.13.

	M.
Marc Zyngier Aug. 20, 2021, 6:41 p.m. UTC | #36
On Fri, 20 Aug 2021 19:14:22 +0100,
Matteo Croce <mcroce@linux.microsoft.com> wrote:
> 
> On Fri, Aug 20, 2021 at 8:09 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 20 Aug 2021 18:56:33 +0100,
> > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > >
> > > On Fri, Aug 20, 2021 at 7:51 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 20 Aug 2021 18:35:45 +0100,
> > > > Matteo Croce <mcroce@linux.microsoft.com> wrote:
> > > > >
> > > > > > > I think it's wrong. The original offset was 0, and to align it to the
> > > > > > > boundary we need to add just NET_IP_ALIGN, which is two.
> > > > > > > NET_SKB_PAD is a much bigger value, (I think 64), which is used to
> > > > > > > reserve space to prepend an header, e.g. with tunnels.
> > > > > >
> > > > > > How about the other adjustments that Eric mentioned regarding the size
> > > > > > of the buffer? Aren't they required?
> > > > > >
> > > > >
> > > > > I guess that if stmmac_rx_buf1_len() needed such adjustment, it would
> > > > > be already broken when XDP is in use.
> > > > > When you use XDP, stmmac_rx_offset() adds a pretty big headroom of 256
> > > > > byte, which would easily trigger an overflow if not accounted.
> > > > > Did you try attaching a simple XDP program on a stock 5.13 kernel?
> > > >
> > > > Yes, as mentioned in [1], to which you replied...
> > > >
> > > >         M.
> > > >
> > > > [1] https://lore.kernel.org/r/87wnohqty1.wl-maz@kernel.org
> > > >
> > >
> > > Great.
> > > So I doubt that the adjustment is needed.
> > > Does it work with all the frame size?
> >
> > I have no idea. Honestly, you are the one who should be able to answer
> > these questions, given that you should have worked out how the buffer
> > allocations work in this particular driver.
> >
> > This whole "let's try another random set of values until something
> > sticks" is not how things ought to be done, and doesn't fill me with
> > the utmost confidence that 5.14 (which apparently may well be cut in
> > *two days*) is going to have a solid stmmac driver.
> >
> > I re-re-request that this patch gets reverted until you figure out
> > what is wrong with the initial patch.
> >
> > Thanks,
> >
> 
> I would have done it, but I'll not have the hardware until next week at least,
> otherwise I'd have tried all these tests myself.
> 
> I'm sure that NET_SKB_PAD doesn't need to be there, if just removing
> it fixes the problem, consider applying it and put a Fixes tag.

No, I don't think that's the right thing to do. A patch breaks a
driver, and the author of the patch is not in a position to fix it.
That's OK, these things happen, it's just bad timing.

But I don't understand this part of the kernel well enough to submit a
patch based on a sample of *one*, at the last minute, just because "it
works for me", and have the confidence that it doesn't break anything
else.

I have now posted a revert of the original patch[1]. I'll be happy to
work with you, with a less pressure, in order to have something that
works for everyone in the next cycle.

Thanks,

	M.

[1] https://lore.kernel.org/netdev/20210820183002.457226-1-maz@kernel.org
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index b6cd43eda7ac..04bdb3950d63 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -338,9 +338,9 @@  static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
 static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
 {
 	if (stmmac_xdp_is_enabled(priv))
-		return XDP_PACKET_HEADROOM;
+		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
 
-	return 0;
+	return NET_SKB_PAD + NET_IP_ALIGN;
 }
 
 void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);