Message ID | 02afb707-65de-5101-a79b-355929c4e00b@nelint.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote: > > Hello all, > > We're seeing alignment issues from the ethernet stack on an i.MX6UL board: > > root@mx6ul:~# cat /proc/cpu/alignment > User: 0 > System: 470010 (inet_gro_receive+0x104/0x278) > > This seems to be related to the ip header alignment, and there > was much discussion in mailing list threads [1] and [2]. > > In particular, Russell referred to a patch here, but I haven't been > able to find it: > https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002844.html > > Eric Dumazet also suggested a path toward fixing it, but I don't quite > understand the suggestion: > http://www.spinics.net/lists/netdev/msg213166.htm > > The immediate problem is addressed by just reading the id and frag_offs > fields in the iphdr structure as shown in this patch: > > commit 98810abc911b1286a7e4a2ebdfbad66f12fae19d > Author: Eric Nelson <eric@nelint.com> > Date: Fri Sep 23 08:26:03 2016 -0700 > > net: ipv4: af_inet: don't read multiple 16-bit iphdr fields as a 32-bit > value > > Change-Id: Idc7122c22c13ca078be31907d30ab1c3148ba807 > Signed-off-by: Eric Nelson <eric@nelint.com> > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 0cc98b1..c17ef6e 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1301,6 +1301,7 @@ static struct sk_buff **inet_gro_receive(struct > sk_buff **head, > unsigned int hlen; > unsigned int off; > unsigned int id; > + unsigned int frag; > int flush = 1; > int proto; > > @@ -1326,9 +1327,9 @@ static struct sk_buff **inet_gro_receive(struct > sk_buff **head, > if (unlikely(ip_fast_csum((u8 *)iph, 5))) > goto out_unlock; > > - id = ntohl(*(__be32 *)&iph->id); > - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF)); > - id >>= 16; > + id = ntohs(*(__be16 *)&iph->id); > + frag = ntohs(*(__be16 *)&iph->frag_off); > + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag & > ~IP_DF)); > > for (p = *head; p; p = p->next) { > struct iphdr *iph2; > This solves nothing, because a few lines after you'll have yet another unaligned access : ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) | ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) { So you might have one less problematic access, out of hundreds of them all over the places. Really the problem is that whole stack depends on the assumption that IP headers are aligned on arches that care (ie where NET_IP_ALIGN == 2) If your build does have NET_IP_ALIGN = 2 and you get a fault here, it might be because of a buggy driver. The other known case is some GRE encapsulations that break the assumption, and this is discussed somewhere else.
Thanks Eric, On 09/23/2016 09:54 AM, Eric Dumazet wrote: > On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote: >> >> Hello all, >> >> We're seeing alignment issues from the ethernet stack on an i.MX6UL board: >> >> <snip> >> >> - id = ntohl(*(__be32 *)&iph->id); >> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF)); >> - id >>= 16; >> + id = ntohs(*(__be16 *)&iph->id); >> + frag = ntohs(*(__be16 *)&iph->frag_off); >> + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag & >> ~IP_DF)); >> >> for (p = *head; p; p = p->next) { >> struct iphdr *iph2; >> > > This solves nothing, because a few lines after you'll have yet another > unaligned access : > Oddly, it does prevent the vast majority (90%+) of the alignment errors. I believe this is because the compiler is generating an ldm instruction when the ntohl() call is used, but I'm stumped about why these aren't generating faults: > ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) | > ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) { > > So you might have one less problematic access, out of hundreds of them > all over the places. > > Really the problem is that whole stack depends on the assumption that > IP headers are aligned on arches that care > (ie where NET_IP_ALIGN == 2) > > If your build does have NET_IP_ALIGN = 2 and you get a fault here, it > might be because of a buggy driver. > NET_IP_ALIGN is set to 2. > The other known case is some GRE encapsulations that break the > assumption, and this is discussed somewhere else. > I don't think that's the case. # CONFIG_IPV6_GRE is not set Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on a 4-byte boundary. Does the ldm instruction require 8-byte alignment? There's definitely a compiler-version dependency involved here, since using gcc 4.9 also reduced the number of faults dramatically.
Hi Eric, On 09/23/2016 10:19 AM, Eric Nelson wrote: > Thanks Eric, > > On 09/23/2016 09:54 AM, Eric Dumazet wrote: >> On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote: >>> >>> Hello all, >>> >>> We're seeing alignment issues from the ethernet stack on an i.MX6UL board: >>> >>> > > <snip> > >>> >>> - id = ntohl(*(__be32 *)&iph->id); >>> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF)); >>> - id >>= 16; >>> + id = ntohs(*(__be16 *)&iph->id); >>> + frag = ntohs(*(__be16 *)&iph->frag_off); >>> + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag & >>> ~IP_DF)); >>> >>> for (p = *head; p; p = p->next) { >>> struct iphdr *iph2; >>> >> >> This solves nothing, because a few lines after you'll have yet another >> unaligned access : >> > > Oddly, it does prevent the vast majority (90%+) of the alignment errors. > > I believe this is because the compiler is generating an ldm instruction > when the ntohl() call is used, but I'm stumped about why these aren't > generating faults: > >> ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) | >> ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) { >> >> So you might have one less problematic access, out of hundreds of them >> all over the places. >> >> Really the problem is that whole stack depends on the assumption that >> IP headers are aligned on arches that care >> (ie where NET_IP_ALIGN == 2) >> >> If your build does have NET_IP_ALIGN = 2 and you get a fault here, it >> might be because of a buggy driver. >> > > NET_IP_ALIGN is set to 2. > >> The other known case is some GRE encapsulations that break the >> assumption, and this is discussed somewhere else. >> > I don't think that's the case. > > # CONFIG_IPV6_GRE is not set > > Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on > a 4-byte boundary. > No. That was wrong. The iphdr is aligned at offsets of 14 from the ethernet frame, which itself is longword aligned. I mistakenly tested before the call to skb_gro_header_slow(), when iph was NULL. After putting a test in the right place, I'm seeing an address of 888a364e for the first un-aligned packet. Since the hardware requires longword alignment for its' DMA transfers, aligning the IP header will require a memcpy, right? You hinted at a solution in this post: http://www.spinics.net/lists/netdev/msg213166.html Are you aware of another driver doing this that could be used as a reference? Please advise, Eric
On Fri, Sep 23, 2016 at 10:19:50AM -0700, Eric Nelson wrote: > Oddly, it does prevent the vast majority (90%+) of the alignment errors. > > I believe this is because the compiler is generating an ldm instruction > when the ntohl() call is used, but I'm stumped about why these aren't > generating faults: ldm generates alignment faults when the address is not aligned to a 32-bit boundary. ldr on ARMv6+ does not. > I don't think that's the case. > > # CONFIG_IPV6_GRE is not set > > Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on > a 4-byte boundary. > > Does the ldm instruction require 8-byte alignment? > > There's definitely a compiler-version dependency involved here, > since using gcc 4.9 also reduced the number of faults dramatically. Well, I don't think it's that gcc related: User: 0 System: 312855 (ip6_route_input+0x6c/0x1e0) Skipped: 0 Half: 0 Word: 0 DWord: 2 Multi: 312853 c06d8998 <ip6_route_input>: c06d89ac: e1a04000 mov r4, r0 c06d89b0: e1d489b4 ldrh r8, [r4, #148] ; 0x94 c06d89b8: e594a0a0 ldr sl, [r4, #160] ; 0xa0 c06d89cc: e08ac008 add ip, sl, r8 c06d89d4: e28c3018 add r3, ip, #24 c06d89dc: e28c7008 add r7, ip, #8 c06d89e4: e893000f ldm r3, {r0, r1, r2, r3} c06d89ec: e24be044 sub lr, fp, #68 ; 0x44 c06d89f4: e24b5054 sub r5, fp, #84 ; 0x54 c06d89fc: e885000f stm r5, {r0, r1, r2, r3} c06d8a04: e897000f ldm r7, {r0, r1, r2, r3} c06d8a10: e88e000f stm lr, {r0, r1, r2, r3} This is from: struct flowi6 fl6 = { .flowi6_iif = l3mdev_fib_oif(skb->dev), .daddr = iph->daddr, .saddr = iph->saddr, .flowlabel = ip6_flowinfo(iph), .flowi6_mark = skb->mark, .flowi6_proto = iph->nexthdr, }; specifically, I suspect, the saddr and daddr initialisations. There's not much to get away from this - the FEC on iMX requires a 16-byte alignment for DMA addresses, which violates the network stack's requirement for the ethernet packet to be received with a two byte offset. So the IP header (and IPv6 headers) will always be mis-aligned in memory, which leads to a huge number of alignment faults. There's not much getting away from this - the problem is not in the networking stack, but the FEC hardware/network driver. See: struct fec_enet_private *fep = netdev_priv(ndev); int off; off = ((unsigned long)skb->data) & fep->rx_align; if (off) skb_reserve(skb, fep->rx_align + 1 - off); bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(&fep->pdev->dev, skb->data, FEC_ENET_RX_FRSIZE - fep->rx_align, DMA_FROM_DEVICE)); in fec_enet_new_rxbdp().
> Since the hardware requires longword alignment for its' DMA transfers, > aligning the IP header will require a memcpy, right? The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts two padding bits on transmit. ENETx_RACC has the same. What about your hardware? Andrew
Thanks Russell, On 09/23/2016 10:37 AM, Russell King - ARM Linux wrote: > On Fri, Sep 23, 2016 at 10:19:50AM -0700, Eric Nelson wrote: >> Oddly, it does prevent the vast majority (90%+) of the alignment errors. >> >> I believe this is because the compiler is generating an ldm instruction >> when the ntohl() call is used, but I'm stumped about why these aren't >> generating faults: After looking at it, I have to think that the code that reads iph->id is just hit more frequently than the other code in this routine. > > ldm generates alignment faults when the address is not aligned to a > 32-bit boundary. ldr on ARMv6+ does not. > >> I don't think that's the case. >> >> # CONFIG_IPV6_GRE is not set >> >> Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on >> a 4-byte boundary. >> >> Does the ldm instruction require 8-byte alignment? >> >> There's definitely a compiler-version dependency involved here, >> since using gcc 4.9 also reduced the number of faults dramatically. > > Well, I don't think it's that gcc related: > I can only say that I noticed a dramatic drop in the number of faults, and didn't see the inet_gro_receive reported in /proc/cpu/alignment with gcc 4.9 when trying to identify the issue. > User: 0 > System: 312855 (ip6_route_input+0x6c/0x1e0) > Skipped: 0 > Half: 0 > Word: 0 > DWord: 2 > Multi: 312853 > > c06d8998 <ip6_route_input>: > c06d89ac: e1a04000 mov r4, r0 > c06d89b0: e1d489b4 ldrh r8, [r4, #148] ; 0x94 > c06d89b8: e594a0a0 ldr sl, [r4, #160] ; 0xa0 > c06d89cc: e08ac008 add ip, sl, r8 > c06d89d4: e28c3018 add r3, ip, #24 > c06d89dc: e28c7008 add r7, ip, #8 > c06d89e4: e893000f ldm r3, {r0, r1, r2, r3} > c06d89ec: e24be044 sub lr, fp, #68 ; 0x44 > c06d89f4: e24b5054 sub r5, fp, #84 ; 0x54 > c06d89fc: e885000f stm r5, {r0, r1, r2, r3} > c06d8a04: e897000f ldm r7, {r0, r1, r2, r3} > c06d8a10: e88e000f stm lr, {r0, r1, r2, r3} > > This is from: > > struct flowi6 fl6 = { > .flowi6_iif = l3mdev_fib_oif(skb->dev), > .daddr = iph->daddr, > .saddr = iph->saddr, > .flowlabel = ip6_flowinfo(iph), > .flowi6_mark = skb->mark, > .flowi6_proto = iph->nexthdr, > }; > > specifically, I suspect, the saddr and daddr initialisations. > > There's not much to get away from this - the FEC on iMX requires a > 16-byte alignment for DMA addresses, which violates the network > stack's requirement for the ethernet packet to be received with a > two byte offset. So the IP header (and IPv6 headers) will always > be mis-aligned in memory, which leads to a huge number of alignment > faults. > > There's not much getting away from this - the problem is not in the > networking stack, but the FEC hardware/network driver. See: > > struct fec_enet_private *fep = netdev_priv(ndev); > int off; > > off = ((unsigned long)skb->data) & fep->rx_align; > if (off) > skb_reserve(skb, fep->rx_align + 1 - off); > > bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(&fep->pdev->dev, skb->data, FEC_ENET_RX_FRSIZE - fep->rx_align, DMA_FROM_DEVICE)); > > in fec_enet_new_rxbdp(). > So the question is: should we just live with this and acknowledge a performance penalty of bad alignment or do something about it? I'm not sure the cost (or the details) of Eric's proposed fix of allocating and copying the header to another skb. The original report was of bad network performance, but I haven't been able to see an impact doing some simple tests using wget and SSH.
On Fri, Sep 23, 2016 at 08:13:01PM +0200, Andrew Lunn wrote: > > Since the hardware requires longword alignment for its' DMA transfers, > > aligning the IP header will require a memcpy, right? > > The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts > two padding bits on transmit. ENETx_RACC has the same. > > What about your hardware? The iMX6 FEC also has that ability - as part of my FEC patch stack from ages ago, I implemented support for it. "net:fec: implement almost zero-copy receive path" in my public fec-testing branch. That patch stack is sadly now totally dead and I've no interest in reviving it myself. There was some interest from others in taking my patch stack over, but that went quiet.
Thanks Andrew. On 09/23/2016 11:13 AM, Andrew Lunn wrote: >> Since the hardware requires longword alignment for its' DMA transfers, >> aligning the IP header will require a memcpy, right? > > The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts > two padding bits on transmit. ENETx_RACC has the same. > > What about your hardware? > You got me with the RTFM! From the i.MX6DQ reference manual, bit 7 of ENET_RACC says this: "RX FIFO Shift-16 When this field is set, the actual frame data starts at bit 16 of the first word read from the RX FIFO aligning the Ethernet payload on a 32-bit boundary." Same for the i.MX6UL. I'm not sure what it will take to use this, but it seems to be exactly what we're looking for.
On Fri, Sep 23, 2016 at 11:26:18AM -0700, Eric Nelson wrote: > So the question is: should we just live with this and acknowledge a > performance penalty of bad alignment or do something about it? Well, I've no interest in trying to do anything with the FEC driver anymore, as I'll just generate another big patch stack which won't make it into the kernel in a timely fashion - my last attempt at improving the FEC driver was dogged with conflicting changes and I gave up with it in the end. I ended up spending a full cycle rebasing, re-testing, and re-evaluating their performance only to find that I'd missed the merge window again, and other conflicting changes got merged which meant that I had to start from the beginning again. > I'm not sure the cost (or the details) of Eric's proposed fix of allocating > and copying the header to another skb. I had a quick look at this, and although Eric's idea may be a good idea, it doesn't contain enough details for me to be able to implement it - eg, I've no idea how to attach the 128-byte skb to the beginning of a previously allocated skb containing the rest of the packet. I've just looked through linux/skbuff.h and I can't see anything that takes two sk_buff's that would do the job. However, I don't think that's necessary in this case, because the iMX6 FEC supports the 16-bit alignment of the packet, if only it was enabled in hardware and the driver caters for it.
Thanks Russell, On 09/23/2016 11:30 AM, Russell King - ARM Linux wrote: > On Fri, Sep 23, 2016 at 08:13:01PM +0200, Andrew Lunn wrote: >>> Since the hardware requires longword alignment for its' DMA transfers, >>> aligning the IP header will require a memcpy, right? >> >> The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts >> two padding bits on transmit. ENETx_RACC has the same. >> >> What about your hardware? > > The iMX6 FEC also has that ability - as part of my FEC patch stack from > ages ago, I implemented support for it. > > "net:fec: implement almost zero-copy receive path" > > in my public fec-testing branch. > > That patch stack is sadly now totally dead and I've no interest in > reviving it myself. There was some interest from others in taking my > patch stack over, but that went quiet. > I'll take a look and hopefully revive at least part of the patch set.
Thanks Russell, On 09/23/2016 11:37 AM, Russell King - ARM Linux wrote: > On Fri, Sep 23, 2016 at 11:26:18AM -0700, Eric Nelson wrote: >> So the question is: should we just live with this and acknowledge a >> performance penalty of bad alignment or do something about it? > > Well, I've no interest in trying to do anything with the FEC driver > anymore, as I'll just generate another big patch stack which won't > make it into the kernel in a timely fashion - my last attempt at > improving the FEC driver was dogged with conflicting changes and I > gave up with it in the end. I ended up spending a full cycle > rebasing, re-testing, and re-evaluating their performance only to find > that I'd missed the merge window again, and other conflicting changes > got merged which meant that I had to start from the beginning again. > That's sad. I recall reading your notes on that patch series and it was a model for how to structure and document a patch set. I hadn't noticed that you abandoned it and it's frustrating that the merge process prevented your efforts from being used. I'm also disheartened to hear your frustration about getting things pushed up-stream and the entire Linux community should take note. >> I'm not sure the cost (or the details) of Eric's proposed fix of allocating >> and copying the header to another skb. > > I had a quick look at this, and although Eric's idea may be a good > idea, it doesn't contain enough details for me to be able to > implement it - eg, I've no idea how to attach the 128-byte skb to the > beginning of a previously allocated skb containing the rest of the > packet. I've just looked through linux/skbuff.h and I can't see > anything that takes two sk_buff's that would do the job. > > However, I don't think that's necessary in this case, because the > iMX6 FEC supports the 16-bit alignment of the packet, if only it was > enabled in hardware and the driver caters for it. > Right. If the hardware supports placing things at a suitable address, that's the right approach. I'll try to review your earlier patch set and at least find a way to address the alignment issues. I'm a bit booked until LinuxCon but will try to get something out soon. Regards, Eric
Hello Russell, On Fri, Sep 23, 2016 at 07:37:25PM +0100, Russell King - ARM Linux wrote: > On Fri, Sep 23, 2016 at 11:26:18AM -0700, Eric Nelson wrote: > > So the question is: should we just live with this and acknowledge a > > performance penalty of bad alignment or do something about it? > > Well, I've no interest in trying to do anything with the FEC driver > anymore, as I'll just generate another big patch stack which won't > make it into the kernel in a timely fashion - my last attempt at > improving the FEC driver was dogged with conflicting changes and I > gave up with it in the end. I ended up spending a full cycle > rebasing, re-testing, and re-evaluating their performance only to find > that I'd missed the merge window again, and other conflicting changes > got merged which meant that I had to start from the beginning again. I'm not included in the set of people who are responsible to review and merge fec patches, but I'd be surprised if you couldn't get an exclusive lock for that driver. Something like: After 4.X-rc1 the fec isn't touched any more until you got your series ready for the 4.X+1 merge window. Of course some fixes might have to go in, but these hopefully won't disturb much. Best regards Uwe
From: Eric Nelson <eric@nelint.com> Date: Fri, 23 Sep 2016 10:33:29 -0700 > Since the hardware requires longword alignment for its' DMA transfers, > aligning the IP header will require a memcpy, right? I wish hardware designers didn't do this. There is no conflict between DMA alignment and properly offseting the packet data by two bytes. All hardware designers have to do is allow 2 padding bytes to be emitted by the chip before the actual packet data. Then the longword or whatever DMA transfer alignment is met whilst still giving the necessary flexibility for where the packet data lands.
From: Eric Nelson <eric@nelint.com> Date: Fri, 23 Sep 2016 11:35:17 -0700 > From the i.MX6DQ reference manual, bit 7 of ENET_RACC says this: > > "RX FIFO Shift-16 > > When this field is set, the actual frame data starts at bit 16 of the first > word read from the RX FIFO aligning the Ethernet payload on a > 32-bit boundary." > > Same for the i.MX6UL. > > I'm not sure what it will take to use this, but it seems to be exactly > what we're looking for. +1
From: David Miller <davem@davemloft.net> Sent: Saturday, September 24, 2016 10:46 AM > To: eric@nelint.com > Cc: andrew@lunn.ch; edumazet@google.com; Andy Duan > <fugang.duan@nxp.com>; otavio@ossystems.com.br; > netdev@vger.kernel.org; troy.kisky@boundarydevices.com; > rmk+kernel@arm.linux.org.uk; cjb.sw.nospam@gmail.com; linux-arm- > kernel@lists.infradead.org > Subject: Re: Alignment issues with freescale FEC driver > > From: Eric Nelson <eric@nelint.com> > Date: Fri, 23 Sep 2016 11:35:17 -0700 > > > From the i.MX6DQ reference manual, bit 7 of ENET_RACC says this: > > > > "RX FIFO Shift-16 > > > > When this field is set, the actual frame data starts at bit 16 of the > > first word read from the RX FIFO aligning the Ethernet payload on a > > 32-bit boundary." > > > > Same for the i.MX6UL. > > > > I'm not sure what it will take to use this, but it seems to be exactly > > what we're looking for. > > +1 RACC[SHIFT16] just instructs the MAC to write two additional bytes in front of each frame received into the RX FIFO to align the Ethernet payload on a 32-bit boundary. Eric's patch "net: fec: support RRACC_SHIFT16 to align IP header" works fine. For the alignment issues, that is introduced by commit 1b7bde6d6 and c259c132a in net-next tree. Before these commits, no alignment issue. How to fix the issue: Solution1: to enable HW RRACC_SHIFT16 feature (test pass): Eric's patch "net: fec: support RRACC_SHIFT16 to align IP header". Solution2: include the correct prefetch() header (test pass): --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -59,7 +59,7 @@ #include <linux/pinctrl/consumer.h> #include <linux/pm_runtime.h> #include <linux/busfreq-imx.h> -#include <linux/prefetch.h> +#include <asm/processor.h> Solution3: use __netdev_alloc_skb_ip_align() instead of netdev_alloc_skb(). Or: still use the previous method before commit 1b7bde6d6: skb = netdev_alloc_skb(ndev, pkt_len - 4 + NET_IP_ALIGN); skb_reserve(skb, NET_IP_ALIGN); Comparing these solutions: From sw effort and performance, I think these are the similar. Enable RRACC_SHIFT16 doesn't take extra advantage. Correct my if I am wrong. Thanks. Regards, Andy
Hi David, On 09/23/2016 07:43 PM, David Miller wrote: > From: Eric Nelson <eric@nelint.com> > Date: Fri, 23 Sep 2016 10:33:29 -0700 > >> Since the hardware requires longword alignment for its' DMA transfers, >> aligning the IP header will require a memcpy, right? > > I wish hardware designers didn't do this. > > There is no conflict between DMA alignment and properly offseting > the packet data by two bytes. > > All hardware designers have to do is allow 2 padding bytes to be > emitted by the chip before the actual packet data. > Andrew Lunn pointed out that the hardware does support this, and I just pushed a patch for the vendor kernel to the meta-freescale mailing list: https://lists.yoctoproject.org/pipermail/meta-freescale/2016-September/019228.html > Then the longword or whatever DMA transfer alignment is met > whilst still giving the necessary flexibility for where the > packet data lands. > Right. A relatively small change fixes things right up. Many thanks to Andrew for pointing this out and Russell for providing the basis for my patch. I'll re-work this for the up-stream kernel when I get out from under a couple of unrelated things.
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 0cc98b1..c17ef6e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1301,6 +1301,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, unsigned int hlen; unsigned int off; unsigned int id; + unsigned int frag; int flush = 1; int proto;
Hello all, We're seeing alignment issues from the ethernet stack on an i.MX6UL board: root@mx6ul:~# cat /proc/cpu/alignment User: 0 System: 470010 (inet_gro_receive+0x104/0x278) This seems to be related to the ip header alignment, and there was much discussion in mailing list threads [1] and [2]. In particular, Russell referred to a patch here, but I haven't been able to find it: https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002844.html Eric Dumazet also suggested a path toward fixing it, but I don't quite understand the suggestion: http://www.spinics.net/lists/netdev/msg213166.htm The immediate problem is addressed by just reading the id and frag_offs fields in the iphdr structure as shown in this patch: commit 98810abc911b1286a7e4a2ebdfbad66f12fae19d Author: Eric Nelson <eric@nelint.com> Date: Fri Sep 23 08:26:03 2016 -0700 net: ipv4: af_inet: don't read multiple 16-bit iphdr fields as a 32-bit value Change-Id: Idc7122c22c13ca078be31907d30ab1c3148ba807 Signed-off-by: Eric Nelson <eric@nelint.com> @@ -1326,9 +1327,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, if (unlikely(ip_fast_csum((u8 *)iph, 5))) goto out_unlock; - id = ntohl(*(__be32 *)&iph->id); - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF)); - id >>= 16; + id = ntohs(*(__be16 *)&iph->id); + frag = ntohs(*(__be16 *)&iph->frag_off); + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag & ~IP_DF)); for (p = *head; p; p = p->next) { struct iphdr *iph2; The reading of both fields in one "ntohl" seems obfuscated at best and certainly worthy of a comment about the optimization but I understand from other notes that the fundamental problem is that the IP header should be aligned on a 4-byte boundary and that's not possible without a memcpy. I'd like to hear suggestions about how we can address this. Regards, Eric [1] - http://www.spinics.net/lists/netdev/msg213114.html [2] - https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002828.html