diff mbox

Alignment issues with freescale FEC driver

Message ID 02afb707-65de-5101-a79b-355929c4e00b@nelint.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Nelson Sept. 23, 2016, 4:43 p.m. UTC
Hello all,

We're seeing alignment issues from the ethernet stack on an i.MX6UL board:

root@mx6ul:~# cat /proc/cpu/alignment
User: 0
System: 470010 (inet_gro_receive+0x104/0x278)

This seems to be related to the ip header alignment, and there
was much discussion in mailing list threads [1] and [2].

In particular, Russell referred to a patch here, but I haven't been
able to find it:
https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002844.html

Eric Dumazet also suggested a path toward fixing it, but I don't quite
understand the suggestion:
http://www.spinics.net/lists/netdev/msg213166.htm

The immediate problem is addressed by just reading the id and frag_offs
fields in the iphdr structure as shown in this patch:

commit 98810abc911b1286a7e4a2ebdfbad66f12fae19d
Author: Eric Nelson <eric@nelint.com>
Date: Fri Sep 23 08:26:03 2016 -0700

net: ipv4: af_inet: don't read multiple 16-bit iphdr fields as a 32-bit
value

Change-Id: Idc7122c22c13ca078be31907d30ab1c3148ba807
Signed-off-by: Eric Nelson <eric@nelint.com>


@@ -1326,9 +1327,9 @@ static struct sk_buff **inet_gro_receive(struct
sk_buff **head,
if (unlikely(ip_fast_csum((u8 *)iph, 5)))
goto out_unlock;

- id = ntohl(*(__be32 *)&iph->id);
- flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
- id >>= 16;
+ id = ntohs(*(__be16 *)&iph->id);
+ frag = ntohs(*(__be16 *)&iph->frag_off);
+ flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag &
~IP_DF));

for (p = *head; p; p = p->next) {
struct iphdr *iph2;


The reading of both fields in one "ntohl" seems obfuscated at best and
certainly worthy of a comment about the optimization but I understand
from other notes that the fundamental problem is that the IP header should
be aligned on a 4-byte boundary and that's not possible without a memcpy.

I'd like to hear suggestions about how we can address this.

Regards,


Eric

[1] - http://www.spinics.net/lists/netdev/msg213114.html
[2] -
https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002828.html

Comments

Eric Dumazet Sept. 23, 2016, 4:54 p.m. UTC | #1
On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote:
>
> Hello all,
>
> We're seeing alignment issues from the ethernet stack on an i.MX6UL board:
>
> root@mx6ul:~# cat /proc/cpu/alignment
> User: 0
> System: 470010 (inet_gro_receive+0x104/0x278)
>
> This seems to be related to the ip header alignment, and there
> was much discussion in mailing list threads [1] and [2].
>
> In particular, Russell referred to a patch here, but I haven't been
> able to find it:
> https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002844.html
>
> Eric Dumazet also suggested a path toward fixing it, but I don't quite
> understand the suggestion:
> http://www.spinics.net/lists/netdev/msg213166.htm
>
> The immediate problem is addressed by just reading the id and frag_offs
> fields in the iphdr structure as shown in this patch:
>
> commit 98810abc911b1286a7e4a2ebdfbad66f12fae19d
> Author: Eric Nelson <eric@nelint.com>
> Date: Fri Sep 23 08:26:03 2016 -0700
>
> net: ipv4: af_inet: don't read multiple 16-bit iphdr fields as a 32-bit
> value
>
> Change-Id: Idc7122c22c13ca078be31907d30ab1c3148ba807
> Signed-off-by: Eric Nelson <eric@nelint.com>
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 0cc98b1..c17ef6e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1301,6 +1301,7 @@ static struct sk_buff **inet_gro_receive(struct
> sk_buff **head,
> unsigned int hlen;
> unsigned int off;
> unsigned int id;
> + unsigned int frag;
> int flush = 1;
> int proto;
>
> @@ -1326,9 +1327,9 @@ static struct sk_buff **inet_gro_receive(struct
> sk_buff **head,
> if (unlikely(ip_fast_csum((u8 *)iph, 5)))
> goto out_unlock;
>
> - id = ntohl(*(__be32 *)&iph->id);
> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
> - id >>= 16;
> + id = ntohs(*(__be16 *)&iph->id);
> + frag = ntohs(*(__be16 *)&iph->frag_off);
> + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag &
> ~IP_DF));
>
> for (p = *head; p; p = p->next) {
> struct iphdr *iph2;
>

This solves nothing, because a few lines after you'll have yet another
unaligned access :


((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {

So you might have one less problematic access, out of hundreds of them
all over the places.

Really the problem is that whole stack depends on the assumption that
IP headers are aligned on arches that care
(ie where NET_IP_ALIGN == 2)

If your build does have NET_IP_ALIGN = 2 and you get a fault here, it
might be because of a buggy driver.

The other known case is some GRE encapsulations that break the
assumption, and this is discussed somewhere else.
Eric Nelson Sept. 23, 2016, 5:19 p.m. UTC | #2
Thanks Eric,

On 09/23/2016 09:54 AM, Eric Dumazet wrote:
> On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote:
>>
>> Hello all,
>>
>> We're seeing alignment issues from the ethernet stack on an i.MX6UL board:
>>
>>

<snip>

>>
>> - id = ntohl(*(__be32 *)&iph->id);
>> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
>> - id >>= 16;
>> + id = ntohs(*(__be16 *)&iph->id);
>> + frag = ntohs(*(__be16 *)&iph->frag_off);
>> + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag &
>> ~IP_DF));
>>
>> for (p = *head; p; p = p->next) {
>> struct iphdr *iph2;
>>
> 
> This solves nothing, because a few lines after you'll have yet another
> unaligned access :
> 

Oddly, it does prevent the vast majority (90%+) of the alignment errors.

I believe this is because the compiler is generating an ldm instruction
when the ntohl() call is used, but I'm stumped about why these aren't
generating faults:

> ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
> ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {
> 
> So you might have one less problematic access, out of hundreds of them
> all over the places.
> 
> Really the problem is that whole stack depends on the assumption that
> IP headers are aligned on arches that care
> (ie where NET_IP_ALIGN == 2)
> 
> If your build does have NET_IP_ALIGN = 2 and you get a fault here, it
> might be because of a buggy driver.
> 

NET_IP_ALIGN is set to 2.

> The other known case is some GRE encapsulations that break the
> assumption, and this is discussed somewhere else.
> 
I don't think that's the case.

# CONFIG_IPV6_GRE is not set

Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on
a 4-byte boundary.

Does the ldm instruction require 8-byte alignment?

There's definitely a compiler-version dependency involved here,
since using gcc 4.9 also reduced the number of faults dramatically.
Eric Nelson Sept. 23, 2016, 5:33 p.m. UTC | #3
Hi Eric,

On 09/23/2016 10:19 AM, Eric Nelson wrote:
> Thanks Eric,
> 
> On 09/23/2016 09:54 AM, Eric Dumazet wrote:
>> On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote:
>>>
>>> Hello all,
>>>
>>> We're seeing alignment issues from the ethernet stack on an i.MX6UL board:
>>>
>>>
> 
> <snip>
> 
>>>
>>> - id = ntohl(*(__be32 *)&iph->id);
>>> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
>>> - id >>= 16;
>>> + id = ntohs(*(__be16 *)&iph->id);
>>> + frag = ntohs(*(__be16 *)&iph->frag_off);
>>> + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag &
>>> ~IP_DF));
>>>
>>> for (p = *head; p; p = p->next) {
>>> struct iphdr *iph2;
>>>
>>
>> This solves nothing, because a few lines after you'll have yet another
>> unaligned access :
>>
> 
> Oddly, it does prevent the vast majority (90%+) of the alignment errors.
> 
> I believe this is because the compiler is generating an ldm instruction
> when the ntohl() call is used, but I'm stumped about why these aren't
> generating faults:
> 
>> ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
>> ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {
>>
>> So you might have one less problematic access, out of hundreds of them
>> all over the places.
>>
>> Really the problem is that whole stack depends on the assumption that
>> IP headers are aligned on arches that care
>> (ie where NET_IP_ALIGN == 2)
>>
>> If your build does have NET_IP_ALIGN = 2 and you get a fault here, it
>> might be because of a buggy driver.
>>
> 
> NET_IP_ALIGN is set to 2.
> 
>> The other known case is some GRE encapsulations that break the
>> assumption, and this is discussed somewhere else.
>>
> I don't think that's the case.
> 
> # CONFIG_IPV6_GRE is not set
> 
> Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on
> a 4-byte boundary.
> 

No. That was wrong.

The iphdr is aligned at offsets of 14 from the ethernet frame, which itself
is longword aligned.

I mistakenly tested before the call to skb_gro_header_slow(), when
iph was NULL.

After putting a test in the right place, I'm seeing an address of
888a364e for the first un-aligned packet.

Since the hardware requires longword alignment for its' DMA transfers,
aligning the IP header will require a memcpy, right?

You hinted at a solution in this post:

http://www.spinics.net/lists/netdev/msg213166.html

Are you aware of another driver doing this that could be used as
a reference?

Please advise,


Eric
Russell King (Oracle) Sept. 23, 2016, 5:37 p.m. UTC | #4
On Fri, Sep 23, 2016 at 10:19:50AM -0700, Eric Nelson wrote:
> Oddly, it does prevent the vast majority (90%+) of the alignment errors.
> 
> I believe this is because the compiler is generating an ldm instruction
> when the ntohl() call is used, but I'm stumped about why these aren't
> generating faults:

ldm generates alignment faults when the address is not aligned to a
32-bit boundary.  ldr on ARMv6+ does not.

> I don't think that's the case.
> 
> # CONFIG_IPV6_GRE is not set
> 
> Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on
> a 4-byte boundary.
> 
> Does the ldm instruction require 8-byte alignment?
> 
> There's definitely a compiler-version dependency involved here,
> since using gcc 4.9 also reduced the number of faults dramatically.

Well, I don't think it's that gcc related:

User:           0
System:         312855 (ip6_route_input+0x6c/0x1e0)
Skipped:        0
Half:           0
Word:           0
DWord:          2
Multi:          312853

c06d8998 <ip6_route_input>:
c06d89ac:       e1a04000        mov     r4, r0
c06d89b0:       e1d489b4        ldrh    r8, [r4, #148]  ; 0x94
c06d89b8:       e594a0a0        ldr     sl, [r4, #160]  ; 0xa0
c06d89cc:       e08ac008        add     ip, sl, r8
c06d89d4:       e28c3018        add     r3, ip, #24
c06d89dc:       e28c7008        add     r7, ip, #8
c06d89e4:       e893000f        ldm     r3, {r0, r1, r2, r3}
c06d89ec:       e24be044        sub     lr, fp, #68     ; 0x44
c06d89f4:       e24b5054        sub     r5, fp, #84     ; 0x54
c06d89fc:       e885000f        stm     r5, {r0, r1, r2, r3}
c06d8a04:       e897000f        ldm     r7, {r0, r1, r2, r3}
c06d8a10:       e88e000f        stm     lr, {r0, r1, r2, r3}

This is from:

        struct flowi6 fl6 = {
                .flowi6_iif = l3mdev_fib_oif(skb->dev),
                .daddr = iph->daddr,
                .saddr = iph->saddr,
                .flowlabel = ip6_flowinfo(iph),
                .flowi6_mark = skb->mark,
                .flowi6_proto = iph->nexthdr,
        };

specifically, I suspect, the saddr and daddr initialisations.

There's not much to get away from this - the FEC on iMX requires a
16-byte alignment for DMA addresses, which violates the network
stack's requirement for the ethernet packet to be received with a
two byte offset.  So the IP header (and IPv6 headers) will always
be mis-aligned in memory, which leads to a huge number of alignment
faults.

There's not much getting away from this - the problem is not in the
networking stack, but the FEC hardware/network driver.  See:

        struct  fec_enet_private *fep = netdev_priv(ndev);
        int off;

        off = ((unsigned long)skb->data) & fep->rx_align;
        if (off)
                skb_reserve(skb, fep->rx_align + 1 - off);

        bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(&fep->pdev->dev, skb->data, FEC_ENET_RX_FRSIZE - fep->rx_align, DMA_FROM_DEVICE));

in fec_enet_new_rxbdp().
Andrew Lunn Sept. 23, 2016, 6:13 p.m. UTC | #5
> Since the hardware requires longword alignment for its' DMA transfers,
> aligning the IP header will require a memcpy, right?

The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts
two padding bits on transmit. ENETx_RACC has the same.

What about your hardware?

     Andrew
Eric Nelson Sept. 23, 2016, 6:26 p.m. UTC | #6
Thanks Russell,

On 09/23/2016 10:37 AM, Russell King - ARM Linux wrote:
> On Fri, Sep 23, 2016 at 10:19:50AM -0700, Eric Nelson wrote:
>> Oddly, it does prevent the vast majority (90%+) of the alignment errors.
>>
>> I believe this is because the compiler is generating an ldm instruction
>> when the ntohl() call is used, but I'm stumped about why these aren't
>> generating faults:

After looking at it, I have to think that the code that reads iph->id
is just hit more frequently than the other code in this routine.

> 
> ldm generates alignment faults when the address is not aligned to a
> 32-bit boundary.  ldr on ARMv6+ does not.
> 
>> I don't think that's the case.
>>
>> # CONFIG_IPV6_GRE is not set
>>
>> Hmm... Instrumenting the kernel, it seems that iphdr **is** aligned on
>> a 4-byte boundary.
>>
>> Does the ldm instruction require 8-byte alignment?
>>
>> There's definitely a compiler-version dependency involved here,
>> since using gcc 4.9 also reduced the number of faults dramatically.
> 
> Well, I don't think it's that gcc related:
> 

I can only say that I noticed a dramatic drop in the number of faults, and
didn't see the inet_gro_receive reported in /proc/cpu/alignment with gcc 4.9
when trying to identify the issue.

> User:           0
> System:         312855 (ip6_route_input+0x6c/0x1e0)
> Skipped:        0
> Half:           0
> Word:           0
> DWord:          2
> Multi:          312853
> 
> c06d8998 <ip6_route_input>:
> c06d89ac:       e1a04000        mov     r4, r0
> c06d89b0:       e1d489b4        ldrh    r8, [r4, #148]  ; 0x94
> c06d89b8:       e594a0a0        ldr     sl, [r4, #160]  ; 0xa0
> c06d89cc:       e08ac008        add     ip, sl, r8
> c06d89d4:       e28c3018        add     r3, ip, #24
> c06d89dc:       e28c7008        add     r7, ip, #8
> c06d89e4:       e893000f        ldm     r3, {r0, r1, r2, r3}
> c06d89ec:       e24be044        sub     lr, fp, #68     ; 0x44
> c06d89f4:       e24b5054        sub     r5, fp, #84     ; 0x54
> c06d89fc:       e885000f        stm     r5, {r0, r1, r2, r3}
> c06d8a04:       e897000f        ldm     r7, {r0, r1, r2, r3}
> c06d8a10:       e88e000f        stm     lr, {r0, r1, r2, r3}
> 
> This is from:
> 
>         struct flowi6 fl6 = {
>                 .flowi6_iif = l3mdev_fib_oif(skb->dev),
>                 .daddr = iph->daddr,
>                 .saddr = iph->saddr,
>                 .flowlabel = ip6_flowinfo(iph),
>                 .flowi6_mark = skb->mark,
>                 .flowi6_proto = iph->nexthdr,
>         };
> 
> specifically, I suspect, the saddr and daddr initialisations.
> 
> There's not much to get away from this - the FEC on iMX requires a
> 16-byte alignment for DMA addresses, which violates the network
> stack's requirement for the ethernet packet to be received with a
> two byte offset.  So the IP header (and IPv6 headers) will always
> be mis-aligned in memory, which leads to a huge number of alignment
> faults.
> 
> There's not much getting away from this - the problem is not in the
> networking stack, but the FEC hardware/network driver.  See:
> 
>         struct  fec_enet_private *fep = netdev_priv(ndev);
>         int off;
> 
>         off = ((unsigned long)skb->data) & fep->rx_align;
>         if (off)
>                 skb_reserve(skb, fep->rx_align + 1 - off);
> 
>         bdp->cbd_bufaddr = cpu_to_fec32(dma_map_single(&fep->pdev->dev, skb->data, FEC_ENET_RX_FRSIZE - fep->rx_align, DMA_FROM_DEVICE));
> 
> in fec_enet_new_rxbdp().
> 

So the question is: should we just live with this and acknowledge a
performance penalty of bad alignment or do something about it?

I'm not sure the cost (or the details) of Eric's proposed fix of allocating
and copying the header to another skb.

The original report was of bad network performance, but I haven't
been able to see an impact doing some simple tests using wget
and SSH.
Russell King (Oracle) Sept. 23, 2016, 6:30 p.m. UTC | #7
On Fri, Sep 23, 2016 at 08:13:01PM +0200, Andrew Lunn wrote:
> > Since the hardware requires longword alignment for its' DMA transfers,
> > aligning the IP header will require a memcpy, right?
> 
> The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts
> two padding bits on transmit. ENETx_RACC has the same.
> 
> What about your hardware?

The iMX6 FEC also has that ability - as part of my FEC patch stack from
ages ago, I implemented support for it.

  "net:fec: implement almost zero-copy receive path"

in my public fec-testing branch.

That patch stack is sadly now totally dead and I've no interest in
reviving it myself.  There was some interest from others in taking my
patch stack over, but that went quiet.
Eric Nelson Sept. 23, 2016, 6:35 p.m. UTC | #8
Thanks Andrew.

On 09/23/2016 11:13 AM, Andrew Lunn wrote:
>> Since the hardware requires longword alignment for its' DMA transfers,
>> aligning the IP header will require a memcpy, right?
> 
> The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts
> two padding bits on transmit. ENETx_RACC has the same.
> 
> What about your hardware?
> 

You got me with the RTFM!

From the i.MX6DQ reference manual, bit 7 of ENET_RACC says this:

"RX FIFO Shift-16

When this field is set, the actual frame data starts at bit 16 of the first
word read from the RX FIFO aligning the Ethernet payload on a
32-bit boundary."

Same for the i.MX6UL.

I'm not sure what it will take to use this, but it seems to be exactly
what we're looking for.
Russell King (Oracle) Sept. 23, 2016, 6:37 p.m. UTC | #9
On Fri, Sep 23, 2016 at 11:26:18AM -0700, Eric Nelson wrote:
> So the question is: should we just live with this and acknowledge a
> performance penalty of bad alignment or do something about it?

Well, I've no interest in trying to do anything with the FEC driver
anymore, as I'll just generate another big patch stack which won't
make it into the kernel in a timely fashion - my last attempt at
improving the FEC driver was dogged with conflicting changes and I
gave up with it in the end.  I ended up spending a full cycle
rebasing, re-testing, and re-evaluating their performance only to find
that I'd missed the merge window again, and other conflicting changes
got merged which meant that I had to start from the beginning again.

> I'm not sure the cost (or the details) of Eric's proposed fix of allocating
> and copying the header to another skb.

I had a quick look at this, and although Eric's idea may be a good
idea, it doesn't contain enough details for me to be able to
implement it - eg, I've no idea how to attach the 128-byte skb to the
beginning of a previously allocated skb containing the rest of the
packet.  I've just looked through linux/skbuff.h and I can't see
anything that takes two sk_buff's that would do the job.

However, I don't think that's necessary in this case, because the
iMX6 FEC supports the 16-bit alignment of the packet, if only it was
enabled in hardware and the driver caters for it.
Eric Nelson Sept. 23, 2016, 6:39 p.m. UTC | #10
Thanks Russell,

On 09/23/2016 11:30 AM, Russell King - ARM Linux wrote:
> On Fri, Sep 23, 2016 at 08:13:01PM +0200, Andrew Lunn wrote:
>>> Since the hardware requires longword alignment for its' DMA transfers,
>>> aligning the IP header will require a memcpy, right?
>>
>> The vf610 FEC has an SHIFT16 bit in register ENETx_TACC, which inserts
>> two padding bits on transmit. ENETx_RACC has the same.
>>
>> What about your hardware?
> 
> The iMX6 FEC also has that ability - as part of my FEC patch stack from
> ages ago, I implemented support for it.
> 
>   "net:fec: implement almost zero-copy receive path"
> 
> in my public fec-testing branch.
> 
> That patch stack is sadly now totally dead and I've no interest in
> reviving it myself.  There was some interest from others in taking my
> patch stack over, but that went quiet.
> 

I'll take a look and hopefully revive at least part of the patch set.
Eric Nelson Sept. 23, 2016, 6:49 p.m. UTC | #11
Thanks Russell,

On 09/23/2016 11:37 AM, Russell King - ARM Linux wrote:
> On Fri, Sep 23, 2016 at 11:26:18AM -0700, Eric Nelson wrote:
>> So the question is: should we just live with this and acknowledge a
>> performance penalty of bad alignment or do something about it?
> 
> Well, I've no interest in trying to do anything with the FEC driver
> anymore, as I'll just generate another big patch stack which won't
> make it into the kernel in a timely fashion - my last attempt at
> improving the FEC driver was dogged with conflicting changes and I
> gave up with it in the end.  I ended up spending a full cycle
> rebasing, re-testing, and re-evaluating their performance only to find
> that I'd missed the merge window again, and other conflicting changes
> got merged which meant that I had to start from the beginning again.
> 

That's sad. I recall reading your notes on that patch series and it was
a model for how to structure and document a patch set.

I hadn't noticed that you abandoned it and it's frustrating that the
merge process prevented your efforts from being used.

I'm also disheartened to hear your frustration about getting things
pushed up-stream and the entire Linux community should take note.

>> I'm not sure the cost (or the details) of Eric's proposed fix of allocating
>> and copying the header to another skb.
> 
> I had a quick look at this, and although Eric's idea may be a good
> idea, it doesn't contain enough details for me to be able to
> implement it - eg, I've no idea how to attach the 128-byte skb to the
> beginning of a previously allocated skb containing the rest of the
> packet.  I've just looked through linux/skbuff.h and I can't see
> anything that takes two sk_buff's that would do the job.
> 
> However, I don't think that's necessary in this case, because the
> iMX6 FEC supports the 16-bit alignment of the packet, if only it was
> enabled in hardware and the driver caters for it.
> 

Right. If the hardware supports placing things at a suitable address,
that's the right approach.

I'll try to review your earlier patch set and at least find a way to address
the alignment issues.

I'm a bit booked until LinuxCon but will try to get something out soon.

Regards,


Eric
Uwe Kleine-König Sept. 23, 2016, 8:22 p.m. UTC | #12
Hello Russell,

On Fri, Sep 23, 2016 at 07:37:25PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 23, 2016 at 11:26:18AM -0700, Eric Nelson wrote:
> > So the question is: should we just live with this and acknowledge a
> > performance penalty of bad alignment or do something about it?
> 
> Well, I've no interest in trying to do anything with the FEC driver
> anymore, as I'll just generate another big patch stack which won't
> make it into the kernel in a timely fashion - my last attempt at
> improving the FEC driver was dogged with conflicting changes and I
> gave up with it in the end.  I ended up spending a full cycle
> rebasing, re-testing, and re-evaluating their performance only to find
> that I'd missed the merge window again, and other conflicting changes
> got merged which meant that I had to start from the beginning again.

I'm not included in the set of people who are responsible to review and
merge fec patches, but I'd be surprised if you couldn't get an exclusive
lock for that driver. Something like: After 4.X-rc1 the fec isn't
touched any more until you got your series ready for the 4.X+1 merge
window. Of course some fixes might have to go in, but these hopefully
won't disturb much.

Best regards
Uwe
David Miller Sept. 24, 2016, 2:43 a.m. UTC | #13
From: Eric Nelson <eric@nelint.com>
Date: Fri, 23 Sep 2016 10:33:29 -0700

> Since the hardware requires longword alignment for its' DMA transfers,
> aligning the IP header will require a memcpy, right?

I wish hardware designers didn't do this.

There is no conflict between DMA alignment and properly offseting
the packet data by two bytes.

All hardware designers have to do is allow 2 padding bytes to be
emitted by the chip before the actual packet data.

Then the longword or whatever DMA transfer alignment is met
whilst still giving the necessary flexibility for where the
packet data lands.
David Miller Sept. 24, 2016, 2:45 a.m. UTC | #14
From: Eric Nelson <eric@nelint.com>
Date: Fri, 23 Sep 2016 11:35:17 -0700

> From the i.MX6DQ reference manual, bit 7 of ENET_RACC says this:
> 
> "RX FIFO Shift-16
> 
> When this field is set, the actual frame data starts at bit 16 of the first
> word read from the RX FIFO aligning the Ethernet payload on a
> 32-bit boundary."
> 
> Same for the i.MX6UL.
> 
> I'm not sure what it will take to use this, but it seems to be exactly
> what we're looking for.

+1
Andy Duan Sept. 24, 2016, 5:13 a.m. UTC | #15
From: David Miller <davem@davemloft.net> Sent: Saturday, September 24, 2016 10:46 AM
> To: eric@nelint.com
> Cc: andrew@lunn.ch; edumazet@google.com; Andy Duan
> <fugang.duan@nxp.com>; otavio@ossystems.com.br;
> netdev@vger.kernel.org; troy.kisky@boundarydevices.com;
> rmk+kernel@arm.linux.org.uk; cjb.sw.nospam@gmail.com; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: Alignment issues with freescale FEC driver
> 
> From: Eric Nelson <eric@nelint.com>
> Date: Fri, 23 Sep 2016 11:35:17 -0700
> 
> > From the i.MX6DQ reference manual, bit 7 of ENET_RACC says this:
> >
> > "RX FIFO Shift-16
> >
> > When this field is set, the actual frame data starts at bit 16 of the
> > first word read from the RX FIFO aligning the Ethernet payload on a
> > 32-bit boundary."
> >
> > Same for the i.MX6UL.
> >
> > I'm not sure what it will take to use this, but it seems to be exactly
> > what we're looking for.
> 
> +1

RACC[SHIFT16] just instructs the MAC to write two additional bytes in front of each frame received into the RX FIFO to align
the Ethernet payload on a 32-bit boundary.
Eric's patch "net: fec: support RRACC_SHIFT16 to align IP header" works fine.

For the alignment issues, that is introduced by commit 1b7bde6d6 and c259c132a in net-next tree. Before these commits, no alignment issue.

How to fix the issue:
Solution1:  to enable HW RRACC_SHIFT16 feature (test pass):
	Eric's patch  "net: fec: support RRACC_SHIFT16 to align IP header".
Solution2: include the correct prefetch() header (test pass):
	--- a/drivers/net/ethernet/freescale/fec_main.c
	+++ b/drivers/net/ethernet/freescale/fec_main.c
	@@ -59,7 +59,7 @@
	 #include <linux/pinctrl/consumer.h>
 	#include <linux/pm_runtime.h>
 	#include <linux/busfreq-imx.h>
	-#include <linux/prefetch.h>
	+#include <asm/processor.h>
Solution3: use __netdev_alloc_skb_ip_align() instead of netdev_alloc_skb(). 
	     Or: still use the previous method before commit 1b7bde6d6:
		skb = netdev_alloc_skb(ndev, pkt_len - 4 + NET_IP_ALIGN);
		skb_reserve(skb, NET_IP_ALIGN);

Comparing these solutions:
	From sw effort and performance, I think these are the similar.  Enable RRACC_SHIFT16 doesn't take extra advantage.

Correct my if I am wrong. Thanks.

Regards,
Andy
Eric Nelson Sept. 24, 2016, 12:27 p.m. UTC | #16
Hi David,

On 09/23/2016 07:43 PM, David Miller wrote:
> From: Eric Nelson <eric@nelint.com>
> Date: Fri, 23 Sep 2016 10:33:29 -0700
> 
>> Since the hardware requires longword alignment for its' DMA transfers,
>> aligning the IP header will require a memcpy, right?
> 
> I wish hardware designers didn't do this.
> 
> There is no conflict between DMA alignment and properly offseting
> the packet data by two bytes.
> 
> All hardware designers have to do is allow 2 padding bytes to be
> emitted by the chip before the actual packet data.
> 

Andrew Lunn pointed out that the hardware does support this,
and I just pushed a patch for the vendor kernel to the meta-freescale
mailing list:

https://lists.yoctoproject.org/pipermail/meta-freescale/2016-September/019228.html

> Then the longword or whatever DMA transfer alignment is met
> whilst still giving the necessary flexibility for where the
> packet data lands.
> 

Right. A relatively small change fixes things right up.

Many thanks to Andrew for pointing this out and Russell for providing
the basis for my patch.

I'll re-work this for the up-stream kernel when I get out from
under a couple of unrelated things.
diff mbox

Patch

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0cc98b1..c17ef6e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1301,6 +1301,7 @@  static struct sk_buff **inet_gro_receive(struct
sk_buff **head,
unsigned int hlen;
unsigned int off;
unsigned int id;
+ unsigned int frag;
int flush = 1;
int proto;