alignment faults in 3.6

Message ID	1349959248.21172.8970.camel@edumazet-glaptop (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Subject: Re: alignment faults in 3.6 From: Eric Dumazet <eric.dumazet@gmail.com> To: Arnd Bergmann <arnd@arndb.de> In-Reply-To: <201210111228.25995.arnd@arndb.de> References: <20121005082439.GF4625@n2100.arm.linux.org.uk> <20121011103257.GO4625@n2100.arm.linux.org.uk> <1349952574.21172.8604.camel@edumazet-glaptop> <201210111228.25995.arnd@arndb.de> Date: Thu, 11 Oct 2012 14:40:48 +0200 Message-ID: <1349959248.21172.8970.camel@edumazet-glaptop> Mime-Version: 1.0 summary: Content analysis details: (-2.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.214.49 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (eric.dumazet[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Cc: =?ISO-8859-1?Q?M=E5ns_Rullg=E5rd?= <mans@mansr.com>, Russell King - ARM Linux <linux@arm.linux.org.uk>, netdev@vger.kernel.org, David Laight <David.Laight@aculab.com>, Jon Masters <jonathan@jonmasters.org>, linux-arm-kernel@lists.infradead.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Eric Dumazet Oct. 11, 2012, 12:40 p.m. UTC

On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote:

> 
> Rob Herring as the original reporter has dropped off the Cc list, adding
> him back.
> 
> I assume that the calxeda xgmac driver is the culprit then. It uses
> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in
> xgmac_rx_refill but it is not clear whether it does so intentionally
> or by accident.

Thanks Arnd

It seems an accident, since driver doesnt check skb->data alignment at
all (this can change with SLAB debug on/off)

It also incorrectly adds 64 bytes to bfsize, there is no need for this.

(or if its needed, a comment would be nice, because on prior kernels,
this makes skb->head allocations uses kmalloc-4096 instead of
kmalloc-2048 slab cache... With 3.7 its less an issue now we use order-3
pages to deliver fragments for rx skbs

So the following patch should fix the alignment, and makes driver uses
half memory than before for stable kernels

Rob Herring Oct. 11, 2012, 1:20 p.m. UTC | #1

On 10/11/2012 07:40 AM, Eric Dumazet wrote:
> On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote:
> 
>>
>> Rob Herring as the original reporter has dropped off the Cc list, adding
>> him back.
>>
>> I assume that the calxeda xgmac driver is the culprit then. It uses
>> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in
>> xgmac_rx_refill but it is not clear whether it does so intentionally
>> or by accident.

This in fact does work and eliminates the unaligned traps. However, not
all h/w can do IP aligned DMA (i.MX FEC for example), so I still think
this is a questionable optimization by the compiler. We're saving 1 load
instruction here for data that is likely already in the cache. It may be
legal per the ABI, but the downside of this optimization is much greater
than the upside.

> 
> Thanks Arnd
> 
> It seems an accident, since driver doesnt check skb->data alignment at
> all (this can change with SLAB debug on/off)
> 
> It also incorrectly adds 64 bytes to bfsize, there is no need for this.

I'm pretty sure this was needed as the h/w writes out full bursts of
data, but I'll go back and check.

Rob

> (or if its needed, a comment would be nice, because on prior kernels,
> this makes skb->head allocations uses kmalloc-4096 instead of
> kmalloc-2048 slab cache... With 3.7 its less an issue now we use order-3
> pages to deliver fragments for rx skbs
>
> So the following patch should fix the alignment, and makes driver uses
> half memory than before for stable kernels
> 
> diff --git a/drivers/net/ethernet/calxeda/xgmac.c b/drivers/net/ethernet/calxeda/xgmac.c
> index 16814b3..a895e18 100644
> --- a/drivers/net/ethernet/calxeda/xgmac.c
> +++ b/drivers/net/ethernet/calxeda/xgmac.c
> @@ -671,7 +671,8 @@ static void xgmac_rx_refill(struct xgmac_priv *priv)
>  		p = priv->dma_rx + entry;
>  
>  		if (priv->rx_skbuff[entry] == NULL) {
> -			skb = netdev_alloc_skb(priv->dev, priv->dma_buf_sz);
> +			skb = netdev_alloc_skb_ip_align(priv->dev,
> +							priv->dma_buf_sz);
>  			if (unlikely(skb == NULL))
>  				break;
>  
> @@ -703,7 +704,7 @@ static int xgmac_dma_desc_rings_init(struct net_device *dev)
>  	/* Set the Buffer size according to the MTU;
>  	 * indeed, in case of jumbo we need to bump-up the buffer sizes.
>  	 */
> -	bfsize = ALIGN(dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN + 64,
> +	bfsize = ALIGN(dev->mtu + ETH_HLEN + ETH_FCS_LEN,
>  		       64);
>  
>  	netdev_dbg(priv->dev, "mtu [%d] bfsize [%d]\n", dev->mtu, bfsize);
> 
>

Måns Rullgård Oct. 11, 2012, 1:32 p.m. UTC | #2

Rob Herring <robherring2@gmail.com> writes:

> On 10/11/2012 07:40 AM, Eric Dumazet wrote:
>> On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote:
>> 
>>>
>>> Rob Herring as the original reporter has dropped off the Cc list, adding
>>> him back.
>>>
>>> I assume that the calxeda xgmac driver is the culprit then. It uses
>>> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in
>>> xgmac_rx_refill but it is not clear whether it does so intentionally
>>> or by accident.
>
> This in fact does work and eliminates the unaligned traps. However, not
> all h/w can do IP aligned DMA (i.MX FEC for example), so I still think
> this is a questionable optimization by the compiler. We're saving 1 load
> instruction here for data that is likely already in the cache. It may be
> legal per the ABI, but the downside of this optimization is much greater
> than the upside.

The compiler is working *exactly* as it should.  Merging the loads saves
cycles *and* code size.  Many of these added up can make a real difference.

When writing code, you must follow all the rules, whether you like them
or not.  Without rules, the compiler would be very limited in the
optimisations it could perform.

Unfortunately, new optimisations occasionally uncover broken code
violating some constraint or other.  When this happens, the correct
course of action is to fix the code, not cripple the compiler.

Arnd Bergmann Oct. 11, 2012, 1:35 p.m. UTC | #3

On Thursday 11 October 2012, Rob Herring wrote:
> 
> On 10/11/2012 07:40 AM, Eric Dumazet wrote:
> > On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote:
> > 
> >>
> >> Rob Herring as the original reporter has dropped off the Cc list, adding
> >> him back.
> >>
> >> I assume that the calxeda xgmac driver is the culprit then. It uses
> >> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in
> >> xgmac_rx_refill but it is not clear whether it does so intentionally
> >> or by accident.
> 
> This in fact does work and eliminates the unaligned traps. However, not
> all h/w can do IP aligned DMA (i.MX FEC for example), so I still think
> this is a questionable optimization by the compiler. We're saving 1 load
> instruction here for data that is likely already in the cache. It may be
> legal per the ABI, but the downside of this optimization is much greater
> than the upside.
> 

What about the other approach that Eric suggested for such hardware
in http://www.spinics.net/lists/arm-kernel/msg200206.html ?

	Arnd

Eric Dumazet Oct. 11, 2012, 1:47 p.m. UTC | #4

On Thu, 2012-10-11 at 08:20 -0500, Rob Herring wrote:
> On 10/11/2012 07:40 AM, Eric Dumazet wrote:
> > On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote:
> > 
> >>
> >> Rob Herring as the original reporter has dropped off the Cc list, adding
> >> him back.
> >>
> >> I assume that the calxeda xgmac driver is the culprit then. It uses
> >> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in
> >> xgmac_rx_refill but it is not clear whether it does so intentionally
> >> or by accident.
> 
> This in fact does work and eliminates the unaligned traps. However, not
> all h/w can do IP aligned DMA (i.MX FEC for example), so I still think
> this is a questionable optimization by the compiler. We're saving 1 load
> instruction here for data that is likely already in the cache. It may be
> legal per the ABI, but the downside of this optimization is much greater
> than the upside.

Compiler is asked to perform a 32bit load, it does it.

There is no questionable optimization here. Really.
Please stop pretending this, this makes no sense.

As I said, if some h/w cannot do IP aligned DMA, driver can use a
workaround, or a plain memmove() (some drivers seems to do this to work
around this h/w limitation, just grep for memmove() in drivers/net)

> 
> > 
> > Thanks Arnd
> > 
> > It seems an accident, since driver doesnt check skb->data alignment at
> > all (this can change with SLAB debug on/off)
> > 
> > It also incorrectly adds 64 bytes to bfsize, there is no need for this.
> 
> I'm pretty sure this was needed as the h/w writes out full bursts of
> data, but I'll go back and check.

Maybe the ALIGN() was needed then. But the 64 + NE_IP_ALIGN sounds like
the head room that we allocate/reserve in netdev_alloc_skb_ip_align()

So you allocate this extra room twice.

Thanks

Rob Herring Oct. 11, 2012, 3:23 p.m. UTC | #5

On 10/11/2012 08:47 AM, Eric Dumazet wrote:
> On Thu, 2012-10-11 at 08:20 -0500, Rob Herring wrote:
>> On 10/11/2012 07:40 AM, Eric Dumazet wrote:
>>> On Thu, 2012-10-11 at 12:28 +0000, Arnd Bergmann wrote:
>>>
>>>>
>>>> Rob Herring as the original reporter has dropped off the Cc list, adding
>>>> him back.
>>>>
>>>> I assume that the calxeda xgmac driver is the culprit then. It uses
>>>> netdev_alloc_skb() rather than netdev_alloc_skb_ip_align() in
>>>> xgmac_rx_refill but it is not clear whether it does so intentionally
>>>> or by accident.
>>
>> This in fact does work and eliminates the unaligned traps. However, not
>> all h/w can do IP aligned DMA (i.MX FEC for example), so I still think
>> this is a questionable optimization by the compiler. We're saving 1 load
>> instruction here for data that is likely already in the cache. It may be
>> legal per the ABI, but the downside of this optimization is much greater
>> than the upside.
> 
> Compiler is asked to perform a 32bit load, it does it.

Not exactly. It is asked to to perform 2 32-bit loads which are combined
into a single ldm (load multiple) which cannot handle unaligned
accesses. Here's a simple example that does the same thing:

void test(char * buf)
{
	printf("%d, %d\n", *((unsigned int *)&buf[0]), *((unsigned int *)&buf[4]));
}

So I guess the only ABI legal unaligned access is in a packed struct.

> There is no questionable optimization here. Really.
> Please stop pretending this, this makes no sense.

I'm not the one calling the networking stack bad code.

I can fix my h/w, so I'll stop caring about this. Others can all get
bitten by this new behavior in gcc 4.7.

Rob

> As I said, if some h/w cannot do IP aligned DMA, driver can use a
> workaround, or a plain memmove() (some drivers seems to do this to work
> around this h/w limitation, just grep for memmove() in drivers/net)
> 
>>
>>>
>>> Thanks Arnd
>>>
>>> It seems an accident, since driver doesnt check skb->data alignment at
>>> all (this can change with SLAB debug on/off)
>>>
>>> It also incorrectly adds 64 bytes to bfsize, there is no need for this.
>>
>> I'm pretty sure this was needed as the h/w writes out full bursts of
>> data, but I'll go back and check.
> 
> Maybe the ALIGN() was needed then. But the 64 + NE_IP_ALIGN sounds like
> the head room that we allocate/reserve in netdev_alloc_skb_ip_align()
> 
> So you allocate this extra room twice.
> 
> Thanks
> 
>

David Laight Oct. 11, 2012, 3:39 p.m. UTC | #6

> Not exactly. It is asked to to perform 2 32-bit loads which are combined
> into a single ldm (load multiple) which cannot handle unaligned
> accesses. Here's a simple example that does the same thing:
> 
> void test(char * buf)
> {
> 	printf("%d, %d\n", *((unsigned int *)&buf[0]), *((unsigned int *)&buf[4]));
> }

Have you actually looked at what an ARM processor traditionally did
with misaligned memory reads?
While useful, it probably wasn't what was intended.

Actually, and IIRC, some very recent ARM cpus will do the 'expected'
thing for single-word loads from misaligned addesses.
However they almost certainly won't for ldm/stm.

The 'ldm' optimisation for adjacent memory loads is also dubious.
On at least some ARMs it is very slow (might only be strongarms).

> So I guess the only ABI legal unaligned access is in a packed struct.

Correct. And you mustn't try casting the address, the compiler is
allowed to remember where it came from.
(This causes a lot of grief...)

If you are targeting the ARM cpu that can do misaligned transfers,
then gcc should generate single instructions for misaligned structure
members, and never do the 'ldm' optimisations.

But, the IP header is expected to be aligned.

	David

Eric Dumazet Oct. 11, 2012, 4:15 p.m. UTC | #7

On Thu, 2012-10-11 at 10:23 -0500, Rob Herring wrote:
> On 10/11/2012 08:47 AM, Eric Dumazet wrote:

> > Compiler is asked to perform a 32bit load, it does it.
> 
> Not exactly. It is asked to to perform 2 32-bit loads which are combined
> into a single ldm (load multiple) which cannot handle unaligned
> accesses. Here's a simple example that does the same thing:

Thats simply not true. You are severely mistaken.

ldm does a load of seeral 32bit words.

And the compiler would not use it if the alignment was not matching the
prereq (alignment >= 4)

> 
> void test(char * buf)
> {
> 	printf("%d, %d\n", *((unsigned int *)&buf[0]), *((unsigned int *)&buf[4]));
> }

But you completely miss the fact that network doesnt pass a "char *buf"
but a "be32 *buf". Your example is not relevant at all.

So the compiler is absolutely right, and network stack is _right_ too.

The prereq is that IP header are aligned to 4 bytes boundary.

Denying this fact is not going to help you

> 
> So I guess the only ABI legal unaligned access is in a packed struct.
> 
> > There is no questionable optimization here. Really.
> > Please stop pretending this, this makes no sense.
> 
> I'm not the one calling the networking stack bad code.

Once you understand the issues, you can explain us where is the bad
code. But given you say "Bug is in compiler, and/or network stack, but
my driver is fine", its not very wise.

For the moment, the bug is in your driver.

> 
> I can fix my h/w, so I'll stop caring about this. Others can all get
> bitten by this new behavior in gcc 4.7.

Again compiler is fine. How should we say that so that you stop your
rants ?

Stop trying to find an excuse, dont try to fool us, this is really
embarrassing. Just fix the driver, there is no shame to fix an error.

Måns Rullgård Oct. 11, 2012, 4:18 p.m. UTC | #8

"David Laight" <David.Laight@ACULAB.COM> writes:

>> Not exactly. It is asked to to perform 2 32-bit loads which are combined
>> into a single ldm (load multiple) which cannot handle unaligned
>> accesses. Here's a simple example that does the same thing:
>> 
>> void test(char * buf)
>> {
>> 	printf("%d, %d\n", *((unsigned int *)&buf[0]), *((unsigned int *)&buf[4]));
>> }
>
> Have you actually looked at what an ARM processor traditionally did
> with misaligned memory reads?
> While useful, it probably wasn't what was intended.
>
> Actually, and IIRC, some very recent ARM cpus will do the 'expected'
> thing for single-word loads from misaligned addesses.

What various CPUs do with unaligned accesses is not the issue here.  The
casts in the code above act as a promise to the compiler that the
address is in fact properly align for the pointer type.

> However they almost certainly won't for ldm/stm.
>
> The 'ldm' optimisation for adjacent memory loads is also dubious.

There is nothing whatsoever dubious about the compiler using the most
efficient instruction sequence to accomplish what the code asks for.

> On at least some ARMs it is very slow (might only be strongarms).

The compiler will pick instructions suitable for the CPU you specify.

>> So I guess the only ABI legal unaligned access is in a packed struct.
>
> Correct. And you mustn't try casting the address, the compiler is
> allowed to remember where it came from.
> (This causes a lot of grief...)

It is only a problem when you try to outsmart the compiler.

> If you are targeting the ARM cpu that can do misaligned transfers,
> then gcc should generate single instructions for misaligned structure
> members, and never do the 'ldm' optimisations.

That is exactly how gcc works.

> But, the IP header is expected to be aligned.

Everything tells the compiler the struct is perfectly aligned.  When the
buggy driver passes a misaligned pointer, bad things happen.

Arnd Bergmann Oct. 12, 2012, 8:11 a.m. UTC | #9

On Thursday 11 October 2012, Måns Rullgård wrote:
> > But, the IP header is expected to be aligned.
> 
> Everything tells the compiler the struct is perfectly aligned.  When the
> buggy driver passes a misaligned pointer, bad things happen.

Would it be appropriate to add a WARN_ON_ONCE() in the alignment fault path
then? If all alignment faults in the kernel are caused by broken drivers,
that would at least give us some hope of finding those drivers while at the
same time not causing much overhead in the case where we need to do the
fixup in the meantime.

	Arnd

Russell King - ARM Linux Oct. 12, 2012, 9:03 a.m. UTC | #10

On Fri, Oct 12, 2012 at 08:11:42AM +0000, Arnd Bergmann wrote:
> On Thursday 11 October 2012, Måns Rullgård wrote:
> > > But, the IP header is expected to be aligned.
> > 
> > Everything tells the compiler the struct is perfectly aligned.  When the
> > buggy driver passes a misaligned pointer, bad things happen.
> 
> Would it be appropriate to add a WARN_ON_ONCE() in the alignment fault path
> then? If all alignment faults in the kernel are caused by broken drivers,
> that would at least give us some hope of finding those drivers while at the
> same time not causing much overhead in the case where we need to do the
> fixup in the meantime.

No.  It is my understanding that various IP option processing can also
cause the alignment fault handler to be invoked, even when the packet is
properly aligned, and then there's jffs2/mtd which also relies upon
alignment faults being fixed up.

Eric Dumazet Oct. 12, 2012, 10:04 a.m. UTC | #11

On Fri, 2012-10-12 at 10:03 +0100, Russell King - ARM Linux wrote:

> No.  It is my understanding that various IP option processing can also
> cause the alignment fault handler to be invoked, even when the packet is
> properly aligned, and then there's jffs2/mtd which also relies upon
> alignment faults being fixed up.

Oh well.

We normally make sure we dont have alignment faults on arches that dont
have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS (or a non null NET_IP_ALIGN)

So if you find an offender, please report a bug, because I can guarantee
you we will _fix_ it.

One example of a fix was the following subtle one.

commit 117632e64d2a5f464e491fe221d7169a3814a77b
tcp: take care of misalignments

We discovered that TCP stack could retransmit misaligned skbs if a
malicious peer acknowledged sub MSS frame. This currently can happen
only if output interface is non SG enabled : If SG is enabled, tcp
builds headless skbs (all payload is included in fragments), so the tcp
trimming process only removes parts of skb fragments, header stay
aligned.

Some arches cant handle misalignments, so force a head reallocation and
shrink headroom to MAX_TCP_HEADER.

Dont care about misaligments on x86 and PPC (or other arches setting
NET_IP_ALIGN to 0)

This patch introduces __pskb_copy() which can specify the headroom of
new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()

Måns Rullgård Oct. 12, 2012, 11 a.m. UTC | #12

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

> On Fri, Oct 12, 2012 at 08:11:42AM +0000, Arnd Bergmann wrote:
>> On Thursday 11 October 2012, Måns Rullgård wrote:
>> > > But, the IP header is expected to be aligned.
>> > 
>> > Everything tells the compiler the struct is perfectly aligned.  When the
>> > buggy driver passes a misaligned pointer, bad things happen.
>> 
>> Would it be appropriate to add a WARN_ON_ONCE() in the alignment fault path
>> then?

I think that's an excellent idea.

>> If all alignment faults in the kernel are caused by broken drivers,
>> that would at least give us some hope of finding those drivers while
>> at the same time not causing much overhead in the case where we need
>> to do the fixup in the meantime.
>
> No.  It is my understanding that various IP option processing can also
> cause the alignment fault handler to be invoked, even when the packet is
> properly aligned, and then there's jffs2/mtd which also relies upon
> alignment faults being fixed up.

As far as I'm concerned, this is all hearsay, and I've only ever heard
it from you.  Why can't you let those who care fix these bugs instead?

Russell King - ARM Linux Oct. 12, 2012, 11:07 a.m. UTC | #13

On Fri, Oct 12, 2012 at 12:00:03PM +0100, Måns Rullgård wrote:
> Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> 
> > On Fri, Oct 12, 2012 at 08:11:42AM +0000, Arnd Bergmann wrote:
> >> On Thursday 11 October 2012, Måns Rullgård wrote:
> >> > > But, the IP header is expected to be aligned.
> >> > 
> >> > Everything tells the compiler the struct is perfectly aligned.  When the
> >> > buggy driver passes a misaligned pointer, bad things happen.
> >> 
> >> Would it be appropriate to add a WARN_ON_ONCE() in the alignment fault path
> >> then?
> 
> I think that's an excellent idea.

Well, I get the last word here and it's no.

> >> If all alignment faults in the kernel are caused by broken drivers,
> >> that would at least give us some hope of finding those drivers while
> >> at the same time not causing much overhead in the case where we need
> >> to do the fixup in the meantime.
> >
> > No.  It is my understanding that various IP option processing can also
> > cause the alignment fault handler to be invoked, even when the packet is
> > properly aligned, and then there's jffs2/mtd which also relies upon
> > alignment faults being fixed up.
> 
> As far as I'm concerned, this is all hearsay, and I've only ever heard
> it from you.  Why can't you let those who care fix these bugs instead?

You know, I'm giving you the benefit of my _knowledge_ which has been
built over the course of the last 20 years.  I've been in these
discussions with networking people before.  I ended up having to develop
the alignment fault handler because of those discussions.  And oh look,
Eric confirmed that the networking code isn't going to get "fixed" as
you were demanding, which is exactly what I said.

I've been in discussions with MTD people over these issues before, I've
discussed this with David Woodhouse when it came up in JFFS2.  I *KNOW*
these things.

You can call it hearsay if you wish, but it seems to be more accurate
than your wild outlandish and pathetic statements.

Måns Rullgård Oct. 12, 2012, 11:18 a.m. UTC | #14

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

> On Fri, Oct 12, 2012 at 12:00:03PM +0100, Måns Rullgård wrote:
>> Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
>> 
>> > On Fri, Oct 12, 2012 at 08:11:42AM +0000, Arnd Bergmann wrote:
>> >> On Thursday 11 October 2012, Måns Rullgård wrote:
>> >> > > But, the IP header is expected to be aligned.
>> >> > 
>> >> > Everything tells the compiler the struct is perfectly aligned.  When the
>> >> > buggy driver passes a misaligned pointer, bad things happen.
>> >> 
>> >> Would it be appropriate to add a WARN_ON_ONCE() in the alignment
>> >> fault path then?
>> 
>> I think that's an excellent idea.
>
> Well, I get the last word here and it's no.

Sadly, yes.

>> >> If all alignment faults in the kernel are caused by broken drivers,
>> >> that would at least give us some hope of finding those drivers while
>> >> at the same time not causing much overhead in the case where we need
>> >> to do the fixup in the meantime.
>> >
>> > No.  It is my understanding that various IP option processing can also
>> > cause the alignment fault handler to be invoked, even when the packet is
>> > properly aligned, and then there's jffs2/mtd which also relies upon
>> > alignment faults being fixed up.
>> 
>> As far as I'm concerned, this is all hearsay, and I've only ever heard
>> it from you.  Why can't you let those who care fix these bugs instead?
>
> You know, I'm giving you the benefit of my _knowledge_ which has been
> built over the course of the last 20 years.

How proud you sound.  Now could you say something of substance instead?

> I've been in these discussions with networking people before.  I ended
> up having to develop the alignment fault handler because of those
> discussions.  And oh look, Eric confirmed that the networking code
> isn't going to get "fixed" as you were demanding, which is exactly
> what I said.

Funny, I saw him say the exact opposite:

  So if you find an offender, please report a bug, because I can
  guarantee you we will _fix_ it.

> I've been in discussions with MTD people over these issues before, I've
> discussed this with David Woodhouse when it came up in JFFS2.  I *KNOW*
> these things.

In the same way you "know" the networking people won't fix their code,
despite them _clearly_ stating the opposite?

> You can call it hearsay if you wish, but it seems to be more accurate
> than your wild outlandish and pathetic statements.

So you're resorting to name-calling.  Not taking that bait.

Russell King - ARM Linux Oct. 12, 2012, 11:19 a.m. UTC | #15

On Fri, Oct 12, 2012 at 12:07:50PM +0100, Russell King - ARM Linux wrote:
> On Fri, Oct 12, 2012 at 12:00:03PM +0100, Måns Rullgård wrote:
> > Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> > 
> > > On Fri, Oct 12, 2012 at 08:11:42AM +0000, Arnd Bergmann wrote:
> > >> On Thursday 11 October 2012, Måns Rullgård wrote:
> > >> > > But, the IP header is expected to be aligned.
> > >> > 
> > >> > Everything tells the compiler the struct is perfectly aligned.  When the
> > >> > buggy driver passes a misaligned pointer, bad things happen.
> > >> 
> > >> Would it be appropriate to add a WARN_ON_ONCE() in the alignment fault path
> > >> then?
> > 
> > I think that's an excellent idea.
> 
> Well, I get the last word here and it's no.
> 
> > >> If all alignment faults in the kernel are caused by broken drivers,
> > >> that would at least give us some hope of finding those drivers while
> > >> at the same time not causing much overhead in the case where we need
> > >> to do the fixup in the meantime.
> > >
> > > No.  It is my understanding that various IP option processing can also
> > > cause the alignment fault handler to be invoked, even when the packet is
> > > properly aligned, and then there's jffs2/mtd which also relies upon
> > > alignment faults being fixed up.
> > 
> > As far as I'm concerned, this is all hearsay, and I've only ever heard
> > it from you.  Why can't you let those who care fix these bugs instead?
> 
> You know, I'm giving you the benefit of my _knowledge_ which has been
> built over the course of the last 20 years.  I've been in these
> discussions with networking people before.  I ended up having to develop
> the alignment fault handler because of those discussions.

Correction: San Mehat at Corel Inc had to develop it, but I was involved
in those discussions over it nevertheless, as I was the person who merged
the code and then subsequently maintained it.

Russell King - ARM Linux Oct. 12, 2012, 11:44 a.m. UTC | #16

On Fri, Oct 12, 2012 at 12:18:08PM +0100, Måns Rullgård wrote:
> Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> > Well, I get the last word here and it's no.
> 
> Sadly, yes.

It's not "sadly" - it's a matter of fact that the kernel does from time
to time generate misaligned accesses and they are _not_ bugs.  If they
were bugs, then the code to fix up misaligned accesses would not have
been developed, and we'd instead take the fault and panic or something
like that.

> >> >> If all alignment faults in the kernel are caused by broken drivers,
> >> >> that would at least give us some hope of finding those drivers while
> >> >> at the same time not causing much overhead in the case where we need
> >> >> to do the fixup in the meantime.
> >> >
> >> > No.  It is my understanding that various IP option processing can also
> >> > cause the alignment fault handler to be invoked, even when the packet is
> >> > properly aligned, and then there's jffs2/mtd which also relies upon
> >> > alignment faults being fixed up.
> >> 
> >> As far as I'm concerned, this is all hearsay, and I've only ever heard
> >> it from you.  Why can't you let those who care fix these bugs instead?
> >
> > You know, I'm giving you the benefit of my _knowledge_ which has been
> > built over the course of the last 20 years.
> 
> How proud you sound.  Now could you say something of substance instead?

You're proving yourself to be idiot?  There, that's substance.

> > I've been in these discussions with networking people before.  I ended
> > up having to develop the alignment fault handler because of those
> > discussions.  And oh look, Eric confirmed that the networking code
> > isn't going to get "fixed" as you were demanding, which is exactly
> > what I said.
> 
> Funny, I saw him say the exact opposite:
> 
>   So if you find an offender, please report a bug, because I can
>   guarantee you we will _fix_ it.

No, let's go back.

- You were demanding that the ipv4 header structure should be packed.
  I said that wasn't going to happen because the networking people
  wouldn't allow it, and it seems that's been proven correct.
- You were demanding that the ipv4 code used the unaligned accessors.
  I said that networking people wouldn't allow it either, and that's
  also been proven correct.

Both these points have been proven correct because Eric has said that the
core networking code is _not_ going to be changed to suit this.

What Eric _has_ said is that networking people consider packets supplied
to the networking layer where the IPv4 header is not aligned on architectures
where misaligned accesses are a problem to be a bug in the network driver,
not the network code, and proposed a solution.

That's entirely different from all your claims that the core networking
code needs fixing to avoid these misaligned accesses.

> > I've been in discussions with MTD people over these issues before, I've
> > discussed this with David Woodhouse when it came up in JFFS2.  I *KNOW*
> > these things.
> 
> In the same way you "know" the networking people won't fix their code,
> despite them _clearly_ stating the opposite?

I'll tell you exactly how I *KNOW* this.  The issue came up because of
noMMU, which does not have any way to fix up alignment faults.  JFFS2
passes randomly aligned buffers to the MTD drivers, and the MTD drivers
assume that they're aligned and they do word accesses on them.

See the thread http://lists.arm.linux.org.uk/lurker/thread/20021204.191632.4473796b.en.html

See: http://lists.arm.linux.org.uk/lurker/message/20020225.195925.02bdbd47.en.html
and: http://lists.arm.linux.org.uk/lurker/message/20020313.150932.081a7592.en.html

There's several other threads where it's also discussed.

And while you're there, note the date.  There is nothing recent about this
issue.  It's well known, and well understood by those who have a grasp of
the issues that alignment faults are a part of normal operation by the
ARM kernel - and is one of the penalties of being tied into architecture
independent code.

Compromises have to be sought, and that's the compromise we get to live
with.

> > You can call it hearsay if you wish, but it seems to be more accurate
> > than your wild outlandish and pathetic statements.
> 
> So you're resorting to name-calling.  Not taking that bait.

Sorry?  So what you're saying is that it's fine for you to call my
comments hearsay, but I'm not allowed to express a view on your comments.
How arrogant of you.

Eric Dumazet Oct. 12, 2012, 12:08 p.m. UTC | #17

On Fri, 2012-10-12 at 12:44 +0100, Russell King - ARM Linux wrote:

> - You were demanding that the ipv4 header structure should be packed.
>   I said that wasn't going to happen because the networking people
>   wouldn't allow it, and it seems that's been proven correct.
> - You were demanding that the ipv4 code used the unaligned accessors.
>   I said that networking people wouldn't allow it either, and that's
>   also been proven correct.
> 
> Both these points have been proven correct because Eric has said that the
> core networking code is _not_ going to be changed to suit this.
> 
> What Eric _has_ said is that networking people consider packets supplied
> to the networking layer where the IPv4 header is not aligned on architectures
> where misaligned accesses are a problem to be a bug in the network driver,
> not the network code, and proposed a solution.
> 
> That's entirely different from all your claims that the core networking
> code needs fixing to avoid these misaligned accesses.

It seems we agree then, but your words were a bit misleading (at least
for me)

So yes, we built network stack with the prereq that IP headers are
aligned, but unfortunately many developers use x86 which doesnt care, so
its possible some bugs are added.

But its not by intent, only by accident.

Thanks

Måns Rullgård Oct. 12, 2012, 12:16 p.m. UTC | #18

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

> On Fri, Oct 12, 2012 at 12:18:08PM +0100, Måns Rullgård wrote:
>> Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
>> > Well, I get the last word here and it's no.
>> 
>> Sadly, yes.
>
> It's not "sadly" - it's a matter of fact that the kernel does from time
> to time generate misaligned accesses and they are _not_ bugs.  If they
> were bugs, then the code to fix up misaligned accesses would not have
> been developed, and we'd instead take the fault and panic or something
> like that.
>
>> >> >> If all alignment faults in the kernel are caused by broken drivers,
>> >> >> that would at least give us some hope of finding those drivers while
>> >> >> at the same time not causing much overhead in the case where we need
>> >> >> to do the fixup in the meantime.
>> >> >
>> >> > No.  It is my understanding that various IP option processing can also
>> >> > cause the alignment fault handler to be invoked, even when the packet is
>> >> > properly aligned, and then there's jffs2/mtd which also relies upon
>> >> > alignment faults being fixed up.
>> >> 
>> >> As far as I'm concerned, this is all hearsay, and I've only ever heard
>> >> it from you.  Why can't you let those who care fix these bugs instead?
>> >
>> > You know, I'm giving you the benefit of my _knowledge_ which has been
>> > built over the course of the last 20 years.
>> 
>> How proud you sound.  Now could you say something of substance instead?
>
> You're proving yourself to be idiot?  There, that's substance.
>
>> > I've been in these discussions with networking people before.  I ended
>> > up having to develop the alignment fault handler because of those
>> > discussions.  And oh look, Eric confirmed that the networking code
>> > isn't going to get "fixed" as you were demanding, which is exactly
>> > what I said.
>> 
>> Funny, I saw him say the exact opposite:
>> 
>>   So if you find an offender, please report a bug, because I can
>>   guarantee you we will _fix_ it.
>
> No, let's go back.
>
> - You were demanding that the ipv4 header structure should be packed.

I demanded no such thing.

>   I said that wasn't going to happen because the networking people
>   wouldn't allow it, and it seems that's been proven correct.
> - You were demanding that the ipv4 code used the unaligned accessors.

Again, I made no such demand.

>   I said that networking people wouldn't allow it either, and that's
>   also been proven correct.

I did not, in fact, _demand_ anything at all.  What I did say was that
to fix the problem of unaligned access traps the OP was having, the
(driver) code supplying the unaligned pointer should be fixed, or *if
that is not possible* mark the structs __packed.  As it turns out,
fixing the driver is easy, so there is no need to change the structs or
how they are accessed.

> Both these points have been proven correct because Eric has said that the
> core networking code is _not_ going to be changed to suit this.
>
> What Eric _has_ said is that networking people consider packets supplied
> to the networking layer where the IPv4 header is not aligned on architectures
> where misaligned accesses are a problem to be a bug in the network driver,
> not the network code, and proposed a solution.
>
> That's entirely different from all your claims that the core networking
> code needs fixing to avoid these misaligned accesses.

I never said that.  I said whatever code is responsible for the pointer
should be fixed.  That code turns out to be the driver.

>> > I've been in discussions with MTD people over these issues before, I've
>> > discussed this with David Woodhouse when it came up in JFFS2.  I *KNOW*
>> > these things.
>> 
>> In the same way you "know" the networking people won't fix their code,
>> despite them _clearly_ stating the opposite?
>
> I'll tell you exactly how I *KNOW* this.  The issue came up because of
> noMMU, which does not have any way to fix up alignment faults.  JFFS2
> passes randomly aligned buffers to the MTD drivers, and the MTD drivers
> assume that they're aligned and they do word accesses on them.

So jffs2 is broken.  Why can't it be fixed?

> See the thread http://lists.arm.linux.org.uk/lurker/thread/20021204.191632.4473796b.en.html

That thread is about detecting misaligned accesses and what to do with
them if they do occur.  I don't see anyone successfully arguing against
fixing the code causing them.

> See: http://lists.arm.linux.org.uk/lurker/message/20020225.195925.02bdbd47.en.html

Yes, there seems to be a problem in jffs2.  Or at least there was one 10
years ago.

> and: http://lists.arm.linux.org.uk/lurker/message/20020313.150932.081a7592.en.html

Yes, disabling the alignment trap on armv5 is a bad idea.  Nobody has
argued otherwise.

> There's several other threads where it's also discussed.
>
> And while you're there, note the date.  There is nothing recent about this
> issue.  It's well known, and well understood by those who have a grasp of
> the issues that alignment faults are a part of normal operation by the
> ARM kernel - and is one of the penalties of being tied into architecture
> independent code.
>
> Compromises have to be sought, and that's the compromise we get to live
> with.

So because of some ancient history with jffs2, we should deny
present-day developers the tools to quickly identify problems in their
code?  I just _love_ such compromises.

>> > You can call it hearsay if you wish, but it seems to be more accurate
>> > than your wild outlandish and pathetic statements.
>> 
>> So you're resorting to name-calling.  Not taking that bait.
>
> Sorry?  So what you're saying is that it's fine for you to call my
> comments hearsay, but I'm not allowed to express a view on your comments.
> How arrogant of you.

Go ahead, pile it on.  I'm not falling into this trap.

Russell King - ARM Linux Oct. 12, 2012, 12:24 p.m. UTC | #19

On Fri, Oct 12, 2012 at 12:04:23PM +0200, Eric Dumazet wrote:
> On Fri, 2012-10-12 at 10:03 +0100, Russell King - ARM Linux wrote:
> 
> > No.  It is my understanding that various IP option processing can also
> > cause the alignment fault handler to be invoked, even when the packet is
> > properly aligned, and then there's jffs2/mtd which also relies upon
> > alignment faults being fixed up.
> 
> Oh well.
> 
> We normally make sure we dont have alignment faults on arches that dont
> have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS (or a non null NET_IP_ALIGN)
> 
> So if you find an offender, please report a bug, because I can guarantee
> you we will _fix_ it.

I think one change I will make to the ARM alignment fixup is to get it
to record the last PC where a misaligned kernel fault occurred, and
report it via our statistics procfs file.  That should allow us to
track down where some of these occur.

They aren't anywhere near regular though - looking at the statistics, my
firewall seems to do an average of around 2-3 a day, and a web server
around 7-8 a day.

Benjamin LaHaise Oct. 12, 2012, 2:22 p.m. UTC | #20

On Fri, Oct 12, 2012 at 02:08:12PM +0200, Eric Dumazet wrote:
> So yes, we built network stack with the prereq that IP headers are
> aligned, but unfortunately many developers use x86 which doesnt care, so
> its possible some bugs are added.

x86 does have an alignment check flag that can be set in the flags register.  
Somehow, I doubt anyone would be willing to walk through all the noise the 
faults would likely trigger.

		-ben

David Laight Oct. 12, 2012, 2:36 p.m. UTC | #21

> x86 does have an alignment check flag that can be set in the flags register.
> Somehow, I doubt anyone would be willing to walk through all the noise the
> faults would likely trigger.

Someone has tried to set that (in userspace) on NetBSD.
The fault reporting has been fixed, but really nothing
works since optimised parts of libc deliberately do
misaligned transfers.

	David

Eric Dumazet Oct. 12, 2012, 2:48 p.m. UTC | #22

On Fri, 2012-10-12 at 10:22 -0400, Benjamin LaHaise wrote:
> On Fri, Oct 12, 2012 at 02:08:12PM +0200, Eric Dumazet wrote:
> > So yes, we built network stack with the prereq that IP headers are
> > aligned, but unfortunately many developers use x86 which doesnt care, so
> > its possible some bugs are added.
> 
> x86 does have an alignment check flag that can be set in the flags register.  
> Somehow, I doubt anyone would be willing to walk through all the noise the 
> faults would likely trigger.

If this can be mapped to an event that can be used by perf tool, that
might be useful ?

Benjamin LaHaise Oct. 12, 2012, 3 p.m. UTC | #23

On Fri, Oct 12, 2012 at 04:48:20PM +0200, Eric Dumazet wrote:
> > Somehow, I doubt anyone would be willing to walk through all the noise the 
> > faults would likely trigger.
> 
> If this can be mapped to an event that can be used by perf tool, that
> might be useful ?

There are performance counters for the various different types of alignment 
faults supported by perf. Modern x86 makes the vast majority of unaligned 
accesses very low overhead -- the only ones that really hurt are those 
straddling different vm pages, but even those have little cost compared to 
obsolete microarchitectures (*cough* P4 *cough*).

		-ben

Ben Hutchings Oct. 12, 2012, 3:04 p.m. UTC | #24

On Fri, 2012-10-12 at 16:48 +0200, Eric Dumazet wrote:
> On Fri, 2012-10-12 at 10:22 -0400, Benjamin LaHaise wrote:
> > On Fri, Oct 12, 2012 at 02:08:12PM +0200, Eric Dumazet wrote:
> > > So yes, we built network stack with the prereq that IP headers are
> > > aligned, but unfortunately many developers use x86 which doesnt care, so
> > > its possible some bugs are added.
> > 
> > x86 does have an alignment check flag that can be set in the flags register.  
> > Somehow, I doubt anyone would be willing to walk through all the noise the 
> > faults would likely trigger.
> 
> If this can be mapped to an event that can be used by perf tool, that
> might be useful ?

AC was so useless that it has now been reallocated to use by SMAP
<https://lwn.net/Articles/517251/>.

Ben.

David Laight Oct. 12, 2012, 3:47 p.m. UTC | #25

> AC was so useless that it has now been reallocated to use by SMAP
> <https://lwn.net/Articles/517251/>.

That is a long time coming!
Wonder when it will appear in any cpus.

How am I going to get root access when I can't get the kernel
to execute code at address 0 any more :-)

Ben Hutchings Oct. 12, 2012, 4:13 p.m. UTC | #26

On Fri, 2012-10-12 at 16:47 +0100, David Laight wrote:
> > AC was so useless that it has now been reallocated to use by SMAP
> > <https://lwn.net/Articles/517251/>.
> 
> That is a long time coming!
> Wonder when it will appear in any cpus.
> 
> How am I going to get root access when I can't get the kernel
> to execute code at address 0 any more :-)

Moving further off the topic: that is supposed to be prevented by a
separate feature, SMEP, which I think is available in current Intel CPUs
(Ivy Bridge).  Also, unprivileged users are generally not permitted to
mmap() at address 0.

Ben.

alignment faults in 3.6

Commit Message

Comments

Patch