Problem with 64-bit registers on i.MX53

Message ID	20121008171011.GA4625@n2100.arm.linux.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Mon, 8 Oct 2012 18:10:11 +0100 From: Russell King - ARM Linux <linux@arm.linux.org.uk> To: Michael Olbrich <m.olbrich@pengutronix.de> Subject: Re: Problem with 64-bit registers on i.MX53 Message-ID: <20121008171011.GA4625@n2100.arm.linux.org.uk> References: <20121008160841.GM19651@pengutronix.de> <20121008170124.GZ4625@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20121008170124.GZ4625@n2100.arm.linux.org.uk> User-Agent: Mutt/1.5.19 (2009-01-05) summary: Content analysis details: (-1.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS Cc: linux-arm-kernel@lists.infradead.org, kernel@pengutronix.de Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Russell King - ARM Linux Oct. 8, 2012, 5:10 p.m. UTC

On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote:
> > 
> > Hi,
> > 
> > I have a problem that looks like that 64-bit registers (I think) are not
> > saved/restored correctly on a context switch. I've reduced it to the
> > following test case:
> > 
> > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d)
> >   v3.5 is also affected
> > - imx_v6_v7_defconfig
> > - arch/arm/boot/dts/imx53-evk.dts
> > 
> > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon
> > -O2".
> > ------------------------>8--------------------------------
> > #include <inttypes.h>
> > #include <assert.h>
> > 
> > volatile int x = 2;
> > volatile int64_t y = 2;
> > 
> > int main() {
> > 	volatile int a = 0;
> > 	volatile int64_t b = 0;
> > 	while (1) {
> > 		a = (a + x) % (1 << 30);
> > 		b = (b + y) % (1 << 30);
> > 		assert(a == b);
> > 	}
> > }
> > ------------------------>8--------------------------------
> > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or
> > something like that.
> 
> Hmm.
> 
> Can you send me the output of 'grep ^Features /proc/cpuinfo' please?

You may also like to try the patch below... it will probably fix your
problem.

tip-bot for Dave Martin Oct. 8, 2012, 5:50 p.m. UTC | #1

On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote:
> > > 
> > > Hi,
> > > 
> > > I have a problem that looks like that 64-bit registers (I think) are not
> > > saved/restored correctly on a context switch. I've reduced it to the
> > > following test case:
> > > 
> > > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d)
> > >   v3.5 is also affected
> > > - imx_v6_v7_defconfig
> > > - arch/arm/boot/dts/imx53-evk.dts
> > > 
> > > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon
> > > -O2".
> > > ------------------------>8--------------------------------
> > > #include <inttypes.h>
> > > #include <assert.h>
> > > 
> > > volatile int x = 2;
> > > volatile int64_t y = 2;
> > > 
> > > int main() {
> > > 	volatile int a = 0;
> > > 	volatile int64_t b = 0;
> > > 	while (1) {
> > > 		a = (a + x) % (1 << 30);
> > > 		b = (b + y) % (1 << 30);
> > > 		assert(a == b);
> > > 	}
> > > }
> > > ------------------------>8--------------------------------
> > > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or
> > > something like that.

Just for my curiosity, can you let me know what compiler version you're
using and the disassembly?  I'm actually a little surprised to see
NEON code being generated here, though the patch below fixes what
definitely looks like a context switch bug for combined v6+v7 kernels...

Cheers
---Dave

> > 
> > Hmm.
> > 
> > Can you send me the output of 'grep ^Features /proc/cpuinfo' please?
> 
> You may also like to try the patch below... it will probably fix your
> problem.
> 
> diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
> index a7aadbd..6a6f1e4 100644
> --- a/arch/arm/include/asm/vfpmacros.h
> +++ b/arch/arm/include/asm/vfpmacros.h
> @@ -28,7 +28,7 @@
>  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
>  	ldr	\tmp, [\tmp, #0]
>  	tst	\tmp, #HWCAP_VFPv3D16
> -	ldceq	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
> +	ldceql	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
>  	addne	\base, \base, #32*4		    @ step over unused register space
>  #else
>  	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
> @@ -52,7 +52,7 @@
>  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
>  	ldr	\tmp, [\tmp, #0]
>  	tst	\tmp, #HWCAP_VFPv3D16
> -	stceq	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
> +	stceql	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
>  	addne	\base, \base, #32*4		    @ step over unused register space
>  #else
>  	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Russell King - ARM Linux Oct. 8, 2012, 6:03 p.m. UTC | #2

On Mon, Oct 08, 2012 at 06:50:52PM +0100, Dave Martin wrote:
> Just for my curiosity, can you let me know what compiler version you're
> using and the disassembly?  I'm actually a little surprised to see
> NEON code being generated here, though the patch below fixes what
> definitely looks like a context switch bug for combined v6+v7 kernels...

Well, one such compiler is gcc 4.6.3 in Ubuntu Precise LTS.

tip-bot for Dave Martin Oct. 8, 2012, 6:04 p.m. UTC | #3

On Mon, Oct 08, 2012 at 07:03:37PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 08, 2012 at 06:50:52PM +0100, Dave Martin wrote:
> > Just for my curiosity, can you let me know what compiler version you're
> > using and the disassembly?  I'm actually a little surprised to see
> > NEON code being generated here, though the patch below fixes what
> > definitely looks like a context switch bug for combined v6+v7 kernels...
> 
> Well, one such compiler is gcc 4.6.3 in Ubuntu Precise LTS.

Hmmm, I really need to upgrade...

Cheers
---Dave

Michael Olbrich Oct. 9, 2012, 8:52 a.m. UTC | #4

On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote:
> > > I have a problem that looks like that 64-bit registers (I think) are not
> > > saved/restored correctly on a context switch. I've reduced it to the
> > > following test case:
> > > 
> > > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d)
> > >   v3.5 is also affected
> > > - imx_v6_v7_defconfig
> > > - arch/arm/boot/dts/imx53-evk.dts
> > > 
> > > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon
> > > -O2".
> > > ------------------------>8--------------------------------
> > > #include <inttypes.h>
> > > #include <assert.h>
> > > 
> > > volatile int x = 2;
> > > volatile int64_t y = 2;
> > > 
> > > int main() {
> > > 	volatile int a = 0;
> > > 	volatile int64_t b = 0;
> > > 	while (1) {
> > > 		a = (a + x) % (1 << 30);
> > > 		b = (b + y) % (1 << 30);
> > > 		assert(a == b);
> > > 	}
> > > }
> > > ------------------------>8--------------------------------
> > > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or
> > > something like that.
> > 
> > Hmm.
> > 
> > Can you send me the output of 'grep ^Features /proc/cpuinfo' please?

Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls 

> You may also like to try the patch below... it will probably fix your
> problem.

This does indeed fix my problem. Is this a real fix or just a test to
narrow down the issue? I don't really understand what it does.
If it is a real fix,

Tested-By: Michael Olbrich <m.olbrich@pengutronix.de>

Regards,
Michael

> diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
> index a7aadbd..6a6f1e4 100644
> --- a/arch/arm/include/asm/vfpmacros.h
> +++ b/arch/arm/include/asm/vfpmacros.h
> @@ -28,7 +28,7 @@
>  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
>  	ldr	\tmp, [\tmp, #0]
>  	tst	\tmp, #HWCAP_VFPv3D16
> -	ldceq	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
> +	ldceql	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
>  	addne	\base, \base, #32*4		    @ step over unused register space
>  #else
>  	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
> @@ -52,7 +52,7 @@
>  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
>  	ldr	\tmp, [\tmp, #0]
>  	tst	\tmp, #HWCAP_VFPv3D16
> -	stceq	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
> +	stceql	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
>  	addne	\base, \base, #32*4		    @ step over unused register space
>  #else
>  	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0

Michael Olbrich Oct. 9, 2012, 9:02 a.m. UTC | #5

On Mon, Oct 08, 2012 at 06:50:52PM +0100, Dave Martin wrote:
> On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote:
> > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote:
> > > > I have a problem that looks like that 64-bit registers (I think) are not
> > > > saved/restored correctly on a context switch. I've reduced it to the
> > > > following test case:
> > > > 
> > > > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d)
> > > >   v3.5 is also affected
> > > > - imx_v6_v7_defconfig
> > > > - arch/arm/boot/dts/imx53-evk.dts
> > > > 
> > > > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon
> > > > -O2".
> > > > ------------------------>8--------------------------------
> > > > #include <inttypes.h>
> > > > #include <assert.h>
> > > > 
> > > > volatile int x = 2;
> > > > volatile int64_t y = 2;
> > > > 
> > > > int main() {
> > > > 	volatile int a = 0;
> > > > 	volatile int64_t b = 0;
> > > > 	while (1) {
> > > > 		a = (a + x) % (1 << 30);
> > > > 		b = (b + y) % (1 << 30);
> > > > 		assert(a == b);
> > > > 	}
> > > > }
> > > > ------------------------>8--------------------------------
> > > > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or
> > > > something like that.
> 
> Just for my curiosity, can you let me know what compiler version you're
> using and the disassembly?  I'm actually a little surprised to see
> NEON code being generated here,

Im using oselas.toolchain which includes a gcc-linaro-4.6-2011.11. It
generates quite a bit of NEON code actually. I originally tracked down the
issue to a commit in libxcb: "xcb_in: Use 64-bit sequence numbers
internally everywhere.". The compiler generated NEON code to calculate
sequence numbers...

Regards,
Michael

Uwe Kleine-König Oct. 9, 2012, 9:02 a.m. UTC | #6

Hello,

On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote:
> You may also like to try the patch below... it will probably fix your
> problem.
> 
> diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
> index a7aadbd..6a6f1e4 100644
> --- a/arch/arm/include/asm/vfpmacros.h
> +++ b/arch/arm/include/asm/vfpmacros.h
> @@ -28,7 +28,7 @@
>  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
>  	ldr	\tmp, [\tmp, #0]
>  	tst	\tmp, #HWCAP_VFPv3D16
> -	ldceq	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
> +	ldceql	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
>  	addne	\base, \base, #32*4		    @ step over unused register space
>  #else
>  	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
> @@ -52,7 +52,7 @@
>  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
>  	ldr	\tmp, [\tmp, #0]
>  	tst	\tmp, #HWCAP_VFPv3D16
> -	stceq	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
> +	stceql	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
According to the ARMARM for v7-A and v7-R (ARM DDI 0406B errata 2010 Q2)
the syntax is "STC{L}<c> ...", with a note "The pre-UAL syntax STC<c>L
is equivalent to STCL<c>.". Maybe the UAL-syntax should better be used?

Best regards
Uwe

tip-bot for Dave Martin Oct. 9, 2012, 2:05 p.m. UTC | #7

On Tue, Oct 09, 2012 at 11:02:37AM +0200, Uwe Kleine-König wrote:
> Hello,
> 
> On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote:
> > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote:
> > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote:
> > You may also like to try the patch below... it will probably fix your
> > problem.
> > 
> > diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
> > index a7aadbd..6a6f1e4 100644
> > --- a/arch/arm/include/asm/vfpmacros.h
> > +++ b/arch/arm/include/asm/vfpmacros.h
> > @@ -28,7 +28,7 @@
> >  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
> >  	ldr	\tmp, [\tmp, #0]
> >  	tst	\tmp, #HWCAP_VFPv3D16
> > -	ldceq	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
> > +	ldceql	p11, cr0, [\base],#32*4		    @ FLDMIAD \base!, {d16-d31}
> >  	addne	\base, \base, #32*4		    @ step over unused register space
> >  #else
> >  	VFPFMRX	\tmp, MVFR0			    @ Media and VFP Feature Register 0
> > @@ -52,7 +52,7 @@
> >  	ldr	\tmp, =elf_hwcap		    @ may not have MVFR regs
> >  	ldr	\tmp, [\tmp, #0]
> >  	tst	\tmp, #HWCAP_VFPv3D16
> > -	stceq	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
> > +	stceql	p11, cr0, [\base],#32*4		    @ FSTMIAD \base!, {d16-d31}
> According to the ARMARM for v7-A and v7-R (ARM DDI 0406B errata 2010 Q2)
> the syntax is "STC{L}<c> ...", with a note "The pre-UAL syntax STC<c>L
> is equivalent to STCL<c>.". Maybe the UAL-syntax should better be used?

The older stc<c>l type of syntax is used all over the place.  Code which
might need to be built by tools which pre-date unified syntax needs to
use the old syntax, so it is in common usage in the kernel in general.

This code presumably only gets built by new-enough tools for the unified
syntax to be usable, but support for the old syntax isn't going to
disappear from the tools any time soon, AFAIK.

The bug here was that the presence or absence of the "L" suffix is used
to encode bit 4 of the starting d-register number for these instructions.
The comment says d16-d31, but the instructions as written are actually
saving and restoring d0-d15...which is not so helpful since we already
handled those registers in the neighbouring code.

We could avoid this kind of bug by writing those VFP instructions
using the unified syntax native mnemonics (vstmia, vldmia -- since d16-
d31 never existed while the old fldmiad/fstmiad mnemonics were in use,
and the assembler doesn't accept them), but it is tricky to change the
assembler's notion of target CPU and FPU on-the-fly inside a header or
macro without messing things up.

Cheers
---Dave

> 
> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K.                           | Uwe Kleine-König            |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Problem with 64-bit registers on i.MX53

Commit Message

Comments

Patch