Message ID | 20121008171011.GA4625@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote: > > > > > > Hi, > > > > > > I have a problem that looks like that 64-bit registers (I think) are not > > > saved/restored correctly on a context switch. I've reduced it to the > > > following test case: > > > > > > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d) > > > v3.5 is also affected > > > - imx_v6_v7_defconfig > > > - arch/arm/boot/dts/imx53-evk.dts > > > > > > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon > > > -O2". > > > ------------------------>8-------------------------------- > > > #include <inttypes.h> > > > #include <assert.h> > > > > > > volatile int x = 2; > > > volatile int64_t y = 2; > > > > > > int main() { > > > volatile int a = 0; > > > volatile int64_t b = 0; > > > while (1) { > > > a = (a + x) % (1 << 30); > > > b = (b + y) % (1 << 30); > > > assert(a == b); > > > } > > > } > > > ------------------------>8-------------------------------- > > > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or > > > something like that. Just for my curiosity, can you let me know what compiler version you're using and the disassembly? I'm actually a little surprised to see NEON code being generated here, though the patch below fixes what definitely looks like a context switch bug for combined v6+v7 kernels... Cheers ---Dave > > > > Hmm. > > > > Can you send me the output of 'grep ^Features /proc/cpuinfo' please? > > You may also like to try the patch below... it will probably fix your > problem. > > diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h > index a7aadbd..6a6f1e4 100644 > --- a/arch/arm/include/asm/vfpmacros.h > +++ b/arch/arm/include/asm/vfpmacros.h > @@ -28,7 +28,7 @@ > ldr \tmp, =elf_hwcap @ may not have MVFR regs > ldr \tmp, [\tmp, #0] > tst \tmp, #HWCAP_VFPv3D16 > - ldceq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > + ldceql p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > addne \base, \base, #32*4 @ step over unused register space > #else > VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 > @@ -52,7 +52,7 @@ > ldr \tmp, =elf_hwcap @ may not have MVFR regs > ldr \tmp, [\tmp, #0] > tst \tmp, #HWCAP_VFPv3D16 > - stceq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > + stceql p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > addne \base, \base, #32*4 @ step over unused register space > #else > VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Mon, Oct 08, 2012 at 06:50:52PM +0100, Dave Martin wrote: > Just for my curiosity, can you let me know what compiler version you're > using and the disassembly? I'm actually a little surprised to see > NEON code being generated here, though the patch below fixes what > definitely looks like a context switch bug for combined v6+v7 kernels... Well, one such compiler is gcc 4.6.3 in Ubuntu Precise LTS.
On Mon, Oct 08, 2012 at 07:03:37PM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 08, 2012 at 06:50:52PM +0100, Dave Martin wrote: > > Just for my curiosity, can you let me know what compiler version you're > > using and the disassembly? I'm actually a little surprised to see > > NEON code being generated here, though the patch below fixes what > > definitely looks like a context switch bug for combined v6+v7 kernels... > > Well, one such compiler is gcc 4.6.3 in Ubuntu Precise LTS. Hmmm, I really need to upgrade... Cheers ---Dave
On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote: > > > I have a problem that looks like that 64-bit registers (I think) are not > > > saved/restored correctly on a context switch. I've reduced it to the > > > following test case: > > > > > > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d) > > > v3.5 is also affected > > > - imx_v6_v7_defconfig > > > - arch/arm/boot/dts/imx53-evk.dts > > > > > > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon > > > -O2". > > > ------------------------>8-------------------------------- > > > #include <inttypes.h> > > > #include <assert.h> > > > > > > volatile int x = 2; > > > volatile int64_t y = 2; > > > > > > int main() { > > > volatile int a = 0; > > > volatile int64_t b = 0; > > > while (1) { > > > a = (a + x) % (1 << 30); > > > b = (b + y) % (1 << 30); > > > assert(a == b); > > > } > > > } > > > ------------------------>8-------------------------------- > > > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or > > > something like that. > > > > Hmm. > > > > Can you send me the output of 'grep ^Features /proc/cpuinfo' please? Features : swp half thumb fastmult vfp edsp neon vfpv3 tls > You may also like to try the patch below... it will probably fix your > problem. This does indeed fix my problem. Is this a real fix or just a test to narrow down the issue? I don't really understand what it does. If it is a real fix, Tested-By: Michael Olbrich <m.olbrich@pengutronix.de> Regards, Michael > diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h > index a7aadbd..6a6f1e4 100644 > --- a/arch/arm/include/asm/vfpmacros.h > +++ b/arch/arm/include/asm/vfpmacros.h > @@ -28,7 +28,7 @@ > ldr \tmp, =elf_hwcap @ may not have MVFR regs > ldr \tmp, [\tmp, #0] > tst \tmp, #HWCAP_VFPv3D16 > - ldceq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > + ldceql p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > addne \base, \base, #32*4 @ step over unused register space > #else > VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 > @@ -52,7 +52,7 @@ > ldr \tmp, =elf_hwcap @ may not have MVFR regs > ldr \tmp, [\tmp, #0] > tst \tmp, #HWCAP_VFPv3D16 > - stceq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > + stceql p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > addne \base, \base, #32*4 @ step over unused register space > #else > VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
On Mon, Oct 08, 2012 at 06:50:52PM +0100, Dave Martin wrote: > On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote: > > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote: > > > > I have a problem that looks like that 64-bit registers (I think) are not > > > > saved/restored correctly on a context switch. I've reduced it to the > > > > following test case: > > > > > > > > - Latest Linux mainline kernel (v3.6-8559-ge9eca4d) > > > > v3.5 is also affected > > > > - imx_v6_v7_defconfig > > > > - arch/arm/boot/dts/imx53-evk.dts > > > > > > > > The following test program is compiled with "-mcpu=cortex-a8 -mfpu=neon > > > > -O2". > > > > ------------------------>8-------------------------------- > > > > #include <inttypes.h> > > > > #include <assert.h> > > > > > > > > volatile int x = 2; > > > > volatile int64_t y = 2; > > > > > > > > int main() { > > > > volatile int a = 0; > > > > volatile int64_t b = 0; > > > > while (1) { > > > > a = (a + x) % (1 << 30); > > > > b = (b + y) % (1 << 30); > > > > assert(a == b); > > > > } > > > > } > > > > ------------------------>8-------------------------------- > > > > The ".. (b + y) .." should result in "vadd.i64 d19, d18, d16" or > > > > something like that. > > Just for my curiosity, can you let me know what compiler version you're > using and the disassembly? I'm actually a little surprised to see > NEON code being generated here, Im using oselas.toolchain which includes a gcc-linaro-4.6-2011.11. It generates quite a bit of NEON code actually. I originally tracked down the issue to a commit in libxcb: "xcb_in: Use 64-bit sequence numbers internally everywhere.". The compiler generated NEON code to calculate sequence numbers... Regards, Michael
Hello, On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote: > You may also like to try the patch below... it will probably fix your > problem. > > diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h > index a7aadbd..6a6f1e4 100644 > --- a/arch/arm/include/asm/vfpmacros.h > +++ b/arch/arm/include/asm/vfpmacros.h > @@ -28,7 +28,7 @@ > ldr \tmp, =elf_hwcap @ may not have MVFR regs > ldr \tmp, [\tmp, #0] > tst \tmp, #HWCAP_VFPv3D16 > - ldceq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > + ldceql p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > addne \base, \base, #32*4 @ step over unused register space > #else > VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 > @@ -52,7 +52,7 @@ > ldr \tmp, =elf_hwcap @ may not have MVFR regs > ldr \tmp, [\tmp, #0] > tst \tmp, #HWCAP_VFPv3D16 > - stceq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > + stceql p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} According to the ARMARM for v7-A and v7-R (ARM DDI 0406B errata 2010 Q2) the syntax is "STC{L}<c> ...", with a note "The pre-UAL syntax STC<c>L is equivalent to STCL<c>.". Maybe the UAL-syntax should better be used? Best regards Uwe
On Tue, Oct 09, 2012 at 11:02:37AM +0200, Uwe Kleine-König wrote: > Hello, > > On Mon, Oct 08, 2012 at 06:10:11PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 08, 2012 at 06:01:24PM +0100, Russell King - ARM Linux wrote: > > > On Mon, Oct 08, 2012 at 06:08:41PM +0200, Michael Olbrich wrote: > > You may also like to try the patch below... it will probably fix your > > problem. > > > > diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h > > index a7aadbd..6a6f1e4 100644 > > --- a/arch/arm/include/asm/vfpmacros.h > > +++ b/arch/arm/include/asm/vfpmacros.h > > @@ -28,7 +28,7 @@ > > ldr \tmp, =elf_hwcap @ may not have MVFR regs > > ldr \tmp, [\tmp, #0] > > tst \tmp, #HWCAP_VFPv3D16 > > - ldceq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > > + ldceql p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} > > addne \base, \base, #32*4 @ step over unused register space > > #else > > VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 > > @@ -52,7 +52,7 @@ > > ldr \tmp, =elf_hwcap @ may not have MVFR regs > > ldr \tmp, [\tmp, #0] > > tst \tmp, #HWCAP_VFPv3D16 > > - stceq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > > + stceql p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} > According to the ARMARM for v7-A and v7-R (ARM DDI 0406B errata 2010 Q2) > the syntax is "STC{L}<c> ...", with a note "The pre-UAL syntax STC<c>L > is equivalent to STCL<c>.". Maybe the UAL-syntax should better be used? The older stc<c>l type of syntax is used all over the place. Code which might need to be built by tools which pre-date unified syntax needs to use the old syntax, so it is in common usage in the kernel in general. This code presumably only gets built by new-enough tools for the unified syntax to be usable, but support for the old syntax isn't going to disappear from the tools any time soon, AFAIK. The bug here was that the presence or absence of the "L" suffix is used to encode bit 4 of the starting d-register number for these instructions. The comment says d16-d31, but the instructions as written are actually saving and restoring d0-d15...which is not so helpful since we already handled those registers in the neighbouring code. We could avoid this kind of bug by writing those VFP instructions using the unified syntax native mnemonics (vstmia, vldmia -- since d16- d31 never existed while the old fldmiad/fstmiad mnemonics were in use, and the assembler doesn't accept them), but it is tricky to change the assembler's notion of target CPU and FPU on-the-fly inside a header or macro without messing things up. Cheers ---Dave > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König | > Industrial Linux Solutions | http://www.pengutronix.de/ | > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h index a7aadbd..6a6f1e4 100644 --- a/arch/arm/include/asm/vfpmacros.h +++ b/arch/arm/include/asm/vfpmacros.h @@ -28,7 +28,7 @@ ldr \tmp, =elf_hwcap @ may not have MVFR regs ldr \tmp, [\tmp, #0] tst \tmp, #HWCAP_VFPv3D16 - ldceq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} + ldceql p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} addne \base, \base, #32*4 @ step over unused register space #else VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 @@ -52,7 +52,7 @@ ldr \tmp, =elf_hwcap @ may not have MVFR regs ldr \tmp, [\tmp, #0] tst \tmp, #HWCAP_VFPv3D16 - stceq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} + stceql p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} addne \base, \base, #32*4 @ step over unused register space #else VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0