Message ID | 1361520485-30269-1-git-send-email-sboyd@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi guys, On Fri, Feb 22, 2013 at 08:08:05AM +0000, Stephen Boyd wrote: > From: Steve Muckle <smuckle@codeaurora.org> > > The subarchitecture field in the fpsid register is 7 bits wide. > The topmost bit is used to designate that the subarchitecture > designer is not ARM. We use this field to determine which VFP > version is supported by the CPU. Since the topmost bit is masked > off we detect non-ARM subarchitectures as supporting only > HWCAP_VFP and not HWCAP_VFPv3 as it should be for Qualcomm's > processors. I'm struggling to see why this has anything to do with the hwcaps being set incorrectly. What value do you have in fpsid? As far as I can tell, the subarchitecture bits 6:0 should start at 0x40 for you, right? I can see cases for changing this code, I just don't see why it would go wrong in the case you're describing. Cheers, Will
On 2/22/2013 10:27 AM, Will Deacon wrote: > Hi guys, > > On Fri, Feb 22, 2013 at 08:08:05AM +0000, Stephen Boyd wrote: >> From: Steve Muckle <smuckle@codeaurora.org> >> >> The subarchitecture field in the fpsid register is 7 bits wide. >> The topmost bit is used to designate that the subarchitecture >> designer is not ARM. We use this field to determine which VFP >> version is supported by the CPU. Since the topmost bit is masked >> off we detect non-ARM subarchitectures as supporting only >> HWCAP_VFP and not HWCAP_VFPv3 as it should be for Qualcomm's >> processors. > I'm struggling to see why this has anything to do with the hwcaps being set > incorrectly. What value do you have in fpsid? As far as I can tell, the > subarchitecture bits 6:0 should start at 0x40 for you, right? Yes it does. > > I can see cases for changing this code, I just don't see why it would go > wrong in the case you're describing. VFP_arch = (vfpsid & FPSID_ARCH_MASK) >> FPSID_ARCH_BIT; causes VFP_arch to be equal to 0 because 0x40 & 0xf == 0. and then a little bit later we have if (VFP_arch >= 2) { elf_hwcap |= HWCAP_VFPv3; The branch is not taken so we never set VFPv3.
On Fri, Feb 22, 2013 at 11:46:18PM +0000, Stephen Boyd wrote: > On 2/22/2013 10:27 AM, Will Deacon wrote: > > Hi guys, > > > > On Fri, Feb 22, 2013 at 08:08:05AM +0000, Stephen Boyd wrote: > >> From: Steve Muckle <smuckle@codeaurora.org> > >> > >> The subarchitecture field in the fpsid register is 7 bits wide. > >> The topmost bit is used to designate that the subarchitecture > >> designer is not ARM. We use this field to determine which VFP > >> version is supported by the CPU. Since the topmost bit is masked > >> off we detect non-ARM subarchitectures as supporting only > >> HWCAP_VFP and not HWCAP_VFPv3 as it should be for Qualcomm's > >> processors. > > I'm struggling to see why this has anything to do with the hwcaps being set > > incorrectly. What value do you have in fpsid? As far as I can tell, the > > subarchitecture bits 6:0 should start at 0x40 for you, right? > > Yes it does. Ok, good. Could you share the different subarchitecture encodings that you have please? (assumedly some/all of these are compatible with a variant of VFP). > > > > I can see cases for changing this code, I just don't see why it would go > > wrong in the case you're describing. > > VFP_arch = (vfpsid & FPSID_ARCH_MASK) >> FPSID_ARCH_BIT; > > causes VFP_arch to be equal to 0 because 0x40 & 0xf == 0. > > and then a little bit later we have > > if (VFP_arch >= 2) { > elf_hwcap |= HWCAP_VFPv3; > > > The branch is not taken so we never set VFPv3. Ah, that's what I feared: the low bits are zero yet you are compatible with VFPv3. That's fine, but the proposed fix feels like a kludge; the only reason we'd choose on VFPv3 is because the implementor is not ARM, which may not hold true for other vendors. I think it would be better if we translated vendor-specific subarchitectures that are compatible with VFPvN into the corresponding architecture number instead. This would also allow us to add extra hwcaps for extensions other than VFP. Cheers, Will
On Fri, Feb 22, 2013 at 12:08:05AM -0800, Stephen Boyd wrote: > From: Steve Muckle <smuckle@codeaurora.org> > > The subarchitecture field in the fpsid register is 7 bits wide. > The topmost bit is used to designate that the subarchitecture > designer is not ARM. We use this field to determine which VFP > version is supported by the CPU. Since the topmost bit is masked > off we detect non-ARM subarchitectures as supporting only > HWCAP_VFP and not HWCAP_VFPv3 as it should be for Qualcomm's > processors. > > Use the proper width for the mask so that we report the correct > elf_hwcap of non-ARM designed processors. This is a *big* can of worms. How this register is defined depends on what document you look at: DDI0238A/B/C: 0x41011090: (VFP9) 23: 0 = hw, 22:21: 00 = FSTMX/FLDMX format is standard format 1, 20: 0 = Single-precision and double-precision data supported 19:16: 0001 = architecture version DDI0133A: 0x410000A0: (VFP10) 23:16: 0 = architecture version DDI0274F: 0x410120B3: (VFP11) 23: 0 = hw, 22:21: 00 = FSTMX/FLDMX format is standard format 1, 20: 0 = Single-precision and double-precision data supported 19:16: 0001 = architecture version DDI0406A/B/C: 23: hw, 22:16: architecture version Now, DDI0238C was updated on 4th August 2010. The ARM ARM revs A and B were published before this version - only ARM ARM rev C post-dates this. The ARM ARM defines bit 22 to be zero for any ARM produced VFP hardware, and must be one for non-ARM designed VFP hardware. Now, the ARM ARM gives these as possible values for bits 22:16: 0000000 - VFPv1 0000001 - VFPv2 0000010 - VFPv3 with common VFPv2 0000011 - VFPv3 h/w only 0000100 - VFPv3 with common VFPv3 We _do_ test bit 20 to determine if there is support for 32 double precision registers, and if this bit is set, we ignore the VFP. From the description in the ARM ARM, this is wrong if the implementer is not ARM. If the information above _does_ correspond accurately with all ARM produced hardware (VFP9S, VFP10, VFP11, plus any other variants in ARMs CPUs) then they will always have bits 22:20 as zero. This means we can extend the architecture field to include those bits. We should also arrange for VFP support to fail if any of these bits are set. How do we handle non-ARM produced VFP hardware? I've no idea - the stinking pile of dogs poo that ARM Ltd have created here means that we're going to have to decode this register in a per-designer way - and if we don't have support for your designer, then the support code _must_ fail to initialize and report _any_ VFP support. This is going to be horrendously disruptive, because _I_ don't know what values of FPSID register can be found out in the wild. I know that the Dove chips have a Marvell implementer value (0x56), but these report a subarch value of 2 (which seems to correspond with ARM Ltd's implementation of architecture coding.) Is it fully compatible? Don't know. I think the only way we're going to find out is to end up with VFP disabled on most CPUs and wait for people to stream reports in of their hardware breaking. Again, way to go ARM Ltd! What a truely excellent situation this is!
On Mon, Feb 25, 2013 at 05:25:45PM +0000, Russell King - ARM Linux wrote: > On Fri, Feb 22, 2013 at 12:08:05AM -0800, Stephen Boyd wrote: > > From: Steve Muckle <smuckle@codeaurora.org> > > > > The subarchitecture field in the fpsid register is 7 bits wide. > > The topmost bit is used to designate that the subarchitecture > > designer is not ARM. We use this field to determine which VFP > > version is supported by the CPU. Since the topmost bit is masked > > off we detect non-ARM subarchitectures as supporting only > > HWCAP_VFP and not HWCAP_VFPv3 as it should be for Qualcomm's > > processors. > > > > Use the proper width for the mask so that we report the correct > > elf_hwcap of non-ARM designed processors. > > This is a *big* can of worms. How this register is defined depends on > what document you look at: This can of worms is getting bigger. We have more problems with our handling of the different VFP versions, specifically the handling of the EX=0 DEX=0 case. VFP common subarch 3 defines the EX=0, DEX=0 encoding to mean one of the following conditions have been met: 1. an unallocated VFP instruction was encountered. In other words, the VFP was the target of the co-processor instruction, but the instruction is not a known VFP instruction encoding. This should raise an undefined instruction exception. 2. an allocated VFP instruction was encountered, but not handled in hardware. In other words, the instruction is a valid VFP instruction, but the hardware has opted not to implement this instruction and wants software to emulate it instead. (Note: this can also be raised as EX=0, DEX=1 - implementation defined!) Now, VFP common subarch 2 removes condition (2) from this. VFP common subarch 1 further complicates this by changing the behaviour when IXE=1 (these are always 'synchronous' exceptions. Now, what does our code do? Well, the first area to look at is the assembly: look_for_VFP_exceptions: @ Check for synchronous or asynchronous exception tst r1, #FPEXC_EX | FPEXC_DEX bne process_exception @ On some implementations of the VFP subarch 1, setting FPSCR.IXE @ causes all the CDP instructions to be bounced synchronously without @ setting the FPEXC.EX bit VFPFMRX r5, FPSCR tst r5, #FPSCR_IXE bne process_exception @ Fall into hand on to next handler - appropriate coproc instr @ not recognised by VFP So, if EX or DEX is set, _or_ IXE is set, we pass control to VFP_bounce. This is problematical. (a) condition (2) above isn't correctly handled for common subarch v3 - it is always treated as an undefined instruction, and will result in a SIGILL being delivered. (b) if IXE is set, we _always_ treat the instruction as being defined, which means we will never raise a SIGILL for the faulting instruction, even if it is an undefined VFP instruction. Instead, they will receive a SIGFPE and the kernel will dump the entire VFP state into the kernel message log - eg: VFP: Error: unhandled bounce VFP: EXC 0x40000000 SCR 0x00001000 INST 0xec410b30 VFP: s 0: 0x00000000 s 1: 0x00000000 VFP: s 2: 0x00000000 s 3: 0x00000000 VFP: s 4: 0x00000000 s 5: 0x00000000 VFP: s 6: 0x00000000 s 7: 0x00000000 VFP: s 8: 0x00000000 s 9: 0x00000000 VFP: s10: 0x00000000 s11: 0x00000000 VFP: s12: 0x00000000 s13: 0x00000000 VFP: s14: 0x00000000 s15: 0x00000000 VFP: s16: 0x00000000 s17: 0x00000000 VFP: s18: 0x00000000 s19: 0x00000000 VFP: s20: 0x00000000 s21: 0x00000000 VFP: s22: 0x00000000 s23: 0x00000000 VFP: s24: 0x00000000 s25: 0x00000000 VFP: s26: 0x00000000 s27: 0x00000000 VFP: s28: 0x00000000 s29: 0x00000000 VFP: s30: 0x00000000 s31: 0x00000000 (Yes, I've just proven this on Marvell Dove.) Now, (a) is just bad behaviour - as we haven't had any reports of this yet, I suspect that no one has implemented VFP hardware with this behaviour yet. However, (b) is something of a problem: consider a userspace program which ignores SIGFPE signals, sets IXE, and executes an undefined instruction in a tight loop. Watch your kernel message log spew an endless stream of VFP data dumps... And now we come to the final niggle. As you can see from the above, to be able to handle the various VFP exceptions correctly, it is required to know which of the common subarchitecture versions has been implemented in the hardware. These common subarchitectures are entirely optional, and are not even guaranteed to be implemented this way even on ARM hardware. And now go back and look at my preceding mail on the implementer specific decoding of the "subarch" field in the FPSID register... So, what can we do? We _can_ fix some of these by decoding the subarch field for ARM implementations, and having the exception handling code decode these cases in the appropriate common subarchitecture way. What do we do for non-ARM implementations? For Marvell Dove, what little information I have says that it is a "VFPv2 architecture" implementation to the ARM ARM. But Marvell Dove is ARMv7, and the ARM ARM says VFPv2 is not permitted on ARMv7 CPUs. Also, it reports: VFP support v0.3: implementor 56 architecture 2 part 20 variant 9 rev 5 this decodes as "VFPv3 architecture, or later, with Common VFP subarchitecture v2." which is also contary to the Marvell statement. Given that it does appear to work without modifications, I'm willing to bet that it really is VFPv3. For others? That's a very good question to which I don't have any answer: if they don't implement the common subarchitecture, then the decoding of FPEXC, except for the EX and EN bits, is implementation defined. What joy.
On 02/25/13 03:18, Will Deacon wrote: > On Fri, Feb 22, 2013 at 11:46:18PM +0000, Stephen Boyd wrote: >> On 2/22/2013 10:27 AM, Will Deacon wrote: >>> What value do you have in fpsid? As far as I can tell, the >>> subarchitecture bits 6:0 should start at 0x40 for you, right? >> Yes it does. > Ok, good. Could you share the different subarchitecture encodings that you > have please? (assumedly some/all of these are compatible with a variant of > VFP). Definitely all Krait processors have 0x40 for the subarchitecture encoding. I need to check our Scorpions but I'm fairly certain they also have 0x40. > >>> I can see cases for changing this code, I just don't see why it would go >>> wrong in the case you're describing. >> VFP_arch = (vfpsid & FPSID_ARCH_MASK) >> FPSID_ARCH_BIT; >> >> causes VFP_arch to be equal to 0 because 0x40 & 0xf == 0. >> >> and then a little bit later we have >> >> if (VFP_arch >= 2) { >> elf_hwcap |= HWCAP_VFPv3; >> >> >> The branch is not taken so we never set VFPv3. > Ah, that's what I feared: the low bits are zero yet you are compatible with > VFPv3. That's fine, but the proposed fix feels like a kludge; the only reason > we'd choose on VFPv3 is because the implementor is not ARM, which may not hold > true for other vendors. I think it would be better if we translated > vendor-specific subarchitectures that are compatible with VFPvN into the > corresponding architecture number instead. This would also allow us to add > extra hwcaps for extensions other than VFP. Ok. We should be able to make VFP_arch into 0x4 if the implementer is 0x51 and the subarch bits are 0x40.
On Mon, Feb 25, 2013 at 07:01:11PM -0800, Stephen Boyd wrote: > On 02/25/13 03:18, Will Deacon wrote: > > On Fri, Feb 22, 2013 at 11:46:18PM +0000, Stephen Boyd wrote: > >> On 2/22/2013 10:27 AM, Will Deacon wrote: > >>> What value do you have in fpsid? As far as I can tell, the > >>> subarchitecture bits 6:0 should start at 0x40 for you, right? > >> Yes it does. > > Ok, good. Could you share the different subarchitecture encodings that you > > have please? (assumedly some/all of these are compatible with a variant of > > VFP). > > Definitely all Krait processors have 0x40 for the subarchitecture > encoding. I need to check our Scorpions but I'm fairly certain they also > have 0x40. > > > > >>> I can see cases for changing this code, I just don't see why it would go > >>> wrong in the case you're describing. > >> VFP_arch = (vfpsid & FPSID_ARCH_MASK) >> FPSID_ARCH_BIT; > >> > >> causes VFP_arch to be equal to 0 because 0x40 & 0xf == 0. > >> > >> and then a little bit later we have > >> > >> if (VFP_arch >= 2) { > >> elf_hwcap |= HWCAP_VFPv3; > >> > >> > >> The branch is not taken so we never set VFPv3. > > Ah, that's what I feared: the low bits are zero yet you are compatible with > > VFPv3. That's fine, but the proposed fix feels like a kludge; the only reason > > we'd choose on VFPv3 is because the implementor is not ARM, which may not hold > > true for other vendors. I think it would be better if we translated > > vendor-specific subarchitectures that are compatible with VFPvN into the > > corresponding architecture number instead. This would also allow us to add > > extra hwcaps for extensions other than VFP. > > Ok. We should be able to make VFP_arch into 0x4 if the implementer is > 0x51 and the subarch bits are 0x40. What I actually need from you is: for the Qualcomm implementation, what are the subarch bits defined as, and what do they correspond with - both the VFP version, and whether they correspond with any ARM common VFP subarchitecture version. The VFP version defines what the user-visible architecture of the VFP looks like. The common VFP subarchitecture version partly defines the behaviour of the interface between the VFP hardware and the support code. In ARM land, these are the possiblities - I've also listed those platforms which I definitely know of at the moment which use the particular version combination: VFP version VFP subarch V1 - V2 V1 Raspberry Pi V3 V2 Marvell Dove (Cubox) (though, not ARM) V3 NULL OMAP3430 / OMAP4430 V3 V3 There is also mooted to be a VFPv4... Now, we detect VFPv4 via testing for the "fused multiply accumulate" instructions, and flag that to userspace. These are the VFMA, VFMS, VFNMA, and VFNMS instructions. HOWEVER: we do not implement these in the support code, so should these ever get bounced, we will fail to deal with them correctly. So VFPv4 should not be flagged as being implemented yet. Not only that, but VFPv4 introduces the half-precision extension as mandatory - which the support code doesn't support. Also... there seems to be a variant of VFPv3 with half-precision support... which the support code doesn't support either. And finally we get into the issues surrounding trapping/nontrapping implementations - nontrapping implementations are ones where (for example) a floating point divide by zero can't raise a SIGFPE... Last comment to make: this evening I'm beginning to wonder whether I've made a messup with the VFP support code: if we get a bounce due to an unmasked trap, we perform the operation in software and store the result. I don't think this is what's intended from the support code. Problem - the OMAP platforms are nontrapping VFPv3 implementations which can't have their trap enable bits set, so I can't check this there. Dove does, but I don't use that as too much of a devel platform at the moment... and I don't have a RPi that I can build and boot kernels for (the one I've been experimenting with is someone elses, the other end of the country, who is not a software guy...)
On 02/25/13 12:02, Russell King - ARM Linux wrote: > > This can of worms is getting bigger. We have more problems with our > handling of the different VFP versions, specifically the handling of > the EX=0 DEX=0 case. > > VFP common subarch 3 defines the EX=0, DEX=0 encoding to mean one of > the following conditions have been met: > > 1. an unallocated VFP instruction was encountered. > > In other words, the VFP was the target of the co-processor instruction, > but the instruction is not a known VFP instruction encoding. This > should raise an undefined instruction exception. > > 2. an allocated VFP instruction was encountered, but not handled in > hardware. > > In other words, the instruction is a valid VFP instruction, but the > hardware has opted not to implement this instruction and wants > software to emulate it instead. > > (Note: this can also be raised as EX=0, DEX=1 - implementation > defined!) > [snip] > > So, if EX or DEX is set, _or_ IXE is set, we pass control to VFP_bounce. > This is problematical. > > (a) condition (2) above isn't correctly handled for common subarch v3 - it > is always treated as an undefined instruction, and will result in a > SIGILL being delivered. > [snip] > > Now, (a) is just bad behaviour - as we haven't had any reports of this > yet, I suspect that no one has implemented VFP hardware with this > behaviour yet. I believe we ran into this a while ago and fixed it for our chips. We never sent the patch upstream. Sorry. https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git;a=commitdiff;h=00a13be874f230159a6b7f8cc9d0ff23bc1b7d05 I'm looking into what our bits correspond to. Hopefully get back to you in 20 something hours.
On Tue, Feb 26, 2013 at 05:37:17PM -0800, Stephen Boyd wrote: > On 02/25/13 12:02, Russell King - ARM Linux wrote: > > > > This can of worms is getting bigger. We have more problems with our > > handling of the different VFP versions, specifically the handling of > > the EX=0 DEX=0 case. > > > > VFP common subarch 3 defines the EX=0, DEX=0 encoding to mean one of > > the following conditions have been met: > > > > 1. an unallocated VFP instruction was encountered. > > > > In other words, the VFP was the target of the co-processor instruction, > > but the instruction is not a known VFP instruction encoding. This > > should raise an undefined instruction exception. > > > > 2. an allocated VFP instruction was encountered, but not handled in > > hardware. > > > > In other words, the instruction is a valid VFP instruction, but the > > hardware has opted not to implement this instruction and wants > > software to emulate it instead. > > > > (Note: this can also be raised as EX=0, DEX=1 - implementation > > defined!) > > > [snip] > > > > So, if EX or DEX is set, _or_ IXE is set, we pass control to VFP_bounce. > > This is problematical. > > > > (a) condition (2) above isn't correctly handled for common subarch v3 - it > > is always treated as an undefined instruction, and will result in a > > SIGILL being delivered. > > > [snip] > > > > Now, (a) is just bad behaviour - as we haven't had any reports of this > > yet, I suspect that no one has implemented VFP hardware with this > > behaviour yet. > > I believe we ran into this a while ago and fixed it for our chips. We > never sent the patch upstream. Sorry. > > https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git;a=commitdiff;h=00a13be874f230159a6b7f8cc9d0ff23bc1b7d05 Yes, it looks like you did - because your short vector instructions are "allocated VFP instruction"s and your hardware response is to raise an exception with EX=0 DEX=0. As you've found out, with the VFPv2 exception handling that we have, that is interpreted as an undefined instruction, rather than an instruction which needs fixing up.
On 02/26/13 09:54, Russell King - ARM Linux wrote: > What I actually need from you is: for the Qualcomm implementation, what > are the subarch bits defined as, and what do they correspond with - both > the VFP version, and whether they correspond with any ARM common VFP > subarchitecture version. For the Qualcomm implementation, the subarch bits have always been 0x40 so far. And they are compatible with "VFPv3 architecture or later with common VFP subarchitecture v3". In your table below it would be the last row. > > In ARM land, these are the possiblities - I've also listed those > platforms which I definitely know of at the moment which use the > particular version combination: > > VFP version VFP subarch > V1 - > V2 V1 Raspberry Pi > V3 V2 Marvell Dove (Cubox) (though, not ARM) > V3 NULL OMAP3430 / OMAP4430 > V3 V3
diff --git a/arch/arm/include/asm/vfp.h b/arch/arm/include/asm/vfp.h index f4ab34f..76d3f69 100644 --- a/arch/arm/include/asm/vfp.h +++ b/arch/arm/include/asm/vfp.h @@ -21,7 +21,7 @@ #define FPSID_FORMAT_MASK (0x3 << FPSID_FORMAT_BIT) #define FPSID_NODOUBLE (1<<20) #define FPSID_ARCH_BIT (16) -#define FPSID_ARCH_MASK (0xF << FPSID_ARCH_BIT) +#define FPSID_ARCH_MASK (0x7F << FPSID_ARCH_BIT) #define FPSID_PART_BIT (8) #define FPSID_PART_MASK (0xFF << FPSID_PART_BIT) #define FPSID_VARIANT_BIT (4)