Message ID | alpine.LFD.2.20.1512110105450.26920@knanqh.ubzr (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Andy Gross |
Headers | show |
On Friday 11 December 2015 01:26:25 Nicolas Pitre wrote: > This is my counter proposal to Stephen's version. Those patches and the > discussion that followed are available here: > > http://news.gmane.org/group/gmane.linux.kbuild.devel/thread=14007 > > I think what I propose here is much simpler and less intrusive. Going > to the full call site patching provides between moderate to no > additional performance benefits depending on the CPU core. That should > probably be compared with this solution with benchmark results to > determine if it is worth it. Looks really nice, as it's much simpler than Stephen's approach > Of course the ultimate performance will come from having gcc emit those > div instructions directly, but that would make the kernel binary > incompatible with some systems in a multi-arch config. Hence it has to > be a separate config option. Well, what we found is that we already have the same problem today when enabling LPAE makes the kernel incompatible with platforms that can be enabled. The affected platforms are basically the same, except for the earlier PJ4 and Krait variants that have some idiv support but no LPAE. > diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h > index 85e374f873..48c77d422a 100644 > --- a/arch/arm/include/asm/cputype.h > +++ b/arch/arm/include/asm/cputype.h > @@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void) > > return 0; > } > + > +static inline bool __attribute_const__ cpu_is_pj4_nomp(void) > +{ > + return read_cpuid_part() == 0x56005810; > +} > #else > -#define cpu_is_pj4() 0 > +#define cpu_is_pj4() 0 > +#define cpu_is_pj4_nomp() 0 > #endif It would be nice to use a symbolic constant for the above, and maybe define all three known cpuid numbers for PJ4 variants. I've looked at it some more and I now have doubts about this code: #ifdef CONFIG_CPU_PJ4B .type __v7_pj4b_proc_info, #object __v7_pj4b_proc_info: .long 0x560f5800 .long 0xff0fff00 __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info #endif Can someone have a look and tell me that I'm wrong when I read this as matching both PJ4 and PJ4B (and PJ4B-MP)? Either I'm misreading this, or we do the wrong thing in configurations that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove). > +#ifdef CONFIG_ARM_PATCH_IDIV > + > +/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */ > +static inline u32 __attribute_const__ sdiv_instruction(void) > +{ > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { > + u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1); > + if (cpu_is_pj4_nomp()) > + insn = __opcode_thumb32_compose(0xee30, 0x0691); > + return __opcode_to_mem_thumb32(insn); > + } Strictly speaking, we can ignore the thumb instruction for pj4, as all variants also support the normal T2 opcode for udiv/sdiv. > diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S > index af2267f6a5..9397b2e532 100644 > --- a/arch/arm/lib/lib1funcs.S > +++ b/arch/arm/lib/lib1funcs.S > @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA. */ > .endm > > > +#ifdef CONFIG_ARM_PATCH_IDIV > + .align 3 > +#endif > + > ENTRY(__udivsi3) > ENTRY(__aeabi_uidiv) > UNWIND(.fnstart) I'd probably just leave out the #ifdef and always align this, but it's not important. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 11 Dec 2015, Arnd Bergmann wrote: > On Friday 11 December 2015 01:26:25 Nicolas Pitre wrote: > > > This is my counter proposal to Stephen's version. Those patches and the > > discussion that followed are available here: > > > > http://news.gmane.org/group/gmane.linux.kbuild.devel/thread=14007 > > > > I think what I propose here is much simpler and less intrusive. Going > > to the full call site patching provides between moderate to no > > additional performance benefits depending on the CPU core. That should > > probably be compared with this solution with benchmark results to > > determine if it is worth it. > > Looks really nice, as it's much simpler than Stephen's approach > > > Of course the ultimate performance will come from having gcc emit those > > div instructions directly, but that would make the kernel binary > > incompatible with some systems in a multi-arch config. Hence it has to > > be a separate config option. > > Well, what we found is that we already have the same problem today > when enabling LPAE makes the kernel incompatible with platforms that > can be enabled. The affected platforms are basically the same, except > for the earlier PJ4 and Krait variants that have some idiv support > but no LPAE. Still, making native div support depend on CONFIG_LPAE would be strange. A separate config symbol would mmake it self explanatory. > > diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h > > index 85e374f873..48c77d422a 100644 > > --- a/arch/arm/include/asm/cputype.h > > +++ b/arch/arm/include/asm/cputype.h > > @@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void) > > > > return 0; > > } > > + > > +static inline bool __attribute_const__ cpu_is_pj4_nomp(void) > > +{ > > + return read_cpuid_part() == 0x56005810; > > +} > > #else > > -#define cpu_is_pj4() 0 > > +#define cpu_is_pj4() 0 > > +#define cpu_is_pj4_nomp() 0 > > #endif > > It would be nice to use a symbolic constant for the above, and maybe define > all three known cpuid numbers for PJ4 variants. > > I've looked at it some more and I now have doubts about this code: > > #ifdef CONFIG_CPU_PJ4B > .type __v7_pj4b_proc_info, #object > __v7_pj4b_proc_info: > .long 0x560f5800 > .long 0xff0fff00 > __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions > .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info > #endif > > > Can someone have a look and tell me that I'm wrong when I read this > as matching both PJ4 and PJ4B (and PJ4B-MP)? > > Either I'm misreading this, or we do the wrong thing in configurations > that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove). I don't have the relevant documentation to validate it. And I'd prefer if this was sorted out in a separate patch. Maybe I should just drop the PJ4 variants from this patch for now. > > +#ifdef CONFIG_ARM_PATCH_IDIV > > + > > +/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */ > > +static inline u32 __attribute_const__ sdiv_instruction(void) > > +{ > > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { > > + u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1); > > + if (cpu_is_pj4_nomp()) > > + insn = __opcode_thumb32_compose(0xee30, 0x0691); > > + return __opcode_to_mem_thumb32(insn); > > + } > > Strictly speaking, we can ignore the thumb instruction for pj4, as all > variants also support the normal T2 opcode for udiv/sdiv. OK it is gone. Now... The code bails out if ARM mode and HWCAP_IDIVA is not set so effectively the ARM mode on PJ4 is currently not used either. So I'm leaning towards having aving another patch to sort PJ4 out. > > diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S > > index af2267f6a5..9397b2e532 100644 > > --- a/arch/arm/lib/lib1funcs.S > > +++ b/arch/arm/lib/lib1funcs.S > > @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA. */ > > .endm > > > > > > +#ifdef CONFIG_ARM_PATCH_IDIV > > + .align 3 > > +#endif > > + > > ENTRY(__udivsi3) > > ENTRY(__aeabi_uidiv) > > UNWIND(.fnstart) > > I'd probably just leave out the #ifdef and always align this, but it's not > important. This serves as "documentation". The alignment is important when CONFIG_ARM_PATCH_IDIV is selected and not otherwise. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 11 December 2015 12:10:29 Nicolas Pitre wrote: > On Fri, 11 Dec 2015, Arnd Bergmann wrote: > > On Friday 11 December 2015 01:26:25 Nicolas Pitre wrote: > > > Of course the ultimate performance will come from having gcc emit those > > > div instructions directly, but that would make the kernel binary > > > incompatible with some systems in a multi-arch config. Hence it has to > > > be a separate config option. > > > > Well, what we found is that we already have the same problem today > > when enabling LPAE makes the kernel incompatible with platforms that > > can be enabled. The affected platforms are basically the same, except > > for the earlier PJ4 and Krait variants that have some idiv support > > but no LPAE. > > Still, making native div support depend on CONFIG_LPAE would be strange. > A separate config symbol would mmake it self explanatory. Based on the discussion we had, I'd use CONFIG_CPU_V7VE or some variation of that: * Almost all early ARMv7 cpu cores (Cortex-A8/A9/A5, Scorpion, PJ4, PJ4B) are not V7VE, and they support neither LPAE nor IDIV * All V7VE cores (Cortex-A7/A12/A15/A17, B15) by definition support both LPAE and IDIV. Krait400 and PJ4B-MP/PJ4C are not V7VE because they don't support virtualization, but for our purposes they are close enough that we want to build them with -march=armv7ve on toolchains that allow this. * Most Krait (pre-Krait400) don't support LPAE but do support IDIV. However, with the latest change to mach-qcom/Kconfig we no longer care because we don't have separate Kconfig symbols per SoC any more and we can build a kernel for QCOM/MSM with either V7 or V7VE and with either LPAE or not, and we get what the user asked for. * PJ4/PJ4B can be treated special when building a thumb2 kernel, we already have a Kconfig option for it that we could extend, but I'd just use your solution for this, it's good enough and it handles the special opcodes for ARM mode transparently (at least when you do a follow-up patch). We haven't solved the question how the new Kconfig option fits in, but I still think we should try to get this right, especially because the LPAE handling is currently suboptimal in the way that you can enable LPAE on any ARMv7 platform, whether that supports LPAE or not. > > #ifdef CONFIG_CPU_PJ4B > > .type __v7_pj4b_proc_info, #object > > __v7_pj4b_proc_info: > > .long 0x560f5800 > > .long 0xff0fff00 > > __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions > > .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info > > #endif > > > > > > Can someone have a look and tell me that I'm wrong when I read this > > as matching both PJ4 and PJ4B (and PJ4B-MP)? > > > > Either I'm misreading this, or we do the wrong thing in configurations > > that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove). > > I don't have the relevant documentation to validate it. And I'd prefer > if this was sorted out in a separate patch. Maybe I should just drop > the PJ4 variants from this patch for now. To clarify: that point had nothing to do with your patch, I just think I found an existing kernel bug that will cause pj4b_processor_functions to be used on PJ4 (Dove, MMP2) in a kernel that includes both PJ4 and PJ4B (Armada 370/XP, Berlin). > > > +#ifdef CONFIG_ARM_PATCH_IDIV > > > + > > > +/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */ > > > +static inline u32 __attribute_const__ sdiv_instruction(void) > > > +{ > > > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { > > > + u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1); > > > + if (cpu_is_pj4_nomp()) > > > + insn = __opcode_thumb32_compose(0xee30, 0x0691); > > > + return __opcode_to_mem_thumb32(insn); > > > + } > > > > Strictly speaking, we can ignore the thumb instruction for pj4, as all > > variants also support the normal T2 opcode for udiv/sdiv. > > OK it is gone. > > Now... The code bails out if ARM mode and HWCAP_IDIVA is not set so > effectively the ARM mode on PJ4 is currently not used either. So I'm > leaning towards having aving another patch to sort PJ4 out. Sounds good. > > > diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S > > > index af2267f6a5..9397b2e532 100644 > > > --- a/arch/arm/lib/lib1funcs.S > > > +++ b/arch/arm/lib/lib1funcs.S > > > @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA. */ > > > .endm > > > > > > > > > +#ifdef CONFIG_ARM_PATCH_IDIV > > > + .align 3 > > > +#endif > > > + > > > ENTRY(__udivsi3) > > > ENTRY(__aeabi_uidiv) > > > UNWIND(.fnstart) > > > > I'd probably just leave out the #ifdef and always align this, but it's not > > important. > > This serves as "documentation". The alignment is important when > CONFIG_ARM_PATCH_IDIV is selected and not otherwise. Fair enough. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 11 Dec 2015, Arnd Bergmann wrote: > On Friday 11 December 2015 12:10:29 Nicolas Pitre wrote: > > On Fri, 11 Dec 2015, Arnd Bergmann wrote: > > > On Friday 11 December 2015 01:26:25 Nicolas Pitre wrote: > > > > Of course the ultimate performance will come from having gcc emit those > > > > div instructions directly, but that would make the kernel binary > > > > incompatible with some systems in a multi-arch config. Hence it has to > > > > be a separate config option. > > > > > > Well, what we found is that we already have the same problem today > > > when enabling LPAE makes the kernel incompatible with platforms that > > > can be enabled. The affected platforms are basically the same, except > > > for the earlier PJ4 and Krait variants that have some idiv support > > > but no LPAE. > > > > Still, making native div support depend on CONFIG_LPAE would be strange. > > A separate config symbol would mmake it self explanatory. > > Based on the discussion we had, I'd use CONFIG_CPU_V7VE or some variation > of that: > > * Almost all early ARMv7 cpu cores (Cortex-A8/A9/A5, Scorpion, PJ4, PJ4B) are not > V7VE, and they support neither LPAE nor IDIV > > * All V7VE cores (Cortex-A7/A12/A15/A17, B15) by definition support both LPAE > and IDIV. Krait400 and PJ4B-MP/PJ4C are not V7VE because they don't support > virtualization, but for our purposes they are close enough that we want to > build them with -march=armv7ve on toolchains that allow this. > > * Most Krait (pre-Krait400) don't support LPAE but do support IDIV. However, > with the latest change to mach-qcom/Kconfig we no longer care because we > don't have separate Kconfig symbols per SoC any more and we can build a > kernel for QCOM/MSM with either V7 or V7VE and with either LPAE or not, and > we get what the user asked for. > > * PJ4/PJ4B can be treated special when building a thumb2 kernel, we already > have a Kconfig option for it that we could extend, but I'd just use your > solution for this, it's good enough and it handles the special opcodes > for ARM mode transparently (at least when you do a follow-up patch). > > We haven't solved the question how the new Kconfig option fits in, but > I still think we should try to get this right, especially because the LPAE > handling is currently suboptimal in the way that you can enable LPAE on > any ARMv7 platform, whether that supports LPAE or not. You seem to have a good grasp of the problem space. I'd suggest you make a patch! ;-) I already have a pending patch optimizing __do_div64 with udiv. I'm waiting for the config option to be in place. > > > #ifdef CONFIG_CPU_PJ4B > > > .type __v7_pj4b_proc_info, #object > > > __v7_pj4b_proc_info: > > > .long 0x560f5800 > > > .long 0xff0fff00 > > > __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions > > > .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info > > > #endif > > > > > > > > > Can someone have a look and tell me that I'm wrong when I read this > > > as matching both PJ4 and PJ4B (and PJ4B-MP)? > > > > > > Either I'm misreading this, or we do the wrong thing in configurations > > > that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove). > > > > I don't have the relevant documentation to validate it. And I'd prefer > > if this was sorted out in a separate patch. Maybe I should just drop > > the PJ4 variants from this patch for now. > > To clarify: that point had nothing to do with your patch, I just think > I found an existing kernel bug that will cause pj4b_processor_functions > to be used on PJ4 (Dove, MMP2) in a kernel that includes both PJ4 and > PJ4B (Armada 370/XP, Berlin). OK. I'd suggest starting another thread about this to get the right people's attention. > > This serves as "documentation". The alignment is important when > > CONFIG_ARM_PATCH_IDIV is selected and not otherwise. > > Fair enough. I tested my patch with both ARM and Thumb kernels, also dumping kernel memory to be sure the right instructions were in place. If someone is willing to ACK v2 of my patch I'll put it in the patch system. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 11, 2015 at 10:50:02PM +0100, Arnd Bergmann wrote: > On Friday 11 December 2015 12:10:29 Nicolas Pitre wrote: > > On Fri, 11 Dec 2015, Arnd Bergmann wrote: > > > #ifdef CONFIG_CPU_PJ4B > > > .type __v7_pj4b_proc_info, #object > > > __v7_pj4b_proc_info: > > > .long 0x560f5800 > > > .long 0xff0fff00 > > > __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions > > > .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info > > > #endif > > > > > > > > > Can someone have a look and tell me that I'm wrong when I read this > > > as matching both PJ4 and PJ4B (and PJ4B-MP)? > > > > > > Either I'm misreading this, or we do the wrong thing in configurations > > > that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove). > > > > I don't have the relevant documentation to validate it. And I'd prefer > > if this was sorted out in a separate patch. Maybe I should just drop > > the PJ4 variants from this patch for now. > > To clarify: that point had nothing to do with your patch, I just think > I found an existing kernel bug that will cause pj4b_processor_functions > to be used on PJ4 (Dove, MMP2) in a kernel that includes both PJ4 and > PJ4B (Armada 370/XP, Berlin). It does look like it will end up matching a PJ4 CPU against the PJ4B entry if it's enabled. I think the question that needs to be asked is why the mask is soo loose, and the past history gives us some information. The loosening of the mask was done by Gregory Clement a couple of years ago: ARM: 7754/1: Fix the CPU ID and the mask associated to the PJ4B This commit fixes the ID and mask for the PJ4B which was too restrictive and didn't match the CPU of the Armada 370 SoC. __v7_pj4b_proc_info: - .long 0x562f5840 - .long 0xfffffff0 + .long 0x560f5800 + .long 0xff0fff00 So it was to include Armada 370. So this now brings up the question... what is the MIDR value used in Armada 370?
On Friday 11 December 2015 17:00:50 Nicolas Pitre wrote: > You seem to have a good grasp of the problem space. I'd suggest you make > a patch! ;-) Yes, I can do that once all the ARMv6/v7 multiplatform work is done, that should at least reduce the number of special cases. > > > > #ifdef CONFIG_CPU_PJ4B > > > > .type __v7_pj4b_proc_info, #object > > > > __v7_pj4b_proc_info: > > > > .long 0x560f5800 > > > > .long 0xff0fff00 > > > > __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions > > > > .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info > > > > #endif > > > > > > > > > > > > Can someone have a look and tell me that I'm wrong when I read this > > > > as matching both PJ4 and PJ4B (and PJ4B-MP)? > > > > > > > > Either I'm misreading this, or we do the wrong thing in configurations > > > > that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove). > > > > > > I don't have the relevant documentation to validate it. And I'd prefer > > > if this was sorted out in a separate patch. Maybe I should just drop > > > the PJ4 variants from this patch for now. > > > > To clarify: that point had nothing to do with your patch, I just think > > I found an existing kernel bug that will cause pj4b_processor_functions > > to be used on PJ4 (Dove, MMP2) in a kernel that includes both PJ4 and > > PJ4B (Armada 370/XP, Berlin). > > OK. I'd suggest starting another thread about this to get the right > people's attention. I've done a preliminary patch now, will send that out. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 11 December 2015 22:34:16 Russell King - ARM Linux wrote: > > __v7_pj4b_proc_info: > - .long 0x562f5840 > - .long 0xfffffff0 > + .long 0x560f5800 > + .long 0xff0fff00 > > So it was to include Armada 370. So this now brings up the question... > what is the MIDR value used in Armada 370? I've listed them in an earlier thread, here is the list again: variant part revision name features mmp2: 0 0x581 5 PJ4 idivt dove: 0 0x581 5 PJ4 idivt Armada 370 1 0x581 1 PJ4B idivt mmp3: 2 0x584 2 PJ4-MP idiva idivt lpae Armada XP 2 0x584 2 PJ4-MP idiva idivt lpae Berlin 2 0x584 2 PJ4-MP idiva idivt lpae So the original table was wrong because it failed to include PJ4B (Armada 370), but the current version is wrong, because it also includes PJ4 (Dove and MMP2). Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 11 Dec 2015, Arnd Bergmann wrote: > On Friday 11 December 2015 22:34:16 Russell King - ARM Linux wrote: > > > > __v7_pj4b_proc_info: > > - .long 0x562f5840 > > - .long 0xfffffff0 > > + .long 0x560f5800 > > + .long 0xff0fff00 > > > > So it was to include Armada 370. So this now brings up the question... > > what is the MIDR value used in Armada 370? > > I've listed them in an earlier thread, here is the list again: > > variant part revision name features > mmp2: 0 0x581 5 PJ4 idivt > dove: 0 0x581 5 PJ4 idivt > Armada 370 1 0x581 1 PJ4B idivt > mmp3: 2 0x584 2 PJ4-MP idiva idivt lpae > Armada XP 2 0x584 2 PJ4-MP idiva idivt lpae > Berlin 2 0x584 2 PJ4-MP idiva idivt lpae > > So the original table was wrong because it failed to include PJ4B (Armada 370), > but the current version is wrong, because it also includes PJ4 (Dove and MMP2). I'd suggest you add the above table and conclusion to the commit log for your proposed fix. Next time the question comes up the info will be right there. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 34e1569a11..efea5fa975 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1604,6 +1604,21 @@ config THUMB2_AVOID_R_ARM_THM_JUMP11 config ARM_ASM_UNIFIED bool +config ARM_PATCH_IDIV + bool "Runtime patch udiv/sdiv instructions into __aeabi_{u}idiv()" + depends on CPU_32v7 && !XIP_KERNEL + default y + help + Some v7 CPUs have support for the udiv and sdiv instructions + that can be used to implement the __aeabi_uidiv and __aeabi_idiv + functions provided by the ARM runtime ABI. + + Enabling this option allows the kernel to modify itself to replace + the first two instructions of these library functions with the + udiv or sdiv plus "bx lr" instructions. Typically this will be + faster and less power intensive than running the original library + support code to do integer division. + config AEABI bool "Use the ARM EABI to compile the kernel" help diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h index 85e374f873..48c77d422a 100644 --- a/arch/arm/include/asm/cputype.h +++ b/arch/arm/include/asm/cputype.h @@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void) return 0; } + +static inline bool __attribute_const__ cpu_is_pj4_nomp(void) +{ + return read_cpuid_part() == 0x56005810; +} #else -#define cpu_is_pj4() 0 +#define cpu_is_pj4() 0 +#define cpu_is_pj4_nomp() 0 #endif static inline int __attribute_const__ cpuid_feature_extract_field(u32 features, diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 20edd349d3..d8614775a8 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -375,6 +375,77 @@ void __init early_print(const char *str, ...) printk("%s", buf); } +#ifdef CONFIG_ARM_PATCH_IDIV + +/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */ +static inline u32 __attribute_const__ sdiv_instruction(void) +{ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1); + if (cpu_is_pj4_nomp()) + insn = __opcode_thumb32_compose(0xee30, 0x0691); + return __opcode_to_mem_thumb32(insn); + } + + if (cpu_is_pj4_nomp()) + return __opcode_to_mem_arm(0xee300691); + return __opcode_to_mem_arm(0xe710f110); +} + +/* "udiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 0" if we're on pj4 w/o MP */ +static inline u32 __attribute_const__ udiv_instruction(void) +{ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + u32 insn = __opcode_thumb32_compose(0xfbb0, 0xf0f1); + if (cpu_is_pj4_nomp()) + insn = __opcode_thumb32_compose(0xee30, 0x0611); + return __opcode_to_mem_thumb32(insn); + } + + if (cpu_is_pj4_nomp()) + return __opcode_to_mem_arm(0xee300611); + return __opcode_to_mem_arm(0xe730f110); +} + +/* "bx lr" */ +static inline u32 __attribute_const__ bx_lr_instruction(void) +{ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + u32 insn = __opcode_thumb32_compose(0x4770, 0x46c0); + return __opcode_to_mem_thumb32(insn); + } + + return __opcode_to_mem_arm(0xe12fff1e); +} + +static void __init patch_aeabi_uidiv(void) +{ + extern void __aeabi_uidiv(void); + extern void __aeabi_idiv(void); + uintptr_t fn_addr; + unsigned int mask; + + mask = IS_ENABLED(CONFIG_THUMB2_KERNEL) ? HWCAP_IDIVT : HWCAP_IDIVA; + if (!(elf_hwcap & mask)) + return; + + pr_info("CPU: div instructions available: patching division code\n"); + + fn_addr = ((uintptr_t)&__aeabi_uidiv) & ~1; + ((u32 *)fn_addr)[0] = udiv_instruction(); + ((u32 *)fn_addr)[1] = bx_lr_instruction(); + flush_icache_range(fn_addr, fn_addr + 8); + + fn_addr = ((uintptr_t)&__aeabi_idiv) & ~1; + ((u32 *)fn_addr)[0] = sdiv_instruction(); + ((u32 *)fn_addr)[1] = bx_lr_instruction(); + flush_icache_range(fn_addr, fn_addr + 8); +} + +#else +static inline void patch_aeabi_uidiv(void) { } +#endif + static void __init cpuid_init_hwcaps(void) { int block; @@ -642,6 +713,7 @@ static void __init setup_processor(void) elf_hwcap = list->elf_hwcap; cpuid_init_hwcaps(); + patch_aeabi_uidiv(); #ifndef CONFIG_ARM_THUMB elf_hwcap &= ~(HWCAP_THUMB | HWCAP_IDIVT); diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S index af2267f6a5..9397b2e532 100644 --- a/arch/arm/lib/lib1funcs.S +++ b/arch/arm/lib/lib1funcs.S @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA. */ .endm +#ifdef CONFIG_ARM_PATCH_IDIV + .align 3 +#endif + ENTRY(__udivsi3) ENTRY(__aeabi_uidiv) UNWIND(.fnstart) @@ -253,6 +257,10 @@ UNWIND(.fnstart) UNWIND(.fnend) ENDPROC(__umodsi3) +#ifdef CONFIG_ARM_PATCH_IDIV + .align 3 +#endif + ENTRY(__divsi3) ENTRY(__aeabi_idiv) UNWIND(.fnstart)
The ARM compiler inserts calls to __aeabi_uidiv() and __aeabi_idiv() when it needs to perform division on signed and unsigned integers. If a processor has support for the udiv and sdiv instructions, the kernel may overwrite the beginning of those functions with those instructions and a "bx lr" to get better performance. To ensure those functions are aligned to a 32-bit word for easier patching (which might not always be the case in Thumb mode) and the two patched instructions for each case are contained in the same cache line, a 8-byte alignment is enforced when ARM_PATCH_IDIV is configured. This was heavily inspired by a previous patch by Stephen Boyd. Signed-off-by: Nicolas Pitre <nico@linaro.org> --- This is my counter proposal to Stephen's version. Those patches and the discussion that followed are available here: http://news.gmane.org/group/gmane.linux.kbuild.devel/thread=14007 I think what I propose here is much simpler and less intrusive. Going to the full call site patching provides between moderate to no additional performance benefits depending on the CPU core. That should probably be compared with this solution with benchmark results to determine if it is worth it. Of course the ultimate performance will come from having gcc emit those div instructions directly, but that would make the kernel binary incompatible with some systems in a multi-arch config. Hence it has to be a separate config option. -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html