Message ID | 3214675.EzzC1Ail5Z@wuerfel (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 24 Aug 2016 09:41:39 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote: > > On Tue, 23 Aug 2016 14:01:29 +0200 > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote: > > > > Erratum 657417 is worked around by the linker by inserting additional > > > > branch trampolines to avoid problematic branch target locations. This > > > > results in much higher linking time and presumably slower and larger > > > > generated code. The workaround also seems to only be required when > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as > > > > well. > > > > > > > > The workaround today is left to the linker to apply, which is overly > > > > conservative. > > > > > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html > > > > > > > > This patch adds an option which defaults to "y" in cases where we > > > > could possibly be running Cortex A8 and using Thumb2 instructions. > > > > In reality the workaround might not be required at all for the kernel > > > > if virtual instruction memory is linear in physical memory. However it > > > > is more conservative to keep the workaround, and it may be the case > > > > that the TLB lookup would be required in order to catch branches to > > > > unmapped or no-execute pages. > > > > > > > > In an allyesconfig build, this workaround causes a large load on > > > > the linker's branch stub hash and slows down the final link by a > > > > factor of 5. > > > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > > > > > > > > Thanks a lot for finding this issue. I can confirm that your patch > > > helps noticeably in all configurations, reducing time for a relink > > > from 18 minutes to 9 minutes on my machine in the best case, but > > > the factor 10 slowdown of the final link with your thin archives > > > and gc-sections patches remains. > > > > > > I suspect there is still something else going on besides the 657417 > > > slowing things down, but it's also possible that I'm doing something > > > wrong here. > > > > Okay, I was only testing thin archives. gc-sections I didn't look at. > > With thin archives, one final arm allyesconfig link with this patch is > > not showing a regression. gc-sections must be causing something else > > ARM specific, because powerpc seems to link fast with gc-sections. > > Ok, I see. For completeness, here are my results with thin archives and > without gc-sections on ARM: This is about what I saw. > > || no THUMB2, thin archives, no gc-sections, before: 144 seconds > > 09:29:51 LINK vmlinux > 09:29:51 AR built-in.o > 09:29:52 LD vmlinux.o > 09:30:12 MODPOST vmlinux.o > 09:30:14 GEN .version > 09:30:14 CHK include/generated/compile.h > UPD include/generated/compile.h > 09:30:14 CC init/version.o > 09:30:15 AR init/built-in.o > 09:30:43 KSYM .tmp_kallsyms1.o > 09:31:28 KSYM .tmp_kallsyms2.o > 09:31:40 LD vmlinux > 09:32:13 SORTEX vmlinux > 09:32:13 SYSMAP System.map > 09:32:15 OBJCOPY arch/arm/boot/Image > > || no THUMB2, thin archives, no gc-sections, after: 70 seconds > > 09:33:54 LINK vmlinux > 09:33:54 AR built-in.o > 09:33:55 LD vmlinux.o > 09:34:13 MODPOST vmlinux.o > 09:34:15 GEN .version > 09:34:16 CHK include/generated/compile.h > UPD include/generated/compile.h > 09:34:16 CC init/version.o > 09:34:16 AR init/built-in.o > 09:34:24 KSYM .tmp_kallsyms1.o > 09:34:43 KSYM .tmp_kallsyms2.o > 09:34:55 LD vmlinux > 09:35:03 SORTEX vmlinux > 09:35:03 SYSMAP System.map > 09:35:04 OBJCOPY arch/arm/boot/Image > > The final 'LD' step is much faster here as you also found, and now > the time for the complete link is mainly the initial 'LD vmlinux.o' > step, which did not get faster with your patch. The info here isn't very good because KSYM is printed after the link but before the kallsyms generation. We can kind of see what's happening if we take the time between the KSYM and the LD vmlinux as the time for kallsyms: before after KSYM ~12s ~12s LD vmlinux.o 20s 18s LD .tmp_kallsyms1.o 28s 8s LD .tmp_kallsyms2.o 33s 7s LD vmlinux 33s 8s Probably the cortex a8 workaround does not get applied to vmlinux.o link (due to being incremental), so we don't see any speedup with the patch. It takes longer overall I guess because it keeps a lot of symbols in the output file (due to incremental). > > Can you send your latest ARM patch to enable this and I'll have a look > > at it? > > See below. I have not updated the patch description yet, but included > the changes that Nico suggested. The test above used the same patch > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line. Thanks, I'll take a look. -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Aug 2016 19:05:30 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Wed, 24 Aug 2016 09:41:39 +0200 > Arnd Bergmann <arnd@arndb.de> wrote: > > > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote: > > > On Tue, 23 Aug 2016 14:01:29 +0200 > > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote: > > > > > Erratum 657417 is worked around by the linker by inserting additional > > > > > branch trampolines to avoid problematic branch target locations. This > > > > > results in much higher linking time and presumably slower and larger > > > > > generated code. The workaround also seems to only be required when > > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as > > > > > well. > > > > > > > > > > The workaround today is left to the linker to apply, which is overly > > > > > conservative. > > > > > > > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html > > > > > > > > > > This patch adds an option which defaults to "y" in cases where we > > > > > could possibly be running Cortex A8 and using Thumb2 instructions. > > > > > In reality the workaround might not be required at all for the kernel > > > > > if virtual instruction memory is linear in physical memory. However it > > > > > is more conservative to keep the workaround, and it may be the case > > > > > that the TLB lookup would be required in order to catch branches to > > > > > unmapped or no-execute pages. > > > > > > > > > > In an allyesconfig build, this workaround causes a large load on > > > > > the linker's branch stub hash and slows down the final link by a > > > > > factor of 5. > > > > > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > > > > > > > > > > > Thanks a lot for finding this issue. I can confirm that your patch > > > > helps noticeably in all configurations, reducing time for a relink > > > > from 18 minutes to 9 minutes on my machine in the best case, but > > > > the factor 10 slowdown of the final link with your thin archives > > > > and gc-sections patches remains. > > > > > > > > I suspect there is still something else going on besides the 657417 > > > > slowing things down, but it's also possible that I'm doing something > > > > wrong here. > > > > > > Okay, I was only testing thin archives. gc-sections I didn't look at. > > > With thin archives, one final arm allyesconfig link with this patch is > > > not showing a regression. gc-sections must be causing something else > > > ARM specific, because powerpc seems to link fast with gc-sections. > > > > Ok, I see. For completeness, here are my results with thin archives and > > without gc-sections on ARM: > > This is about what I saw. > > > > > || no THUMB2, thin archives, no gc-sections, before: 144 seconds > > > > 09:29:51 LINK vmlinux > > 09:29:51 AR built-in.o > > 09:29:52 LD vmlinux.o > > 09:30:12 MODPOST vmlinux.o > > 09:30:14 GEN .version > > 09:30:14 CHK include/generated/compile.h > > UPD include/generated/compile.h > > 09:30:14 CC init/version.o > > 09:30:15 AR init/built-in.o > > 09:30:43 KSYM .tmp_kallsyms1.o > > 09:31:28 KSYM .tmp_kallsyms2.o > > 09:31:40 LD vmlinux > > 09:32:13 SORTEX vmlinux > > 09:32:13 SYSMAP System.map > > 09:32:15 OBJCOPY arch/arm/boot/Image > > > > || no THUMB2, thin archives, no gc-sections, after: 70 seconds > > > > 09:33:54 LINK vmlinux > > 09:33:54 AR built-in.o > > 09:33:55 LD vmlinux.o > > 09:34:13 MODPOST vmlinux.o > > 09:34:15 GEN .version > > 09:34:16 CHK include/generated/compile.h > > UPD include/generated/compile.h > > 09:34:16 CC init/version.o > > 09:34:16 AR init/built-in.o > > 09:34:24 KSYM .tmp_kallsyms1.o > > 09:34:43 KSYM .tmp_kallsyms2.o > > 09:34:55 LD vmlinux > > 09:35:03 SORTEX vmlinux > > 09:35:03 SYSMAP System.map > > 09:35:04 OBJCOPY arch/arm/boot/Image > > > > The final 'LD' step is much faster here as you also found, and now > > the time for the complete link is mainly the initial 'LD vmlinux.o' > > step, which did not get faster with your patch. > > The info here isn't very good because KSYM is printed after the link > but before the kallsyms generation. We can kind of see what's happening > if we take the time between the KSYM and the LD vmlinux as the time for > kallsyms: > before after > KSYM ~12s ~12s > LD vmlinux.o 20s 18s > LD .tmp_kallsyms1.o 28s 8s > LD .tmp_kallsyms2.o 33s 7s > LD vmlinux 33s 8s > > Probably the cortex a8 workaround does not get applied to vmlinux.o > link (due to being incremental), so we don't see any speedup with the > patch. It takes longer overall I guess because it keeps a lot of > symbols in the output file (due to incremental). > > > > > Can you send your latest ARM patch to enable this and I'll have a look > > > at it? > > > > See below. I have not updated the patch description yet, but included > > the changes that Nico suggested. The test above used the same patch > > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line. > > Thanks, I'll take a look. Okay, I can't reproduce your bad linking times even with gc-sections. It's possible I'm doing something wrong, but with my patches + your patch and standard arm allyesconfig: 20:33:56 AR built-in.o 20:33:57 LD vmlinux.o MODPOST vmlinux.o 20:34:12 GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.o 20:34:24 KSYM .tmp_kallsyms1.o 20:34:45 KSYM .tmp_kallsyms2.o 20:34:54 LD vmlinux 20:35:07 SORTEX vmlinux 20:35:07 SYSMAP System.map I have about 71 seconds for the final link phase. Command is: make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux $ arm-linux-gnueabi-ld -v GNU ld (GNU Binutils for Debian) 2.27 Confirming function/data sections: $ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l 1012848 $ size vmlinux text data bss dec hex filename 74205771 36746855 19072744 130025370 7c0079a vmlinux $ file vmlinux vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped Final link command is: arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o Which takes 12s I'm building THUMB2 now, but if you have any hints for me... Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Aug 2016 20:56:46 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Wed, 24 Aug 2016 19:05:30 +1000 > Nicholas Piggin <npiggin@gmail.com> wrote: > > > On Wed, 24 Aug 2016 09:41:39 +0200 > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > On Wednesday, August 24, 2016 2:00:44 PM CEST Nicholas Piggin wrote: > > > > On Tue, 23 Aug 2016 14:01:29 +0200 > > > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote: > > > > > > Erratum 657417 is worked around by the linker by inserting additional > > > > > > branch trampolines to avoid problematic branch target locations. This > > > > > > results in much higher linking time and presumably slower and larger > > > > > > generated code. The workaround also seems to only be required when > > > > > > linking thumb2 code, but the linker applies it for non-thumb2 code as > > > > > > well. > > > > > > > > > > > > The workaround today is left to the linker to apply, which is overly > > > > > > conservative. > > > > > > > > > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html > > > > > > > > > > > > This patch adds an option which defaults to "y" in cases where we > > > > > > could possibly be running Cortex A8 and using Thumb2 instructions. > > > > > > In reality the workaround might not be required at all for the kernel > > > > > > if virtual instruction memory is linear in physical memory. However it > > > > > > is more conservative to keep the workaround, and it may be the case > > > > > > that the TLB lookup would be required in order to catch branches to > > > > > > unmapped or no-execute pages. > > > > > > > > > > > > In an allyesconfig build, this workaround causes a large load on > > > > > > the linker's branch stub hash and slows down the final link by a > > > > > > factor of 5. > > > > > > > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > > > > > > > > > > > > > > Thanks a lot for finding this issue. I can confirm that your patch > > > > > helps noticeably in all configurations, reducing time for a relink > > > > > from 18 minutes to 9 minutes on my machine in the best case, but > > > > > the factor 10 slowdown of the final link with your thin archives > > > > > and gc-sections patches remains. > > > > > > > > > > I suspect there is still something else going on besides the 657417 > > > > > slowing things down, but it's also possible that I'm doing something > > > > > wrong here. > > > > > > > > Okay, I was only testing thin archives. gc-sections I didn't look at. > > > > With thin archives, one final arm allyesconfig link with this patch is > > > > not showing a regression. gc-sections must be causing something else > > > > ARM specific, because powerpc seems to link fast with gc-sections. > > > > > > Ok, I see. For completeness, here are my results with thin archives and > > > without gc-sections on ARM: > > > > This is about what I saw. > > > > > > > > || no THUMB2, thin archives, no gc-sections, before: 144 seconds > > > > > > 09:29:51 LINK vmlinux > > > 09:29:51 AR built-in.o > > > 09:29:52 LD vmlinux.o > > > 09:30:12 MODPOST vmlinux.o > > > 09:30:14 GEN .version > > > 09:30:14 CHK include/generated/compile.h > > > UPD include/generated/compile.h > > > 09:30:14 CC init/version.o > > > 09:30:15 AR init/built-in.o > > > 09:30:43 KSYM .tmp_kallsyms1.o > > > 09:31:28 KSYM .tmp_kallsyms2.o > > > 09:31:40 LD vmlinux > > > 09:32:13 SORTEX vmlinux > > > 09:32:13 SYSMAP System.map > > > 09:32:15 OBJCOPY arch/arm/boot/Image > > > > > > || no THUMB2, thin archives, no gc-sections, after: 70 seconds > > > > > > 09:33:54 LINK vmlinux > > > 09:33:54 AR built-in.o > > > 09:33:55 LD vmlinux.o > > > 09:34:13 MODPOST vmlinux.o > > > 09:34:15 GEN .version > > > 09:34:16 CHK include/generated/compile.h > > > UPD include/generated/compile.h > > > 09:34:16 CC init/version.o > > > 09:34:16 AR init/built-in.o > > > 09:34:24 KSYM .tmp_kallsyms1.o > > > 09:34:43 KSYM .tmp_kallsyms2.o > > > 09:34:55 LD vmlinux > > > 09:35:03 SORTEX vmlinux > > > 09:35:03 SYSMAP System.map > > > 09:35:04 OBJCOPY arch/arm/boot/Image > > > > > > The final 'LD' step is much faster here as you also found, and now > > > the time for the complete link is mainly the initial 'LD vmlinux.o' > > > step, which did not get faster with your patch. > > > > The info here isn't very good because KSYM is printed after the link > > but before the kallsyms generation. We can kind of see what's happening > > if we take the time between the KSYM and the LD vmlinux as the time for > > kallsyms: > > before after > > KSYM ~12s ~12s > > LD vmlinux.o 20s 18s > > LD .tmp_kallsyms1.o 28s 8s > > LD .tmp_kallsyms2.o 33s 7s > > LD vmlinux 33s 8s > > > > Probably the cortex a8 workaround does not get applied to vmlinux.o > > link (due to being incremental), so we don't see any speedup with the > > patch. It takes longer overall I guess because it keeps a lot of > > symbols in the output file (due to incremental). > > > > > > > > Can you send your latest ARM patch to enable this and I'll have a look > > > > at it? > > > > > > See below. I have not updated the patch description yet, but included > > > the changes that Nico suggested. The test above used the same patch > > > but left out the 'select LD_DEAD_CODE_DATA_ELIMINATION' line. > > > > Thanks, I'll take a look. > > Okay, I can't reproduce your bad linking times even with gc-sections. It's > possible I'm doing something wrong, but with my patches + your patch and > standard arm allyesconfig: > > 20:33:56 AR built-in.o > 20:33:57 LD vmlinux.o > MODPOST vmlinux.o > 20:34:12 GEN .version > CHK include/generated/compile.h > UPD include/generated/compile.h > CC init/version.o > AR init/built-in.o > 20:34:24 KSYM .tmp_kallsyms1.o > 20:34:45 KSYM .tmp_kallsyms2.o > 20:34:54 LD vmlinux > 20:35:07 SORTEX vmlinux > 20:35:07 SYSMAP System.map > > I have about 71 seconds for the final link phase. > > Command is: > make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j3 vmlinux > > $ arm-linux-gnueabi-ld -v > GNU ld (GNU Binutils for Debian) 2.27 > > Confirming function/data sections: > $ objdump -h built-in.o | grep ^[[:space:]]*[0-9][0-9]* | wc -l > 1012848 > > $ size vmlinux > text data bss dec hex filename > 74205771 36746855 19072744 130025370 7c0079a vmlinux > > $ file vmlinux > vmlinux: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=37d207288f81f83291dfb45294a94c555d9fbdb8, not stripped > > Final link command is: > arm-linux-gnueabi-ld -EL --no-fix-cortex-a8 -p --no-undefined -X --pic-veneer --build-id --gc-sections -X -o vmlinux -T ./arch/arm/kernel/vmlinux.lds --whole-archive built-in.o .tmp_kallsyms2.o > > Which takes 12s > > I'm building THUMB2 now, but if you have any hints for me... Just did a thumb2 build. Disable V6, enable thumb2, otherwise same config and tree takes a long time to link as expected because the --no-fix-cortex-a8 option is not being applied. Adding that brings link time down to same as allyesconfig (slightly faster). Confirmed it's output thumb2 instructions. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote: > On Wed, 24 Aug 2016 19:05:30 +1000 > Nicholas Piggin <npiggin@gmail.com> wrote: > > Thanks, I'll take a look. > > Okay, I can't reproduce your bad linking times even with gc-sections. It's > possible I'm doing something wrong, but with my patches + your patch and > standard arm allyesconfig: > > 20:33:56 AR built-in.o > 20:33:57 LD vmlinux.o > MODPOST vmlinux.o > 20:34:12 GEN .version > CHK include/generated/compile.h > UPD include/generated/compile.h > CC init/version.o > AR init/built-in.o > 20:34:24 KSYM .tmp_kallsyms1.o > 20:34:45 KSYM .tmp_kallsyms2.o > 20:34:54 LD vmlinux > 20:35:07 SORTEX vmlinux > 20:35:07 SYSMAP System.map > > I have about 71 seconds for the final link phase. I've tracked down my remaining build time regression to a bad binutils snapshot (2.26.51) I had been using, and upgraded to the 2.27 release now, which is roughly the same as what you have: 14:45:41 LINK vmlinux 14:45:41 AR built-in.o 14:45:42 LD vmlinux.o 14:51:49 MODPOST vmlinux.o 14:51:51 GEN .version 14:51:51 CHK include/generated/compile.h UPD include/generated/compile.h 14:51:51 CC init/version.o 14:51:52 AR init/built-in.o 14:52:04 KSYM .tmp_kallsyms1.o 14:52:31 KSYM .tmp_kallsyms2.o 14:52:43 LD vmlinux 14:52:55 SORTEX vmlinux 14:52:55 SYSMAP System.map 14:52:56 OBJCOPY arch/arm/boot/Image The long minutes that were spent in "arm-linux-gnueabi-ld -r -o vmlinux.o --whole-archive built-in.o" are all gone now. I still see a problem with big-endian builds failing with thinarc/gc-sections, I'll investigate that some other day, or you could have a look at that if you want to make sure it's an ARM specific problem, not something with your patches in general. The patch that I sent for enabling the two on ARM blocks out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see the problem. It's possible that it only breaks when doing a big-endian build after a little-endian build without a "make clean" inbetween. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Aug 2016 17:01:30 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Wednesday, August 24, 2016 8:56:46 PM CEST Nicholas Piggin wrote: > > On Wed, 24 Aug 2016 19:05:30 +1000 > > Nicholas Piggin <npiggin@gmail.com> wrote: > > > Thanks, I'll take a look. > > > > Okay, I can't reproduce your bad linking times even with gc-sections. It's > > possible I'm doing something wrong, but with my patches + your patch and > > standard arm allyesconfig: > > > > 20:33:56 AR built-in.o > > 20:33:57 LD vmlinux.o > > MODPOST vmlinux.o > > 20:34:12 GEN .version > > CHK include/generated/compile.h > > UPD include/generated/compile.h > > CC init/version.o > > AR init/built-in.o > > 20:34:24 KSYM .tmp_kallsyms1.o > > 20:34:45 KSYM .tmp_kallsyms2.o > > 20:34:54 LD vmlinux > > 20:35:07 SORTEX vmlinux > > 20:35:07 SYSMAP System.map > > > > I have about 71 seconds for the final link phase. > > > I've tracked down my remaining build time regression to > a bad binutils snapshot (2.26.51) I had been using, and upgraded > to the 2.27 release now, which is roughly the same as what > you have: > > 14:45:41 LINK vmlinux > 14:45:41 AR built-in.o > 14:45:42 LD vmlinux.o > 14:51:49 MODPOST vmlinux.o > 14:51:51 GEN .version > 14:51:51 CHK include/generated/compile.h > UPD include/generated/compile.h > 14:51:51 CC init/version.o > 14:51:52 AR init/built-in.o > 14:52:04 KSYM .tmp_kallsyms1.o > 14:52:31 KSYM .tmp_kallsyms2.o > 14:52:43 LD vmlinux > 14:52:55 SORTEX vmlinux > 14:52:55 SYSMAP System.map > 14:52:56 OBJCOPY arch/arm/boot/Image > > The long minutes that were spent in "arm-linux-gnueabi-ld > -r -o vmlinux.o --whole-archive built-in.o" are all gone now. Thanks for tracking it down, that's good to hear. > I still see a problem with big-endian builds failing with > thinarc/gc-sections, I'll investigate that some other day, > or you could have a look at that if you want to make sure > it's an ARM specific problem, not something with your > patches in general. > > The patch that I sent for enabling the two on ARM blocks > out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see > the problem. It's possible that it only breaks when doing > a big-endian build after a little-endian build without > a "make clean" inbetween. I'm able to build big endian ARM allyesconfig with thin archives and gc-sections, worked fine. There have been a few bugs in powerpc when failing to notice the change when endian of builds was swapped, so maybe you got bitten by something similar. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 26 August 2016, Nicholas Piggin wrote: > > I still see a problem with big-endian builds failing with > > thinarc/gc-sections, I'll investigate that some other day, > > or you could have a look at that if you want to make sure > > it's an ARM specific problem, not something with your > > patches in general. > > > > The patch that I sent for enabling the two on ARM blocks > > out CONFIG_CPU_BIG_ENDIAN, so just revert that hunk to see > > the problem. It's possible that it only breaks when doing > > a big-endian build after a little-endian build without > > a "make clean" inbetween. > > I'm able to build big endian ARM allyesconfig with thin > archives and gc-sections, worked fine. > > There have been a few bugs in powerpc when failing to notice > the change when endian of builds was swapped, so maybe you > got bitten by something similar. I tracked this down as well now, and it's also a problem on my local machine, your patches are fine. In order to debug the other problem, I was building with "make LD=/home/arnd/..../ld" to try out different versions of the linker, and that caused the "LD += -EB" line from arch/arm/Makefile to be ignored. We should probably override LDFLAGS rather than LD (and AFLAGS instead of AS) for big-endian builds, but that is unrelated to your work. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b62ae32f8a1e..9bf37a6e7384 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -83,6 +83,7 @@ config ARM select HAVE_UID16 select HAVE_VIRT_CPU_ACCOUNTING_GEN select IRQ_FORCED_THREADING + select LD_DEAD_CODE_DATA_ELIMINATION select MODULES_USE_ELF_REL select NO_BOOTMEM select OF_EARLY_FLATTREE if OF @@ -92,6 +93,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select THIN_ARCHIVES # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index ad325a8c7e1e..b7f2a41fd940 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -13,6 +13,9 @@ endif CFLAGS_REMOVE_return_address.o = -pg +ccflags-y += -fno-function-sections -fno-data-sections +subdir-ccflags-y += -fno-function-sections -fno-data-sections + # Object file lists. obj-y := elf.o entry-common.o irq.o opcodes.o \ diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S index 56c8bdf776bd..4b515ae498e2 100644 --- a/arch/arm/kernel/vmlinux-xip.lds.S +++ b/arch/arm/kernel/vmlinux-xip.lds.S @@ -12,17 +12,17 @@ #define PROC_INFO \ . = ALIGN(4); \ VMLINUX_SYMBOL(__proc_info_begin) = .; \ - *(.proc.info.init) \ + KEEP(*(.proc.info.init)) \ VMLINUX_SYMBOL(__proc_info_end) = .; #define IDMAP_TEXT \ ALIGN_FUNCTION(); \ VMLINUX_SYMBOL(__idmap_text_start) = .; \ - *(.idmap.text) \ + KEEP(*(.idmap.text)) \ VMLINUX_SYMBOL(__idmap_text_end) = .; \ . = ALIGN(PAGE_SIZE); \ VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \ - *(.hyp.idmap.text) \ + KEEP(*(.hyp.idmap.text)) \ VMLINUX_SYMBOL(__hyp_idmap_text_end) = .; #ifdef CONFIG_HOTPLUG_CPU @@ -114,7 +114,7 @@ SECTIONS __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { __start___ex_table = .; #ifdef CONFIG_MMU - *(__ex_table) + KEEP(*(__ex_table)) #endif __stop___ex_table = .; } @@ -126,12 +126,12 @@ SECTIONS . = ALIGN(8); .ARM.unwind_idx : { __start_unwind_idx = .; - *(.ARM.exidx*) + KEEP(*(.ARM.exidx*)) __stop_unwind_idx = .; } .ARM.unwind_tab : { __start_unwind_tab = .; - *(.ARM.extab*) + KEEP(*(.ARM.extab*)) __stop_unwind_tab = .; } #endif @@ -146,7 +146,7 @@ SECTIONS */ __vectors_start = .; .vectors 0xffff0000 : AT(__vectors_start) { - *(.vectors) + KEEP(*(.vectors)) } . = __vectors_start + SIZEOF(.vectors); __vectors_end = .; @@ -169,24 +169,24 @@ SECTIONS } .init.arch.info : { __arch_info_begin = .; - *(.arch.info.init) + KEEP(*(.arch.info.init)) __arch_info_end = .; } .init.tagtable : { __tagtable_begin = .; - *(.taglist.init) + KEEP(*(.taglist.init)) __tagtable_end = .; } #ifdef CONFIG_SMP_ON_UP .init.smpalt : { __smpalt_begin = .; - *(.alt.smp.init) + KEEP(*(.alt.smp.init)) __smpalt_end = .; } #endif .init.pv_table : { __pv_table_begin = .; - *(.pv_table) + KEEP(*(.pv_table)) __pv_table_end = .; } .init.data : { diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 7396a5f00c5f..abb59e4c12db 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -17,7 +17,7 @@ #define PROC_INFO \ . = ALIGN(4); \ VMLINUX_SYMBOL(__proc_info_begin) = .; \ - *(.proc.info.init) \ + KEEP(*(.proc.info.init)) \ VMLINUX_SYMBOL(__proc_info_end) = .; #define HYPERVISOR_TEXT \ @@ -169,7 +169,7 @@ SECTIONS */ __vectors_start = .; .vectors 0xffff0000 : AT(__vectors_start) { - *(.vectors) + KEEP(*(.vectors)) } . = __vectors_start + SIZEOF(.vectors); __vectors_end = .; @@ -192,24 +192,24 @@ SECTIONS } .init.arch.info : { __arch_info_begin = .; - *(.arch.info.init) + KEEP(*(.arch.info.init)) __arch_info_end = .; } .init.tagtable : { __tagtable_begin = .; - *(.taglist.init) + KEEP(*(.taglist.init)) __tagtable_end = .; } #ifdef CONFIG_SMP_ON_UP .init.smpalt : { __smpalt_begin = .; - *(.alt.smp.init) + KEEP(*(.alt.smp.init)) __smpalt_end = .; } #endif .init.pv_table : { __pv_table_begin = .; - *(.pv_table) + KEEP(*(.pv_table)) __pv_table_end = .; } .init.data : { diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 6a09cc204b07..7117b8e99de8 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -717,6 +717,7 @@ config SWP_EMULATE config CPU_BIG_ENDIAN bool "Build big-endian kernel" depends on ARCH_SUPPORTS_BIG_ENDIAN + depends on !THIN_ARCHIVES help Say Y if you plan on running a kernel in big-endian mode. Note that your board must be properly built and your board diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 9136c3afd3c6..e01f0b00a678 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -433,7 +433,9 @@ * during second ld run in second ld pass when generating System.map */ #define TEXT_TEXT \ ALIGN_FUNCTION(); \ - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ + *(.text.hot .text.hot.*) \ + *(.text.unlikely .text.unlikely.*) \ + *(.text .text.*) \ *(.ref.text) \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \