Message ID | 20191013221310.30748-7-sebastian@breakpoint.cc (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Queued spinlocks/RW-locks for ARM | expand |
On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote: > > On non-preemptive kernels, the locking instruction is less than 64 bytes > and it makes sense to inline it. With PREEMPTION the kernel becomes very > big if the locks are inlined. > > Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> > --- At the moment, we have two architectures selecting all 28 symbols and you are adding a third, all other architecture select none of them. This tells me that the configurability has gone a little overboard. How about adding a shortcut ARCH_INLINE_ALL_SPINLOCKS that selects the 28 symbols and using that for arm/arm64/s390? Also, the output of 'size vmlinux' before and after the patch for multi_v7_defconfig would be useful to have in the changelog, as there are a couple of platforms that are particularly sensitive to object code size changes. Arnd > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 4c780e7387208..21edc8e649261 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -26,6 +26,32 @@ config ARM > select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST > select ARCH_HAVE_CUSTOM_GPIO_H > select ARCH_HAS_GCOV_PROFILE_ALL > + select ARCH_INLINE_READ_LOCK if !PREEMPTION > + select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION > + select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION > + select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION > + select ARCH_INLINE_READ_UNLOCK if !PREEMPTION > + select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION > + select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION > + select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION > + select ARCH_INLINE_WRITE_LOCK if !PREEMPTION > + select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION > + select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION > + select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION > + select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION > + select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION > + select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION > + select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION > + select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION > + select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION > + select ARCH_INLINE_SPIN_LOCK if !PREEMPTION > + select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION > + select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION > + select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION > + select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION > + select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION > + select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION > + select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION > select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC > select ARCH_MIGHT_HAVE_PC_PARPORT > select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN > -- > 2.23.0 >
On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote: > On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote > > Also, the output of 'size vmlinux' before and after the patch for > multi_v7_defconfig would be useful to have in the changelog, as there > are a couple of platforms that are particularly sensitive to object code > size changes. To follow up, here are the numbers I get, building the linux-5.4-rc2 multi_v7_defconfig with clang-9, comparing the original spinlock and the qspinlock, combined with inlining all locks or leaving them out of line: text data bss dec hex filename 15294816 6958636 404480 22657932 159bb8c vmlinux-orig 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline This gives us a 1.5% size increase over the original code with inlining, or a 0.5% decrease without inlining, or about 1.9% size difference for the Kconfig change itself, which does sound significant. Maybe it should be configurable? Arnd
On 2019-10-14 09:43:53 [+0200], Arnd Bergmann wrote: > On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior > <sebastian@breakpoint.cc> wrote: > > > > On non-preemptive kernels, the locking instruction is less than 64 bytes > > and it makes sense to inline it. With PREEMPTION the kernel becomes very > > big if the locks are inlined. > > > > Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> > > --- > > At the moment, we have two architectures selecting all 28 symbols > and you are adding a third, all other architecture select none of them. > > This tells me that the configurability has gone a little overboard. How about > adding a shortcut ARCH_INLINE_ALL_SPINLOCKS that selects the 28 > symbols and using that for arm/arm64/s390? Sounds reasonable. > Also, the output of 'size vmlinux' before and after the patch for > multi_v7_defconfig would be useful to have in the changelog, as there > are a couple of platforms that are particularly sensitive to object code > size changes. okay. > Arnd Sebastian
On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote: > On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote > > > > Also, the output of 'size vmlinux' before and after the patch for > > multi_v7_defconfig would be useful to have in the changelog, as there > > are a couple of platforms that are particularly sensitive to object code > > size changes. > > To follow up, here are the numbers I get, building the linux-5.4-rc2 > multi_v7_defconfig with clang-9, comparing the original spinlock > and the qspinlock, combined with inlining all locks or leaving them > out of line: > > text data bss dec hex filename > 15294816 6958636 404480 22657932 159bb8c vmlinux-orig > 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline > 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline > 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline > > This gives us a 1.5% size increase over the original code with > inlining, or a 0.5% decrease without inlining, or about 1.9% size > difference for the Kconfig change itself, which does sound > significant. I had 2% increase (vmlinux-orig -> vmlinux-qlock-inline) but my vmlinux was only half the size. Performance wise the inlining improved the hackbench numbers in my test. How do we account that? > Maybe it should be configurable? Any comment from the locking department? I would prefer to avoid an extra knob for it. The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I posted last time: with inlining I get more or less to the performance of the ticket implementation on imx6 and it makes no difference on AM572x. Let me run the hackbench test with the multi_v7_defconfig on my two boards with ORIG/qlock/qlock-inline and come with some numbers here. > Arnd Sebastian
On 10/15/19 6:30 PM, Sebastian Andrzej Siewior wrote: > On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote: >> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote: >>> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote >>> >>> Also, the output of 'size vmlinux' before and after the patch for >>> multi_v7_defconfig would be useful to have in the changelog, as there >>> are a couple of platforms that are particularly sensitive to object code >>> size changes. >> To follow up, here are the numbers I get, building the linux-5.4-rc2 >> multi_v7_defconfig with clang-9, comparing the original spinlock >> and the qspinlock, combined with inlining all locks or leaving them >> out of line: >> >> text data bss dec hex filename >> 15294816 6958636 404480 22657932 159bb8c vmlinux-orig >> 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline >> 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline >> 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline >> >> This gives us a 1.5% size increase over the original code with >> inlining, or a 0.5% decrease without inlining, or about 1.9% size >> difference for the Kconfig change itself, which does sound >> significant. > I had 2% increase (vmlinux-orig -> vmlinux-qlock-inline) but my vmlinux > was only half the size. Performance wise the inlining improved the > hackbench numbers in my test. How do we account that? > >> Maybe it should be configurable? > Any comment from the locking department? I would prefer to avoid an > extra knob for it. > The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I > posted last time: with inlining I get more or less to the performance of > the ticket implementation on imx6 and it makes no difference on AM572x. > Let me run the hackbench test with the multi_v7_defconfig on my two > boards with ORIG/qlock/qlock-inline and come with some numbers here. Perhaps, we should not just looking at the all inlined or all uninlined cases. Different variants of the lock and unlock functions can differ widely in size depends on how the irq handling code is handled in each architecture. Maybe we can inline the small ones but leave the bigger ones uninlined. That can increase performance without too much overhead in kernel size. Cheers, Longman
On Wed, Oct 16, 2019 at 12:37 AM Waiman Long <longman@redhat.com> wrote: > On 10/15/19 6:30 PM, Sebastian Andrzej Siewior wrote: > > On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote: > >> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote: > >>> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote > >> Maybe it should be configurable? > > Any comment from the locking department? I would prefer to avoid an > > extra knob for it. > > The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I > > posted last time: with inlining I get more or less to the performance of > > the ticket implementation on imx6 and it makes no difference on AM572x. > > Let me run the hackbench test with the multi_v7_defconfig on my two > > boards with ORIG/qlock/qlock-inline and come with some numbers here. > > Perhaps, we should not just looking at the all inlined or all uninlined > cases. Different variants of the lock and unlock functions can differ > widely in size depends on how the irq handling code is handled in each > architecture. Maybe we can inline the small ones but leave the bigger > ones uninlined. That can increase performance without too much overhead > in kernel size. We also have CONFIG_CC_OPTIMIZE_FOR_SIZE, which at the moment only controls the -O2 / -Os options, but basing it on that would be an easy way to avoid another user-visible option. Arnd
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4c780e7387208..21edc8e649261 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -26,6 +26,32 @@ config ARM select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_INLINE_READ_LOCK if !PREEMPTION + select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION + select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION + select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION + select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN
On non-preemptive kernels, the locking instruction is less than 64 bytes and it makes sense to inline it. With PREEMPTION the kernel becomes very big if the locks are inlined. Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> --- arch/arm/Kconfig | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)