[6/6] ARM: Inline locking functions for !PREEMPTION

Message ID	20191013221310.30748-7-sebastian@breakpoint.cc (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=cQrh=YG=lists.infradead.org=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9597207FF From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> To: linux-arm-kernel@lists.infradead.org Subject: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION Date: Mon, 14 Oct 2019 00:13:10 +0200 Message-Id: <20191013221310.30748-7-sebastian@breakpoint.cc> In-Reply-To: <20191013221310.30748-1-sebastian@breakpoint.cc> References: <20191013221310.30748-1-sebastian@breakpoint.cc> MIME-Version: 1.0 summary: Content analysis details: (-1.5 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [2a0a:51c0:0:12e:520:0:0:1 listed in] [list.dnswl.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.8 UPPERCASE_50_75 message body is 50-75% uppercase Precedence: list Cc: Arnd Bergmann <arnd@arndb.de>, Peter Zijlstra <peterz@infradead.org>, Russell King <linux@armlinux.org.uk>, Sebastian Andrzej Siewior <sebastian@breakpoint.cc>, Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	Queued spinlocks/RW-locks for ARM \| expand [RFC,0/6,v2] Queued spinlocks/RW-locks for ARM [1/6] sh: Move cmpxchg-xchg.h to asm-generic [2/6] ARM: cmpxchg: Define first cmpxchg() followed by xchg() [3/6] ARM: Add xchg_{8\|16}() on generic cmpxchg() on CPU_V6 [4/6] ARM: Use qrwlock implementation [5/6] ARM: Use qspinlock implementation [6/6] ARM: Inline locking functions for !PREEMPTION

Sebastian Andrzej Siewior Oct. 13, 2019, 10:13 p.m. UTC

On non-preemptive kernels, the locking instruction is less than 64 bytes
and it makes sense to inline it. With PREEMPTION the kernel becomes very
big if the locks are inlined.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/arm/Kconfig | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Arnd Bergmann Oct. 14, 2019, 7:43 a.m. UTC | #1

On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior
<sebastian@breakpoint.cc> wrote:
>
> On non-preemptive kernels, the locking instruction is less than 64 bytes
> and it makes sense to inline it. With PREEMPTION the kernel becomes very
> big if the locks are inlined.
>
> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
> ---

At the moment, we have two architectures selecting all 28 symbols
and you are adding a third, all other architecture select none of them.

This tells me that the configurability has gone a little overboard. How about
adding a shortcut ARCH_INLINE_ALL_SPINLOCKS that selects the 28
symbols and using that for arm/arm64/s390?

Also, the output of 'size vmlinux' before and after the patch for
multi_v7_defconfig would be useful to have in the changelog, as there
are a couple of platforms that are particularly sensitive to object code
size changes.

      Arnd

> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 4c780e7387208..21edc8e649261 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -26,6 +26,32 @@ config ARM
>         select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>         select ARCH_HAVE_CUSTOM_GPIO_H
>         select ARCH_HAS_GCOV_PROFILE_ALL
> +       select ARCH_INLINE_READ_LOCK if !PREEMPTION
> +       select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
> +       select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
> +       select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
>         select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC
>         select ARCH_MIGHT_HAVE_PC_PARPORT
>         select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN
> --
> 2.23.0
>

Arnd Bergmann Oct. 14, 2019, 10:01 a.m. UTC | #2

On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
>
> Also, the output of 'size vmlinux' before and after the patch for
> multi_v7_defconfig would be useful to have in the changelog, as there
> are a couple of platforms that are particularly sensitive to object code
> size changes.

To follow up, here are the numbers I get, building the linux-5.4-rc2
multi_v7_defconfig with clang-9, comparing the original spinlock
and the qspinlock, combined with inlining all locks or leaving them
out of line:

   text    data     bss     dec     hex filename
15294816 6958636 404480 22657932 159bb8c vmlinux-orig
16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline
15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline
15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline

This gives us a 1.5% size increase over the original code with
inlining, or a 0.5% decrease without inlining, or about 1.9% size
difference for the Kconfig change itself, which does sound
significant.

Maybe it should be configurable?

      Arnd

Sebastian Andrzej Siewior Oct. 15, 2019, 10:04 p.m. UTC | #3

On 2019-10-14 09:43:53 [+0200], Arnd Bergmann wrote:
> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior
> <sebastian@breakpoint.cc> wrote:
> >
> > On non-preemptive kernels, the locking instruction is less than 64 bytes
> > and it makes sense to inline it. With PREEMPTION the kernel becomes very
> > big if the locks are inlined.
> >
> > Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
> > ---
> 
> At the moment, we have two architectures selecting all 28 symbols
> and you are adding a third, all other architecture select none of them.
> 
> This tells me that the configurability has gone a little overboard. How about
> adding a shortcut ARCH_INLINE_ALL_SPINLOCKS that selects the 28
> symbols and using that for arm/arm64/s390?

Sounds reasonable. 

> Also, the output of 'size vmlinux' before and after the patch for
> multi_v7_defconfig would be useful to have in the changelog, as there
> are a couple of platforms that are particularly sensitive to object code
> size changes.

okay.

>       Arnd

Sebastian

Sebastian Andrzej Siewior Oct. 15, 2019, 10:30 p.m. UTC | #4

On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote:
> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
> >
> > Also, the output of 'size vmlinux' before and after the patch for
> > multi_v7_defconfig would be useful to have in the changelog, as there
> > are a couple of platforms that are particularly sensitive to object code
> > size changes.
> 
> To follow up, here are the numbers I get, building the linux-5.4-rc2
> multi_v7_defconfig with clang-9, comparing the original spinlock
> and the qspinlock, combined with inlining all locks or leaving them
> out of line:
> 
>    text    data     bss     dec     hex filename
> 15294816 6958636 404480 22657932 159bb8c vmlinux-orig
> 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline
> 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline
> 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline
> 
> This gives us a 1.5% size increase over the original code with
> inlining, or a 0.5% decrease without inlining, or about 1.9% size
> difference for the Kconfig change itself, which does sound
> significant.

I had 2% increase (vmlinux-orig -> vmlinux-qlock-inline) but my vmlinux
was only half the size. Performance wise the inlining improved the
hackbench numbers in my test. How do we account that?

> Maybe it should be configurable?

Any comment from the locking department? I would prefer to avoid an
extra knob for it. 
The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I
posted last time: with inlining I get more or less to the performance of
the ticket implementation on imx6 and it makes no difference on AM572x.
Let me run the hackbench test with the multi_v7_defconfig on my two
boards with ORIG/qlock/qlock-inline and come with some numbers here.

>       Arnd

Sebastian

Waiman Long Oct. 15, 2019, 10:37 p.m. UTC | #5

On 10/15/19 6:30 PM, Sebastian Andrzej Siewior wrote:
> On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote:
>> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
>>>
>>> Also, the output of 'size vmlinux' before and after the patch for
>>> multi_v7_defconfig would be useful to have in the changelog, as there
>>> are a couple of platforms that are particularly sensitive to object code
>>> size changes.
>> To follow up, here are the numbers I get, building the linux-5.4-rc2
>> multi_v7_defconfig with clang-9, comparing the original spinlock
>> and the qspinlock, combined with inlining all locks or leaving them
>> out of line:
>>
>>    text    data     bss     dec     hex filename
>> 15294816 6958636 404480 22657932 159bb8c vmlinux-orig
>> 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline
>> 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline
>> 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline
>>
>> This gives us a 1.5% size increase over the original code with
>> inlining, or a 0.5% decrease without inlining, or about 1.9% size
>> difference for the Kconfig change itself, which does sound
>> significant.
> I had 2% increase (vmlinux-orig -> vmlinux-qlock-inline) but my vmlinux
> was only half the size. Performance wise the inlining improved the
> hackbench numbers in my test. How do we account that?
>
>> Maybe it should be configurable?
> Any comment from the locking department? I would prefer to avoid an
> extra knob for it. 
> The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I
> posted last time: with inlining I get more or less to the performance of
> the ticket implementation on imx6 and it makes no difference on AM572x.
> Let me run the hackbench test with the multi_v7_defconfig on my two
> boards with ORIG/qlock/qlock-inline and come with some numbers here.

Perhaps, we should not just looking at the all inlined or all uninlined
cases. Different variants of the lock and unlock functions can differ
widely in size depends on how the irq handling code is handled in each
architecture. Maybe we can inline the small ones but leave the bigger
ones uninlined. That can increase performance without too much overhead
in kernel size.

Cheers,
Longman

Arnd Bergmann Oct. 15, 2019, 10:47 p.m. UTC | #6

On Wed, Oct 16, 2019 at 12:37 AM Waiman Long <longman@redhat.com> wrote:
> On 10/15/19 6:30 PM, Sebastian Andrzej Siewior wrote:
> > On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote:
> >> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >>> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
> >> Maybe it should be configurable?
> > Any comment from the locking department? I would prefer to avoid an
> > extra knob for it.
> > The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I
> > posted last time: with inlining I get more or less to the performance of
> > the ticket implementation on imx6 and it makes no difference on AM572x.
> > Let me run the hackbench test with the multi_v7_defconfig on my two
> > boards with ORIG/qlock/qlock-inline and come with some numbers here.
>
> Perhaps, we should not just looking at the all inlined or all uninlined
> cases. Different variants of the lock and unlock functions can differ
> widely in size depends on how the irq handling code is handled in each
> architecture. Maybe we can inline the small ones but leave the bigger
> ones uninlined. That can increase performance without too much overhead
> in kernel size.

We also have CONFIG_CC_OPTIMIZE_FOR_SIZE, which at the moment
only controls the -O2 / -Os options, but basing it on that would be
an easy way to avoid another user-visible option.

      Arnd

[6/6] ARM: Inline locking functions for !PREEMPTION

Commit Message

Comments

Patch