mbox series

[0/3] IRQ stack support for ARM

Message ID 1602141333-17822-1-git-send-email-maninder1.s@samsung.com (mailing list archive)
Headers show
Series IRQ stack support for ARM | expand

Message

Maninder Singh Oct. 8, 2020, 7:15 a.m. UTC
Observed Stack Overflow on 8KB kernel stack on ARM specially 
incase on network interrupts, which results in undeterministic behaviour. 
So there is need for per cpu dedicated IRQ stack for ARM.

As ARm does not have extra co-processor register
to save thread info pointer, IRQ stack will be at some
performance cost, so code is under CONFIG_IRQ_STACK.

and we don't have much knowledge and set up for CLANG
and ARM_UNWIND, so dependency added for both cases.

Tested patch set with QEMU for latest kernel
and 4.1 kernel for ARM target with same patch set.

Maninder Singh, Vaneet Narang (3):
  arm: introduce self pointer in thread info
  arm: introduce IRQ stacks
  arm: Modify stack trace and dump for use with irq stack

 arch/arm/Kconfig                   | 10 ++++++++
 arch/arm/include/asm/assembler.h   | 11 +++++++++
 arch/arm/include/asm/irq.h         | 13 +++++++++++
 arch/arm/include/asm/thread_info.h | 27 ++++++++++++++++++++++
 arch/arm/kernel/entry-armv.S       | 41 ++++++++++++++++++++++++++++++++-
 arch/arm/kernel/irq.c              | 24 +++++++++++++++++++
 arch/arm/kernel/stacktrace.c       | 21 +++++++++++++++++
 arch/arm/kernel/traps.c            | 47 +++++++++++++++++++++++++++++++++++---
 arch/arm/lib/backtrace.S           | 18 +++++++++++++++
 include/linux/thread_info.h        |  4 ++++
 kernel/fork.c                      |  1 +
 11 files changed, 213 insertions(+), 4 deletions(-)

Comments

Sebastian Andrzej Siewior Oct. 8, 2020, 7:53 a.m. UTC | #1
On 2020-10-08 12:45:30 [+0530], Maninder Singh wrote:
> Observed Stack Overflow on 8KB kernel stack on ARM specially 
> incase on network interrupts, which results in undeterministic behaviour. 
> So there is need for per cpu dedicated IRQ stack for ARM.

You could try to look where this stack overflow is coming from. If this
is limited to softirq processing/NAPI (since you mentioned network) you
could try implementing do_softirq_own_stack().

Sebastian
Russell King (Oracle) Oct. 8, 2020, 8:30 a.m. UTC | #2
On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
> Observed Stack Overflow on 8KB kernel stack on ARM specially 
> incase on network interrupts, which results in undeterministic behaviour. 
> So there is need for per cpu dedicated IRQ stack for ARM.
> 
> As ARm does not have extra co-processor register
> to save thread info pointer, IRQ stack will be at some
> performance cost, so code is under CONFIG_IRQ_STACK.
> 
> and we don't have much knowledge and set up for CLANG
> and ARM_UNWIND, so dependency added for both cases.
> 
> Tested patch set with QEMU for latest kernel
> and 4.1 kernel for ARM target with same patch set.

You need to investigate and show where and why this is happening. My
guess is you have a network driver that uses a lot of kernel stack
space, which itself would be a bug.

Note that there are compiler versions out there that mis-optimise and
eat stack space - the kernel build should be warning if a function
uses a large amount of stack.
Nick Desaulniers Oct. 15, 2020, 8:59 p.m. UTC | #3
On Thu, Oct 8, 2020 at 1:30 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
> > Observed Stack Overflow on 8KB kernel stack on ARM specially
> > incase on network interrupts, which results in undeterministic behaviour.
> > So there is need for per cpu dedicated IRQ stack for ARM.
> >
> > As ARm does not have extra co-processor register
> > to save thread info pointer, IRQ stack will be at some
> > performance cost, so code is under CONFIG_IRQ_STACK.
> >
> > and we don't have much knowledge and set up for CLANG
> > and ARM_UNWIND, so dependency added for both cases.
> >
> > Tested patch set with QEMU for latest kernel
> > and 4.1 kernel for ARM target with same patch set.
>
> You need to investigate and show where and why this is happening. My
> guess is you have a network driver that uses a lot of kernel stack
> space, which itself would be a bug.
>
> Note that there are compiler versions out there that mis-optimise and
> eat stack space - the kernel build should be warning if a function
> uses a large amount of stack.

For tracking down those not-super-helpful compiler warnings, I wrote a
tool where if you rebuild with debug info, and give it the object file
and string of the function the compiler warned about it will parse the
DWARF to tell you the size of each local variable, and if it came from
an inline frame.  Generally, it's possible to stack allocate something
that's way too big; instead those should be allocated on the heap.
https://github.com/ClangBuiltLinux/frame-larger-than
(I haven't had time to sit down and use it to resolve all outstanding
issues, but it has worked well for me in the past)
Florian Fainelli Oct. 15, 2020, 9:16 p.m. UTC | #4
On 10/15/20 1:59 PM, Nick Desaulniers wrote:
> On Thu, Oct 8, 2020 at 1:30 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>>
>> On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
>>> Observed Stack Overflow on 8KB kernel stack on ARM specially
>>> incase on network interrupts, which results in undeterministic behaviour.
>>> So there is need for per cpu dedicated IRQ stack for ARM.
>>>
>>> As ARm does not have extra co-processor register
>>> to save thread info pointer, IRQ stack will be at some
>>> performance cost, so code is under CONFIG_IRQ_STACK.
>>>
>>> and we don't have much knowledge and set up for CLANG
>>> and ARM_UNWIND, so dependency added for both cases.
>>>
>>> Tested patch set with QEMU for latest kernel
>>> and 4.1 kernel for ARM target with same patch set.
>>
>> You need to investigate and show where and why this is happening. My
>> guess is you have a network driver that uses a lot of kernel stack
>> space, which itself would be a bug.
>>
>> Note that there are compiler versions out there that mis-optimise and
>> eat stack space - the kernel build should be warning if a function
>> uses a large amount of stack.
> 
> For tracking down those not-super-helpful compiler warnings, I wrote a
> tool where if you rebuild with debug info, and give it the object file
> and string of the function the compiler warned about it will parse the
> DWARF to tell you the size of each local variable, and if it came from
> an inline frame.  Generally, it's possible to stack allocate something
> that's way too big; instead those should be allocated on the heap.
> https://github.com/ClangBuiltLinux/frame-larger-than
> (I haven't had time to sit down and use it to resolve all outstanding
> issues, but it has worked well for me in the past)

Things get a bit more difficult with the network stack and you easily
recurse into functions and blow up the stack size. This is especially
true if you have some complex network tunneling or filtering going on.

For one, in the 4.1 kernel that appears to have been used as a basis for
this work, if you have CONFIG_BPF enabled but not
CONFIG_BPF_JIT_ALWAYS_ON, __bpf_prog_run will require about 724 bytes of
stack last I measured, that's nearly 10% of the stack that goes away
just like that.
--
Florian
Arnd Bergmann Oct. 21, 2020, 11:58 a.m. UTC | #5
(replying to my own mail, apparently my normal outgoing email server is
blacklisted, so resending from @kernel.org)

On Fri, Oct 16, 2020 at 12:09 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Oct 8, 2020 at 10:32 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> > On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
> > > Observed Stack Overflow on 8KB kernel stack on ARM specially
> > > incase on network interrupts, which results in undeterministic behaviour.
> > > So there is need for per cpu dedicated IRQ stack for ARM.
> > >
> > > As ARm does not have extra co-processor register
> > > to save thread info pointer, IRQ stack will be at some
> > > performance cost, so code is under CONFIG_IRQ_STACK.
> > >
> > > and we don't have much knowledge and set up for CLANG
> > > and ARM_UNWIND, so dependency added for both cases.
> > >
> > > Tested patch set with QEMU for latest kernel
> > > and 4.1 kernel for ARM target with same patch set.
> >
> > You need to investigate and show where and why this is happening. My
> > guess is you have a network driver that uses a lot of kernel stack
> > space, which itself would be a bug.
>
> Agreed.
>
> > Note that there are compiler versions out there that mis-optimise and
> > eat stack space - the kernel build should be warning if a function
> > uses a large amount of stack.
>
> Some more ideas for figuring it out:
>
> CONFIG_DEBUG_STACK_USAGE may also be helpful in identifying
> code paths that are deeply nested with multiple functions taking a
> lot of stack space, but each one staying under the limit.
>
> CONFIG_DEBUG_STACKOVERFLOW would also help here but
> is not supported on Arm at the moment. There was a patch[1] from
> Uwe Kleine-König to add this, and I suppose we should still add
> that, in particular if it helps debug this problem.
>
> CONFIG_VMAP_STACK is probably the best way to debug
> random runtime stack overflows because using a guard page
> turns random memory corruption into an immediate oops,
> but I don't think there is an implementation for Arm yet and
> using a lot of vmalloc space means we might not be able to
> default to this.
>
> Regardless of identifying and fixing the bug Maninder found, I
> also think that supporting separate async stacks on Arm is useful
> for determinism. Most of the popular architectures use irqstack
> for this reason, and I was actually surprised that we don't do it
> on arch/arm/.
>
>      Arnd
>
> [1] https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koenig@pengutronix.de/
Russell King (Oracle) Oct. 21, 2020, 12:34 p.m. UTC | #6
On Wed, Oct 21, 2020 at 01:58:21PM +0200, Arnd Bergmann wrote:
> (replying to my own mail, apparently my normal outgoing email server is
> blacklisted, so resending from @kernel.org)
> 
> On Fri, Oct 16, 2020 at 12:09 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Thu, Oct 8, 2020 at 10:32 AM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> > > On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote:
> > > > Observed Stack Overflow on 8KB kernel stack on ARM specially
> > > > incase on network interrupts, which results in undeterministic behaviour.
> > > > So there is need for per cpu dedicated IRQ stack for ARM.
> > > >
> > > > As ARm does not have extra co-processor register
> > > > to save thread info pointer, IRQ stack will be at some
> > > > performance cost, so code is under CONFIG_IRQ_STACK.
> > > >
> > > > and we don't have much knowledge and set up for CLANG
> > > > and ARM_UNWIND, so dependency added for both cases.
> > > >
> > > > Tested patch set with QEMU for latest kernel
> > > > and 4.1 kernel for ARM target with same patch set.
> > >
> > > You need to investigate and show where and why this is happening. My
> > > guess is you have a network driver that uses a lot of kernel stack
> > > space, which itself would be a bug.
> >
> > Agreed.
> >
> > > Note that there are compiler versions out there that mis-optimise and
> > > eat stack space - the kernel build should be warning if a function
> > > uses a large amount of stack.
> >
> > Some more ideas for figuring it out:
> >
> > CONFIG_DEBUG_STACK_USAGE may also be helpful in identifying
> > code paths that are deeply nested with multiple functions taking a
> > lot of stack space, but each one staying under the limit.
> >
> > CONFIG_DEBUG_STACKOVERFLOW would also help here but
> > is not supported on Arm at the moment. There was a patch[1] from
> > Uwe Kleine-König to add this, and I suppose we should still add
> > that, in particular if it helps debug this problem.
> >
> > CONFIG_VMAP_STACK is probably the best way to debug
> > random runtime stack overflows because using a guard page
> > turns random memory corruption into an immediate oops,
> > but I don't think there is an implementation for Arm yet and
> > using a lot of vmalloc space means we might not be able to
> > default to this.
> >
> > Regardless of identifying and fixing the bug Maninder found, I
> > also think that supporting separate async stacks on Arm is useful
> > for determinism. Most of the popular architectures use irqstack
> > for this reason, and I was actually surprised that we don't do it
> > on arch/arm/.
> >
> >      Arnd
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koenig@pengutronix.de/

We don't do it because we don't have a separate register to be able
to store the thread_info pointer, and copying that lump between the
SVC and IRQ stack will add massively to IRQ latency, especially for
older machines.
Arnd Bergmann Oct. 21, 2020, 12:46 p.m. UTC | #7
On Wed, Oct 21, 2020 at 2:34 PM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> We don't do it because we don't have a separate register to be able
> to store the thread_info pointer, and copying that lump between the
> SVC and IRQ stack will add massively to IRQ latency, especially for
> older machines.

I forwarded my other reply as well, in which I suggested using
CONFIG_THREAD_INFO_IN_TASK, wouldn't that solve the problem?

      Arnd