Message ID | 1602141333-17822-1-git-send-email-maninder1.s@samsung.com (mailing list archive) |
---|---|
Headers | show |
Series | IRQ stack support for ARM | expand |
On 2020-10-08 12:45:30 [+0530], Maninder Singh wrote: > Observed Stack Overflow on 8KB kernel stack on ARM specially > incase on network interrupts, which results in undeterministic behaviour. > So there is need for per cpu dedicated IRQ stack for ARM. You could try to look where this stack overflow is coming from. If this is limited to softirq processing/NAPI (since you mentioned network) you could try implementing do_softirq_own_stack(). Sebastian
On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote: > Observed Stack Overflow on 8KB kernel stack on ARM specially > incase on network interrupts, which results in undeterministic behaviour. > So there is need for per cpu dedicated IRQ stack for ARM. > > As ARm does not have extra co-processor register > to save thread info pointer, IRQ stack will be at some > performance cost, so code is under CONFIG_IRQ_STACK. > > and we don't have much knowledge and set up for CLANG > and ARM_UNWIND, so dependency added for both cases. > > Tested patch set with QEMU for latest kernel > and 4.1 kernel for ARM target with same patch set. You need to investigate and show where and why this is happening. My guess is you have a network driver that uses a lot of kernel stack space, which itself would be a bug. Note that there are compiler versions out there that mis-optimise and eat stack space - the kernel build should be warning if a function uses a large amount of stack.
On Thu, Oct 8, 2020 at 1:30 AM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote: > > Observed Stack Overflow on 8KB kernel stack on ARM specially > > incase on network interrupts, which results in undeterministic behaviour. > > So there is need for per cpu dedicated IRQ stack for ARM. > > > > As ARm does not have extra co-processor register > > to save thread info pointer, IRQ stack will be at some > > performance cost, so code is under CONFIG_IRQ_STACK. > > > > and we don't have much knowledge and set up for CLANG > > and ARM_UNWIND, so dependency added for both cases. > > > > Tested patch set with QEMU for latest kernel > > and 4.1 kernel for ARM target with same patch set. > > You need to investigate and show where and why this is happening. My > guess is you have a network driver that uses a lot of kernel stack > space, which itself would be a bug. > > Note that there are compiler versions out there that mis-optimise and > eat stack space - the kernel build should be warning if a function > uses a large amount of stack. For tracking down those not-super-helpful compiler warnings, I wrote a tool where if you rebuild with debug info, and give it the object file and string of the function the compiler warned about it will parse the DWARF to tell you the size of each local variable, and if it came from an inline frame. Generally, it's possible to stack allocate something that's way too big; instead those should be allocated on the heap. https://github.com/ClangBuiltLinux/frame-larger-than (I haven't had time to sit down and use it to resolve all outstanding issues, but it has worked well for me in the past)
On 10/15/20 1:59 PM, Nick Desaulniers wrote: > On Thu, Oct 8, 2020 at 1:30 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: >> >> On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote: >>> Observed Stack Overflow on 8KB kernel stack on ARM specially >>> incase on network interrupts, which results in undeterministic behaviour. >>> So there is need for per cpu dedicated IRQ stack for ARM. >>> >>> As ARm does not have extra co-processor register >>> to save thread info pointer, IRQ stack will be at some >>> performance cost, so code is under CONFIG_IRQ_STACK. >>> >>> and we don't have much knowledge and set up for CLANG >>> and ARM_UNWIND, so dependency added for both cases. >>> >>> Tested patch set with QEMU for latest kernel >>> and 4.1 kernel for ARM target with same patch set. >> >> You need to investigate and show where and why this is happening. My >> guess is you have a network driver that uses a lot of kernel stack >> space, which itself would be a bug. >> >> Note that there are compiler versions out there that mis-optimise and >> eat stack space - the kernel build should be warning if a function >> uses a large amount of stack. > > For tracking down those not-super-helpful compiler warnings, I wrote a > tool where if you rebuild with debug info, and give it the object file > and string of the function the compiler warned about it will parse the > DWARF to tell you the size of each local variable, and if it came from > an inline frame. Generally, it's possible to stack allocate something > that's way too big; instead those should be allocated on the heap. > https://github.com/ClangBuiltLinux/frame-larger-than > (I haven't had time to sit down and use it to resolve all outstanding > issues, but it has worked well for me in the past) Things get a bit more difficult with the network stack and you easily recurse into functions and blow up the stack size. This is especially true if you have some complex network tunneling or filtering going on. For one, in the 4.1 kernel that appears to have been used as a basis for this work, if you have CONFIG_BPF enabled but not CONFIG_BPF_JIT_ALWAYS_ON, __bpf_prog_run will require about 724 bytes of stack last I measured, that's nearly 10% of the stack that goes away just like that. -- Florian
(replying to my own mail, apparently my normal outgoing email server is blacklisted, so resending from @kernel.org) On Fri, Oct 16, 2020 at 12:09 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Oct 8, 2020 at 10:32 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote: > > > Observed Stack Overflow on 8KB kernel stack on ARM specially > > > incase on network interrupts, which results in undeterministic behaviour. > > > So there is need for per cpu dedicated IRQ stack for ARM. > > > > > > As ARm does not have extra co-processor register > > > to save thread info pointer, IRQ stack will be at some > > > performance cost, so code is under CONFIG_IRQ_STACK. > > > > > > and we don't have much knowledge and set up for CLANG > > > and ARM_UNWIND, so dependency added for both cases. > > > > > > Tested patch set with QEMU for latest kernel > > > and 4.1 kernel for ARM target with same patch set. > > > > You need to investigate and show where and why this is happening. My > > guess is you have a network driver that uses a lot of kernel stack > > space, which itself would be a bug. > > Agreed. > > > Note that there are compiler versions out there that mis-optimise and > > eat stack space - the kernel build should be warning if a function > > uses a large amount of stack. > > Some more ideas for figuring it out: > > CONFIG_DEBUG_STACK_USAGE may also be helpful in identifying > code paths that are deeply nested with multiple functions taking a > lot of stack space, but each one staying under the limit. > > CONFIG_DEBUG_STACKOVERFLOW would also help here but > is not supported on Arm at the moment. There was a patch[1] from > Uwe Kleine-König to add this, and I suppose we should still add > that, in particular if it helps debug this problem. > > CONFIG_VMAP_STACK is probably the best way to debug > random runtime stack overflows because using a guard page > turns random memory corruption into an immediate oops, > but I don't think there is an implementation for Arm yet and > using a lot of vmalloc space means we might not be able to > default to this. > > Regardless of identifying and fixing the bug Maninder found, I > also think that supporting separate async stacks on Arm is useful > for determinism. Most of the popular architectures use irqstack > for this reason, and I was actually surprised that we don't do it > on arch/arm/. > > Arnd > > [1] https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koenig@pengutronix.de/
On Wed, Oct 21, 2020 at 01:58:21PM +0200, Arnd Bergmann wrote: > (replying to my own mail, apparently my normal outgoing email server is > blacklisted, so resending from @kernel.org) > > On Fri, Oct 16, 2020 at 12:09 PM Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Thu, Oct 8, 2020 at 10:32 AM Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > On Thu, Oct 08, 2020 at 12:45:30PM +0530, Maninder Singh wrote: > > > > Observed Stack Overflow on 8KB kernel stack on ARM specially > > > > incase on network interrupts, which results in undeterministic behaviour. > > > > So there is need for per cpu dedicated IRQ stack for ARM. > > > > > > > > As ARm does not have extra co-processor register > > > > to save thread info pointer, IRQ stack will be at some > > > > performance cost, so code is under CONFIG_IRQ_STACK. > > > > > > > > and we don't have much knowledge and set up for CLANG > > > > and ARM_UNWIND, so dependency added for both cases. > > > > > > > > Tested patch set with QEMU for latest kernel > > > > and 4.1 kernel for ARM target with same patch set. > > > > > > You need to investigate and show where and why this is happening. My > > > guess is you have a network driver that uses a lot of kernel stack > > > space, which itself would be a bug. > > > > Agreed. > > > > > Note that there are compiler versions out there that mis-optimise and > > > eat stack space - the kernel build should be warning if a function > > > uses a large amount of stack. > > > > Some more ideas for figuring it out: > > > > CONFIG_DEBUG_STACK_USAGE may also be helpful in identifying > > code paths that are deeply nested with multiple functions taking a > > lot of stack space, but each one staying under the limit. > > > > CONFIG_DEBUG_STACKOVERFLOW would also help here but > > is not supported on Arm at the moment. There was a patch[1] from > > Uwe Kleine-König to add this, and I suppose we should still add > > that, in particular if it helps debug this problem. > > > > CONFIG_VMAP_STACK is probably the best way to debug > > random runtime stack overflows because using a guard page > > turns random memory corruption into an immediate oops, > > but I don't think there is an implementation for Arm yet and > > using a lot of vmalloc space means we might not be able to > > default to this. > > > > Regardless of identifying and fixing the bug Maninder found, I > > also think that supporting separate async stacks on Arm is useful > > for determinism. Most of the popular architectures use irqstack > > for this reason, and I was actually surprised that we don't do it > > on arch/arm/. > > > > Arnd > > > > [1] https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koenig@pengutronix.de/ We don't do it because we don't have a separate register to be able to store the thread_info pointer, and copying that lump between the SVC and IRQ stack will add massively to IRQ latency, especially for older machines.
On Wed, Oct 21, 2020 at 2:34 PM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > We don't do it because we don't have a separate register to be able > to store the thread_info pointer, and copying that lump between the > SVC and IRQ stack will add massively to IRQ latency, especially for > older machines. I forwarded my other reply as well, in which I suggested using CONFIG_THREAD_INFO_IN_TASK, wouldn't that solve the problem? Arnd