Message ID | 50007197.8030403@xenomai.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: > > I do not know if it is really useful, but it seems it would be possible > to reduce the number of memory accesses to just one in the irq_handler > macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a > PC relative load, with something like the following patch: To be strict with ccode sections, you can't do this. The handle_arch_irq symbol identifies a variable and with your patch you're moving it from the .data section to the .text section. The .text section is meant to be read only, and this is even more true when using a XIP kernel where .text is in ROM, or if we could make the access protection of the kernel ro. > diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S > index 0d1851c..48ee46a 100644 > --- a/arch/arm/kernel/entry-armv.S > +++ b/arch/arm/kernel/entry-armv.S > @@ -37,10 +37,9 @@ > */ > .macro irq_handler > #ifdef CONFIG_MULTI_IRQ_HANDLER > - ldr r1, =handle_arch_irq > mov r0, sp > adr lr, BSYM(9997f) > - ldr pc, [r1] > + ldr pc, handle_arch_irq > #else > arch_irq_handler_default > #endif > @@ -325,6 +324,12 @@ ENDPROC(__pabt_svc) > #endif > .LCfp: > .word fp_enter > +#ifdef CONFIG_MULTI_IRQ_HANDLER > + .globl handle_arch_irq > +handle_arch_irq: > + .space 4 > +#endif > + > > /* > * User mode handlers > @@ -1151,9 +1156,3 @@ cr_alignment: > .space 4 > cr_no_alignment: > .space 4 > - > -#ifdef CONFIG_MULTI_IRQ_HANDLER > - .globl handle_arch_irq > -handle_arch_irq: > - .space 4 > -#endif > > > -- > Gilles. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On 07/13/2012 09:40 PM, Nicolas Pitre wrote: > On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: > >> >> I do not know if it is really useful, but it seems it would be possible >> to reduce the number of memory accesses to just one in the irq_handler >> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a >> PC relative load, with something like the following patch: > > To be strict with ccode sections, you can't do this. The > handle_arch_irq symbol identifies a variable and with your patch you're > moving it from the .data section to the .text section. The .text > section is meant to be read only, and this is even more true when using > a XIP kernel where .text is in ROM, or if we could make the access > protection of the kernel ro. I understand that but, XIP kernel aside, the handle_arch_irq variable is set only once very early during the boot process, so, almost read-only. Is not Linux using self-modifying code in some cases anyway (booting an SMP kernel on an UP processor for instance).
On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: > On 07/13/2012 09:40 PM, Nicolas Pitre wrote: > > On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: > > > >> > >> I do not know if it is really useful, but it seems it would be possible > >> to reduce the number of memory accesses to just one in the irq_handler > >> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a > >> PC relative load, with something like the following patch: > > > > To be strict with ccode sections, you can't do this. The > > handle_arch_irq symbol identifies a variable and with your patch you're > > moving it from the .data section to the .text section. The .text > > section is meant to be read only, and this is even more true when using > > a XIP kernel where .text is in ROM, or if we could make the access > > protection of the kernel ro. > > I understand that but, XIP kernel aside, the handle_arch_irq variable is > set only once very early during the boot process, so, almost read-only. > Is not Linux using self-modifying code in some cases anyway (booting an > SMP kernel on an UP processor for instance). There are limits to which such tricks should be applied. In the SMP on UP case this is a matter of making the kernel boot at all which is a rather strong reason. Do you have performance numbers like interrupt latency that show this patch being worth it? Without concrete justifications I don't think we should go down that path. Nicolas
On 07/13/2012 10:09 PM, Nicolas Pitre wrote: > On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: > >> On 07/13/2012 09:40 PM, Nicolas Pitre wrote: >>> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: >>> >>>> >>>> I do not know if it is really useful, but it seems it would be possible >>>> to reduce the number of memory accesses to just one in the irq_handler >>>> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a >>>> PC relative load, with something like the following patch: >>> >>> To be strict with ccode sections, you can't do this. The >>> handle_arch_irq symbol identifies a variable and with your patch you're >>> moving it from the .data section to the .text section. The .text >>> section is meant to be read only, and this is even more true when using >>> a XIP kernel where .text is in ROM, or if we could make the access >>> protection of the kernel ro. >> >> I understand that but, XIP kernel aside, the handle_arch_irq variable is >> set only once very early during the boot process, so, almost read-only. >> Is not Linux using self-modifying code in some cases anyway (booting an >> SMP kernel on an UP processor for instance). > > There are limits to which such tricks should be applied. In the SMP on > UP case this is a matter of making the kernel boot at all which is a > rather strong reason. > > Do you have performance numbers like interrupt latency that show this > patch being worth it? Without concrete justifications I don't think we > should go down that path. I intend to do some interrupt latency measurements soon. But I suspect CONFIG_MULTI_IRQ_HANDLER will cause more differences due to the fact that the irq handlers are now fat C compiled code instead of carefully optimized assembly code, than because of these two memory accesses. And in fact, chances are that I will observe nothing at all since the low end platforms I have are AT91 which are not using CONFIG_MULTI_IRQ_HANDLER yet.
On 07/13/2012 10:09 PM, Nicolas Pitre wrote: > On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: > >> On 07/13/2012 09:40 PM, Nicolas Pitre wrote: >>> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote: >>> >>>> >>>> I do not know if it is really useful, but it seems it would be possible >>>> to reduce the number of memory accesses to just one in the irq_handler >>>> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a >>>> PC relative load, with something like the following patch: >>> >>> To be strict with ccode sections, you can't do this. The >>> handle_arch_irq symbol identifies a variable and with your patch you're >>> moving it from the .data section to the .text section. The .text >>> section is meant to be read only, and this is even more true when using >>> a XIP kernel where .text is in ROM, or if we could make the access >>> protection of the kernel ro. >> >> I understand that but, XIP kernel aside, the handle_arch_irq variable is >> set only once very early during the boot process, so, almost read-only. >> Is not Linux using self-modifying code in some cases anyway (booting an >> SMP kernel on an UP processor for instance). > > There are limits to which such tricks should be applied. In the SMP on > UP case this is a matter of making the kernel boot at all which is a > rather strong reason. > > Do you have performance numbers like interrupt latency that show this > patch being worth it? Without concrete justifications I don't think we > should go down that path. So, I ran a few tests on at91rm9200, where I expected the differences to be most visible. First I enabled CONFIG_MULTI_IRQ_HANDLER and wrote the irq decoding handler in plain C. This increases the irq latency of 1.2us (measured with the average irq latency on an idle system). I rewrote this irq decoding handler in assembly, using the macros in entry-macro.S. This decreases the irq latency of 600ns. Then I try the trick at the beginning of this thread, and... could not measure any difference, so, you were right. Anyway, given that on at91rm9200 worst case irq latencies are in the 80us range, all these optimizations are pointless.
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 0d1851c..48ee46a 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -37,10 +37,9 @@ */ .macro irq_handler #ifdef CONFIG_MULTI_IRQ_HANDLER - ldr r1, =handle_arch_irq mov r0, sp adr lr, BSYM(9997f) - ldr pc, [r1] + ldr pc, handle_arch_irq #else arch_irq_handler_default #endif @@ -325,6 +324,12 @@ ENDPROC(__pabt_svc) #endif .LCfp: .word fp_enter +#ifdef CONFIG_MULTI_IRQ_HANDLER + .globl handle_arch_irq +handle_arch_irq: + .space 4 +#endif + /* * User mode handlers @@ -1151,9 +1156,3 @@ cr_alignment: .space 4 cr_no_alignment: .space 4 - -#ifdef CONFIG_MULTI_IRQ_HANDLER - .globl handle_arch_irq -handle_arch_irq: - .space 4 -#endif