diff mbox

Loading handle_arch_irq with a PC relative load

Message ID 50007197.8030403@xenomai.org (mailing list archive)
State New, archived
Headers show

Commit Message

Gilles Chanteperdrix July 13, 2012, 7:05 p.m. UTC
I do not know if it is really useful, but it seems it would be possible 
to reduce the number of memory accesses to just one in the irq_handler 
macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a 
PC relative load, with something like the following patch:

Comments

Nicolas Pitre July 13, 2012, 7:40 p.m. UTC | #1
On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:

> 
> I do not know if it is really useful, but it seems it would be possible 
> to reduce the number of memory accesses to just one in the irq_handler 
> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a 
> PC relative load, with something like the following patch:

To be strict with ccode sections, you can't do this.  The 
handle_arch_irq symbol identifies a variable and with your patch you're 
moving it from the .data section to the .text section.  The .text 
section is meant to be read only, and this is even more true when using 
a XIP kernel where .text is in ROM, or if we could make the access 
protection of the kernel ro.


> diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> index 0d1851c..48ee46a 100644
> --- a/arch/arm/kernel/entry-armv.S
> +++ b/arch/arm/kernel/entry-armv.S
> @@ -37,10 +37,9 @@
>   */
>  	.macro	irq_handler
>  #ifdef CONFIG_MULTI_IRQ_HANDLER
> -	ldr	r1, =handle_arch_irq
>  	mov	r0, sp
>  	adr	lr, BSYM(9997f)
> -	ldr	pc, [r1]
> +	ldr	pc, handle_arch_irq
>  #else
>  	arch_irq_handler_default
>  #endif
> @@ -325,6 +324,12 @@ ENDPROC(__pabt_svc)
>  #endif
>  .LCfp:
>  	.word	fp_enter
> +#ifdef CONFIG_MULTI_IRQ_HANDLER
> +	.globl	handle_arch_irq
> +handle_arch_irq:
> +	.space	4
> +#endif
> +
>  
>  /*
>   * User mode handlers
> @@ -1151,9 +1156,3 @@ cr_alignment:
>  	.space	4
>  cr_no_alignment:
>  	.space	4
> -
> -#ifdef CONFIG_MULTI_IRQ_HANDLER
> -	.globl	handle_arch_irq
> -handle_arch_irq:
> -	.space	4
> -#endif
> 
> 
> -- 
>                                                                 Gilles.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Gilles Chanteperdrix July 13, 2012, 7:51 p.m. UTC | #2
On 07/13/2012 09:40 PM, Nicolas Pitre wrote:
> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:
> 
>>
>> I do not know if it is really useful, but it seems it would be possible 
>> to reduce the number of memory accesses to just one in the irq_handler 
>> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a 
>> PC relative load, with something like the following patch:
> 
> To be strict with ccode sections, you can't do this.  The 
> handle_arch_irq symbol identifies a variable and with your patch you're 
> moving it from the .data section to the .text section.  The .text 
> section is meant to be read only, and this is even more true when using 
> a XIP kernel where .text is in ROM, or if we could make the access 
> protection of the kernel ro.

I understand that but, XIP kernel aside, the handle_arch_irq variable is
set only once very early during the boot process, so, almost read-only.
Is not Linux using self-modifying code in some cases anyway (booting an
SMP kernel on an UP processor for instance).
Nicolas Pitre July 13, 2012, 8:09 p.m. UTC | #3
On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:

> On 07/13/2012 09:40 PM, Nicolas Pitre wrote:
> > On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:
> > 
> >>
> >> I do not know if it is really useful, but it seems it would be possible 
> >> to reduce the number of memory accesses to just one in the irq_handler 
> >> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a 
> >> PC relative load, with something like the following patch:
> > 
> > To be strict with ccode sections, you can't do this.  The 
> > handle_arch_irq symbol identifies a variable and with your patch you're 
> > moving it from the .data section to the .text section.  The .text 
> > section is meant to be read only, and this is even more true when using 
> > a XIP kernel where .text is in ROM, or if we could make the access 
> > protection of the kernel ro.
> 
> I understand that but, XIP kernel aside, the handle_arch_irq variable is
> set only once very early during the boot process, so, almost read-only.
> Is not Linux using self-modifying code in some cases anyway (booting an
> SMP kernel on an UP processor for instance).

There are limits to which such tricks should be applied.  In the SMP on 
UP case this is a matter of making the kernel boot at all which is a 
rather strong reason.

Do you have performance numbers like interrupt latency that show this 
patch being worth it?  Without concrete justifications I don't think we 
should go down that path.


Nicolas
Gilles Chanteperdrix July 13, 2012, 8:13 p.m. UTC | #4
On 07/13/2012 10:09 PM, Nicolas Pitre wrote:
> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:
> 
>> On 07/13/2012 09:40 PM, Nicolas Pitre wrote:
>>> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:
>>>
>>>>
>>>> I do not know if it is really useful, but it seems it would be possible 
>>>> to reduce the number of memory accesses to just one in the irq_handler 
>>>> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a 
>>>> PC relative load, with something like the following patch:
>>>
>>> To be strict with ccode sections, you can't do this.  The 
>>> handle_arch_irq symbol identifies a variable and with your patch you're 
>>> moving it from the .data section to the .text section.  The .text 
>>> section is meant to be read only, and this is even more true when using 
>>> a XIP kernel where .text is in ROM, or if we could make the access 
>>> protection of the kernel ro.
>>
>> I understand that but, XIP kernel aside, the handle_arch_irq variable is
>> set only once very early during the boot process, so, almost read-only.
>> Is not Linux using self-modifying code in some cases anyway (booting an
>> SMP kernel on an UP processor for instance).
> 
> There are limits to which such tricks should be applied.  In the SMP on 
> UP case this is a matter of making the kernel boot at all which is a 
> rather strong reason.
> 
> Do you have performance numbers like interrupt latency that show this 
> patch being worth it?  Without concrete justifications I don't think we 
> should go down that path.

I intend to do some interrupt latency measurements soon. But I suspect
CONFIG_MULTI_IRQ_HANDLER will cause more differences due to the fact
that the irq handlers are now fat C compiled code instead of carefully
optimized assembly code, than because of these two memory accesses.

And in fact, chances are that I will observe nothing at all since the
low end platforms I have are AT91 which are not using
CONFIG_MULTI_IRQ_HANDLER yet.
Gilles Chanteperdrix July 14, 2012, 10:39 a.m. UTC | #5
On 07/13/2012 10:09 PM, Nicolas Pitre wrote:
> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:
> 
>> On 07/13/2012 09:40 PM, Nicolas Pitre wrote:
>>> On Fri, 13 Jul 2012, Gilles Chanteperdrix wrote:
>>>
>>>>
>>>> I do not know if it is really useful, but it seems it would be possible 
>>>> to reduce the number of memory accesses to just one in the irq_handler 
>>>> macro in the case where CONFIG_MULTI_IRQ_HANDLER is enabled, by using a 
>>>> PC relative load, with something like the following patch:
>>>
>>> To be strict with ccode sections, you can't do this.  The 
>>> handle_arch_irq symbol identifies a variable and with your patch you're 
>>> moving it from the .data section to the .text section.  The .text 
>>> section is meant to be read only, and this is even more true when using 
>>> a XIP kernel where .text is in ROM, or if we could make the access 
>>> protection of the kernel ro.
>>
>> I understand that but, XIP kernel aside, the handle_arch_irq variable is
>> set only once very early during the boot process, so, almost read-only.
>> Is not Linux using self-modifying code in some cases anyway (booting an
>> SMP kernel on an UP processor for instance).
> 
> There are limits to which such tricks should be applied.  In the SMP on 
> UP case this is a matter of making the kernel boot at all which is a 
> rather strong reason.
> 
> Do you have performance numbers like interrupt latency that show this 
> patch being worth it?  Without concrete justifications I don't think we 
> should go down that path.

So, I ran a few tests on at91rm9200, where I expected the differences to
be most visible.

First I enabled CONFIG_MULTI_IRQ_HANDLER and wrote the irq decoding
handler in plain C. This increases the irq latency of 1.2us (measured
with the average irq latency on an idle system).

I rewrote this irq decoding handler in assembly, using the macros in
entry-macro.S. This decreases the irq latency of 600ns.

Then I try the trick at the beginning of this thread, and... could not
measure any difference, so, you were right.

Anyway, given that on at91rm9200 worst case irq latencies are in the
80us range, all these optimizations are pointless.
diff mbox

Patch

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 0d1851c..48ee46a 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -37,10 +37,9 @@ 
  */
 	.macro	irq_handler
 #ifdef CONFIG_MULTI_IRQ_HANDLER
-	ldr	r1, =handle_arch_irq
 	mov	r0, sp
 	adr	lr, BSYM(9997f)
-	ldr	pc, [r1]
+	ldr	pc, handle_arch_irq
 #else
 	arch_irq_handler_default
 #endif
@@ -325,6 +324,12 @@  ENDPROC(__pabt_svc)
 #endif
 .LCfp:
 	.word	fp_enter
+#ifdef CONFIG_MULTI_IRQ_HANDLER
+	.globl	handle_arch_irq
+handle_arch_irq:
+	.space	4
+#endif
+
 
 /*
  * User mode handlers
@@ -1151,9 +1156,3 @@  cr_alignment:
 	.space	4
 cr_no_alignment:
 	.space	4
-
-#ifdef CONFIG_MULTI_IRQ_HANDLER
-	.globl	handle_arch_irq
-handle_arch_irq:
-	.space	4
-#endif