diff mbox

[01/16] ARM: b.L: secondary kernel entry code

Message ID 1357777251-13541-2-git-send-email-nicolas.pitre@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolas Pitre Jan. 10, 2013, 12:20 a.m. UTC
CPUs in a big.LITTLE systems have special needs when entering the kernel
due to a hotplug event, or when resuming from a deep sleep mode.

This is vectorized so multiple CPUs can enter the kernel in parallel
without serialization.

Only the basic structure is introduced here.  This will be extended
later.

TODO: MPIDR based indexing should eventually be made runtime adjusted.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/Kconfig                |  6 +++
 arch/arm/common/Makefile        |  3 ++
 arch/arm/common/bL_entry.c      | 30 +++++++++++++++
 arch/arm/common/bL_head.S       | 81 +++++++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/bL_entry.h | 35 ++++++++++++++++++
 5 files changed, 155 insertions(+)
 create mode 100644 arch/arm/common/bL_entry.c
 create mode 100644 arch/arm/common/bL_head.S
 create mode 100644 arch/arm/include/asm/bL_entry.h

Comments

Stephen Boyd Jan. 10, 2013, 7:12 a.m. UTC | #1
On 1/9/2013 4:20 PM, Nicolas Pitre wrote:
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index f95ba14ae3..2271f02e8e 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1579,6 +1579,12 @@ config HAVE_ARM_TWD
>  	help
>  	  This options enables support for the ARM timer and watchdog unit
>  
> +config BIG_LITTLE
> +	bool "big.LITTLE support (Experimental)"
> +	depends on CPU_V7 && SMP && EXPERIMENTAL

I thought EXPERIMENTAL was being phased out?

> diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
> index e8a4e58f1b..50880c494f 100644
> --- a/arch/arm/common/Makefile
> +++ b/arch/arm/common/Makefile
> @@ -13,3 +13,6 @@ obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
>  obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
>  obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
>  obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
> +obj-$(CONFIG_FIQ_GLUE)		+= fiq_glue.o fiq_glue_setup.o
> +obj-$(CONFIG_FIQ_DEBUGGER)	+= fiq_debugger.o

This looks like non-related stuff?
Nicolas Pitre Jan. 10, 2013, 3:30 p.m. UTC | #2
On Wed, 9 Jan 2013, Stephen Boyd wrote:

> On 1/9/2013 4:20 PM, Nicolas Pitre wrote:
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index f95ba14ae3..2271f02e8e 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1579,6 +1579,12 @@ config HAVE_ARM_TWD
> >  	help
> >  	  This options enables support for the ARM timer and watchdog unit
> >  
> > +config BIG_LITTLE
> > +	bool "big.LITTLE support (Experimental)"
> > +	depends on CPU_V7 && SMP && EXPERIMENTAL
> 
> I thought EXPERIMENTAL was being phased out?

True.

> > diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
> > index e8a4e58f1b..50880c494f 100644
> > --- a/arch/arm/common/Makefile
> > +++ b/arch/arm/common/Makefile
> > @@ -13,3 +13,6 @@ obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
> >  obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
> >  obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
> >  obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
> > +obj-$(CONFIG_FIQ_GLUE)		+= fiq_glue.o fiq_glue_setup.o
> > +obj-$(CONFIG_FIQ_DEBUGGER)	+= fiq_debugger.o
> 
> This looks like non-related stuff?

Indeed.  Rebase fallouts.

Thanks.


Nicolas
Catalin Marinas Jan. 10, 2013, 3:34 p.m. UTC | #3
Hi Nico,

On 10 January 2013 00:20, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> --- /dev/null
> +++ b/arch/arm/common/bL_entry.c
...
> +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];

IMHO, we should keep this array linear and ignore the cluster grouping
at this stage. This information could be added to latter patches that
actually need to know about the b.L topology. This would also imply
that we treat the MPIDR just as an ID without digging into its bit
layout. But I haven't looked at the other patches yet to see how this
would fit.

> +void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> +{
> +       unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> +       bL_entry_vectors[cluster][cpu] = val;
> +       smp_wmb();
> +       __cpuc_flush_dcache_area((void *)&bL_entry_vectors[cluster][cpu], 4);
> +       outer_clean_range(__pa(&bL_entry_vectors[cluster][cpu]),
> +                         __pa(&bL_entry_vectors[cluster][cpu + 1]));

Why are you using the smp_wmb() here? We don't need any barrier since
data cache ops by MVA are automatically ordered in relation to stores
to the same MVA (as long as the MVA is in Normal Cacheable memory).

> --- /dev/null
> +++ b/arch/arm/common/bL_head.S
...
> +ENTRY(bL_entry_point)
> +
> + THUMB(        adr     r12, BSYM(1f)   )
> + THUMB(        bx      r12             )
> + THUMB(        .thumb                  )
> +1:
> +       mrc     p15, 0, r0, c0, c0, 5

Minor thing, maybe a comment for this line like @ MPIDR.
Nicolas Pitre Jan. 10, 2013, 4:47 p.m. UTC | #4
On Thu, 10 Jan 2013, Catalin Marinas wrote:

> Hi Nico,
> 
> On 10 January 2013 00:20, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > --- /dev/null
> > +++ b/arch/arm/common/bL_entry.c
> ...
> > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> 
> IMHO, we should keep this array linear and ignore the cluster grouping
> at this stage. This information could be added to latter patches that
> actually need to know about the b.L topology.

That's virtually all of them.  Everything b.L related is always 
expressed in terms of a cpu,cluster tuple at the low level.

> This would also imply that we treat the MPIDR just as an ID without 
> digging into its bit layout.

That makes for too large an index space.  We always end up needing to 
break the MPIDR into a cpu,cluster thing as the MPIDR bits are too 
sparse.

> > +void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> > +{
> > +       unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> > +       bL_entry_vectors[cluster][cpu] = val;
> > +       smp_wmb();
> > +       __cpuc_flush_dcache_area((void *)&bL_entry_vectors[cluster][cpu], 4);
> > +       outer_clean_range(__pa(&bL_entry_vectors[cluster][cpu]),
> > +                         __pa(&bL_entry_vectors[cluster][cpu + 1]));
> 
> Why are you using the smp_wmb() here? We don't need any barrier since
> data cache ops by MVA are automatically ordered in relation to stores
> to the same MVA (as long as the MVA is in Normal Cacheable memory).

That was the result of monkeying the write_pen_release() code.  I'll 
remove that as the rest of the code added later doesn't use that anyway.

> > --- /dev/null
> > +++ b/arch/arm/common/bL_head.S
> ...
> > +ENTRY(bL_entry_point)
> > +
> > + THUMB(        adr     r12, BSYM(1f)   )
> > + THUMB(        bx      r12             )
> > + THUMB(        .thumb                  )
> > +1:
> > +       mrc     p15, 0, r0, c0, c0, 5
> 
> Minor thing, maybe a comment for this line like @ MPIDR.

ACK.


Nicolas
Will Deacon Jan. 10, 2013, 11:05 p.m. UTC | #5
On Thu, Jan 10, 2013 at 12:20:36AM +0000, Nicolas Pitre wrote:
> CPUs in a big.LITTLE systems have special needs when entering the kernel
> due to a hotplug event, or when resuming from a deep sleep mode.
> 
> This is vectorized so multiple CPUs can enter the kernel in parallel
> without serialization.
> 
> Only the basic structure is introduced here.  This will be extended
> later.
> 
> TODO: MPIDR based indexing should eventually be made runtime adjusted.

Agreed.

> diff --git a/arch/arm/common/bL_entry.c b/arch/arm/common/bL_entry.c
> new file mode 100644
> index 0000000000..80fff49417
> --- /dev/null
> +++ b/arch/arm/common/bL_entry.c
> @@ -0,0 +1,30 @@
> +/*
> + * arch/arm/common/bL_entry.c -- big.LITTLE kernel re-entry point
> + *
> + * Created by:  Nicolas Pitre, March 2012
> + * Copyright:   (C) 2012  Linaro Limited
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +
> +#include <asm/bL_entry.h>
> +#include <asm/barrier.h>
> +#include <asm/proc-fns.h>
> +#include <asm/cacheflush.h>
> +
> +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];

Does this actually need to be volatile? I'd have thought a compiler
barrier in place of the smp_wmb below would be enough (following on from
Catalin's comments).

> +
> +void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> +{
> +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> +	bL_entry_vectors[cluster][cpu] = val;
> +	smp_wmb();
> +	__cpuc_flush_dcache_area((void *)&bL_entry_vectors[cluster][cpu], 4);
> +	outer_clean_range(__pa(&bL_entry_vectors[cluster][cpu]),
> +			  __pa(&bL_entry_vectors[cluster][cpu + 1]));
> +}
> diff --git a/arch/arm/common/bL_head.S b/arch/arm/common/bL_head.S
> new file mode 100644
> index 0000000000..9d351f2b4c
> --- /dev/null
> +++ b/arch/arm/common/bL_head.S
> @@ -0,0 +1,81 @@
> +/*
> + * arch/arm/common/bL_head.S -- big.LITTLE kernel re-entry point
> + *
> + * Created by:  Nicolas Pitre, March 2012
> + * Copyright:   (C) 2012  Linaro Limited
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/bL_entry.h>
> +
> +	.macro	pr_dbg	cpu, string
> +#if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
> +	b	1901f
> +1902:	.ascii	"CPU 0: \0CPU 1: \0CPU 2: \0CPU 3: \0"
> +	.ascii	"CPU 4: \0CPU 5: \0CPU 6: \0CPU 7: \0"
> +1903:	.asciz	"\string"
> +	.align
> +1901:	adr	r0, 1902b
> +	add	r0, r0, \cpu, lsl #3
> +	bl	printascii
> +	adr	r0, 1903b
> +	bl	printascii
> +#endif
> +	.endm
> +
> +	.arm
> +	.align
> +
> +ENTRY(bL_entry_point)
> +
> + THUMB(	adr	r12, BSYM(1f)	)
> + THUMB(	bx	r12		)
> + THUMB(	.thumb			)
> +1:
> +	mrc	p15, 0, r0, c0, c0, 5
> +	ubfx	r9, r0, #0, #4			@ r9 = cpu
> +	ubfx	r10, r0, #8, #4			@ r10 = cluster
> +	mov	r3, #BL_CPUS_PER_CLUSTER
> +	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
> +	cmp	r4, #(BL_CPUS_PER_CLUSTER * BL_NR_CLUSTERS)
> +	blo	2f
> +
> +	/* We didn't expect this CPU.  Try to make it quiet. */
> +1:	wfi
> +	wfe
> +	b	1b

I realise this CPU is stuck at this point, but you should have a dsb
before a wfi instruction. This could be problematic with the CCI this
early, so maybe just a comment saying that it doesn't matter because we
don't care about this core?

> +
> +2:	pr_dbg	r4, "kernel bL_entry_point\n"
> +
> +	/*
> +	 * MMU is off so we need to get to bL_entry_vectors in a
> +	 * position independent way.
> +	 */
> +	adr	r5, 3f
> +	ldr	r6, [r5]
> +	add	r6, r5, r6			@ r6 = bL_entry_vectors
> +
> +bL_entry_gated:
> +	ldr	r5, [r6, r4, lsl #2]		@ r5 = CPU entry vector
> +	cmp	r5, #0
> +	wfeeq
> +	beq	bL_entry_gated
> +	pr_dbg	r4, "released\n"
> +	bx	r5
> +
> +	.align	2
> +
> +3:	.word	bL_entry_vectors - .
> +
> +ENDPROC(bL_entry_point)
> +
> +	.bss
> +	.align	5
> +
> +	.type	bL_entry_vectors, #object
> +ENTRY(bL_entry_vectors)
> +	.space	4 * BL_NR_CLUSTERS * BL_CPUS_PER_CLUSTER

Is there a particular reason to put this in the bss?

> diff --git a/arch/arm/include/asm/bL_entry.h b/arch/arm/include/asm/bL_entry.h
> new file mode 100644
> index 0000000000..ff623333a1
> --- /dev/null
> +++ b/arch/arm/include/asm/bL_entry.h
> @@ -0,0 +1,35 @@
> +/*
> + * arch/arm/include/asm/bL_entry.h
> + *
> + * Created by:  Nicolas Pitre, April 2012
> + * Copyright:   (C) 2012  Linaro Limited
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef BL_ENTRY_H
> +#define BL_ENTRY_H
> +
> +#define BL_CPUS_PER_CLUSTER	4
> +#define BL_NR_CLUSTERS		2

Hmm, I see these have to be constant so you can allocate your space in
the assembly file. In which case, I think it's worth changing their
names to have MAX or LIMIT in them... maybe they could even be CONFIG
options?

Will
Nicolas Pitre Jan. 11, 2013, 1:26 a.m. UTC | #6
On Thu, 10 Jan 2013, Will Deacon wrote:

> On Thu, Jan 10, 2013 at 12:20:36AM +0000, Nicolas Pitre wrote:
> > CPUs in a big.LITTLE systems have special needs when entering the kernel
> > due to a hotplug event, or when resuming from a deep sleep mode.
> > 
> > This is vectorized so multiple CPUs can enter the kernel in parallel
> > without serialization.
> > 
> > Only the basic structure is introduced here.  This will be extended
> > later.
> > 
> > TODO: MPIDR based indexing should eventually be made runtime adjusted.
> 
> Agreed.
> 
> > diff --git a/arch/arm/common/bL_entry.c b/arch/arm/common/bL_entry.c
> > new file mode 100644
> > index 0000000000..80fff49417
> > --- /dev/null
> > +++ b/arch/arm/common/bL_entry.c
> > @@ -0,0 +1,30 @@
> > +/*
> > + * arch/arm/common/bL_entry.c -- big.LITTLE kernel re-entry point
> > + *
> > + * Created by:  Nicolas Pitre, March 2012
> > + * Copyright:   (C) 2012  Linaro Limited
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/init.h>
> > +
> > +#include <asm/bL_entry.h>
> > +#include <asm/barrier.h>
> > +#include <asm/proc-fns.h>
> > +#include <asm/cacheflush.h>
> > +
> > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> 
> Does this actually need to be volatile? I'd have thought a compiler
> barrier in place of the smp_wmb below would be enough (following on from
> Catalin's comments).

Actually, I did the reverse i.e. I removed the smp_wmb() entirely. A 
compiler barrier forces the whole world to memory while here we only 
want this particular assignment to be pushed out.

Furthermore, I like the volatile as it flags that this is a special 
variable which in this case is also accessed from CPUs with no cache.

> > +void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> > +{
> > +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> > +	bL_entry_vectors[cluster][cpu] = val;
> > +	smp_wmb();
> > +	__cpuc_flush_dcache_area((void *)&bL_entry_vectors[cluster][cpu], 4);
> > +	outer_clean_range(__pa(&bL_entry_vectors[cluster][cpu]),
> > +			  __pa(&bL_entry_vectors[cluster][cpu + 1]));
> > +}
> > diff --git a/arch/arm/common/bL_head.S b/arch/arm/common/bL_head.S
> > new file mode 100644
> > index 0000000000..9d351f2b4c
> > --- /dev/null
> > +++ b/arch/arm/common/bL_head.S
> > @@ -0,0 +1,81 @@
> > +/*
> > + * arch/arm/common/bL_head.S -- big.LITTLE kernel re-entry point
> > + *
> > + * Created by:  Nicolas Pitre, March 2012
> > + * Copyright:   (C) 2012  Linaro Limited
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/bL_entry.h>
> > +
> > +	.macro	pr_dbg	cpu, string
> > +#if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
> > +	b	1901f
> > +1902:	.ascii	"CPU 0: \0CPU 1: \0CPU 2: \0CPU 3: \0"
> > +	.ascii	"CPU 4: \0CPU 5: \0CPU 6: \0CPU 7: \0"
> > +1903:	.asciz	"\string"
> > +	.align
> > +1901:	adr	r0, 1902b
> > +	add	r0, r0, \cpu, lsl #3
> > +	bl	printascii
> > +	adr	r0, 1903b
> > +	bl	printascii
> > +#endif
> > +	.endm
> > +
> > +	.arm
> > +	.align
> > +
> > +ENTRY(bL_entry_point)
> > +
> > + THUMB(	adr	r12, BSYM(1f)	)
> > + THUMB(	bx	r12		)
> > + THUMB(	.thumb			)
> > +1:
> > +	mrc	p15, 0, r0, c0, c0, 5
> > +	ubfx	r9, r0, #0, #4			@ r9 = cpu
> > +	ubfx	r10, r0, #8, #4			@ r10 = cluster
> > +	mov	r3, #BL_CPUS_PER_CLUSTER
> > +	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
> > +	cmp	r4, #(BL_CPUS_PER_CLUSTER * BL_NR_CLUSTERS)
> > +	blo	2f
> > +
> > +	/* We didn't expect this CPU.  Try to make it quiet. */
> > +1:	wfi
> > +	wfe
> > +	b	1b
> 
> I realise this CPU is stuck at this point, but you should have a dsb
> before a wfi instruction. This could be problematic with the CCI this
> early, so maybe just a comment saying that it doesn't matter because we
> don't care about this core?

Why a dsb?  No data was even touched at this point.  And since this is 
meant to be a better "b ." kind of loop, I'd rather not try to make it 
more sophisticated than it already is.  And of course it is meant to 
never be executed in practice.

> > +
> > +2:	pr_dbg	r4, "kernel bL_entry_point\n"
> > +
> > +	/*
> > +	 * MMU is off so we need to get to bL_entry_vectors in a
> > +	 * position independent way.
> > +	 */
> > +	adr	r5, 3f
> > +	ldr	r6, [r5]
> > +	add	r6, r5, r6			@ r6 = bL_entry_vectors
> > +
> > +bL_entry_gated:
> > +	ldr	r5, [r6, r4, lsl #2]		@ r5 = CPU entry vector
> > +	cmp	r5, #0
> > +	wfeeq
> > +	beq	bL_entry_gated
> > +	pr_dbg	r4, "released\n"
> > +	bx	r5
> > +
> > +	.align	2
> > +
> > +3:	.word	bL_entry_vectors - .
> > +
> > +ENDPROC(bL_entry_point)
> > +
> > +	.bss
> > +	.align	5
> > +
> > +	.type	bL_entry_vectors, #object
> > +ENTRY(bL_entry_vectors)
> > +	.space	4 * BL_NR_CLUSTERS * BL_CPUS_PER_CLUSTER
> 
> Is there a particular reason to put this in the bss?

Yes, to have it zero initialized without taking up binary space.

> > diff --git a/arch/arm/include/asm/bL_entry.h b/arch/arm/include/asm/bL_entry.h
> > new file mode 100644
> > index 0000000000..ff623333a1
> > --- /dev/null
> > +++ b/arch/arm/include/asm/bL_entry.h
> > @@ -0,0 +1,35 @@
> > +/*
> > + * arch/arm/include/asm/bL_entry.h
> > + *
> > + * Created by:  Nicolas Pitre, April 2012
> > + * Copyright:   (C) 2012  Linaro Limited
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#ifndef BL_ENTRY_H
> > +#define BL_ENTRY_H
> > +
> > +#define BL_CPUS_PER_CLUSTER	4
> > +#define BL_NR_CLUSTERS		2
> 
> Hmm, I see these have to be constant so you can allocate your space in
> the assembly file. In which case, I think it's worth changing their
> names to have MAX or LIMIT in them...

Yes, good point.  I'll change them.

>  maybe they could even be CONFIG options?

Nah.  I prefer not adding new config options unless this is really 
necessary or useful.  For the forseeable future, we'll see systems with 
at most 2 clusters and at most 4 CPUs per cluster.  That could easily be 
revisited later if that becomes unsuitable for some new systems.

Initially I wanted all those things to be runtime sized in relation with 
the TODO item in the commit log.  That too can come later.


Nicolas
Will Deacon Jan. 11, 2013, 10:55 a.m. UTC | #7
On Fri, Jan 11, 2013 at 01:26:21AM +0000, Nicolas Pitre wrote:
> On Thu, 10 Jan 2013, Will Deacon wrote:
> > On Thu, Jan 10, 2013 at 12:20:36AM +0000, Nicolas Pitre wrote:
> > > +
> > > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> > 
> > Does this actually need to be volatile? I'd have thought a compiler
> > barrier in place of the smp_wmb below would be enough (following on from
> > Catalin's comments).
> 
> Actually, I did the reverse i.e. I removed the smp_wmb() entirely. A 
> compiler barrier forces the whole world to memory while here we only 
> want this particular assignment to be pushed out.
> 
> Furthermore, I like the volatile as it flags that this is a special 
> variable which in this case is also accessed from CPUs with no cache.

Ok, fair enough. Given that the smp_wmb isn't needed that sounds better.

> > > +	/* We didn't expect this CPU.  Try to make it quiet. */
> > > +1:	wfi
> > > +	wfe
> > > +	b	1b
> > 
> > I realise this CPU is stuck at this point, but you should have a dsb
> > before a wfi instruction. This could be problematic with the CCI this
> > early, so maybe just a comment saying that it doesn't matter because we
> > don't care about this core?
> 
> Why a dsb?  No data was even touched at this point.  And since this is 
> meant to be a better "b ." kind of loop, I'd rather not try to make it 
> more sophisticated than it already is.  And of course it is meant to 
> never be executed in practice.

Sure, that's why I think just mentioning that we don't ever plan to boot
this CPU is a good idea (so people don't add code here later on).

> > > diff --git a/arch/arm/include/asm/bL_entry.h b/arch/arm/include/asm/bL_entry.h
> > > new file mode 100644
> > > index 0000000000..ff623333a1
> > > --- /dev/null
> > > +++ b/arch/arm/include/asm/bL_entry.h
> > > @@ -0,0 +1,35 @@
> > > +/*
> > > + * arch/arm/include/asm/bL_entry.h
> > > + *
> > > + * Created by:  Nicolas Pitre, April 2012
> > > + * Copyright:   (C) 2012  Linaro Limited
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + */
> > > +
> > > +#ifndef BL_ENTRY_H
> > > +#define BL_ENTRY_H
> > > +
> > > +#define BL_CPUS_PER_CLUSTER	4
> > > +#define BL_NR_CLUSTERS		2
> > 
> > Hmm, I see these have to be constant so you can allocate your space in
> > the assembly file. In which case, I think it's worth changing their
> > names to have MAX or LIMIT in them...
> 
> Yes, good point.  I'll change them.

Thanks.

> >  maybe they could even be CONFIG options?
> 
> Nah.  I prefer not adding new config options unless this is really 
> necessary or useful.  For the forseeable future, we'll see systems with 
> at most 2 clusters and at most 4 CPUs per cluster.  That could easily be 
> revisited later if that becomes unsuitable for some new systems.

The current GIC is limited to 8 CPUs, so 4x2 is also a realistic possibility.

> Initially I wanted all those things to be runtime sized in relation with 
> the TODO item in the commit log.  That too can come later.

Out of interest: how would you achieve that? I also thought about getting
this information from the device tree, but I can't see how to plug that in
with static storage.

Cheers,

Will
tip-bot for Dave Martin Jan. 11, 2013, 11:35 a.m. UTC | #8
On Fri, Jan 11, 2013 at 10:55:26AM +0000, Will Deacon wrote:
> On Fri, Jan 11, 2013 at 01:26:21AM +0000, Nicolas Pitre wrote:
> > On Thu, 10 Jan 2013, Will Deacon wrote:
> > > On Thu, Jan 10, 2013 at 12:20:36AM +0000, Nicolas Pitre wrote:
> > > > +
> > > > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> > > 
> > > Does this actually need to be volatile? I'd have thought a compiler
> > > barrier in place of the smp_wmb below would be enough (following on from
> > > Catalin's comments).
> > 
> > Actually, I did the reverse i.e. I removed the smp_wmb() entirely. A 
> > compiler barrier forces the whole world to memory while here we only 
> > want this particular assignment to be pushed out.
> > 
> > Furthermore, I like the volatile as it flags that this is a special 
> > variable which in this case is also accessed from CPUs with no cache.
> 
> Ok, fair enough. Given that the smp_wmb isn't needed that sounds better.
> 
> > > > +	/* We didn't expect this CPU.  Try to make it quiet. */
> > > > +1:	wfi
> > > > +	wfe
> > > > +	b	1b
> > > 
> > > I realise this CPU is stuck at this point, but you should have a dsb
> > > before a wfi instruction. This could be problematic with the CCI this
> > > early, so maybe just a comment saying that it doesn't matter because we
> > > don't care about this core?
> > 
> > Why a dsb?  No data was even touched at this point.  And since this is 
> > meant to be a better "b ." kind of loop, I'd rather not try to make it 
> > more sophisticated than it already is.  And of course it is meant to 
> > never be executed in practice.
> 
> Sure, that's why I think just mentioning that we don't ever plan to boot
> this CPU is a good idea (so people don't add code here later on).

I agree with the conclusions here.

> > > > diff --git a/arch/arm/include/asm/bL_entry.h b/arch/arm/include/asm/bL_entry.h
> > > > new file mode 100644
> > > > index 0000000000..ff623333a1
> > > > --- /dev/null
> > > > +++ b/arch/arm/include/asm/bL_entry.h
> > > > @@ -0,0 +1,35 @@
> > > > +/*
> > > > + * arch/arm/include/asm/bL_entry.h
> > > > + *
> > > > + * Created by:  Nicolas Pitre, April 2012
> > > > + * Copyright:   (C) 2012  Linaro Limited
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or modify
> > > > + * it under the terms of the GNU General Public License version 2 as
> > > > + * published by the Free Software Foundation.
> > > > + */
> > > > +
> > > > +#ifndef BL_ENTRY_H
> > > > +#define BL_ENTRY_H
> > > > +
> > > > +#define BL_CPUS_PER_CLUSTER	4
> > > > +#define BL_NR_CLUSTERS		2
> > > 
> > > Hmm, I see these have to be constant so you can allocate your space in
> > > the assembly file. In which case, I think it's worth changing their
> > > names to have MAX or LIMIT in them...
> > 
> > Yes, good point.  I'll change them.
> 
> Thanks.
> 
> > >  maybe they could even be CONFIG options?
> > 
> > Nah.  I prefer not adding new config options unless this is really 
> > necessary or useful.  For the forseeable future, we'll see systems with 
> > at most 2 clusters and at most 4 CPUs per cluster.  That could easily be 
> > revisited later if that becomes unsuitable for some new systems.
> 
> The current GIC is limited to 8 CPUs, so 4x2 is also a realistic possibility.
> 
> > Initially I wanted all those things to be runtime sized in relation with 
> > the TODO item in the commit log.  That too can come later.
> 
> Out of interest: how would you achieve that? I also thought about getting
> this information from the device tree, but I can't see how to plug that in
> with static storage.

I think you would just have to bite the bullet and go dynamic in this
case. But it's not a lot of data in total with the current limits, so
this feels like overkill.

If we eventually need to go many-CPU with this code, it will need
addressing, but there are no current plans for that that I know of.

Cheers
---Dave
Catalin Marinas Jan. 11, 2013, 11:45 a.m. UTC | #9
On Thu, Jan 10, 2013 at 04:47:09PM +0000, Nicolas Pitre wrote:
> On Thu, 10 Jan 2013, Catalin Marinas wrote:
> > On 10 January 2013 00:20, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > --- /dev/null
> > > +++ b/arch/arm/common/bL_entry.c
> > ...
> > > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> > 
> > IMHO, we should keep this array linear and ignore the cluster grouping
> > at this stage. This information could be added to latter patches that
> > actually need to know about the b.L topology.
> 
> That's virtually all of them.  Everything b.L related is always 
> expressed in terms of a cpu,cluster tuple at the low level.
> 
> > This would also imply that we treat the MPIDR just as an ID without 
> > digging into its bit layout.
> 
> That makes for too large an index space.  We always end up needing to 
> break the MPIDR into a cpu,cluster thing as the MPIDR bits are too 
> sparse.

You could find a way to compress this with some mask and shifts. We can
look at this later if we are to generalise this to non-b.L systems.
Lorenzo Pieralisi Jan. 11, 2013, 12:05 p.m. UTC | #10
On Fri, Jan 11, 2013 at 11:45:53AM +0000, Catalin Marinas wrote:
> On Thu, Jan 10, 2013 at 04:47:09PM +0000, Nicolas Pitre wrote:
> > On Thu, 10 Jan 2013, Catalin Marinas wrote:
> > > On 10 January 2013 00:20, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > --- /dev/null
> > > > +++ b/arch/arm/common/bL_entry.c
> > > ...
> > > > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> > > 
> > > IMHO, we should keep this array linear and ignore the cluster grouping
> > > at this stage. This information could be added to latter patches that
> > > actually need to know about the b.L topology.
> > 
> > That's virtually all of them.  Everything b.L related is always 
> > expressed in terms of a cpu,cluster tuple at the low level.
> > 
> > > This would also imply that we treat the MPIDR just as an ID without 
> > > digging into its bit layout.
> > 
> > That makes for too large an index space.  We always end up needing to 
> > break the MPIDR into a cpu,cluster thing as the MPIDR bits are too 
> > sparse.
> 
> You could find a way to compress this with some mask and shifts. We can
> look at this later if we are to generalise this to non-b.L systems.

The MPIDR linearization (a simple hash to convert it to a linear index) is
planned anyway since code paths like cpu_{suspend/resume} do not work for
multi-cluster systems as things stand.

Lorenzo
tip-bot for Dave Martin Jan. 11, 2013, 12:19 p.m. UTC | #11
On Fri, Jan 11, 2013 at 11:45:53AM +0000, Catalin Marinas wrote:
> On Thu, Jan 10, 2013 at 04:47:09PM +0000, Nicolas Pitre wrote:
> > On Thu, 10 Jan 2013, Catalin Marinas wrote:
> > > On 10 January 2013 00:20, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > --- /dev/null
> > > > +++ b/arch/arm/common/bL_entry.c
> > > ...
> > > > +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> > > 
> > > IMHO, we should keep this array linear and ignore the cluster grouping
> > > at this stage. This information could be added to latter patches that
> > > actually need to know about the b.L topology.
> > 
> > That's virtually all of them.  Everything b.L related is always 
> > expressed in terms of a cpu,cluster tuple at the low level.
> > 
> > > This would also imply that we treat the MPIDR just as an ID without 
> > > digging into its bit layout.
> > 
> > That makes for too large an index space.  We always end up needing to 
> > break the MPIDR into a cpu,cluster thing as the MPIDR bits are too 
> > sparse.
> 
> You could find a way to compress this with some mask and shifts. We can
> look at this later if we are to generalise this to non-b.L systems.

The b.L cluster handling code has multiple instances of this issue.
We should either try to fix them all, or defer them all as being
overkill for the foreseeable future.

For current platforms, the space saved in unlikely to be larger than
the amount of code required to implement the optimisation.

I do think we need a good, generic way to map sparesely-populated,
multidimensional topological node IDs to/from a linear space, but
we should avoid reinventing that too many times.

I think Lorenzo was already potentially looking at this issue in
relation to managing cpu_logical_map.

Cheers
---Dave
Santosh Shilimkar Jan. 11, 2013, 5:16 p.m. UTC | #12
On Thursday 10 January 2013 05:50 AM, Nicolas Pitre wrote:
> CPUs in a big.LITTLE systems have special needs when entering the kernel
> due to a hotplug event, or when resuming from a deep sleep mode.
>
> This is vectorized so multiple CPUs can enter the kernel in parallel
> without serialization.
>
> Only the basic structure is introduced here.  This will be extended
> later.
>
> TODO: MPIDR based indexing should eventually be made runtime adjusted.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---

[..]

> diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
> index e8a4e58f1b..50880c494f 100644
> --- a/arch/arm/common/Makefile
> +++ b/arch/arm/common/Makefile
> @@ -13,3 +13,6 @@ obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
>   obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
>   obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
>   obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
> +obj-$(CONFIG_FIQ_GLUE)		+= fiq_glue.o fiq_glue_setup.o
> +obj-$(CONFIG_FIQ_DEBUGGER)	+= fiq_debugger.o
> +obj-$(CONFIG_BIG_LITTLE)	+= bL_head.o bL_entry.o
> diff --git a/arch/arm/common/bL_entry.c b/arch/arm/common/bL_entry.c
> new file mode 100644
> index 0000000000..80fff49417
> --- /dev/null
> +++ b/arch/arm/common/bL_entry.c
> @@ -0,0 +1,30 @@
> +/*
> + * arch/arm/common/bL_entry.c -- big.LITTLE kernel re-entry point
> + *
> + * Created by:  Nicolas Pitre, March 2012
> + * Copyright:   (C) 2012  Linaro Limited
2013 now :-)
Looks like you need to update rest of the patches as well.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +
> +#include <asm/bL_entry.h>
> +#include <asm/barrier.h>
> +#include <asm/proc-fns.h>
> +#include <asm/cacheflush.h>
> +
> +extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
> +
> +void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
> +{
> +	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
> +	bL_entry_vectors[cluster][cpu] = val;
> +	smp_wmb();
> +	__cpuc_flush_dcache_area((void *)&bL_entry_vectors[cluster][cpu], 4);
> +	outer_clean_range(__pa(&bL_entry_vectors[cluster][cpu]),
> +			  __pa(&bL_entry_vectors[cluster][cpu + 1]));
> +}
I had the same question about smp_wmb() as Catalin but after following
rest of the comments, I understand it will be removed so thats good.

> diff --git a/arch/arm/common/bL_head.S b/arch/arm/common/bL_head.S
> new file mode 100644
> index 0000000000..9d351f2b4c
> --- /dev/null
> +++ b/arch/arm/common/bL_head.S
> @@ -0,0 +1,81 @@
> +/*
> + * arch/arm/common/bL_head.S -- big.LITTLE kernel re-entry point
> + *
> + * Created by:  Nicolas Pitre, March 2012
> + * Copyright:   (C) 2012  Linaro Limited
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/bL_entry.h>
> +
> +	.macro	pr_dbg	cpu, string
> +#if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
> +	b	1901f
> +1902:	.ascii	"CPU 0: \0CPU 1: \0CPU 2: \0CPU 3: \0"
> +	.ascii	"CPU 4: \0CPU 5: \0CPU 6: \0CPU 7: \0"
> +1903:	.asciz	"\string"
> +	.align
> +1901:	adr	r0, 1902b
> +	add	r0, r0, \cpu, lsl #3
> +	bl	printascii
> +	adr	r0, 1903b
> +	bl	printascii
> +#endif
> +	.endm
> +
> +	.arm
> +	.align
> +
> +ENTRY(bL_entry_point)
> +
> + THUMB(	adr	r12, BSYM(1f)	)
> + THUMB(	bx	r12		)
> + THUMB(	.thumb			)
> +1:
> +	mrc	p15, 0, r0, c0, c0, 5
> +	ubfx	r9, r0, #0, #4			@ r9 = cpu
> +	ubfx	r10, r0, #8, #4			@ r10 = cluster
> +	mov	r3, #BL_CPUS_PER_CLUSTER
> +	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
> +	cmp	r4, #(BL_CPUS_PER_CLUSTER * BL_NR_CLUSTERS)
> +	blo	2f
> +
> +	/* We didn't expect this CPU.  Try to make it quiet. */
> +1:	wfi
> +	wfe

Why do you need a wfe followed by wif ?
Just curious.

Regards
Santosh
Nicolas Pitre Jan. 11, 2013, 6:10 p.m. UTC | #13
On Fri, 11 Jan 2013, Santosh Shilimkar wrote:

> On Thursday 10 January 2013 05:50 AM, Nicolas Pitre wrote:
> > +ENTRY(bL_entry_point)
> > +
> > + THUMB(	adr	r12, BSYM(1f)	)
> > + THUMB(	bx	r12		)
> > + THUMB(	.thumb			)
> > +1:
> > +	mrc	p15, 0, r0, c0, c0, 5
> > +	ubfx	r9, r0, #0, #4			@ r9 = cpu
> > +	ubfx	r10, r0, #8, #4			@ r10 = cluster
> > +	mov	r3, #BL_CPUS_PER_CLUSTER
> > +	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
> > +	cmp	r4, #(BL_CPUS_PER_CLUSTER * BL_NR_CLUSTERS)
> > +	blo	2f
> > +
> > +	/* We didn't expect this CPU.  Try to make it quiet. */
> > +1:	wfi
> > +	wfe
> 
> Why do you need a wfe followed by wif ?
> Just curious.

If the WFI doesn't work because an interrupt is pending then the WFE 
might work better.  But as I mentioned before, this is not intended to 
be used for other purposes than "we're really screwed so at least let's 
try to cheaply quieten this CPU" case.


Nicolas
Santosh Shilimkar Jan. 11, 2013, 6:30 p.m. UTC | #14
On Friday 11 January 2013 11:40 PM, Nicolas Pitre wrote:
> On Fri, 11 Jan 2013, Santosh Shilimkar wrote:
>
>> On Thursday 10 January 2013 05:50 AM, Nicolas Pitre wrote:
>>> +ENTRY(bL_entry_point)
>>> +
>>> + THUMB(	adr	r12, BSYM(1f)	)
>>> + THUMB(	bx	r12		)
>>> + THUMB(	.thumb			)
>>> +1:
>>> +	mrc	p15, 0, r0, c0, c0, 5
>>> +	ubfx	r9, r0, #0, #4			@ r9 = cpu
>>> +	ubfx	r10, r0, #8, #4			@ r10 = cluster
>>> +	mov	r3, #BL_CPUS_PER_CLUSTER
>>> +	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
>>> +	cmp	r4, #(BL_CPUS_PER_CLUSTER * BL_NR_CLUSTERS)
>>> +	blo	2f
>>> +
>>> +	/* We didn't expect this CPU.  Try to make it quiet. */
>>> +1:	wfi
>>> +	wfe
>>
>> Why do you need a wfe followed by wif ?
>> Just curious.
>
> If the WFI doesn't work because an interrupt is pending then the WFE
> might work better.  But as I mentioned before, this is not intended to
> be used for other purposes than "we're really screwed so at least let's
> try to cheaply quieten this CPU" case.
>
Thanks for clarification.

Regards
Santosh
Pavel Machek March 7, 2013, 7:37 a.m. UTC | #15
Hi!

> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1579,6 +1579,12 @@ config HAVE_ARM_TWD
>  	help
>  	  This options enables support for the ARM timer and watchdog unit
>  
> +config BIG_LITTLE
> +	bool "big.LITTLE support (Experimental)"
> +	depends on CPU_V7 && SMP && EXPERIMENTAL
> +	help
> +	  This option enables support for the big.LITTLE architecture.
> +

Perhaps few lines of what big.LITTLE is would be nice here?

It is that "few high-performance cores + few low-power cores" on chip, right?

BTW... is it possible/good idea to run all the cores at the same time for 
max performance? From descriptions I understood that is not normally done
and I'd like to understand why....
Nicolas Pitre March 7, 2013, 8:57 a.m. UTC | #16
On Thu, 7 Mar 2013, Pavel Machek wrote:

> Hi!
> 
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1579,6 +1579,12 @@ config HAVE_ARM_TWD
> >  	help
> >  	  This options enables support for the ARM timer and watchdog unit
> >  
> > +config BIG_LITTLE
> > +	bool "big.LITTLE support (Experimental)"
> > +	depends on CPU_V7 && SMP && EXPERIMENTAL
> > +	help
> > +	  This option enables support for the big.LITTLE architecture.
> > +
> 
> Perhaps few lines of what big.LITTLE is would be nice here?

I'd invite you to look at the latest series posted on the list.  This 
patch is obsolete.

> It is that "few high-performance cores + few low-power cores" on chip, right?

Right.

> BTW... is it possible/good idea to run all the cores at the same time for 
> max performance? From descriptions I understood that is not normally done
> and I'd like to understand why....

It can be done.  But then the scheduler needs to be smarter about which 
task is put on which core.


Nicolas
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f95ba14ae3..2271f02e8e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1579,6 +1579,12 @@  config HAVE_ARM_TWD
 	help
 	  This options enables support for the ARM timer and watchdog unit
 
+config BIG_LITTLE
+	bool "big.LITTLE support (Experimental)"
+	depends on CPU_V7 && SMP && EXPERIMENTAL
+	help
+	  This option enables support for the big.LITTLE architecture.
+
 choice
 	prompt "Memory split"
 	default VMSPLIT_3G
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index e8a4e58f1b..50880c494f 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -13,3 +13,6 @@  obj-$(CONFIG_SHARP_PARAM)	+= sharpsl_param.o
 obj-$(CONFIG_SHARP_SCOOP)	+= scoop.o
 obj-$(CONFIG_PCI_HOST_ITE8152)  += it8152.o
 obj-$(CONFIG_ARM_TIMER_SP804)	+= timer-sp.o
+obj-$(CONFIG_FIQ_GLUE)		+= fiq_glue.o fiq_glue_setup.o
+obj-$(CONFIG_FIQ_DEBUGGER)	+= fiq_debugger.o
+obj-$(CONFIG_BIG_LITTLE)	+= bL_head.o bL_entry.o
diff --git a/arch/arm/common/bL_entry.c b/arch/arm/common/bL_entry.c
new file mode 100644
index 0000000000..80fff49417
--- /dev/null
+++ b/arch/arm/common/bL_entry.c
@@ -0,0 +1,30 @@ 
+/*
+ * arch/arm/common/bL_entry.c -- big.LITTLE kernel re-entry point
+ *
+ * Created by:  Nicolas Pitre, March 2012
+ * Copyright:   (C) 2012  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+
+#include <asm/bL_entry.h>
+#include <asm/barrier.h>
+#include <asm/proc-fns.h>
+#include <asm/cacheflush.h>
+
+extern volatile unsigned long bL_entry_vectors[BL_NR_CLUSTERS][BL_CPUS_PER_CLUSTER];
+
+void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
+{
+	unsigned long val = ptr ? virt_to_phys(ptr) : 0;
+	bL_entry_vectors[cluster][cpu] = val;
+	smp_wmb();
+	__cpuc_flush_dcache_area((void *)&bL_entry_vectors[cluster][cpu], 4);
+	outer_clean_range(__pa(&bL_entry_vectors[cluster][cpu]),
+			  __pa(&bL_entry_vectors[cluster][cpu + 1]));
+}
diff --git a/arch/arm/common/bL_head.S b/arch/arm/common/bL_head.S
new file mode 100644
index 0000000000..9d351f2b4c
--- /dev/null
+++ b/arch/arm/common/bL_head.S
@@ -0,0 +1,81 @@ 
+/*
+ * arch/arm/common/bL_head.S -- big.LITTLE kernel re-entry point
+ *
+ * Created by:  Nicolas Pitre, March 2012
+ * Copyright:   (C) 2012  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/bL_entry.h>
+
+	.macro	pr_dbg	cpu, string
+#if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
+	b	1901f
+1902:	.ascii	"CPU 0: \0CPU 1: \0CPU 2: \0CPU 3: \0"
+	.ascii	"CPU 4: \0CPU 5: \0CPU 6: \0CPU 7: \0"
+1903:	.asciz	"\string"
+	.align
+1901:	adr	r0, 1902b
+	add	r0, r0, \cpu, lsl #3
+	bl	printascii
+	adr	r0, 1903b
+	bl	printascii
+#endif
+	.endm
+
+	.arm
+	.align
+
+ENTRY(bL_entry_point)
+
+ THUMB(	adr	r12, BSYM(1f)	)
+ THUMB(	bx	r12		)
+ THUMB(	.thumb			)
+1:
+	mrc	p15, 0, r0, c0, c0, 5
+	ubfx	r9, r0, #0, #4			@ r9 = cpu
+	ubfx	r10, r0, #8, #4			@ r10 = cluster
+	mov	r3, #BL_CPUS_PER_CLUSTER
+	mla	r4, r3, r10, r9			@ r4 = canonical CPU index
+	cmp	r4, #(BL_CPUS_PER_CLUSTER * BL_NR_CLUSTERS)
+	blo	2f
+
+	/* We didn't expect this CPU.  Try to make it quiet. */
+1:	wfi
+	wfe
+	b	1b
+
+2:	pr_dbg	r4, "kernel bL_entry_point\n"
+
+	/*
+	 * MMU is off so we need to get to bL_entry_vectors in a
+	 * position independent way.
+	 */
+	adr	r5, 3f
+	ldr	r6, [r5]
+	add	r6, r5, r6			@ r6 = bL_entry_vectors
+
+bL_entry_gated:
+	ldr	r5, [r6, r4, lsl #2]		@ r5 = CPU entry vector
+	cmp	r5, #0
+	wfeeq
+	beq	bL_entry_gated
+	pr_dbg	r4, "released\n"
+	bx	r5
+
+	.align	2
+
+3:	.word	bL_entry_vectors - .
+
+ENDPROC(bL_entry_point)
+
+	.bss
+	.align	5
+
+	.type	bL_entry_vectors, #object
+ENTRY(bL_entry_vectors)
+	.space	4 * BL_NR_CLUSTERS * BL_CPUS_PER_CLUSTER
diff --git a/arch/arm/include/asm/bL_entry.h b/arch/arm/include/asm/bL_entry.h
new file mode 100644
index 0000000000..ff623333a1
--- /dev/null
+++ b/arch/arm/include/asm/bL_entry.h
@@ -0,0 +1,35 @@ 
+/*
+ * arch/arm/include/asm/bL_entry.h
+ *
+ * Created by:  Nicolas Pitre, April 2012
+ * Copyright:   (C) 2012  Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef BL_ENTRY_H
+#define BL_ENTRY_H
+
+#define BL_CPUS_PER_CLUSTER	4
+#define BL_NR_CLUSTERS		2
+
+#ifndef __ASSEMBLY__
+
+/*
+ * Platform specific code should use this symbol to set up secondary
+ * entry location for processors to use when released from reset.
+ */
+extern void bL_entry_point(void);
+
+/*
+ * This is used to indicate where the given CPU from given cluster should
+ * branch once it is ready to re-enter the kernel using ptr, or NULL if it
+ * should be gated.  A gated CPU is held in a WFE loop until its vector
+ * becomes non NULL.
+ */
+void bL_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr);
+
+#endif /* ! __ASSEMBLY__ */
+#endif