[0/3] ARM: mvebu: disable I/O coherency on !SMP

Message ID	20140702222817.1262a7b0@free-electrons.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Wed, 2 Jul 2014 22:28:17 +0200 From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> To: Russell King - ARM Linux <linux@arm.linux.org.uk> Subject: Re: [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Message-ID: <20140702222817.1262a7b0@free-electrons.com> In-Reply-To: <20140702175857.GR32514@n2100.arm.linux.org.uk> References: <1404318070-8503-1-git-send-email-thomas.petazzoni@free-electrons.com> <20140702164147.GQ32514@n2100.arm.linux.org.uk> <20140702191832.5e9f35fc@free-electrons.com> <20140702175857.GR32514@n2100.arm.linux.org.uk> Organization: Free Electrons MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/rluKPiJABAB_vPaJa8uJngr" Cc: Lior Amsalem <alior@marvell.com>, Andrew Lunn <andrew@lunn.ch>, Jason Cooper <jason@lakedaemon.net>, Tawfik Bayouk <tawfik@marvell.com>, Simon Guinot <simon.guinot@sequanux.org>, Christophe Vu-Brugier <cvubrugier@lacie.com>, Nadav Haklai <nadavh@marvell.com>, Ezequiel Garcia <ezequiel.garcia@free-electrons.com>, Gregory Clement <gregory.clement@free-electrons.com>, linux-arm-kernel@lists.infradead.org, Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Precedence: list Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Message ID

20140702222817.1262a7b0@free-electrons.com (mailing list archive)

State

New, archived

Headers

Date: Wed, 2 Jul 2014 22:28:17 +0200
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Subject: Re: [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
Message-ID: <20140702222817.1262a7b0@free-electrons.com>
In-Reply-To: <20140702175857.GR32514@n2100.arm.linux.org.uk>
References: <1404318070-8503-1-git-send-email-thomas.petazzoni@free-electrons.com>
	<20140702164147.GQ32514@n2100.arm.linux.org.uk>
	<20140702191832.5e9f35fc@free-electrons.com>
	<20140702175857.GR32514@n2100.arm.linux.org.uk>
Organization: Free Electrons
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MP_/rluKPiJABAB_vPaJa8uJngr"
Cc: Lior Amsalem <alior@marvell.com>, Andrew Lunn <andrew@lunn.ch>,
	Jason Cooper <jason@lakedaemon.net>, Tawfik Bayouk <tawfik@marvell.com>, 
	Simon Guinot <simon.guinot@sequanux.org>,
	Christophe Vu-Brugier <cvubrugier@lacie.com>,
	Nadav Haklai <nadavh@marvell.com>,
	Ezequiel Garcia <ezequiel.garcia@free-electrons.com>,
	Gregory Clement <gregory.clement@free-electrons.com>,
	linux-arm-kernel@lists.infradead.org,
	Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Precedence: list
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Commit Message

Thomas Petazzoni July 2, 2014, 8:28 p.m. UTC

Russell,

On Wed, 2 Jul 2014 18:58:57 +0100, Russell King - ARM Linux wrote:

> You say that, but Dove is PJ4B too (I forget whether it's PJ4B or not.)
> The only way to be sure is to compare the ID numbers.  The AP510 in
> Dove I have here has an ID register value of 0x560f5815.  This ties up
> with the PJ4B entry in proc-v7.S:
> 
>         .type   __v7_pj4b_proc_info, #object
> __v7_pj4b_proc_info:
>         .long   0x560f5800
>         .long   0xff0fff00
>         __v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
>         .size   __v7_pj4b_proc_info, . - __v7_pj4b_proc_info
> 
> How does that compare to the ID you see on Armada 370 and XP?  (I've
> no idea what IDs you get there...)  If the ID is different, then we
> should be able to solve this pretty easily for these CPUs by creating
> a new proc_info record as I think I have previously suggested.

The ID for Armada 370 is 0x560f5811, but the last digit is the "Product
Revision Number", so I'm not sure it can be used to distinguish Dove
from Armada 370. However, I believe we could simply run Dove in a
Write-Allocate cache policy.

> With a separate record, we can set the S bit if we need to, and we can
> also set the write-allocate mode too, now that the patch I sent you
> is in mainline (which is something I definitely indicated along with
> that patch.)
> 
> I've not really said anything new in this email which I haven't said
> before, with the exception of stating what the ID is for the Dove SoC
> I have.  That's the only ID I know, and I assume that those working on
> mvebu have a better idea what ID numbers are found across all the
> families.

Russell, please again what I've asked numerous times. I already wrote
the patches that creates specific proc_info structures for Armada 370
and Armada XP to set the cache policy and S bit, as you suggested. But
this is only setting the PMD flags, but doesn't touch the TTB flags,
which remain only defined by ALT_UP/ALT_SMP conditional. And I remember
Catalin saying that having PMD flags varying from the TTB flags could
be a problem.

That is the question I'm asking to you since several e-mails that you
never answered, and that is completely unrelated to having separate
proc_info structure, because the TTB flags are *not* defined in the
proc_info structure.

Sorry to be a bit harsh here but you seem to assume that I'm an idiot
who didn't read what you said, and you simply repeat again and again
that your patch already solves the problem by allowing the proc_info
structure to define the PMD flags. But you never answered my question
about the TTB flags, which is the question I've been asking since
several e-mails already.

See the two attached patches for what I've already done for Armada 370
and Armada XP using the proc_info structures. Writing those patches
lead me to the problem of the TTB flags, which remains unanswered.

Thanks for your help!

Thomas

Comments

Russell King - ARM Linux July 2, 2014, 9:33 p.m. UTC | #1

On Wed, Jul 02, 2014 at 10:28:17PM +0200, Thomas Petazzoni wrote:
> Russell,
> 
> On Wed, 2 Jul 2014 18:58:57 +0100, Russell King - ARM Linux wrote:
> 
> > You say that, but Dove is PJ4B too (I forget whether it's PJ4B or not.)
> > The only way to be sure is to compare the ID numbers.  The AP510 in
> > Dove I have here has an ID register value of 0x560f5815.  This ties up
> > with the PJ4B entry in proc-v7.S:
> > 
> >         .type   __v7_pj4b_proc_info, #object
> > __v7_pj4b_proc_info:
> >         .long   0x560f5800
> >         .long   0xff0fff00
> >         __v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> >         .size   __v7_pj4b_proc_info, . - __v7_pj4b_proc_info
> > 
> > How does that compare to the ID you see on Armada 370 and XP?  (I've
> > no idea what IDs you get there...)  If the ID is different, then we
> > should be able to solve this pretty easily for these CPUs by creating
> > a new proc_info record as I think I have previously suggested.
> 
> The ID for Armada 370 is 0x560f5811, but the last digit is the "Product
> Revision Number", so I'm not sure it can be used to distinguish Dove
> from Armada 370. However, I believe we could simply run Dove in a
> Write-Allocate cache policy.
> 
> > With a separate record, we can set the S bit if we need to, and we can
> > also set the write-allocate mode too, now that the patch I sent you
> > is in mainline (which is something I definitely indicated along with
> > that patch.)
> > 
> > I've not really said anything new in this email which I haven't said
> > before, with the exception of stating what the ID is for the Dove SoC
> > I have.  That's the only ID I know, and I assume that those working on
> > mvebu have a better idea what ID numbers are found across all the
> > families.
> 
> Russell, please again what I've asked numerous times. I already wrote
> the patches that creates specific proc_info structures for Armada 370
> and Armada XP to set the cache policy and S bit, as you suggested. But
> this is only setting the PMD flags, but doesn't touch the TTB flags,
> which remain only defined by ALT_UP/ALT_SMP conditional. And I remember
> Catalin saying that having PMD flags varying from the TTB flags could
> be a problem.

That's where having a separate processor struct comes in (which we
already have for the pj4b) and having a separate switch_mm which knows
to set the TTB flags to write allocate.

> Sorry to be a bit harsh here but you seem to assume that I'm an idiot
> who didn't read what you said, and you simply repeat again and again
> that your patch already solves the problem by allowing the proc_info
> structure to define the PMD flags. But you never answered my question
> about the TTB flags, which is the question I've been asking since
> several e-mails already.

You're assuming that I remember all the details each time... I know
that you have sent me several times all the details about what each
Armada device requires, and each time I've briefly read it and thought
"yes, I must get to that" and then never have done.  That's why we're
not making progress - I haven't had the time to read and digest your
messages properly.  For example, if you asked me right now what
Armada XP requires, whether it's SMP or not, I really couldn't tell
you.

> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 6e1b2da..310834e 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -483,14 +483,14 @@ __v7_ca9mp_proc_info:
>  __v7_pj4b_proc_info:
>  	.long	0x560f5810
>  	.long	0xff0ffff0
> -	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> +	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions, mm_mmuflags = PMD_SECT_WBWA
>  	.size	__v7_pj4b_proc_info, . - __v7_pj4b_proc_info
>  
>  	.type   __v7_pj4b_mp_proc_info, #object
>  __v7_pj4b_mp_proc_info:
>  	.long	0x560f5840
>  	.long	0xff0ffff0
> -	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> +	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions, mm_mmuflags = PMD_SECT_WBWA|PMD_SECT_S
>  	.size	__v7_pj4b_mp_proc_info, . - __v7_pj4b_mp_proc_info
>  #endif

Right, so you're using pj4b_processor_functions here.

#ifdef CONFIG_CPU_PJ4B
        globl_equ       cpu_pj4b_switch_mm,     cpu_v7_switch_mm
        globl_equ       cpu_pj4b_set_pte_ext,   cpu_v7_set_pte_ext
        globl_equ       cpu_pj4b_proc_init,     cpu_v7_proc_init

So we need to delete the first (switch_mm) so we can replace it.  I
think we may need a second set of processor functions to deal with
the MP version as well.  Most of these would need to be aliased
to the cpu_pj4b versions, except for the switch_mm one too.

You may wish to consider changing switch_mm to look something like the
following.  I'm assuming that for pj4b you just need WBWA for both
the inner and the outer, but not the shared bit - you'll need to change
that if it needs something else:

ENTRY(cpu_pj4b_switch_mm)
	orr	r0, r0, #TTB_IRGN_WBWA | TTB_RGN_OC_WBWA
	b	1f
ENTRY(cpu_pj4b_mp_switch_mm)
	orr	r0, r0, #TTB_FLAGS_SMP
	b	1f

ENTRY(cpu_v7_switch_mm)
#ifdef CONFIG_MMU
        ALT_SMP(orr     r0, r0, #TTB_FLAGS_SMP)
        ALT_UP(orr      r0, r0, #TTB_FLAGS_UP)
1:      mmid    r1, r1                          @ get mm->context.id
        mov     r2, #0
#ifdef CONFIG_ARM_ERRATA_430973
        mcr     p15, 0, r2, c7, c5, 6           @ flush BTAC/BTB
#endif
...
ENDPROC(cpu_v7_switch_mm)
ENDPROC(cpu_pj4b_switch_mm)
ENDPROC(cpu_pj4b_mp_switch_mm)

That gets rid of the one in the switch_mm path.  The setup
function is much more icky.

I think __v7_setup needs splitting up - especially due to all the
errata in there.  I think we ought to add a whole load of processor
proc_info's - for ARM Cortex-A8, Cortex-A9, Cortex-A15 etc.  Each
one calls a separate setup function which runs only the appropriate
errata fixups for that core (which eliminates a load of testing in
__v7_setup).

This also means we can split the v7_ttb_setup macro and change it
to be per CPU.  (I'm not entirely convinced right now that TTB1
isn't initialised with an undefined value - I can't see where r8
is setup amongst the mess that this has turned into.

I'm not sure what this means for the 3level v7 code - that's something
I've never looked at except a brief glance when it went in.  The
v7_ttb_setup there looks rather complicated and might make the changes
I outlined above much harder.  That also isn't something I can play
around with because I've never been able to get Linux running on the
Versatile Express with the CPU card that would support that code.  We
may need to enlist Catalin's help to modify that code.

That's about the best guidance I can give this evening, and I hope
it helps you.

Thomas Petazzoni July 2, 2014, 9:50 p.m. UTC | #2

Russell,

On Wed, 2 Jul 2014 22:33:40 +0100, Russell King - ARM Linux wrote:

> That's where having a separate processor struct comes in (which we
> already have for the pj4b) and having a separate switch_mm which knows
> to set the TTB flags to write allocate.

Ah, okay.

> You're assuming that I remember all the details each time... I know
> that you have sent me several times all the details about what each
> Armada device requires, and each time I've briefly read it and thought
> "yes, I must get to that" and then never have done.  That's why we're
> not making progress - I haven't had the time to read and digest your
> messages properly.  For example, if you asked me right now what
> Armada XP requires, whether it's SMP or not, I really couldn't tell
> you.

That's surely something I understand perfectly. It took me a while to
get in my head all the requirements of the various SoCs, so I certainly
understand that someone not working on a daily with only those specific
SoCs can easily lose track of what each SoC requires.

Maybe I should write up a summary document/web-page of the requirements
of each SoC, with their identifier values, all progressively add all
the relevant details as the discussion progresses.

> Right, so you're using pj4b_processor_functions here.
> 
> #ifdef CONFIG_CPU_PJ4B
>         globl_equ       cpu_pj4b_switch_mm,     cpu_v7_switch_mm
>         globl_equ       cpu_pj4b_set_pte_ext,   cpu_v7_set_pte_ext
>         globl_equ       cpu_pj4b_proc_init,     cpu_v7_proc_init
> 
> So we need to delete the first (switch_mm) so we can replace it.  I
> think we may need a second set of processor functions to deal with
> the MP version as well.  Most of these would need to be aliased
> to the cpu_pj4b versions, except for the switch_mm one too.
> 
> You may wish to consider changing switch_mm to look something like the
> following.  I'm assuming that for pj4b you just need WBWA for both
> the inner and the outer, but not the shared bit - you'll need to change
> that if it needs something else:
> 
> ENTRY(cpu_pj4b_switch_mm)
> 	orr	r0, r0, #TTB_IRGN_WBWA | TTB_RGN_OC_WBWA
> 	b	1f
> ENTRY(cpu_pj4b_mp_switch_mm)
> 	orr	r0, r0, #TTB_FLAGS_SMP
> 	b	1f
> 	
> ENTRY(cpu_v7_switch_mm)
> #ifdef CONFIG_MMU
>         ALT_SMP(orr     r0, r0, #TTB_FLAGS_SMP)
>         ALT_UP(orr      r0, r0, #TTB_FLAGS_UP)
> 1:      mmid    r1, r1                          @ get mm->context.id
>         mov     r2, #0
> #ifdef CONFIG_ARM_ERRATA_430973
>         mcr     p15, 0, r2, c7, c5, 6           @ flush BTAC/BTB
> #endif
> ...
> ENDPROC(cpu_v7_switch_mm)
> ENDPROC(cpu_pj4b_switch_mm)
> ENDPROC(cpu_pj4b_mp_switch_mm)
> 
> That gets rid of the one in the switch_mm path.  The setup
> function is much more icky.
> 
> I think __v7_setup needs splitting up - especially due to all the
> errata in there.  I think we ought to add a whole load of processor
> proc_info's - for ARM Cortex-A8, Cortex-A9, Cortex-A15 etc.  Each
> one calls a separate setup function which runs only the appropriate
> errata fixups for that core (which eliminates a load of testing in
> __v7_setup).
> 
> This also means we can split the v7_ttb_setup macro and change it
> to be per CPU.  (I'm not entirely convinced right now that TTB1
> isn't initialised with an undefined value - I can't see where r8
> is setup amongst the mess that this has turned into.
> 
> I'm not sure what this means for the 3level v7 code - that's something
> I've never looked at except a brief glance when it went in.  The
> v7_ttb_setup there looks rather complicated and might make the changes
> I outlined above much harder.  That also isn't something I can play
> around with because I've never been able to get Linux running on the
> Versatile Express with the CPU card that would support that code.  We
> may need to enlist Catalin's help to modify that code.
> 
> That's about the best guidance I can give this evening, and I hope
> it helps you.

Wow, thanks a lot for all those details. It'll also take me a little
while to digest these informations, and see what I can implement based
on that.

Thanks again!

Thomas

From 9fb191627dbaa358e6356eb78dfd44847e964ed1 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Fri, 13 Jun 2014 16:06:17 +0200
Subject: [PATCH 2/2] ARM: mm: set appropriate mm_mmuflags for PJ4B variants

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mm/proc-v7.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 6e1b2da..310834e 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -483,14 +483,14 @@  __v7_ca9mp_proc_info:
 __v7_pj4b_proc_info:
 	.long	0x560f5810
 	.long	0xff0ffff0
-	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
+	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions, mm_mmuflags = PMD_SECT_WBWA
 	.size	__v7_pj4b_proc_info, . - __v7_pj4b_proc_info
 
 	.type   __v7_pj4b_mp_proc_info, #object
 __v7_pj4b_mp_proc_info:
 	.long	0x560f5840
 	.long	0xff0ffff0
-	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
+	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions, mm_mmuflags = PMD_SECT_WBWA|PMD_SECT_S
 	.size	__v7_pj4b_mp_proc_info, . - __v7_pj4b_mp_proc_info
 #endif
 
-- 
2.0.0

[0/3] ARM: mvebu: disable I/O coherency on !SMP

Commit Message

Comments

Patch