diff mbox series

[PATCH/RFC] ARM: dts: marzen: Add FLASH node

Message ID 07cf5e2b466f3ba217403afc66a8246460609e09.1679330105.git.geert+renesas@glider.be (mailing list archive)
State New, archived
Headers show
Series [PATCH/RFC] ARM: dts: marzen: Add FLASH node | expand

Commit Message

Geert Uytterhoeven March 20, 2023, 4:43 p.m. UTC
Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
development board.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
"mtd-rom" does not work:
  1. Probing fails with "physmap-flash 0.flash: map_probe failed",
  2. The kernel crashes later in a spectacular way, cfr. the logs below.

U-Boot flinfo says:

    Bank # 1: CFI conformant flash (16 x 16)  Size: 64 MB in 512 Sectors
      AMD Standard command set, Manufacturer ID: 0x01, Device ID: 0x227E
      Erase timeout: 4096 ms, write timeout: 1 ms
      Buffer write timeout: 3 ms, buffer size: 64 bytes
      Sector Start Addresses:
      00000000   RO   00020000        00040000   RO   00060000   RO   00080000
      000A0000        000C0000        000E0000        00100000        00120000
      00140000        00160000        00180000        001A0000        001C0000
      ...
      03FC0000        03FE0000

    Bank # 2: missing or unknown FLASH type

Does anyone have a clue? Using "mtd-rom", I can at least read the FLASH
under Linux.

Thanks in advance!

Crash logs:

    WARNING: bad unlock balance detected!
    6.3.0-rc1-marzen-02372-g053d8eb1df8a-dirty #29 Not tainted
    -------------------------------------
    rcS/66 is trying to release lock ((null)) at:
    [<c011f038>] copy_process+0x1120/0x193c
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by rcS/66:
     #0: c28008d8 (&mm->mmap_lock){++++}-{3:3}, at: copy_process+0xcc4/0x193c

    stack backtrace:
    CPU: 0 PID: 66 Comm: rcS Not tainted 6.3.0-rc1-marzen-02372-g053d8eb1df8a-dirty #29
    Hardware name: Generic R8A7779 (Flattened Device Tree)
     unwind_backtrace from show_stack+0x10/0x14
     show_stack from dump_stack_lvl+0x68/0x90
     dump_stack_lvl from lock_release+0x168/0x328
     lock_release from up_write+0x20/0x244
     up_write from copy_process+0x1120/0x193c
     copy_process from kernel_clone+0xa0/0x2e4
     kernel_clone from sys_clone+0x6c/0x94
     sys_clone from ret_fast_syscall+0x0/0x1c
    Exception stack(0xf0a2dfa8 to 0xf0a2dff0)
    dfa0:                   00200068 bed21738 01200011 00000000 00000000 00000000
    dfc0: 00200068 bed21738 00000042 00000078 00000000 00200000 0020123c bed2176c
    dfe0: 002004c0 bed21738 00000000 00119664
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 66 at kernel/locking/rwsem.c:1364 up_write+0x98/0x244
    DEBUG_RWSEMS_WARN_ON(sem->magic != sem): count = 0x0, magic = 0x0, owner = 0x0, curr 0xc27b04c0, list not empty
    Modules linked in:
    CPU: 0 PID: 66 Comm: rcS Not tainted 6.3.0-rc1-marzen-02372-g053d8eb1df8a-dirty #29
    Hardware name: Generic R8A7779 (Flattened Device Tree)
     unwind_backtrace from show_stack+0x10/0x14
     show_stack from dump_stack_lvl+0x68/0x90
     dump_stack_lvl from __warn+0x7c/0x1c0
     __warn from warn_slowpath_fmt+0xec/0x138
     warn_slowpath_fmt from up_write+0x98/0x244
     up_write from copy_process+0x1120/0x193c
     copy_process from kernel_clone+0xa0/0x2e4
     kernel_clone from sys_clone+0x6c/0x94
     sys_clone from ret_fast_syscall+0x0/0x1c
    Exception stack(0xf0a2dfa8 to 0xf0a2dff0)
    dfa0:                   00200068 bed21738 01200011 00000000 00000000 00000000
    dfc0: 00200068 bed21738 00000042 00000078 00000000 00200000 0020123c bed2176c
    dfe0: 002004c0 bed21738 00000000 00119664
    irq event stamp: 1863
    hardirqs last  enabled at (1863): [<c026c664>] kmem_cache_free+0x130/0x158
    hardirqs last disabled at (1862): [<c026c614>] kmem_cache_free+0xe0/0x158
    softirqs last  enabled at (1624): [<c0101474>] __do_softirq+0x178/0x3e8
    softirqs last disabled at (1611): [<c01276f0>] __irq_exit_rcu+0x110/0x168
    ---[ end trace 0000000000000000 ]---

  or:

    WARNING: bad unlock balance detected!
    6.3.0-rc1-marzen-02372-g053d8eb1df8a-dirty #27 Not tainted
    -------------------------------------
    systemd/1 is trying to release lock (
    8<--- cut here ---
    Unable to handle kernel paging request at virtual address 4e22202c when read
    [4e22202c] *pgd=00000000
    Internal error: Oops: 5 [#1] SMP ARM
    Modules linked in:
    CPU: 2 PID: 1 Comm: systemd Not tainted 6.3.0-rc1-marzen-02372-g053d8eb1df8a-dirty #27
    Hardware name: Generic R8A7779 (Flattened Device Tree)
    PC is at string_nocheck+0x44/0x64
    LR is at 0xffffffff
    pc : [<c080ac48>]    lr : [<ffffffff>]    psr: 000f0093
    sp : f0829a28  ip : f0829af4  fp : c0b13875
    r10: 00000008  r9 : f0829a7c  r8 : f0829aec
    r7 : f0829af4  r6 : f0829b7c  r5 : 4e22202c  r4 : f0829af4
    r3 : ffffff04  r2 : 4e22202c  r1 : 00000000  r0 : f0829aee
    Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
    Control: 10c5387d  Table: 6332004a  DAC: 00000051
    Register r0 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Register r1 information: NULL pointer
    Register r2 information: non-paged memory
    Register r3 information: non-paged memory
    Register r4 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Register r5 information: non-paged memory
    Register r6 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Register r7 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Register r8 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Register r9 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Register r10 information: non-paged memory
    Register r11 information: non-slab/vmalloc memory
    Register r12 information: 2-page vmalloc region starting at 0xf0828000 allocated at kernel_clone+0xa0/0x2e4
    Process systemd (pid: 1, stack limit = 0x(ptrval))
    Stack: (0xf0829a28 to 0xf082a000)
    9a20:                   f0829aee f0829af4 4e22202c c080c990 c18c0918 ffffff04
    [...]
     string_nocheck from string+0x54/0x64
     string from vsnprintf+0x220/0x36c
     vsnprintf from vprintk_store+0x130/0x3b4
     vprintk_store from vprintk_emit+0xa8/0x23c
     vprintk_emit from vprintk_default+0x1c/0x24
     vprintk_default from _printk+0x28/0x58
     _printk from print_lockdep_cache+0x3c/0x68
     print_lockdep_cache from print_unlock_imbalance_bug+0x6c/0xe0
     print_unlock_imbalance_bug from lock_release+0x168/0x328
     lock_release from up_write+0x20/0x244
     up_write from copy_process+0x1120/0x193c
     copy_process from kernel_clone+0xa0/0x2e4
     kernel_clone from sys_clone+0x6c/0x94
     sys_clone from ret_fast_syscall+0x0/0x1c
    Exception stack(0xf0829fa8 to 0xf0829ff0)
    9fa0:                   b64f47b8 b64f4c10 01200011 00000000 00000000 00000000
    9fc0: b64f47b8 b64f4c10 00000000 00000078 00000020 00000000 b64f4750 00000001
    9fe0: 00000078 be953908 b6ea5253 b6e47746
    Code: e28dd00c e49de004 e28dd008 e12fff1e (e7d23001)
    ---[ end trace 0000000000000000 ]---

---
 arch/arm/boot/dts/r8a7779-marzen.dts | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

Tudor Ambarus March 20, 2023, 5:04 p.m. UTC | #1
Hi, Geert!

Vignesh used to review CFI code, maybe he can intervene. I've never
worked with CFI, but I can try to help. I'll need more debug data though.

On 3/20/23 16:43, Geert Uytterhoeven wrote:
> Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
> development board.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
> "mtd-rom" does not work:
>   1. Probing fails with "physmap-flash 0.flash: map_probe failed",

I would first try to understand why the probe fails.

>   2. The kernel crashes later in a spectacular way, cfr. the logs below.
> 
> U-Boot flinfo says:
> 
>     Bank # 1: CFI conformant flash (16 x 16)  Size: 64 MB in 512 Sectors
>       AMD Standard command set, Manufacturer ID: 0x01, Device ID: 0x227E
>       Erase timeout: 4096 ms, write timeout: 1 ms
>       Buffer write timeout: 3 ms, buffer size: 64 bytes
>       Sector Start Addresses:
>       00000000   RO   00020000        00040000   RO   00060000   RO   00080000
>       000A0000        000C0000        000E0000        00100000        00120000
>       00140000        00160000        00180000        001A0000        001C0000
>       ...
>       03FC0000        03FE0000
> 
>     Bank # 2: missing or unknown FLASH type
> 

Do you use "cfi-flash" compatible in u-boot and it works just fine? If
yes, I would try to understand what are the differences between the
probe paths from u-boot and linux. I know it doesn't help much, but we
should start from somewhere if we don't have any other feedback.

Cheers,
ta
Geert Uytterhoeven March 20, 2023, 6:57 p.m. UTC | #2
Hi Tudor,

On Mon, Mar 20, 2023 at 6:04 PM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> Vignesh used to review CFI code, maybe he can intervene. I've never
> worked with CFI, but I can try to help. I'll need more debug data though.
>
> On 3/20/23 16:43, Geert Uytterhoeven wrote:
> > Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
> > development board.
> >
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > ---
> > Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
> > "mtd-rom" does not work:
> >   1. Probing fails with "physmap-flash 0.flash: map_probe failed",
>
> I would first try to understand why the probe fails.

With debug code added, I saw that cfi_probe_chip() fails because
cfi_qry_mode_on() returns zero.  Printing actual vs. expected values
in map_word_equal() showed that nothing was read (all 0xff, IIRC).

Forcing big-endian (CONFIG_MTD_CFI_BE_BYTE_SWAP=y) didn't
help, and caused an unaligned access crash.

> >   2. The kernel crashes later in a spectacular way, cfr. the logs below.
> >
> > U-Boot flinfo says:
> >
> >     Bank # 1: CFI conformant flash (16 x 16)  Size: 64 MB in 512 Sectors
> >       AMD Standard command set, Manufacturer ID: 0x01, Device ID: 0x227E
> >       Erase timeout: 4096 ms, write timeout: 1 ms
> >       Buffer write timeout: 3 ms, buffer size: 64 bytes
> >       Sector Start Addresses:
> >       00000000   RO   00020000        00040000   RO   00060000   RO   00080000
> >       000A0000        000C0000        000E0000        00100000        00120000
> >       00140000        00160000        00180000        001A0000        001C0000
> >       ...
> >       03FC0000        03FE0000
> >
> >     Bank # 2: missing or unknown FLASH type
>
> Do you use "cfi-flash" compatible in u-boot and it works just fine? If
> yes, I would try to understand what are the differences between the
> probe paths from u-boot and linux. I know it doesn't help much, but we
> should start from somewhere if we don't have any other feedback.

It's a bit more complicated... This is a rather old board, and I only
have the originally installed U-Boot 2011.03, and no U-Boot sources.

U-Boot can write to the FLASH, as "saveenv" works (env at  0x40000).

Thanks for your help!

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven March 21, 2023, 3:01 p.m. UTC | #3
On Mon, Mar 20, 2023 at 7:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Mon, Mar 20, 2023 at 6:04 PM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> > Vignesh used to review CFI code, maybe he can intervene. I've never
> > worked with CFI, but I can try to help. I'll need more debug data though.
> >
> > On 3/20/23 16:43, Geert Uytterhoeven wrote:
> > > Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
> > > development board.
> > >
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > ---
> > > Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
> > > "mtd-rom" does not work:
> > >   1. Probing fails with "physmap-flash 0.flash: map_probe failed",
> >
> > I would first try to understand why the probe fails.
>
> With debug code added, I saw that cfi_probe_chip() fails because
> cfi_qry_mode_on() returns zero.  Printing actual vs. expected values
> in map_word_equal() showed that nothing was read (all 0xff, IIRC).

I misremembered:

    physmap-flash 0.flash: physmap platform flash device: [mem
0x00000000-0x03ffffff]
    cfi_qry_present:238: 0x5151 vs. 0xe0
    cfi_qry_present:238: 0x5151 vs. 0xe0
    cfi_qry_present:238: 0x5151 vs. 0xe0
    cfi_qry_present:238: 0x5151 vs. 0xe0
    cfi_qry_present:238: 0x5151 vs. 0xe0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x51 vs. 0xe0
    cfi_qry_present:238: 0x51 vs. 0xe0
    cfi_qry_present:238: 0x51 vs. 0xe0
    cfi_qry_present:238: 0x51 vs. 0xe0
    cfi_qry_present:238: 0x51 vs. 0xe0
    cfi_qry_present:238: 0x51 vs. 0x0
    cfi_qry_present:238: 0x51 vs. 0x0
    cfi_qry_present:238: 0x51 vs. 0x0
    cfi_qry_present:238: 0x51 vs. 0x0
    cfi_qry_present:238: 0x51 vs. 0x0
    physmap-flash 0.flash: map_probe failed

I suddenly remembered I have a different board (APE6-EVM), where
the CFI-FLASH stopped working after adding support for secondary
CPUs. I always thought that was a hardware quirk...

Turns out the CFI-FLASH on Marzen (quad Cortex-A9) is detected when
booting with "nosmp":

    physmap-flash 0.flash: physmap platform flash device: [mem
0x00000000-0x03ffffff]
    cfi_qry_present:238: 0x5151 vs. 0x51
    cfi_qry_present:238: 0x5151 vs. 0x51
    cfi_qry_present:238: 0x5151 vs. 0x51
    cfi_qry_present:238: 0x5151 vs. 0x1c0
    cfi_qry_present:238: 0x5151 vs. 0x1c0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x0
    cfi_qry_present:238: 0x5151 vs. 0x6002
    cfi_qry_present:238: 0x5151 vs. 0x6002
    cfi_qry_present:238: 0x5151 vs. 0x6002
    cfi_qry_present:238: 0x5151 vs. 0x6002
    cfi_qry_present:238: 0x5151 vs. 0x6002
    0.flash: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer
ID 0x000001 Chip ID 0x002301
    Amd/Fujitsu Extended Query Table at 0x0040
      Amd/Fujitsu Extended Query version 1.3.
    number of CFI chips: 1
    3 fixed-partitions partitions found on MTD device 0.flash
    Creating 3 MTD partitions on "0.flash":
    0x000000000000-0x000000040000 : "uboot"
    0x000000040000-0x000000080000 : "uboot-env"
    0x000000080000-0x000004000000 : "flash"

My first guess was that the probing process is migrated to a different
CPU core during probing, but printing smp_processor_id() didn't
confirm that; it's just running on a different CPU core than CPU0.
Wrapping the body of cfi_qry_mode_on() inside a get_cpu()/put_cpu()
pair to prevent migration also didn't fix it.

Is CFI-FLASH known-broken on SMP?
Thanks!

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven April 3, 2023, 4:29 p.m. UTC | #4
On Tue, Mar 21, 2023 at 4:01 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Mon, Mar 20, 2023 at 7:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Mon, Mar 20, 2023 at 6:04 PM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> > > Vignesh used to review CFI code, maybe he can intervene. I've never
> > > worked with CFI, but I can try to help. I'll need more debug data though.
> > >
> > > On 3/20/23 16:43, Geert Uytterhoeven wrote:
> > > > Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
> > > > development board.
> > > >
> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > > ---
> > > > Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
> > > > "mtd-rom" does not work:
> > > >   1. Probing fails with "physmap-flash 0.flash: map_probe failed",
> > >
> > > I would first try to understand why the probe fails.

> I suddenly remembered I have a different board (APE6-EVM), where
> the CFI-FLASH stopped working after adding support for secondary
> CPUs. I always thought that was a hardware quirk...
>
> Turns out the CFI-FLASH on Marzen (quad Cortex-A9) is detected when
> booting with "nosmp":

> My first guess was that the probing process is migrated to a different
> CPU core during probing, but printing smp_processor_id() didn't
> confirm that; it's just running on a different CPU core than CPU0.
> Wrapping the body of cfi_qry_mode_on() inside a get_cpu()/put_cpu()
> pair to prevent migration also didn't fix it.
>
> Is CFI-FLASH known-broken on SMP?

After actively looking for more boards with CFI FLASHes, and finding one
more board where FLASH probing fails on SMP, I dug deeper.
Turns out they all have in common that (a) the CFI FLASH is located at
physical address zero, and (b) the secondary CPU bringup code relies
on mapping (by special hardware) the region at address zero to the
CPU boot code...

Hence fixing this involves making sure that accessing FLASH and bringing
CPU cores online do not happen concurrently...

Gr{oetje,eeting}s,

                        Geert
Tudor Ambarus April 7, 2023, 7:16 a.m. UTC | #5
Hi, Geert,

On 4/3/23 17:29, Geert Uytterhoeven wrote:
> On Tue, Mar 21, 2023 at 4:01 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> On Mon, Mar 20, 2023 at 7:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>> On Mon, Mar 20, 2023 at 6:04 PM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>> Vignesh used to review CFI code, maybe he can intervene. I've never
>>>> worked with CFI, but I can try to help. I'll need more debug data though.
>>>>
>>>> On 3/20/23 16:43, Geert Uytterhoeven wrote:
>>>>> Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
>>>>> development board.
>>>>>
>>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>>>> ---
>>>>> Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
>>>>> "mtd-rom" does not work:
>>>>>   1. Probing fails with "physmap-flash 0.flash: map_probe failed",
>>>>
>>>> I would first try to understand why the probe fails.
> 
>> I suddenly remembered I have a different board (APE6-EVM), where
>> the CFI-FLASH stopped working after adding support for secondary
>> CPUs. I always thought that was a hardware quirk...
>>
>> Turns out the CFI-FLASH on Marzen (quad Cortex-A9) is detected when
>> booting with "nosmp":
> 
>> My first guess was that the probing process is migrated to a different
>> CPU core during probing, but printing smp_processor_id() didn't
>> confirm that; it's just running on a different CPU core than CPU0.
>> Wrapping the body of cfi_qry_mode_on() inside a get_cpu()/put_cpu()
>> pair to prevent migration also didn't fix it.
>>
>> Is CFI-FLASH known-broken on SMP?
> 
> After actively looking for more boards with CFI FLASHes, and finding one
> more board where FLASH probing fails on SMP, I dug deeper.
> Turns out they all have in common that (a) the CFI FLASH is located at
> physical address zero, and (b) the secondary CPU bringup code relies
> on mapping (by special hardware) the region at address zero to the
> CPU boot code...
> 
> Hence fixing this involves making sure that accessing FLASH and bringing
> CPU cores online do not happen concurrently...
> 

Would deferring probe for CFI work?

ta
Geert Uytterhoeven April 10, 2023, 9:25 a.m. UTC | #6
Hi Tudor,

On Fri, Apr 7, 2023 at 9:16 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> On 4/3/23 17:29, Geert Uytterhoeven wrote:
> > On Tue, Mar 21, 2023 at 4:01 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >> On Mon, Mar 20, 2023 at 7:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >>> On Mon, Mar 20, 2023 at 6:04 PM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>>> Vignesh used to review CFI code, maybe he can intervene. I've never
> >>>> worked with CFI, but I can try to help. I'll need more debug data though.
> >>>>
> >>>> On 3/20/23 16:43, Geert Uytterhoeven wrote:
> >>>>> Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
> >>>>> development board.
> >>>>>
> >>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> >>>>> ---
> >>>>> Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
> >>>>> "mtd-rom" does not work:
> >>>>>   1. Probing fails with "physmap-flash 0.flash: map_probe failed",
> >>>>
> >>>> I would first try to understand why the probe fails.
> >
> >> I suddenly remembered I have a different board (APE6-EVM), where
> >> the CFI-FLASH stopped working after adding support for secondary
> >> CPUs. I always thought that was a hardware quirk...
> >>
> >> Turns out the CFI-FLASH on Marzen (quad Cortex-A9) is detected when
> >> booting with "nosmp":
> >
> >> My first guess was that the probing process is migrated to a different
> >> CPU core during probing, but printing smp_processor_id() didn't
> >> confirm that; it's just running on a different CPU core than CPU0.
> >> Wrapping the body of cfi_qry_mode_on() inside a get_cpu()/put_cpu()
> >> pair to prevent migration also didn't fix it.
> >>
> >> Is CFI-FLASH known-broken on SMP?
> >
> > After actively looking for more boards with CFI FLASHes, and finding one
> > more board where FLASH probing fails on SMP, I dug deeper.
> > Turns out they all have in common that (a) the CFI FLASH is located at
> > physical address zero, and (b) the secondary CPU bringup code relies
> > on mapping (by special hardware) the region at address zero to the
> > CPU boot code...
> >
> > Hence fixing this involves making sure that accessing FLASH and bringing
> > CPU cores online do not happen concurrently...
>
> Would deferring probe for CFI work?

Probe deferral of CFI FLASH would not help, as the FLASH is already
probed after secondary CPU startup.

Sequence of operations is:
  1. Map first page(s) of physical address space to RAM containing
     CPU startup code (not using the MMU, but using a special
     register in the SoC),
  2. Boot secondary CPU cores,
  3. Probe CFI-FLASH: fails, as accesses to the first page(s) of
     physical address space do not address the FLASH, but the CPU
     startup code in RAM.
  4. After boot, CPU cores can be offlined and onlined using CPU
      hotplug through sysfs.

When using "mtd-rom" instead of "cfi-flash", the system boots fine,
but the first page of /dev/mtd0 does not contain the FLASH contents,
but the CPU startup code, which I hadn't noticed originally...

Disabling the special mapping for the CPU startup code after all cores
have been brought up (in between steps 2 and 3) fixes the CFI-FLASH.
However, if a CPU core is offlined and onlined at run-time, the special
mapping has to be reinstantiated temporarily again, thus any FLASH
accesses (by other CPUs) must be prevented temporarily, too.

This issue is present on all Renesas SH/R-Mobile and R-Car SoCs
that support SMP. I am wondering if there are any other SMP SoCS that
suffer from similar issues, and that have solved them? I couldn't find
any in-tree DTS for a board with CFI-FLASH or mtd-rom at address zero
using an SMP ARM SoC...

I guess the simplest "solution" is to disable in DT any device at address zero
when SMP is used (when step 1 is executed)...

Thanks for your comments and suggestions!

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven Aug. 31, 2023, 11:19 a.m. UTC | #7
On Mon, Apr 10, 2023 at 11:25 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Fri, Apr 7, 2023 at 9:16 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> > On 4/3/23 17:29, Geert Uytterhoeven wrote:
> > > On Tue, Mar 21, 2023 at 4:01 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > >> On Mon, Mar 20, 2023 at 7:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > >>> On Mon, Mar 20, 2023 at 6:04 PM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> > >>>> Vignesh used to review CFI code, maybe he can intervene. I've never
> > >>>> worked with CFI, but I can try to help. I'll need more debug data though.
> > >>>>
> > >>>> On 3/20/23 16:43, Geert Uytterhoeven wrote:
> > >>>>> Add a device node for the Spansion S29GL512N NOR FLASH on the Marzen
> > >>>>> development board.
> > >>>>>
> > >>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > >>>>> ---
> > >>>>> Although the S29GL512N is a CFI FLASH, using "cfi-flash" instead of
> > >>>>> "mtd-rom" does not work:
> > >>>>>   1. Probing fails with "physmap-flash 0.flash: map_probe failed",
> > >>>>
> > >>>> I would first try to understand why the probe fails.
> > >
> > >> I suddenly remembered I have a different board (APE6-EVM), where
> > >> the CFI-FLASH stopped working after adding support for secondary
> > >> CPUs. I always thought that was a hardware quirk...
> > >>
> > >> Turns out the CFI-FLASH on Marzen (quad Cortex-A9) is detected when
> > >> booting with "nosmp":
> > >
> > >> My first guess was that the probing process is migrated to a different
> > >> CPU core during probing, but printing smp_processor_id() didn't
> > >> confirm that; it's just running on a different CPU core than CPU0.
> > >> Wrapping the body of cfi_qry_mode_on() inside a get_cpu()/put_cpu()
> > >> pair to prevent migration also didn't fix it.
> > >>
> > >> Is CFI-FLASH known-broken on SMP?
> > >
> > > After actively looking for more boards with CFI FLASHes, and finding one
> > > more board where FLASH probing fails on SMP, I dug deeper.
> > > Turns out they all have in common that (a) the CFI FLASH is located at
> > > physical address zero, and (b) the secondary CPU bringup code relies
> > > on mapping (by special hardware) the region at address zero to the
> > > CPU boot code...
> > >
> > > Hence fixing this involves making sure that accessing FLASH and bringing
> > > CPU cores online do not happen concurrently...
> >
> > Would deferring probe for CFI work?
>
> Probe deferral of CFI FLASH would not help, as the FLASH is already
> probed after secondary CPU startup.
>
> Sequence of operations is:
>   1. Map first page(s) of physical address space to RAM containing
>      CPU startup code (not using the MMU, but using a special
>      register in the SoC),
>   2. Boot secondary CPU cores,
>   3. Probe CFI-FLASH: fails, as accesses to the first page(s) of
>      physical address space do not address the FLASH, but the CPU
>      startup code in RAM.
>   4. After boot, CPU cores can be offlined and onlined using CPU
>       hotplug through sysfs.
>
> When using "mtd-rom" instead of "cfi-flash", the system boots fine,
> but the first page of /dev/mtd0 does not contain the FLASH contents,
> but the CPU startup code, which I hadn't noticed originally...
>
> Disabling the special mapping for the CPU startup code after all cores
> have been brought up (in between steps 2 and 3) fixes the CFI-FLASH.
> However, if a CPU core is offlined and onlined at run-time, the special
> mapping has to be reinstantiated temporarily again, thus any FLASH
> accesses (by other CPUs) must be prevented temporarily, too.
>
> This issue is present on all Renesas SH/R-Mobile and R-Car SoCs
> that support SMP. I am wondering if there are any other SMP SoCS that
> suffer from similar issues, and that have solved them? I couldn't find
> any in-tree DTS for a board with CFI-FLASH or mtd-rom at address zero
> using an SMP ARM SoC...
>
> I guess the simplest "solution" is to disable in DT any device at address zero
> when SMP is used (when step 1 is executed)...

Or reserve the boot area using request_mem_region().
I have posted an RFC patch series doing that:
"[PATCH/RFC 0/4] ARM: shmobile: Reserve boot area when SMP is enabled"
https://lore.kernel.org/all/cover.1693409184.git.geert+renesas@glider.be

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/arch/arm/boot/dts/r8a7779-marzen.dts b/arch/arm/boot/dts/r8a7779-marzen.dts
index fd40890bd77bc64c..632519edba6b5a47 100644
--- a/arch/arm/boot/dts/r8a7779-marzen.dts
+++ b/arch/arm/boot/dts/r8a7779-marzen.dts
@@ -26,6 +26,33 @@  chosen {
 		stdout-path = "serial0:115200n8";
 	};
 
+	flash@0 {
+		compatible = "mtd-rom";
+		reg = <0x0 0x04000000>;
+		bank-width = <2>;
+
+		partitions {
+			compatible = "fixed-partitions";
+			#address-cells = <1>;
+			#size-cells = <1>;
+
+			partition@0 {
+				label = "uboot";
+				reg = <0x00000000 0x00040000>;
+				read-only;
+			};
+			partition@40000 {
+				label = "uboot-env";
+				reg = <0x00040000 0x00040000>;
+				read-only;
+			};
+			partition@80000 {
+				label = "flash";
+				reg = <0x00080000 0x03f80000>;
+			};
+		};
+	};
+
 	memory@60000000 {
 		device_type = "memory";
 		reg = <0x60000000 0x40000000>;