diff mbox series

[v3,3/5] remoteproc: k3-r5: Add support for IPC-only mode for all R5Fs

Message ID 20220213201246.25952-4-s-anna@ti.com (mailing list archive)
State Accepted
Headers show
Series K3 R5F & DSP IPC-only mode support | expand

Commit Message

Suman Anna Feb. 13, 2022, 8:12 p.m. UTC
Add support to the K3 R5F remoteproc driver to configure all the R5F
cores to be either in IPC-only mode or the traditional remoteproc mode.
The IPC-only mode expects that the remote processors are already booted
by the bootloader, and only performs the minimum steps required to
initialize and deinitialize the virtio IPC transports. The remoteproc
mode allows the kernel remoteproc driver to do the regular load and
boot and other device management operations for a R5F core.

The IPC-only mode for a R5F core is detected and configured at driver
probe time by querying the System Firmware for the R5F power and reset
state and/or status and making sure that the R5F core is indeed started
by the bootloaders, otherwise the device is configured for remoteproc
mode.

Support for IPC-only mode is achieved through .attach(), .detach() and
.get_loaded_rsc_table() callback ops and zeroing out the regular rproc
ops .prepare(), .unprepare(), .start() and .stop(). The resource table
follows a design-by-contract approach and is expected to be at the base
of the DDR firmware region reserved for each remoteproc, it is mostly
expected to contain only the virtio device and trace resource entries.

NOTE:
The driver cannot configure a R5F core for remoteproc mode by any
means without rebooting the kernel if that R5F core has been started
by a bootloader. This is the current desired behavior and can be
enhanced in the future if the feature is needed.

Signed-off-by: Suman Anna <s-anna@ti.com>
---
v3: Revised version of v2 patch adjusted for dropping v2 Patch 1
 - Added rproc_detach logic in k3_r5_cluster_rproc_exit() before invoking
   rproc_del()
 - Added rproc_detach logic in cleanup path in k3_r5_cluster_rproc_exit()
   before invoking rproc_del()
v2: https://patchwork.kernel.org/project/linux-remoteproc/patch/20210723220248.6554-4-s-anna@ti.com/
v1: https://patchwork.kernel.org/project/linux-remoteproc/patch/20210522000309.26134-5-s-anna@ti.com/

 drivers/remoteproc/ti_k3_r5_remoteproc.c | 219 ++++++++++++++++++++++-
 1 file changed, 215 insertions(+), 4 deletions(-)

Comments

Jan Kiszka Nov. 2, 2023, 10:07 a.m. UTC | #1
On 13.02.22 21:12, Suman Anna wrote:
> Add support to the K3 R5F remoteproc driver to configure all the R5F
> cores to be either in IPC-only mode or the traditional remoteproc mode.
> The IPC-only mode expects that the remote processors are already booted
> by the bootloader, and only performs the minimum steps required to
> initialize and deinitialize the virtio IPC transports. The remoteproc
> mode allows the kernel remoteproc driver to do the regular load and
> boot and other device management operations for a R5F core.
> 
> The IPC-only mode for a R5F core is detected and configured at driver
> probe time by querying the System Firmware for the R5F power and reset
> state and/or status and making sure that the R5F core is indeed started
> by the bootloaders, otherwise the device is configured for remoteproc
> mode.
> 
> Support for IPC-only mode is achieved through .attach(), .detach() and
> .get_loaded_rsc_table() callback ops and zeroing out the regular rproc
> ops .prepare(), .unprepare(), .start() and .stop(). The resource table
> follows a design-by-contract approach and is expected to be at the base
> of the DDR firmware region reserved for each remoteproc, it is mostly
> expected to contain only the virtio device and trace resource entries.
> 
> NOTE:
> The driver cannot configure a R5F core for remoteproc mode by any
> means without rebooting the kernel if that R5F core has been started
> by a bootloader. This is the current desired behavior and can be
> enhanced in the future if the feature is needed.
> 

This change surfaced some complex issue in the K3 core: Turning on the
RTI1 watchdog also powers up R5F core 1. And this could happen either in
U-Boot or in the kernel. If the kernel finds the core running, it also
expects a resource table in the reserved RAM. But as the core is
supposed to start via remoteproc, there is none, rather often garbage.
Sometimes, a consistency check catches that, but not always:

[   38.372844] remoteproc remoteproc18: 41000000.r5f is available
[   38.380324] platform 41400000.r5f: R5F core may have been powered on by a different host, programmed state (0) != actual state (1)
[   38.394910] platform 41400000.r5f: configured R5F for IPC-only mode
[   38.401941] platform 41400000.r5f: assigned reserved memory node r5f-dma-memory@a1000000
[   38.476997] remoteproc remoteproc19: 41400000.r5f is available
[   38.484661] remoteproc remoteproc19: attaching to 41400000.r5f
[   38.491092] Unable to handle kernel paging request at virtual address ffff80000dffffff
[   38.503704] Mem abort info:
[   38.509760]   ESR = 0x0000000096000007
[   38.514071]   EC = 0x25: DABT (current EL), IL = 32 bits
[   38.519578]   SET = 0, FnV = 0
[   38.523095]   EA = 0, S1PTW = 0
[   38.526355]   FSC = 0x07: level 3 translation fault
[   38.528974] cal 6f03000.cal: Neither port is configured, no point in staying up
[   38.531775] Data abort info:
[   38.541866]   ISV = 0, ISS = 0x00000007
[   38.545765]   CM = 0, WnR = 0
[   38.548814] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082fdc000
[   38.555831] [ffff80000dffffff] pgd=10000008fffff003, p4d=10000008fffff003, pud=10000008ffffe003, pmd=1000000886609003, pte=0000000000000000
[   38.568623] remoteproc remoteproc18: powering up 41000000.r5f
[   38.569338] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[   38.574440] remoteproc remoteproc18: Booting fw image am65x-mcu-r5f0_0-fw, size 932
[   38.580644] Modules linked in: usbserial ti_cal videobuf2_dma_contig ti_k3_r5_remoteproc(+) videobuf2_memops pci_endpoint_test videobuf2_v4l2 rti_wdt watchdog videobuf2_common at24 st_lsm6dsx_i2c(+) optee_rng st_lsm6dsx kfifo_buf pm16d17 rng_core tee_stmm_efi tpm_ftpm_tee fuse dm_mod ip_tables x_tables ipv6
[   38.589862] remoteproc remoteproc18: remote processor 41000000.r5f is now up
[   38.615533] CPU: 1 PID: 283 Comm: (udev-worker) Not tainted 6.1.54-cip6 #1
[   38.615546] Hardware name: SIMATIC IOT2050 Advanced PG2 (DT)
[   38.615553] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   38.641945] pc : rproc_handle_resources.constprop.0+0x8c/0x164
[   38.647788] lr : rproc_boot+0x2fc/0x57c
[   38.651628] sp : ffff800009d53740
[   38.654934] x29: ffff800009d53740 x28: ffff00087f7d77f8 x27: ffff0008048f4c10
[   38.662068] x26: 0000000000000001 x25: ffffffffffffffff x24: ffff80000e000000
[   38.669199] x23: ffff00080227e038 x22: 0000000000000000 x21: ffff8000092bb1b0
[   38.676333] x20: ffff00080227e000 x19: 0000000000000000 x18: 000000000000028e
[   38.683464] x17: 0000000000000000 x16: 000000006d958cac x15: ffffffffffffffff
[   38.690597] x14: ffffffffffffffff x13: ffffffffffffffff x12: ffffffffffffffff
[   38.697728] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffffffffbfffffff
[   38.704862] x8 : ffffffffffffffff x7 : fffffdffffffffff x6 : ffffffffffdfffff
[   38.711994] x5 : ffff000802be1f00 x4 : ffff80000e000100 x3 : 00000000000000fd
[   38.719127] x2 : 00000000ffffffff x1 : ffff80000e000003 x0 : ffff80000e000000
[   38.726260] Call trace:
[   38.728703]  rproc_handle_resources.constprop.0+0x8c/0x164
[   38.734196]  rproc_boot+0x2fc/0x57c
[   38.737689]  rproc_add+0xcc/0x16c
[   38.741004]  k3_r5_probe+0x44c/0xe14 [ti_k3_r5_remoteproc]
[   38.746501]  platform_probe+0x68/0xc0
[   38.750168]  really_probe+0xbc/0x2dc
[   38.753742]  __driver_probe_device+0x78/0x114
[   38.758099]  driver_probe_device+0xd8/0x15c
[   38.762282]  __driver_attach+0x94/0x19c
[   38.766119]  bus_for_each_dev+0x74/0xd0
[   38.769954]  driver_attach+0x24/0x30
[   38.773529]  bus_add_driver+0x154/0x20c
[   38.777363]  driver_register+0x78/0x130
[   38.781198]  __platform_driver_register+0x28/0x34
[   38.785901]  k3_r5_rproc_driver_init+0x20/0x1000 [ti_k3_r5_remoteproc]
[   38.792437]  do_one_initcall+0x64/0x1dc
[   38.796272]  do_init_module+0x48/0x1d0
[   38.800023]  load_module+0x185c/0x1cc4
[   38.803770]  __do_sys_finit_module+0xa8/0xfc
[   38.808040]  __arm64_sys_finit_module+0x20/0x30
[   38.812571]  invoke_syscall+0x48/0x114
[   38.816320]  el0_svc_common.constprop.0+0xcc/0xec
[   38.821053]  do_el0_svc+0x2c/0xd0
[   38.821077]  el0_svc+0x2c/0x84
[   38.821095]  el0t_64_sync_handler+0xf4/0x120
[   38.831698]  el0t_64_sync+0x18c/0x190

(this crash was with a stable kernel, but same issue with head of tree)

This raises several questions:
 - Is it a hardware property that RTI1 powers up core 1 as well?
 - If so, how can we use both watchdog and remoteproc so that the latter
   loads the firmware for the former? We are currently doing that from 
   U-Boot, but what if that is not desired? Should the watchdog driver 
   take care to not leave core 1 in a different power state behind?
 - Can and should we do more while parsing the resource table to prevent
   such crashes?

Jan
Mathieu Poirier Nov. 2, 2023, 3:43 p.m. UTC | #2
Hi Jan,

On Thu, Nov 02, 2023 at 11:07:45AM +0100, Jan Kiszka wrote:
> On 13.02.22 21:12, Suman Anna wrote:
> > Add support to the K3 R5F remoteproc driver to configure all the R5F
> > cores to be either in IPC-only mode or the traditional remoteproc mode.
> > The IPC-only mode expects that the remote processors are already booted
> > by the bootloader, and only performs the minimum steps required to
> > initialize and deinitialize the virtio IPC transports. The remoteproc
> > mode allows the kernel remoteproc driver to do the regular load and
> > boot and other device management operations for a R5F core.
> > 
> > The IPC-only mode for a R5F core is detected and configured at driver
> > probe time by querying the System Firmware for the R5F power and reset
> > state and/or status and making sure that the R5F core is indeed started
> > by the bootloaders, otherwise the device is configured for remoteproc
> > mode.
> > 
> > Support for IPC-only mode is achieved through .attach(), .detach() and
> > .get_loaded_rsc_table() callback ops and zeroing out the regular rproc
> > ops .prepare(), .unprepare(), .start() and .stop(). The resource table
> > follows a design-by-contract approach and is expected to be at the base
> > of the DDR firmware region reserved for each remoteproc, it is mostly
> > expected to contain only the virtio device and trace resource entries.
> > 
> > NOTE:
> > The driver cannot configure a R5F core for remoteproc mode by any
> > means without rebooting the kernel if that R5F core has been started
> > by a bootloader. This is the current desired behavior and can be
> > enhanced in the future if the feature is needed.
> > 
> 
> This change surfaced some complex issue in the K3 core: Turning on the
> RTI1 watchdog also powers up R5F core 1. And this could happen either in

When writing "... also powers up...", other than R5F core 1, what else is being
powered?

> U-Boot or in the kernel. If the kernel finds the core running, it also
> expects a resource table in the reserved RAM. But as the core is
> supposed to start via remoteproc, there is none, rather often garbage.
> Sometimes, a consistency check catches that, but not always:
>

If I understand correct and strictly addressing the Linux case, the R5F is
configured to operate in split mode and both cores are off.  An RTI1 watchdog
happens, which has the side effect of turning on core 1.  At some later time core
1 is turned on from the sysfs interface, the remoteproc driver recognizes that
it is already powered and as such enacts the "attach" scenario.  That leads to a
crash because the resource table hasn't been loaded into memory.

Is this a proper description?

> [   38.372844] remoteproc remoteproc18: 41000000.r5f is available
> [   38.380324] platform 41400000.r5f: R5F core may have been powered on by a different host, programmed state (0) != actual state (1)
> [   38.394910] platform 41400000.r5f: configured R5F for IPC-only mode
> [   38.401941] platform 41400000.r5f: assigned reserved memory node r5f-dma-memory@a1000000
> [   38.476997] remoteproc remoteproc19: 41400000.r5f is available
> [   38.484661] remoteproc remoteproc19: attaching to 41400000.r5f
> [   38.491092] Unable to handle kernel paging request at virtual address ffff80000dffffff
> [   38.503704] Mem abort info:
> [   38.509760]   ESR = 0x0000000096000007
> [   38.514071]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   38.519578]   SET = 0, FnV = 0
> [   38.523095]   EA = 0, S1PTW = 0
> [   38.526355]   FSC = 0x07: level 3 translation fault
> [   38.528974] cal 6f03000.cal: Neither port is configured, no point in staying up
> [   38.531775] Data abort info:
> [   38.541866]   ISV = 0, ISS = 0x00000007
> [   38.545765]   CM = 0, WnR = 0
> [   38.548814] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082fdc000
> [   38.555831] [ffff80000dffffff] pgd=10000008fffff003, p4d=10000008fffff003, pud=10000008ffffe003, pmd=1000000886609003, pte=0000000000000000
> [   38.568623] remoteproc remoteproc18: powering up 41000000.r5f
> [   38.569338] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
> [   38.574440] remoteproc remoteproc18: Booting fw image am65x-mcu-r5f0_0-fw, size 932
> [   38.580644] Modules linked in: usbserial ti_cal videobuf2_dma_contig ti_k3_r5_remoteproc(+) videobuf2_memops pci_endpoint_test videobuf2_v4l2 rti_wdt watchdog videobuf2_common at24 st_lsm6dsx_i2c(+) optee_rng st_lsm6dsx kfifo_buf pm16d17 rng_core tee_stmm_efi tpm_ftpm_tee fuse dm_mod ip_tables x_tables ipv6
> [   38.589862] remoteproc remoteproc18: remote processor 41000000.r5f is now up
> [   38.615533] CPU: 1 PID: 283 Comm: (udev-worker) Not tainted 6.1.54-cip6 #1
> [   38.615546] Hardware name: SIMATIC IOT2050 Advanced PG2 (DT)
> [   38.615553] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   38.641945] pc : rproc_handle_resources.constprop.0+0x8c/0x164
> [   38.647788] lr : rproc_boot+0x2fc/0x57c
> [   38.651628] sp : ffff800009d53740
> [   38.654934] x29: ffff800009d53740 x28: ffff00087f7d77f8 x27: ffff0008048f4c10
> [   38.662068] x26: 0000000000000001 x25: ffffffffffffffff x24: ffff80000e000000
> [   38.669199] x23: ffff00080227e038 x22: 0000000000000000 x21: ffff8000092bb1b0
> [   38.676333] x20: ffff00080227e000 x19: 0000000000000000 x18: 000000000000028e
> [   38.683464] x17: 0000000000000000 x16: 000000006d958cac x15: ffffffffffffffff
> [   38.690597] x14: ffffffffffffffff x13: ffffffffffffffff x12: ffffffffffffffff
> [   38.697728] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffffffffbfffffff
> [   38.704862] x8 : ffffffffffffffff x7 : fffffdffffffffff x6 : ffffffffffdfffff
> [   38.711994] x5 : ffff000802be1f00 x4 : ffff80000e000100 x3 : 00000000000000fd
> [   38.719127] x2 : 00000000ffffffff x1 : ffff80000e000003 x0 : ffff80000e000000
> [   38.726260] Call trace:
> [   38.728703]  rproc_handle_resources.constprop.0+0x8c/0x164
> [   38.734196]  rproc_boot+0x2fc/0x57c
> [   38.737689]  rproc_add+0xcc/0x16c
> [   38.741004]  k3_r5_probe+0x44c/0xe14 [ti_k3_r5_remoteproc]
> [   38.746501]  platform_probe+0x68/0xc0
> [   38.750168]  really_probe+0xbc/0x2dc
> [   38.753742]  __driver_probe_device+0x78/0x114
> [   38.758099]  driver_probe_device+0xd8/0x15c
> [   38.762282]  __driver_attach+0x94/0x19c
> [   38.766119]  bus_for_each_dev+0x74/0xd0
> [   38.769954]  driver_attach+0x24/0x30
> [   38.773529]  bus_add_driver+0x154/0x20c
> [   38.777363]  driver_register+0x78/0x130
> [   38.781198]  __platform_driver_register+0x28/0x34
> [   38.785901]  k3_r5_rproc_driver_init+0x20/0x1000 [ti_k3_r5_remoteproc]
> [   38.792437]  do_one_initcall+0x64/0x1dc
> [   38.796272]  do_init_module+0x48/0x1d0
> [   38.800023]  load_module+0x185c/0x1cc4
> [   38.803770]  __do_sys_finit_module+0xa8/0xfc
> [   38.808040]  __arm64_sys_finit_module+0x20/0x30
> [   38.812571]  invoke_syscall+0x48/0x114
> [   38.816320]  el0_svc_common.constprop.0+0xcc/0xec
> [   38.821053]  do_el0_svc+0x2c/0xd0
> [   38.821077]  el0_svc+0x2c/0x84
> [   38.821095]  el0t_64_sync_handler+0xf4/0x120
> [   38.831698]  el0t_64_sync+0x18c/0x190
> 
> (this crash was with a stable kernel, but same issue with head of tree)
>

Right, stable or head the result would be the same.

> This raises several questions:
>  - Is it a hardware property that RTI1 powers up core 1 as well?

I will leave that question to the TI guys.

>  - If so, how can we use both watchdog and remoteproc so that the latter
>    loads the firmware for the former? We are currently doing that from 
>    U-Boot, but what if that is not desired? Should the watchdog driver 
>    take care to not leave core 1 in a different power state behind?

Making sure core1 is turned off by the watchdog driver is a solution but based
on how the HW is behaving and when the interrupt service routine runs, there
may be a race condition when core1 is genuinely enabled.

>  - Can and should we do more while parsing the resource table to prevent
>    such crashes?
> 

That's a tricky question.  The kernel's firmware subsystem ensures the validity
of the ELF image by looking at the image's magic number.  But for the attach()
case only the address of the resource table is provided, and that resource table
doesn't have a magic number.  As such I am not sure that is it possible to parse
the resource table that is provided while keeping things generic.  That said,
I'm open to ideas.

Since this is a platform problem I think the checks need to happen in
k3_r5_get_loaded_rsc_table().  I can't advise on what those should be since I do
not have the HW.

Thanks,
Mathieu

> Jan
> 
> -- 
> Siemens AG, Technology
> Linux Expert Center
>
Jan Kiszka Nov. 2, 2023, 4:43 p.m. UTC | #3
On 02.11.23 16:43, Mathieu Poirier wrote:
> Hi Jan,
> 
> On Thu, Nov 02, 2023 at 11:07:45AM +0100, Jan Kiszka wrote:
>> On 13.02.22 21:12, Suman Anna wrote:
>>> Add support to the K3 R5F remoteproc driver to configure all the R5F
>>> cores to be either in IPC-only mode or the traditional remoteproc mode.
>>> The IPC-only mode expects that the remote processors are already booted
>>> by the bootloader, and only performs the minimum steps required to
>>> initialize and deinitialize the virtio IPC transports. The remoteproc
>>> mode allows the kernel remoteproc driver to do the regular load and
>>> boot and other device management operations for a R5F core.
>>>
>>> The IPC-only mode for a R5F core is detected and configured at driver
>>> probe time by querying the System Firmware for the R5F power and reset
>>> state and/or status and making sure that the R5F core is indeed started
>>> by the bootloaders, otherwise the device is configured for remoteproc
>>> mode.
>>>
>>> Support for IPC-only mode is achieved through .attach(), .detach() and
>>> .get_loaded_rsc_table() callback ops and zeroing out the regular rproc
>>> ops .prepare(), .unprepare(), .start() and .stop(). The resource table
>>> follows a design-by-contract approach and is expected to be at the base
>>> of the DDR firmware region reserved for each remoteproc, it is mostly
>>> expected to contain only the virtio device and trace resource entries.
>>>
>>> NOTE:
>>> The driver cannot configure a R5F core for remoteproc mode by any
>>> means without rebooting the kernel if that R5F core has been started
>>> by a bootloader. This is the current desired behavior and can be
>>> enhanced in the future if the feature is needed.
>>>
>>
>> This change surfaced some complex issue in the K3 core: Turning on the
>> RTI1 watchdog also powers up R5F core 1. And this could happen either in
> 
> When writing "... also powers up...", other than R5F core 1, what else is being
> powered?

Would be a question for the SoC vendor - I assumed that only mcu_rti1
[1] goes on when enabling it. But also mcu_r5fss0_core1 is enabled after
that, at least according to the respective TI-SCI query that the is_on
handler is performing. I've tested that under Linux and in U-Boot.

> 
>> U-Boot or in the kernel. If the kernel finds the core running, it also
>> expects a resource table in the reserved RAM. But as the core is
>> supposed to start via remoteproc, there is none, rather often garbage.
>> Sometimes, a consistency check catches that, but not always:
>>
> 
> If I understand correct and strictly addressing the Linux case, the R5F is
> configured to operate in split mode and both cores are off.  An RTI1 watchdog
> happens, which has the side effect of turning on core 1.  At some later time core
> 1 is turned on from the sysfs interface, the remoteproc driver recognizes that
> it is already powered and as such enacts the "attach" scenario.  That leads to a
> crash because the resource table hasn't been loaded into memory.
> 
> Is this a proper description?

Almost: The watchdog device (rti_wdt [2]) is probed before
k3-r5-remoteproc. This comes with powering up rti1, and that turns on R5
core 1 as well. There is no watchdog fired.

After that, the k3-r5 driver probes the available cores, finds the
second one enabled, and goes down the IPC-only road for it.

> 
>> [   38.372844] remoteproc remoteproc18: 41000000.r5f is available
>> [   38.380324] platform 41400000.r5f: R5F core may have been powered on by a different host, programmed state (0) != actual state (1)
>> [   38.394910] platform 41400000.r5f: configured R5F for IPC-only mode
>> [   38.401941] platform 41400000.r5f: assigned reserved memory node r5f-dma-memory@a1000000
>> [   38.476997] remoteproc remoteproc19: 41400000.r5f is available
>> [   38.484661] remoteproc remoteproc19: attaching to 41400000.r5f
>> [   38.491092] Unable to handle kernel paging request at virtual address ffff80000dffffff
>> [   38.503704] Mem abort info:
>> [   38.509760]   ESR = 0x0000000096000007
>> [   38.514071]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [   38.519578]   SET = 0, FnV = 0
>> [   38.523095]   EA = 0, S1PTW = 0
>> [   38.526355]   FSC = 0x07: level 3 translation fault
>> [   38.528974] cal 6f03000.cal: Neither port is configured, no point in staying up
>> [   38.531775] Data abort info:
>> [   38.541866]   ISV = 0, ISS = 0x00000007
>> [   38.545765]   CM = 0, WnR = 0
>> [   38.548814] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082fdc000
>> [   38.555831] [ffff80000dffffff] pgd=10000008fffff003, p4d=10000008fffff003, pud=10000008ffffe003, pmd=1000000886609003, pte=0000000000000000
>> [   38.568623] remoteproc remoteproc18: powering up 41000000.r5f
>> [   38.569338] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
>> [   38.574440] remoteproc remoteproc18: Booting fw image am65x-mcu-r5f0_0-fw, size 932
>> [   38.580644] Modules linked in: usbserial ti_cal videobuf2_dma_contig ti_k3_r5_remoteproc(+) videobuf2_memops pci_endpoint_test videobuf2_v4l2 rti_wdt watchdog videobuf2_common at24 st_lsm6dsx_i2c(+) optee_rng st_lsm6dsx kfifo_buf pm16d17 rng_core tee_stmm_efi tpm_ftpm_tee fuse dm_mod ip_tables x_tables ipv6
>> [   38.589862] remoteproc remoteproc18: remote processor 41000000.r5f is now up
>> [   38.615533] CPU: 1 PID: 283 Comm: (udev-worker) Not tainted 6.1.54-cip6 #1
>> [   38.615546] Hardware name: SIMATIC IOT2050 Advanced PG2 (DT)
>> [   38.615553] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [   38.641945] pc : rproc_handle_resources.constprop.0+0x8c/0x164
>> [   38.647788] lr : rproc_boot+0x2fc/0x57c
>> [   38.651628] sp : ffff800009d53740
>> [   38.654934] x29: ffff800009d53740 x28: ffff00087f7d77f8 x27: ffff0008048f4c10
>> [   38.662068] x26: 0000000000000001 x25: ffffffffffffffff x24: ffff80000e000000
>> [   38.669199] x23: ffff00080227e038 x22: 0000000000000000 x21: ffff8000092bb1b0
>> [   38.676333] x20: ffff00080227e000 x19: 0000000000000000 x18: 000000000000028e
>> [   38.683464] x17: 0000000000000000 x16: 000000006d958cac x15: ffffffffffffffff
>> [   38.690597] x14: ffffffffffffffff x13: ffffffffffffffff x12: ffffffffffffffff
>> [   38.697728] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffffffffbfffffff
>> [   38.704862] x8 : ffffffffffffffff x7 : fffffdffffffffff x6 : ffffffffffdfffff
>> [   38.711994] x5 : ffff000802be1f00 x4 : ffff80000e000100 x3 : 00000000000000fd
>> [   38.719127] x2 : 00000000ffffffff x1 : ffff80000e000003 x0 : ffff80000e000000
>> [   38.726260] Call trace:
>> [   38.728703]  rproc_handle_resources.constprop.0+0x8c/0x164
>> [   38.734196]  rproc_boot+0x2fc/0x57c
>> [   38.737689]  rproc_add+0xcc/0x16c
>> [   38.741004]  k3_r5_probe+0x44c/0xe14 [ti_k3_r5_remoteproc]
>> [   38.746501]  platform_probe+0x68/0xc0
>> [   38.750168]  really_probe+0xbc/0x2dc
>> [   38.753742]  __driver_probe_device+0x78/0x114
>> [   38.758099]  driver_probe_device+0xd8/0x15c
>> [   38.762282]  __driver_attach+0x94/0x19c
>> [   38.766119]  bus_for_each_dev+0x74/0xd0
>> [   38.769954]  driver_attach+0x24/0x30
>> [   38.773529]  bus_add_driver+0x154/0x20c
>> [   38.777363]  driver_register+0x78/0x130
>> [   38.781198]  __platform_driver_register+0x28/0x34
>> [   38.785901]  k3_r5_rproc_driver_init+0x20/0x1000 [ti_k3_r5_remoteproc]
>> [   38.792437]  do_one_initcall+0x64/0x1dc
>> [   38.796272]  do_init_module+0x48/0x1d0
>> [   38.800023]  load_module+0x185c/0x1cc4
>> [   38.803770]  __do_sys_finit_module+0xa8/0xfc
>> [   38.808040]  __arm64_sys_finit_module+0x20/0x30
>> [   38.812571]  invoke_syscall+0x48/0x114
>> [   38.816320]  el0_svc_common.constprop.0+0xcc/0xec
>> [   38.821053]  do_el0_svc+0x2c/0xd0
>> [   38.821077]  el0_svc+0x2c/0x84
>> [   38.821095]  el0t_64_sync_handler+0xf4/0x120
>> [   38.831698]  el0t_64_sync+0x18c/0x190
>>
>> (this crash was with a stable kernel, but same issue with head of tree)
>>
> 
> Right, stable or head the result would be the same.
> 
>> This raises several questions:
>>  - Is it a hardware property that RTI1 powers up core 1 as well?
> 
> I will leave that question to the TI guys.
> 
>>  - If so, how can we use both watchdog and remoteproc so that the latter
>>    loads the firmware for the former? We are currently doing that from 
>>    U-Boot, but what if that is not desired? Should the watchdog driver 
>>    take care to not leave core 1 in a different power state behind?
> 
> Making sure core1 is turned off by the watchdog driver is a solution but based
> on how the HW is behaving and when the interrupt service routine runs, there
> may be a race condition when core1 is genuinely enabled.

Yes, that worries me as well. In Linux, watchdog and the R5 cores have
no explicit dependency, although you need a firmware on the cores so
that the watchdog event is handled (it does not trigger a hw reset
directly, sadly, that's why there is [3] eg.). That could also cause
probing to happen in parallel, in theory.

> 
>>  - Can and should we do more while parsing the resource table to prevent
>>    such crashes?
>>
> 
> That's a tricky question.  The kernel's firmware subsystem ensures the validity
> of the ELF image by looking at the image's magic number.  But for the attach()
> case only the address of the resource table is provided, and that resource table
> doesn't have a magic number.  As such I am not sure that is it possible to parse
> the resource table that is provided while keeping things generic.  That said,
> I'm open to ideas.
> 
> Since this is a platform problem I think the checks need to happen in
> k3_r5_get_loaded_rsc_table().  I can't advise on what those should be since I do
> not have the HW.

Sure, thanks nevertheless.

Jan

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/ti/k3-am65-mcu.dtsi?h=v6.6#n432
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/watchdog/rti_wdt.c?h=v6.6
[3] https://github.com/siemens/k3-rti-wdt
Hari Nagalla Dec. 8, 2023, 11:21 a.m. UTC | #4
On 11/2/23 11:43, Jan Kiszka wrote:
>>> RTI1 watchdog also powers up R5F core 1. And this could happen either in
>> When writing "... also powers up...", other than R5F core 1, what else is being
>> powered?
> Would be a question for the SoC vendor - I assumed that only mcu_rti1
> [1] goes on when enabling it. But also mcu_r5fss0_core1 is enabled after
> that, at least according to the respective TI-SCI query that the is_on
> handler is performing. I've tested that under Linux and in U-Boot.
> 
As described in section 12.5.2.1 of AM64x TRM 
(https://www.ti.com/lit/pdf/SPRUIM2) -There is a RTI for each CPU core. 
And it is not intended to be use RTI provisioned for a particular CPU 
core with a different core.
And also as shown in section (5.2.2.2.1.3.1) the CPU core and 
corresponding RTI share the same power sub module (LPSC), so enabling 
one powers on the other.

As Suman suggested, it seems more appropriate to enable the RTI watchdog 
timers in the remoteproc driver. Legacy omap remoteproc drivers have 
this support and needs to be extended to k3 remoteproc drivers.
Another option could be to DEFER RTI probe until corresponding 
remoteproc driver is probed.

Any other solutions to maintain this order of enabling remote core and 
the corresponding RTI/WDT?
diff mbox series

Patch

diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index ff4e1fac1c7f..4840ad906018 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -428,6 +428,7 @@  static int k3_r5_rproc_request_mbox(struct rproc *rproc)
  * private to each core. Only Core0 needs to be unhalted for running the
  * cluster in this mode. The function uses the same reset logic as LockStep
  * mode for this (though the behavior is agnostic of the reset release order).
+ * This callback is invoked only in remoteproc mode.
  */
 static int k3_r5_rproc_prepare(struct rproc *rproc)
 {
@@ -493,7 +494,8 @@  static int k3_r5_rproc_prepare(struct rproc *rproc)
  * both cores. The access is made possible only with releasing the resets for
  * both cores, but with only Core0 unhalted. This function re-uses the same
  * reset assert logic as LockStep mode for this mode (though the behavior is
- * agnostic of the reset assert order).
+ * agnostic of the reset assert order). This callback is invoked only in
+ * remoteproc mode.
  */
 static int k3_r5_rproc_unprepare(struct rproc *rproc)
 {
@@ -527,7 +529,8 @@  static int k3_r5_rproc_unprepare(struct rproc *rproc)
  *
  * The Single-CPU mode on applicable SoCs (eg: AM64x) only uses Core0 to execute
  * code, so only Core0 needs to be unhalted. The function uses the same logic
- * flow as Split-mode for this.
+ * flow as Split-mode for this. This callback is invoked only in remoteproc
+ * mode.
  */
 static int k3_r5_rproc_start(struct rproc *rproc)
 {
@@ -598,7 +601,8 @@  static int k3_r5_rproc_start(struct rproc *rproc)
  * be done here, but is preferred to be done in the .unprepare() ops - this
  * maintains the symmetric behavior between the .start(), .stop(), .prepare()
  * and .unprepare() ops, and also balances them well between sysfs 'state'
- * flow and device bind/unbind or module removal.
+ * flow and device bind/unbind or module removal. This callback is invoked
+ * only in remoteproc mode.
  */
 static int k3_r5_rproc_stop(struct rproc *rproc)
 {
@@ -635,6 +639,78 @@  static int k3_r5_rproc_stop(struct rproc *rproc)
 	return ret;
 }
 
+/*
+ * Attach to a running R5F remote processor (IPC-only mode)
+ *
+ * The R5F attach callback only needs to request the mailbox, the remote
+ * processor is already booted, so there is no need to issue any TI-SCI
+ * commands to boot the R5F cores in IPC-only mode. This callback is invoked
+ * only in IPC-only mode.
+ */
+static int k3_r5_rproc_attach(struct rproc *rproc)
+{
+	struct k3_r5_rproc *kproc = rproc->priv;
+	struct device *dev = kproc->dev;
+	int ret;
+
+	ret = k3_r5_rproc_request_mbox(rproc);
+	if (ret)
+		return ret;
+
+	dev_info(dev, "R5F core initialized in IPC-only mode\n");
+	return 0;
+}
+
+/*
+ * Detach from a running R5F remote processor (IPC-only mode)
+ *
+ * The R5F detach callback performs the opposite operation to attach callback
+ * and only needs to release the mailbox, the R5F cores are not stopped and
+ * will be left in booted state in IPC-only mode. This callback is invoked
+ * only in IPC-only mode.
+ */
+static int k3_r5_rproc_detach(struct rproc *rproc)
+{
+	struct k3_r5_rproc *kproc = rproc->priv;
+	struct device *dev = kproc->dev;
+
+	mbox_free_channel(kproc->mbox);
+	dev_info(dev, "R5F core deinitialized in IPC-only mode\n");
+	return 0;
+}
+
+/*
+ * This function implements the .get_loaded_rsc_table() callback and is used
+ * to provide the resource table for the booted R5F in IPC-only mode. The K3 R5F
+ * firmwares follow a design-by-contract approach and are expected to have the
+ * resource table at the base of the DDR region reserved for firmware usage.
+ * This provides flexibility for the remote processor to be booted by different
+ * bootloaders that may or may not have the ability to publish the resource table
+ * address and size through a DT property. This callback is invoked only in
+ * IPC-only mode.
+ */
+static struct resource_table *k3_r5_get_loaded_rsc_table(struct rproc *rproc,
+							 size_t *rsc_table_sz)
+{
+	struct k3_r5_rproc *kproc = rproc->priv;
+	struct device *dev = kproc->dev;
+
+	if (!kproc->rmem[0].cpu_addr) {
+		dev_err(dev, "memory-region #1 does not exist, loaded rsc table can't be found");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/*
+	 * NOTE: The resource table size is currently hard-coded to a maximum
+	 * of 256 bytes. The most common resource table usage for K3 firmwares
+	 * is to only have the vdev resource entry and an optional trace entry.
+	 * The exact size could be computed based on resource table address, but
+	 * the hard-coded value suffices to support the IPC-only mode.
+	 */
+	*rsc_table_sz = 256;
+	return (struct resource_table *)kproc->rmem[0].cpu_addr;
+}
+
 /*
  * Internal Memory translation helper
  *
@@ -1014,6 +1090,116 @@  static void k3_r5_adjust_tcm_sizes(struct k3_r5_rproc *kproc)
 	}
 }
 
+/*
+ * This function checks and configures a R5F core for IPC-only or remoteproc
+ * mode. The driver is configured to be in IPC-only mode for a R5F core when
+ * the core has been loaded and started by a bootloader. The IPC-only mode is
+ * detected by querying the System Firmware for reset, power on and halt status
+ * and ensuring that the core is running. Any incomplete steps at bootloader
+ * are validated and errored out.
+ *
+ * In IPC-only mode, the driver state flags for ATCM, BTCM and LOCZRAMA settings
+ * and cluster mode parsed originally from kernel DT are updated to reflect the
+ * actual values configured by bootloader. The driver internal device memory
+ * addresses for TCMs are also updated.
+ */
+static int k3_r5_rproc_configure_mode(struct k3_r5_rproc *kproc)
+{
+	struct k3_r5_cluster *cluster = kproc->cluster;
+	struct k3_r5_core *core = kproc->core;
+	struct device *cdev = core->dev;
+	bool r_state = false, c_state = false;
+	u32 ctrl = 0, cfg = 0, stat = 0, halted = 0;
+	u64 boot_vec = 0;
+	u32 atcm_enable, btcm_enable, loczrama;
+	struct k3_r5_core *core0;
+	enum cluster_mode mode;
+	int ret;
+
+	core0 = list_first_entry(&cluster->cores, struct k3_r5_core, elem);
+
+	ret = core->ti_sci->ops.dev_ops.is_on(core->ti_sci, core->ti_sci_id,
+					      &r_state, &c_state);
+	if (ret) {
+		dev_err(cdev, "failed to get initial state, mode cannot be determined, ret = %d\n",
+			ret);
+		return ret;
+	}
+	if (r_state != c_state) {
+		dev_warn(cdev, "R5F core may have been powered on by a different host, programmed state (%d) != actual state (%d)\n",
+			 r_state, c_state);
+	}
+
+	ret = reset_control_status(core->reset);
+	if (ret < 0) {
+		dev_err(cdev, "failed to get initial local reset status, ret = %d\n",
+			ret);
+		return ret;
+	}
+
+	ret = ti_sci_proc_get_status(core->tsp, &boot_vec, &cfg, &ctrl,
+				     &stat);
+	if (ret < 0) {
+		dev_err(cdev, "failed to get initial processor status, ret = %d\n",
+			ret);
+		return ret;
+	}
+	atcm_enable = cfg & PROC_BOOT_CFG_FLAG_R5_ATCM_EN ?  1 : 0;
+	btcm_enable = cfg & PROC_BOOT_CFG_FLAG_R5_BTCM_EN ?  1 : 0;
+	loczrama = cfg & PROC_BOOT_CFG_FLAG_R5_TCM_RSTBASE ?  1 : 0;
+	if (cluster->soc_data->single_cpu_mode) {
+		mode = cfg & PROC_BOOT_CFG_FLAG_R5_SINGLE_CORE ?
+				CLUSTER_MODE_SINGLECPU : CLUSTER_MODE_SPLIT;
+	} else {
+		mode = cfg & PROC_BOOT_CFG_FLAG_R5_LOCKSTEP ?
+				CLUSTER_MODE_LOCKSTEP : CLUSTER_MODE_SPLIT;
+	}
+	halted = ctrl & PROC_BOOT_CTRL_FLAG_R5_CORE_HALT;
+
+	/*
+	 * IPC-only mode detection requires both local and module resets to
+	 * be deasserted and R5F core to be unhalted. Local reset status is
+	 * irrelevant if module reset is asserted (POR value has local reset
+	 * deasserted), and is deemed as remoteproc mode
+	 */
+	if (c_state && !ret && !halted) {
+		dev_info(cdev, "configured R5F for IPC-only mode\n");
+		kproc->rproc->state = RPROC_DETACHED;
+		ret = 1;
+		/* override rproc ops with only required IPC-only mode ops */
+		kproc->rproc->ops->prepare = NULL;
+		kproc->rproc->ops->unprepare = NULL;
+		kproc->rproc->ops->start = NULL;
+		kproc->rproc->ops->stop = NULL;
+		kproc->rproc->ops->attach = k3_r5_rproc_attach;
+		kproc->rproc->ops->detach = k3_r5_rproc_detach;
+		kproc->rproc->ops->get_loaded_rsc_table =
+						k3_r5_get_loaded_rsc_table;
+	} else if (!c_state) {
+		dev_info(cdev, "configured R5F for remoteproc mode\n");
+		ret = 0;
+	} else {
+		dev_err(cdev, "mismatched mode: local_reset = %s, module_reset = %s, core_state = %s\n",
+			!ret ? "deasserted" : "asserted",
+			c_state ? "deasserted" : "asserted",
+			halted ? "halted" : "unhalted");
+		ret = -EINVAL;
+	}
+
+	/* fixup TCMs, cluster & core flags to actual values in IPC-only mode */
+	if (ret > 0) {
+		if (core == core0)
+			cluster->mode = mode;
+		core->atcm_enable = atcm_enable;
+		core->btcm_enable = btcm_enable;
+		core->loczrama = loczrama;
+		core->mem[0].dev_addr = loczrama ? 0 : K3_R5_TCM_DEV_ADDR;
+		core->mem[1].dev_addr = loczrama ? K3_R5_TCM_DEV_ADDR : 0;
+	}
+
+	return ret;
+}
+
 static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
 {
 	struct k3_r5_cluster *cluster = platform_get_drvdata(pdev);
@@ -1023,7 +1209,7 @@  static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
 	struct device *cdev;
 	const char *fw_name;
 	struct rproc *rproc;
-	int ret;
+	int ret, ret1;
 
 	core1 = list_last_entry(&cluster->cores, struct k3_r5_core, elem);
 	list_for_each_entry(core, &cluster->cores, elem) {
@@ -1054,6 +1240,12 @@  static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
 		kproc->rproc = rproc;
 		core->rproc = rproc;
 
+		ret = k3_r5_rproc_configure_mode(kproc);
+		if (ret < 0)
+			goto err_config;
+		if (ret)
+			goto init_rmem;
+
 		ret = k3_r5_rproc_configure(kproc);
 		if (ret) {
 			dev_err(dev, "initial configure failed, ret = %d\n",
@@ -1061,6 +1253,7 @@  static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
 			goto err_config;
 		}
 
+init_rmem:
 		k3_r5_adjust_tcm_sizes(kproc);
 
 		ret = k3_r5_reserved_mem_init(kproc);
@@ -1085,6 +1278,15 @@  static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
 	return 0;
 
 err_split:
+	if (rproc->state == RPROC_ATTACHED) {
+		ret1 = rproc_detach(rproc);
+		if (ret1) {
+			dev_err(kproc->dev, "failed to detach rproc, ret = %d\n",
+				ret1);
+			return ret1;
+		}
+	}
+
 	rproc_del(rproc);
 err_add:
 	k3_r5_reserved_mem_exit(kproc);
@@ -1108,6 +1310,7 @@  static void k3_r5_cluster_rproc_exit(void *data)
 	struct k3_r5_rproc *kproc;
 	struct k3_r5_core *core;
 	struct rproc *rproc;
+	int ret;
 
 	/*
 	 * lockstep mode and single-cpu modes have only one rproc associated
@@ -1123,6 +1326,14 @@  static void k3_r5_cluster_rproc_exit(void *data)
 		rproc = core->rproc;
 		kproc = rproc->priv;
 
+		if (rproc->state == RPROC_ATTACHED) {
+			ret = rproc_detach(rproc);
+			if (ret) {
+				dev_err(kproc->dev, "failed to detach rproc, ret = %d\n", ret);
+				return;
+			}
+		}
+
 		rproc_del(rproc);
 
 		k3_r5_reserved_mem_exit(kproc);