mbox series

[v2,0/5] riscv: add PREEMPT_RT support

Message ID 20220831175920.2806-1-jszhang@kernel.org (mailing list archive)
Headers show
Series riscv: add PREEMPT_RT support | expand

Message

Jisheng Zhang Aug. 31, 2022, 5:59 p.m. UTC
This series is to add PREEMPT_RT support to riscv:
patch1 adds the missing number of signal exits in vCPU stat
patch2 switches to the generic guest entry infrastructure
patch3 select HAVE_POSIX_CPU_TIMERS_TASK_WORK which is a requirement for
RT
patch4 adds lazy preempt support
patch5 allows to enable PREEMPT_RT

I assume patch1, patch2 and patch3 can be reviewed and merged for
riscv-next, patch4 and patch5 can be reviewed and maintained in rt tree,
and finally merged once the remaining patches in rt tree are all
mainlined.

Since v1:
  - send to related maillist, I press ENTER too quickly when sending v1
  - remove the signal_pending() handling because that's covered by
    generic guest entry infrastructure

Jisheng Zhang (5):
  RISC-V: KVM: Record number of signal exits as a vCPU stat
  RISC-V: KVM: Use generic guest entry infrastructure
  riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
  riscv: add lazy preempt support
  riscv: Allow to enable RT

 arch/riscv/Kconfig                   |  3 +++
 arch/riscv/include/asm/kvm_host.h    |  1 +
 arch/riscv/include/asm/thread_info.h |  7 +++++--
 arch/riscv/kernel/asm-offsets.c      |  1 +
 arch/riscv/kernel/entry.S            |  9 +++++++--
 arch/riscv/kvm/Kconfig               |  1 +
 arch/riscv/kvm/vcpu.c                | 18 +++++++-----------
 7 files changed, 25 insertions(+), 15 deletions(-)

Comments

Sebastian Andrzej Siewior Sept. 1, 2022, 7:04 a.m. UTC | #1
On 2022-09-01 01:59:15 [+0800], Jisheng Zhang wrote:
> I assume patch1, patch2 and patch3 can be reviewed and merged for
> riscv-next, patch4 and patch5 can be reviewed and maintained in rt tree,
> and finally merged once the remaining patches in rt tree are all
> mainlined.

I would say so, yes.

What about JUMP_LABEL support? Do you halt all CPUs while patching the
code?

Sebastian
Jisheng Zhang Sept. 1, 2022, 1:44 p.m. UTC | #2
On Thu, Sep 01, 2022 at 09:04:05AM +0200, Sebastian Andrzej Siewior wrote:
> On 2022-09-01 01:59:15 [+0800], Jisheng Zhang wrote:
> > I assume patch1, patch2 and patch3 can be reviewed and merged for
> > riscv-next, patch4 and patch5 can be reviewed and maintained in rt tree,
> > and finally merged once the remaining patches in rt tree are all
> > mainlined.
> 
> I would say so, yes.
> 
> What about JUMP_LABEL support? Do you halt all CPUs while patching the
> code?
> 

FWICT, riscv JUMP_LABEL implementation doesn't rely on stop all cpus while
patching text.

Thanks
Conor Dooley Sept. 1, 2022, 4:41 p.m. UTC | #3
On 31/08/2022 18:59, Jisheng Zhang wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> This series is to add PREEMPT_RT support to riscv:
> patch1 adds the missing number of signal exits in vCPU stat
> patch2 switches to the generic guest entry infrastructure
> patch3 select HAVE_POSIX_CPU_TIMERS_TASK_WORK which is a requirement for
> RT
> patch4 adds lazy preempt support
> patch5 allows to enable PREEMPT_RT
> 

What version of the preempt_rt patch did you test this with?

Maybe I am missing something, but I gave this a whirl with
v6.0-rc3 + v6.0-rc3-rt5 & was meant by a bunch of complaints.
I am not familiar with the preempt_rt patch, so I am not sure what
level of BUG()s or WARNING()s are to be expected, but I saw a fair
few...

Thanks,
Conor.



> I assume patch1, patch2 and patch3 can be reviewed and merged for
> riscv-next, patch4 and patch5 can be reviewed and maintained in rt tree,
> and finally merged once the remaining patches in rt tree are all
> mainlined.
> 
> Since v1:
>   - send to related maillist, I press ENTER too quickly when sending v1
>   - remove the signal_pending() handling because that's covered by
>     generic guest entry infrastructure
> 
> Jisheng Zhang (5):
>   RISC-V: KVM: Record number of signal exits as a vCPU stat
>   RISC-V: KVM: Use generic guest entry infrastructure
>   riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
>   riscv: add lazy preempt support
>   riscv: Allow to enable RT
> 
>  arch/riscv/Kconfig                   |  3 +++
>  arch/riscv/include/asm/kvm_host.h    |  1 +
>  arch/riscv/include/asm/thread_info.h |  7 +++++--
>  arch/riscv/kernel/asm-offsets.c      |  1 +
>  arch/riscv/kernel/entry.S            |  9 +++++++--
>  arch/riscv/kvm/Kconfig               |  1 +
>  arch/riscv/kvm/vcpu.c                | 18 +++++++-----------
>  7 files changed, 25 insertions(+), 15 deletions(-)
> 
> --
> 2.34.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Jisheng Zhang Sept. 2, 2022, 1:09 p.m. UTC | #4
On Thu, Sep 01, 2022 at 04:41:52PM +0000, Conor.Dooley@microchip.com wrote:
> On 31/08/2022 18:59, Jisheng Zhang wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > This series is to add PREEMPT_RT support to riscv:
> > patch1 adds the missing number of signal exits in vCPU stat
> > patch2 switches to the generic guest entry infrastructure
> > patch3 select HAVE_POSIX_CPU_TIMERS_TASK_WORK which is a requirement for
> > RT
> > patch4 adds lazy preempt support
> > patch5 allows to enable PREEMPT_RT
> > 
> 
> What version of the preempt_rt patch did you test this with?

v6.0-rc1 + v6.0-rc1-rt patch

> 
> Maybe I am missing something, but I gave this a whirl with
> v6.0-rc3 + v6.0-rc3-rt5 & was meant by a bunch of complaints.
> I am not familiar with the preempt_rt patch, so I am not sure what
> level of BUG()s or WARNING()s are to be expected, but I saw a fair
> few...

Could you please provide corresponding log? Usually, this means there's
a bug in related drivers, so it's better to fix them now rather than
wait for RT patches mainlined.

PS: which HW are you using?

Thanks
Conor Dooley Sept. 2, 2022, 1:29 p.m. UTC | #5
On 02/09/2022 14:09, Jisheng Zhang wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Thu, Sep 01, 2022 at 04:41:52PM +0000, Conor.Dooley@microchip.com wrote:
>> On 31/08/2022 18:59, Jisheng Zhang wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> This series is to add PREEMPT_RT support to riscv:
>>> patch1 adds the missing number of signal exits in vCPU stat
>>> patch2 switches to the generic guest entry infrastructure
>>> patch3 select HAVE_POSIX_CPU_TIMERS_TASK_WORK which is a requirement for
>>> RT
>>> patch4 adds lazy preempt support
>>> patch5 allows to enable PREEMPT_RT
>>>
>>
>> What version of the preempt_rt patch did you test this with?
> 
> v6.0-rc1 + v6.0-rc1-rt patch
> 
>>
>> Maybe I am missing something, but I gave this a whirl with
>> v6.0-rc3 + v6.0-rc3-rt5 & was meant by a bunch of complaints.
>> I am not familiar with the preempt_rt patch, so I am not sure what
>> level of BUG()s or WARNING()s are to be expected, but I saw a fair
>> few...
> 
> Could you please provide corresponding log? Usually, this means there's
> a bug in related drivers, so it's better to fix them now rather than
> wait for RT patches mainlined.

I tried it on PolarFire SoC. I know that at least one of the problems
I found is down to drivers - specifically the system controller & hwrng.

The first issue that comes up is in early smp setup code - we call out
to update_siblings_masks() which does an alloc with preemption. It's
the same backtrace from here:

https://lore.kernel.org/all/0abd0acf-70a1-d546-a517-19efe60042d1@microchip.com/

I'll give it a run through tonight or tomorrow & give you a full log
of what I saw. There's some splats all over the place for me, but I
can't tell if that's just knock-on from the other issues.

Thanks,
Conor.
Sebastian Andrzej Siewior Nov. 11, 2022, 2:32 p.m. UTC | #6
On 2022-09-02 13:29:23 [+0000], Conor.Dooley@microchip.com wrote:
> I'll give it a run through tonight or tomorrow & give you a full log
> of what I saw. There's some splats all over the place for me, but I
> can't tell if that's just knock-on from the other issues.

Is there an update to this or the series as a whole?

> Thanks,
> Conor.

Sebastian
Conor Dooley Nov. 11, 2022, 2:34 p.m. UTC | #7
On 11/11/2022 14:32, Sebastian Andrzej Siewior wrote:
> On 2022-09-02 13:29:23 [+0000], Conor.Dooley@microchip.com wrote:
>> I'll give it a run through tonight or tomorrow & give you a full log
>> of what I saw. There's some splats all over the place for me, but I
>> can't tell if that's just knock-on from the other issues.
> 
> Is there an update to this or the series as a whole?

Not from me.. completely forgot about it.
I'll put it back in my todo list, sorry.
Conor Dooley Nov. 12, 2022, 9:40 p.m. UTC | #8
On 11/11/2022 14:34, Conor Dooley - M52691 wrote:
> On 11/11/2022 14:32, Sebastian Andrzej Siewior wrote:
>> On 2022-09-02 13:29:23 [+0000], Conor.Dooley@microchip.com wrote:
>>> I'll give it a run through tonight or tomorrow & give you a full log
>>> of what I saw. There's some splats all over the place for me, but I
>>> can't tell if that's just knock-on from the other issues.
>>
>> Is there an update to this or the series as a whole?
> 
> Not from me.. completely forgot about it.
> I'll put it back in my todo list, sorry.
> 

I tried out v6.0.5-rc5 + this patchset (it doesnt apply to v6.1-rcN)
and rt14, got the following:
[    4.036667] smp: Bringing up secondary CPUs ...
[    4.069365] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
[    4.069389] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[    4.069410] preempt_count: 1, expected: 0
[    4.069422] RCU nest depth: 1, expected: 1
[    4.069434] 3 locks held by swapper/1/0:
[    4.069449]  #0: ffffffd82cda3b58 (&pcp->lock){+.+.}-{2:2}, at: get_page_from_freelist+0x220/0x1094
[    4.069537]  #1: ffffffff8129b178 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x2e
[    4.069602]  #2: ffffffff813a3e38 (&zone->lock){+.+.}-{2:2}, at: __rmqueue_pcplist+0x156/0xc28
[    4.069662] irq event stamp: 0
[    4.069670] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    4.069690] hardirqs last disabled at (0): [<ffffffff8000f0c8>] copy_process+0x50c/0xdaa
[    4.069727] softirqs last  enabled at (0): [<ffffffff8000f0d6>] copy_process+0x51a/0xdaa
[    4.069756] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    4.069776] Preemption disabled at:
[    4.069782] [<ffffffff80041346>] migrate_enable+0x32/0x124
[    4.069819] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.0.5-rt14-00006-g0fda08a972f4-dirty #1
[    4.069848] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
[    4.069861] Call Trace:
[    4.069870] [<ffffffff80006628>] show_stack+0x2c/0x38
[    4.069907] [<ffffffff80900ad4>] dump_stack_lvl+0x64/0x86
[    4.069935] [<ffffffff80900b0a>] dump_stack+0x14/0x1c
[    4.069959] [<ffffffff80047534>] __might_resched+0x1bc/0x1c6
[    4.069995] [<ffffffff80908f7a>] rt_spin_lock+0x42/0xb8
[    4.070033] [<ffffffff801cab7a>] __rmqueue_pcplist+0x156/0xc28
[    4.070061] [<ffffffff801cbade>] get_page_from_freelist+0x24e/0x1094
[    4.070088] [<ffffffff801cb712>] __alloc_pages+0xc6/0x244
[    4.070113] [<ffffffff801ede42>] new_slab+0x8c/0x4a8
[    4.070153] [<ffffffff801e955a>] ___slab_alloc+0x5d4/0x9a4
[    4.070181] [<ffffffff801ea206>] __kmalloc+0xc0/0x1fc
[    4.070209] [<ffffffff80578296>] detect_cache_attributes+0xb4/0x470
[    4.070238] [<ffffffff80590520>] update_siblings_masks+0x2c/0x202
[    4.070270] [<ffffffff80590aa0>] store_cpu_topology+0x30/0x6a
[    4.070296] [<ffffffff80007756>] smp_callin+0x38/0x66
[    4.231582] smp: Brought up 1 node, 4 CPUs

There's other stuff that goes awry later on too:
https://gist.githubusercontent.com/ConchuOD/47fd47dfa1f49eb4b0f2fb2a68852a7c/raw/b109b83eec6caa1d67cbb156c6f3e671c10aefe9/gistfile1.txt

The SDHCI stuff I am seeing without rt & in v6.1-rc4 so is
unrelated, but the rest resembles what I saw previously.
idk anything about -rt so if there's something blatant that
I've missed here, please lmk.
Tobias Schaffner March 14, 2023, 1:07 p.m. UTC | #9
On 31/08/2022 18:59, Jisheng Zhang wrote:
 > This series is to add PREEMPT_RT support to riscv:
 > patch1 adds the missing number of signal exits in vCPU stat
 > patch2 switches to the generic guest entry infrastructure
 > patch3 select HAVE_POSIX_CPU_TIMERS_TASK_WORK which is a requirement for
 > RT
 > patch4 adds lazy preempt support
 > patch5 allows to enable PREEMPT_RT
 >
 > I assume patch1, patch2 and patch3 can be reviewed and merged for
 > riscv-next, patch4 and patch5 can be reviewed and maintained in rt tree,
 > and finally merged once the remaining patches in rt tree are all
 > mainlined.

I tested the last two patches on a StarFive VisionFive V2 (DT) board 
with 6.1.12-rt7-gdfa52cc14f3b today and the results looked pretty good 
for a first run.

root@StarFive:~# lscpu
Architecture:          riscv64
   Byte Order:          Little Endian
CPU(s):                4
   On-line CPU(s) list: 0-3

root@StarFive:~# uname -a
Linux StarFive 6.1.12-rt7-gdfa52cc14f3b #1 SMP PREEMPT_RT Thu, 01 Jan 
1970 01:00:00 +0000 riscv64 GNU/Linuxb

root@StarFive:~# cat /proc/cmdline
initrd=\initrd.img-6.1.12-rt7-gdfa52cc14f3b LABEL=Boot 
root=PARTUUID=7176479f-eeea-46ac-afb6-7ec47ff7c390 console=tty0 
console=ttyS0,115200 earlycon rootwait isolcpus=2-3 rcu_nocbs=2-3 
nohz_full=2-3 irqaffinity=0-1

root@StarFive:~# cyclictest -m -S -p 90 -i 50 -d 0 -q -D 10m
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
T: 0 (  358) P:90 I:50 C:11999999 Min:     11 Act:   11 Avg:   11 Max: 
    55
T: 1 (  359) P:90 I:50 C:11999241 Min:     11 Act:   11 Avg:   11 Max: 
    60

Feel free to reach out for further tests or logs.

Best,
Tobias

 > Since v1:
 >   - send to related maillist, I press ENTER too quickly when sending v1
 >   - remove the signal_pending() handling because that's covered by
 >     generic guest entry infrastructure
 >
 > Jisheng Zhang (5):
 >   RISC-V: KVM: Record number of signal exits as a vCPU stat
 >   RISC-V: KVM: Use generic guest entry infrastructure
 >   riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
 >   riscv: add lazy preempt support
 >   riscv: Allow to enable RT
 >
 >  arch/riscv/Kconfig                   |  3 +++
 >  arch/riscv/include/asm/kvm_host.h    |  1 +
 >  arch/riscv/include/asm/thread_info.h |  7 +++++--
 >  arch/riscv/kernel/asm-offsets.c      |  1 +
 >  arch/riscv/kernel/entry.S            |  9 +++++++--
 >  arch/riscv/kvm/Kconfig               |  1 +
 >  arch/riscv/kvm/vcpu.c                | 18 +++++++-----------
 >  7 files changed, 25 insertions(+), 15 deletions(-)
 >
 > --
 > 2.34.1
 >
 >
 > _______________________________________________
 > linux-riscv mailing list
 > linux-riscv@lists.infradead.org
 > http://lists.infradead.org/mailman/listinfo/linux-riscv