diff mbox series

[RFC,1/1] riscv: mm: notify remote harts about mmu cache updates

Message ID 20220829205219.283543-1-geomatsi@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/1] riscv: mm: notify remote harts about mmu cache updates | expand

Commit Message

Sergey Matyukevich Aug. 29, 2022, 8:52 p.m. UTC
From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>

Current implementation of update_mmu_cache function performs local TLB
flush. It does not take into account ASID information. Besides, it does
not take into account other harts currently running the same mm context
or possible migration of the running context to other harts. Meanwhile
TLB flush is not performed for every context switch if ASID support
is enabled.

Patch [1] proposed to add ASID support to update_mmu_cache to avoid
flushing local TLB entirely. This patch takes into account other
harts currently running the same mm context as well as possible
migration of this context to other harts.

For this purpose the approach from flush_icache_mm is reused. Remote
harts currently running the same mm context are informed via SBI calls
that they need to flush their local TLBs. All the other harts are marked
as needing a deferred TLB flush when this mm context runs on them.

[1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/

Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
---
 arch/riscv/include/asm/mmu.h      |  2 ++
 arch/riscv/include/asm/pgtable.h  |  2 +-
 arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
 arch/riscv/mm/context.c           | 10 ++++++++++
 arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
 5 files changed, 42 insertions(+), 18 deletions(-)

Comments

Lad, Prabhakar Dec. 22, 2022, 5:50 p.m. UTC | #1
Hi Sergey,

On Mon, Aug 29, 2022 at 9:53 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
>
> Current implementation of update_mmu_cache function performs local TLB
> flush. It does not take into account ASID information. Besides, it does
> not take into account other harts currently running the same mm context
> or possible migration of the running context to other harts. Meanwhile
> TLB flush is not performed for every context switch if ASID support
> is enabled.
>
> Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> flushing local TLB entirely. This patch takes into account other
> harts currently running the same mm context as well as possible
> migration of this context to other harts.
>
> For this purpose the approach from flush_icache_mm is reused. Remote
> harts currently running the same mm context are informed via SBI calls
> that they need to flush their local TLBs. All the other harts are marked
> as needing a deferred TLB flush when this mm context runs on them.
>
> [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
>
> Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> ---
>  arch/riscv/include/asm/mmu.h      |  2 ++
>  arch/riscv/include/asm/pgtable.h  |  2 +-
>  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
>  arch/riscv/mm/context.c           | 10 ++++++++++
>  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
>  5 files changed, 42 insertions(+), 18 deletions(-)
>
I couldn't find your latest patch in my mailbox so I'm replying to this one.

I merged Palmer's for-next branch and when running tests on eMMC with
bonnie++ on the Renesas RZ/Five SoC I am seeing the below issues:

root@smarc-rzfive:/lava-testing# ./emmc_t_002.sh

Welcome to fdisk (util-linux 2.35.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

[   40.809677]  mmcblk0: p1

Command (m for help): Created a new DOS disklabel with disk identifier
0xf4682ae9.

Command (m for help): Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): Partition number (1-4, default 1): First sector
(2048-124321791, default 2048): Last sector, +/-sectors or
+/-size{K,M,G,T,P} (2048-124321791, default 124321791):
Created a new partition 1 of type 'Linux' and of size 59.3 GiB.
Partition #1 contains a ext4 signature.

Command (m for help):
The partition table has been altered.
Calling ioctl() to re-read partition table.
[   40.945583]  mmcblk0: p1
Syncing disks.

mke2fs 1.45.7 (28-Jan-2021)
/dev/mmcblk0p1 contains a ext4 file system
        last mounted on /tmp/tmp.PDgTkhohqt/mnt on Fri Dec 16 19:48:34 2022
Discarding device blocks: done
Creating filesystem with 15539968 4k blocks and 3891200 inodes
Filesystem UUID: 6effbf47-2d7a-4eb8-b2dc-1333b848e449
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424

Allocating group tables: done
Writing inode tables: done
Creating journal (65536 blocks): done
Writing superblocks and filesystem accounting information: done

e2fsck 1.45.7 (28-Jan-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mmcblk0p1: 11/3891200 files (0.0% non-contiguous), 323121/15539968 blocks
[   91.521828] EXT4-fs (mmcblk0p1): mounted filesystem
6effbf47-2d7a-4eb8-b2dc-1333b848e449 with ordered data mode. Quota
mode: disabled.
Using uid:0, gid:0.
Writing with putc()...[  131.775220] do_trap: 3 callbacks suppressed
[  131.775245] sd-resolve[128]: unhandled signal 11 code 0x1 at
0x0000000000000060 in libpthread-2.28.so[3fa6d80000+13000]
[  131.790382] CPU: 0 PID: 128 Comm: sd-resolve Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  131.798386] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  131.804999] epc : 0000003fa6d8eeac ra : 0000003fa6f4a76c sp :
0000003fa6b8c330
[  131.812214]  gp : 0000002aacc1cb88 tp : 0000003fa6b92810 t0 :
0000000000000022
[  131.819432]  t1 : 0000003fa6e7f0ec t2 : 0000003fa6b8b290 s0 :
0000003fa6b8c850
[  131.826669]  s1 : 0000002aacc1f430 a0 : 000000000000000a a1 :
0000003fa6b8c3b8
[  131.833891]  a2 : 0000000000004000 a3 : 0000000000000000 a4 :
0000000000000020
[  131.841110]  a5 : 0000000000000002 a6 : 0000003fa6b8c360 a7 :
0000000000000007
[  131.848328]  s2 : ffffffffffffb000 s3 : ffffffffffffd3d0 s4 :
0000003fa6fe0918
[  131.855561]  s5 : 0000003fa6b8ec20 s6 : 0000003fa6b8c440 s7 :
fffffffffffffffd
[  131.862783]  s8 : 0000003fa6b8c420 s9 : 000000000000000a s10:
0000000000000000
[  131.870001]  s11: 0000003fa6fe2090 t3 : 0000003fa6d8eeaa t4 :
00000009a331f45c
[  131.877219]  t5 : 000000000000003f t6 : 0000000000000000
[  131.882548] status: 8000000200006020 badaddr: 0000000000000060
cause: 000000000000000d
[  131.891349] systemd-journal[87]: unhandled signal 11 code 0x1 at
0x00000000000000c8 in systemd-journald[2abd710000+1b000]
[  131.902382] CPU: 0 PID: 87 Comm: systemd-journal Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  131.910731] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  131.917359] epc : 0000002abd7167e0 ra : 0000002abd7179e4 sp :
0000003fd65416c0
[  131.924578]  gp : 0000002abd72e120 tp : 0000003fbea1f720 t0 :
3534616138333466
[  131.931796]  t1 : ffffffffffffe000 t2 : 000000000000000d s0 :
0000003fd65416c0
[  131.939014]  s1 : 0000003fd65437a8 a0 : 0000000000000009 a1 :
0000003fd65416c0
[  131.946232]  a2 : 0000000000002000 a3 : 0000003fd65437a8 a4 :
0000003fd65436c8
[  131.953450]  a5 : 0000003fd65436d0 a6 : 0000000000000083 a7 :
0000000000000018
[  131.960668]  s2 : 0000003fbee6c918 s3 : 0000003fd6543708 s4 :
0000003fd6543700
[  131.967885]  s5 : 0000002abd724718 s6 : 0000002abd7247d8 s7 :
0000000000000000
[  131.975102]  s8 : ffffffffffffffff s9 : 0000002ad2397120 s10:
0000000000000000
[  131.982319]  s11: 0000003fe5aad418 t3 : 0000003fbed75364 t4 :
00000009a7934adc
[  131.989564]  t5 : 00000000001ea8b0 t6 : 3463396363613637
[  131.994883] status: 0000000200004020 badaddr: 00000000000000c8
cause: 000000000000000d
[  132.003911] audit: type=1701 audit(1671220069.615:11):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=87
comm="systemd-journal" exe="/lib/systemd/systemd-journald" sig=11
res=1
[  132.024142] systemd[1]: unhandled signal 11 code 0x1 at
0xffffffac2b2a2928 in ld-2.28.so[3f83a6e000+17000]
[  132.033946] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  132.041563] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  132.048198] epc : 0000003f83a7a81c ra : 0000003f83a7a992 sp :
0000003fe5aad570
[  132.055419]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
0000000000a9919e
[  132.062635]  t1 : 0000003f8391f7dc t2 : 0000000000000000 s0 :
0000003f83a664d0
[  132.069852]  s1 : 0000003f83a85918 a0 : 0000000000000001 a1 :
0000003f83a20940
[  132.077068]  a2 : 0000003f83629680 a3 : 0000000000000073 a4 :
0000000000000001
[  132.084284]  a5 : 0000003f83a87090 a6 : 000000000000002f a7 :
0000000000062164
[  132.091501]  s2 : 0000002ad22cdce0 s3 : 0000003f83a85918 s4 :
0000000000000006
[  132.098722]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
0000000000001000
[  132.105939]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
0000002ad22fdd60
[  132.113155]  s11: 2f2e2d2c2b2a2928 t3 : 0000003f83a7a9cc t4 :
0000000000000068
[  132.120371]  t5 : 0000000052d19905 t6 : 0000000000d19905
[  132.125686] status: 0000000200004020 badaddr: ffffffac2b2a2928
cause: 000000000000000d
[  132.145321] audit: type=1701 audit(1671220069.747:12):
auid=4294967295 uid=995 gid=994 ses=4294967295 pid=126
comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
[  132.161689] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
[  132.168714] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  132.176293] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  132.182906] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
0000003fe5aace60
[  132.190125]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
0000000000a9919e
[  132.197357]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
0000003fe5aad050
[  132.204574]  s1 : 0000000000000000 a0 : 0000003fe5aace70 a1 :
0000003fe5aad058
[  132.211791]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
0000000000000001
[  132.219007]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
0000000000000000
[  132.226223]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
0000000000000006
[  132.233439]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
0000000000001000
[  132.240655]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
0000002ad22fdd60
[  132.247872]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
0000000000000068
[  132.255088]  t5 : 0000000052d19905 t6 : 0000000000d19905
[  132.260403] status: 0000000200004020 badaddr: 0000006c6b6a6968
cause: 000000000000000c
[  132.269759] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
[  132.276708] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  132.284283] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  132.290895] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
0000003fe5aac750
[  132.298113]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
0000000000a9919e
[  132.305363]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
0000003fe5aac940
[  132.312580]  s1 : 0000000000000000 a0 : 0000003fe5aac760 a1 :
0000003fe5aac948
[  132.319796]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
0000000000000001
[  132.327013]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
0000000000000000
[  132.334229]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
0000000000000006
[  132.341444]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
0000000000001000
[  132.348660]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
0000002ad22fdd60
[  132.355877]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
0000000000000068
[  132.363093]  t5 : 0000000052d19905 t6 : 0000000000d19905
[  132.368408] status: 0000000200004020 badaddr: 0000006c6b6a6968
cause: 000000000000000c
[  132.377123] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
[  132.384078] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  132.391652] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  132.398262] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
0000003fe5aac040
[  132.405479]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
0000000000a9919e
[  132.412745]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
0000003fe5aac230
[  132.419967]  s1 : 0000000000000000 a0 : 0000003fe5aac050 a1 :
0000003fe5aac238
[  132.427184]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
0000000000000001
[  132.434401]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
0000000000000000
[  132.441618]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
0000000000000006
[  132.448833]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
0000000000001000
[  132.456049]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
0000002ad22fdd60
[  132.463265]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
0000000000000068
[  132.470480]  t5 : 0000000052d19905 t6 : 0000000000d19905
[  132.475804] status: 0000000200004020 badaddr: 0000006c6b6a6968
cause: 000000000000000c
[  132.496855] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
[  132.503842] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  132.511415] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  132.518027] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
0000003fe5aab930
[  132.525244]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
0000000000a9919e
[  132.532462]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
0000003fe5aabb20
[  132.539678]  s1 : 0000000000000000 a0 : 0000003fe5aab940 a1 :
0000003fe5aabb28
[  132.546939]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
0000000000000001
[  132.554161]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
0000000000000000
[  132.561378]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
0000000000000006
[  132.568595]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
0000000000001000
[  132.575812]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
0000002ad22fdd60
[  132.583029]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
0000000000000068
[  132.590246]  t5 : 0000000052d19905 t6 : 0000000000d19905
[  132.595561] status: 0000000200004020 badaddr: 0000006c6b6a6968
cause: 000000000000000c
[  132.604448] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
[  132.611424] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  132.618987] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  132.625606] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
0000003fe5aab220
[  132.632818]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
0000000000a9919e
[  132.640035]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
0000003fe5aab410
[  132.647252]  s1 : 0000000000000000 a0 : 0000003fe5aab230 a1 :
0000003fe5aab418
[  132.654467]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
0000000000000001
[  132.661682]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
0000000000000000
[  132.668898]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
0000000000000006
[  132.676113]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
0000000000001000
[  132.683329]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
0000002ad22fdd60
[  132.690556]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
0000000000000068
[  132.697773]  t5 : 0000000052d19905 t6 : 0000000000d19905
[  132.703086] status: 0000000200004020 badaddr: 0000006c6b6a6968
cause: 000000000000000c
[  132.993558] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[  133.001210] CPU: 0 PID: 1 Comm: systemd Not tainted
6.1.0-11009-gf4e9a8cdc25b #167
[  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[  133.015338] Call Trace:
[  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
[  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
[  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
[  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
[  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
[  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
[  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
[  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
[  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
[  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
[  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---

If I revert this patch [0] bonnie++ works as expected.

Any pointers on what could be the issue here?

[0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641

Cheers,
Prabhakar

> diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
> index cedcf8ea3c76..86670a1b4ffd 100644
> --- a/arch/riscv/include/asm/mmu.h
> +++ b/arch/riscv/include/asm/mmu.h
> @@ -20,6 +20,8 @@ typedef struct {
>  #ifdef CONFIG_SMP
>         /* A local icache flush is needed before user execution can resume. */
>         cpumask_t icache_stale_mask;
> +       /* A local tlb flush is needed before user execution can resume. */
> +       cpumask_t tlb_stale_mask;
>  #endif
>  } mm_context_t;
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 7ec936910a96..330f75fe1278 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>          * Relying on flush_tlb_fix_spurious_fault would suffice, but
>          * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
>          */
> -       local_flush_tlb_page(address);
> +       flush_tlb_page(vma, address);
>  }
>
>  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index 801019381dea..907b9efd39a8 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -22,6 +22,24 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  {
>         ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
>  }
> +
> +static inline void local_flush_tlb_all_asid(unsigned long asid)
> +{
> +       __asm__ __volatile__ ("sfence.vma x0, %0"
> +                       :
> +                       : "r" (asid)
> +                       : "memory");
> +}
> +
> +static inline void local_flush_tlb_page_asid(unsigned long addr,
> +               unsigned long asid)
> +{
> +       __asm__ __volatile__ ("sfence.vma %0, %1"
> +                       :
> +                       : "r" (addr), "r" (asid)
> +                       : "memory");
> +}
> +
>  #else /* CONFIG_MMU */
>  #define local_flush_tlb_all()                  do { } while (0)
>  #define local_flush_tlb_page(addr)             do { } while (0)
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 7acbfbd14557..80ce9caba8d2 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -196,6 +196,16 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>
>         if (need_flush_tlb)
>                 local_flush_tlb_all();
> +#ifdef CONFIG_SMP
> +       else {
> +               cpumask_t *mask = &mm->context.tlb_stale_mask;
> +
> +               if (cpumask_test_cpu(cpu, mask)) {
> +                       cpumask_clear_cpu(cpu, mask);
> +                       local_flush_tlb_all_asid(cntx & asid_mask);
> +               }
> +       }
> +#endif
>  }
>
>  static void set_mm_noasid(struct mm_struct *mm)
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 37ed760d007c..ce7dfc81bb3f 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -5,23 +5,7 @@
>  #include <linux/sched.h>
>  #include <asm/sbi.h>
>  #include <asm/mmu_context.h>
> -
> -static inline void local_flush_tlb_all_asid(unsigned long asid)
> -{
> -       __asm__ __volatile__ ("sfence.vma x0, %0"
> -                       :
> -                       : "r" (asid)
> -                       : "memory");
> -}
> -
> -static inline void local_flush_tlb_page_asid(unsigned long addr,
> -               unsigned long asid)
> -{
> -       __asm__ __volatile__ ("sfence.vma %0, %1"
> -                       :
> -                       : "r" (addr), "r" (asid)
> -                       : "memory");
> -}
> +#include <asm/tlbflush.h>
>
>  void flush_tlb_all(void)
>  {
> @@ -31,6 +15,7 @@ void flush_tlb_all(void)
>  static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
>                                   unsigned long size, unsigned long stride)
>  {
> +       struct cpumask *pmask = &mm->context.tlb_stale_mask;
>         struct cpumask *cmask = mm_cpumask(mm);
>         unsigned int cpuid;
>         bool broadcast;
> @@ -44,6 +29,15 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
>         if (static_branch_unlikely(&use_asid_allocator)) {
>                 unsigned long asid = atomic_long_read(&mm->context.id);
>
> +               /*
> +                * TLB will be immediately flushed on harts concurrently
> +                * executing this MM context. TLB flush on other harts
> +                * is deferred until this MM context migrates there.
> +                */
> +               cpumask_setall(pmask);
> +               cpumask_clear_cpu(cpuid, pmask);
> +               cpumask_andnot(pmask, pmask, cmask);
> +
>                 if (broadcast) {
>                         sbi_remote_sfence_vma_asid(cmask, start, size, asid);
>                 } else if (size <= stride) {
> --
> 2.37.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Sergey Matyukevich Dec. 22, 2022, 7:54 p.m. UTC | #2
Hi Prabhakar,

> Hi Sergey,
> 
> On Mon, Aug 29, 2022 at 9:53 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> >
> > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> >
> > Current implementation of update_mmu_cache function performs local TLB
> > flush. It does not take into account ASID information. Besides, it does
> > not take into account other harts currently running the same mm context
> > or possible migration of the running context to other harts. Meanwhile
> > TLB flush is not performed for every context switch if ASID support
> > is enabled.
> >
> > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > flushing local TLB entirely. This patch takes into account other
> > harts currently running the same mm context as well as possible
> > migration of this context to other harts.
> >
> > For this purpose the approach from flush_icache_mm is reused. Remote
> > harts currently running the same mm context are informed via SBI calls
> > that they need to flush their local TLBs. All the other harts are marked
> > as needing a deferred TLB flush when this mm context runs on them.
> >
> > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> >
> > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > ---
> >  arch/riscv/include/asm/mmu.h      |  2 ++
> >  arch/riscv/include/asm/pgtable.h  |  2 +-
> >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> >  arch/riscv/mm/context.c           | 10 ++++++++++
> >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> >  5 files changed, 42 insertions(+), 18 deletions(-)
> >
> I couldn't find your latest patch in my mailbox so I'm replying to this one.
> 
> I merged Palmer's for-next branch and when running tests on eMMC with
> bonnie++ on the Renesas RZ/Five SoC I am seeing the below issues:
> 
> root@smarc-rzfive:/lava-testing# ./emmc_t_002.sh
> 
> Welcome to fdisk (util-linux 2.35.1).
> Changes will remain in memory only, until you decide to write them.
> Be careful before using the write command.
> 
> [   40.809677]  mmcblk0: p1
> 
> Command (m for help): Created a new DOS disklabel with disk identifier
> 0xf4682ae9.
> 
> Command (m for help): Partition type
>    p   primary (0 primary, 0 extended, 4 free)
>    e   extended (container for logical partitions)
> Select (default p): Partition number (1-4, default 1): First sector
> (2048-124321791, default 2048): Last sector, +/-sectors or
> +/-size{K,M,G,T,P} (2048-124321791, default 124321791):
> Created a new partition 1 of type 'Linux' and of size 59.3 GiB.
> Partition #1 contains a ext4 signature.
> 
> Command (m for help):
> The partition table has been altered.
> Calling ioctl() to re-read partition table.
> [   40.945583]  mmcblk0: p1
> Syncing disks.
> 
> mke2fs 1.45.7 (28-Jan-2021)
> /dev/mmcblk0p1 contains a ext4 file system
>         last mounted on /tmp/tmp.PDgTkhohqt/mnt on Fri Dec 16 19:48:34 2022
> Discarding device blocks: done
> Creating filesystem with 15539968 4k blocks and 3891200 inodes
> Filesystem UUID: 6effbf47-2d7a-4eb8-b2dc-1333b848e449
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
>         4096000, 7962624, 11239424
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (65536 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> e2fsck 1.45.7 (28-Jan-2021)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/mmcblk0p1: 11/3891200 files (0.0% non-contiguous), 323121/15539968 blocks
> [   91.521828] EXT4-fs (mmcblk0p1): mounted filesystem
> 6effbf47-2d7a-4eb8-b2dc-1333b848e449 with ordered data mode. Quota
> mode: disabled.
> Using uid:0, gid:0.
> Writing with putc()...[  131.775220] do_trap: 3 callbacks suppressed
> [  131.775245] sd-resolve[128]: unhandled signal 11 code 0x1 at
> 0x0000000000000060 in libpthread-2.28.so[3fa6d80000+13000]
> [  131.790382] CPU: 0 PID: 128 Comm: sd-resolve Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  131.798386] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  131.804999] epc : 0000003fa6d8eeac ra : 0000003fa6f4a76c sp :
> 0000003fa6b8c330
> [  131.812214]  gp : 0000002aacc1cb88 tp : 0000003fa6b92810 t0 :
> 0000000000000022
> [  131.819432]  t1 : 0000003fa6e7f0ec t2 : 0000003fa6b8b290 s0 :
> 0000003fa6b8c850
> [  131.826669]  s1 : 0000002aacc1f430 a0 : 000000000000000a a1 :
> 0000003fa6b8c3b8
> [  131.833891]  a2 : 0000000000004000 a3 : 0000000000000000 a4 :
> 0000000000000020
> [  131.841110]  a5 : 0000000000000002 a6 : 0000003fa6b8c360 a7 :
> 0000000000000007
> [  131.848328]  s2 : ffffffffffffb000 s3 : ffffffffffffd3d0 s4 :
> 0000003fa6fe0918
> [  131.855561]  s5 : 0000003fa6b8ec20 s6 : 0000003fa6b8c440 s7 :
> fffffffffffffffd
> [  131.862783]  s8 : 0000003fa6b8c420 s9 : 000000000000000a s10:
> 0000000000000000
> [  131.870001]  s11: 0000003fa6fe2090 t3 : 0000003fa6d8eeaa t4 :
> 00000009a331f45c
> [  131.877219]  t5 : 000000000000003f t6 : 0000000000000000
> [  131.882548] status: 8000000200006020 badaddr: 0000000000000060
> cause: 000000000000000d
> [  131.891349] systemd-journal[87]: unhandled signal 11 code 0x1 at
> 0x00000000000000c8 in systemd-journald[2abd710000+1b000]
> [  131.902382] CPU: 0 PID: 87 Comm: systemd-journal Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  131.910731] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  131.917359] epc : 0000002abd7167e0 ra : 0000002abd7179e4 sp :
> 0000003fd65416c0
> [  131.924578]  gp : 0000002abd72e120 tp : 0000003fbea1f720 t0 :
> 3534616138333466
> [  131.931796]  t1 : ffffffffffffe000 t2 : 000000000000000d s0 :
> 0000003fd65416c0
> [  131.939014]  s1 : 0000003fd65437a8 a0 : 0000000000000009 a1 :
> 0000003fd65416c0
> [  131.946232]  a2 : 0000000000002000 a3 : 0000003fd65437a8 a4 :
> 0000003fd65436c8
> [  131.953450]  a5 : 0000003fd65436d0 a6 : 0000000000000083 a7 :
> 0000000000000018
> [  131.960668]  s2 : 0000003fbee6c918 s3 : 0000003fd6543708 s4 :
> 0000003fd6543700
> [  131.967885]  s5 : 0000002abd724718 s6 : 0000002abd7247d8 s7 :
> 0000000000000000
> [  131.975102]  s8 : ffffffffffffffff s9 : 0000002ad2397120 s10:
> 0000000000000000
> [  131.982319]  s11: 0000003fe5aad418 t3 : 0000003fbed75364 t4 :
> 00000009a7934adc
> [  131.989564]  t5 : 00000000001ea8b0 t6 : 3463396363613637
> [  131.994883] status: 0000000200004020 badaddr: 00000000000000c8
> cause: 000000000000000d
> [  132.003911] audit: type=1701 audit(1671220069.615:11):
> auid=4294967295 uid=0 gid=0 ses=4294967295 pid=87
> comm="systemd-journal" exe="/lib/systemd/systemd-journald" sig=11
> res=1
> [  132.024142] systemd[1]: unhandled signal 11 code 0x1 at
> 0xffffffac2b2a2928 in ld-2.28.so[3f83a6e000+17000]
> [  132.033946] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  132.041563] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  132.048198] epc : 0000003f83a7a81c ra : 0000003f83a7a992 sp :
> 0000003fe5aad570
> [  132.055419]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
> 0000000000a9919e
> [  132.062635]  t1 : 0000003f8391f7dc t2 : 0000000000000000 s0 :
> 0000003f83a664d0
> [  132.069852]  s1 : 0000003f83a85918 a0 : 0000000000000001 a1 :
> 0000003f83a20940
> [  132.077068]  a2 : 0000003f83629680 a3 : 0000000000000073 a4 :
> 0000000000000001
> [  132.084284]  a5 : 0000003f83a87090 a6 : 000000000000002f a7 :
> 0000000000062164
> [  132.091501]  s2 : 0000002ad22cdce0 s3 : 0000003f83a85918 s4 :
> 0000000000000006
> [  132.098722]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
> 0000000000001000
> [  132.105939]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
> 0000002ad22fdd60
> [  132.113155]  s11: 2f2e2d2c2b2a2928 t3 : 0000003f83a7a9cc t4 :
> 0000000000000068
> [  132.120371]  t5 : 0000000052d19905 t6 : 0000000000d19905
> [  132.125686] status: 0000000200004020 badaddr: ffffffac2b2a2928
> cause: 000000000000000d
> [  132.145321] audit: type=1701 audit(1671220069.747:12):
> auid=4294967295 uid=995 gid=994 ses=4294967295 pid=126
> comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
> [  132.161689] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
> [  132.168714] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  132.176293] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  132.182906] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
> 0000003fe5aace60
> [  132.190125]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
> 0000000000a9919e
> [  132.197357]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
> 0000003fe5aad050
> [  132.204574]  s1 : 0000000000000000 a0 : 0000003fe5aace70 a1 :
> 0000003fe5aad058
> [  132.211791]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
> 0000000000000001
> [  132.219007]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
> 0000000000000000
> [  132.226223]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
> 0000000000000006
> [  132.233439]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
> 0000000000001000
> [  132.240655]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
> 0000002ad22fdd60
> [  132.247872]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
> 0000000000000068
> [  132.255088]  t5 : 0000000052d19905 t6 : 0000000000d19905
> [  132.260403] status: 0000000200004020 badaddr: 0000006c6b6a6968
> cause: 000000000000000c
> [  132.269759] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
> [  132.276708] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  132.284283] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  132.290895] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
> 0000003fe5aac750
> [  132.298113]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
> 0000000000a9919e
> [  132.305363]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
> 0000003fe5aac940
> [  132.312580]  s1 : 0000000000000000 a0 : 0000003fe5aac760 a1 :
> 0000003fe5aac948
> [  132.319796]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
> 0000000000000001
> [  132.327013]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
> 0000000000000000
> [  132.334229]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
> 0000000000000006
> [  132.341444]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
> 0000000000001000
> [  132.348660]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
> 0000002ad22fdd60
> [  132.355877]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
> 0000000000000068
> [  132.363093]  t5 : 0000000052d19905 t6 : 0000000000d19905
> [  132.368408] status: 0000000200004020 badaddr: 0000006c6b6a6968
> cause: 000000000000000c
> [  132.377123] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
> [  132.384078] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  132.391652] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  132.398262] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
> 0000003fe5aac040
> [  132.405479]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
> 0000000000a9919e
> [  132.412745]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
> 0000003fe5aac230
> [  132.419967]  s1 : 0000000000000000 a0 : 0000003fe5aac050 a1 :
> 0000003fe5aac238
> [  132.427184]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
> 0000000000000001
> [  132.434401]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
> 0000000000000000
> [  132.441618]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
> 0000000000000006
> [  132.448833]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
> 0000000000001000
> [  132.456049]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
> 0000002ad22fdd60
> [  132.463265]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
> 0000000000000068
> [  132.470480]  t5 : 0000000052d19905 t6 : 0000000000d19905
> [  132.475804] status: 0000000200004020 badaddr: 0000006c6b6a6968
> cause: 000000000000000c
> [  132.496855] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
> [  132.503842] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  132.511415] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  132.518027] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
> 0000003fe5aab930
> [  132.525244]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
> 0000000000a9919e
> [  132.532462]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
> 0000003fe5aabb20
> [  132.539678]  s1 : 0000000000000000 a0 : 0000003fe5aab940 a1 :
> 0000003fe5aabb28
> [  132.546939]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
> 0000000000000001
> [  132.554161]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
> 0000000000000000
> [  132.561378]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
> 0000000000000006
> [  132.568595]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
> 0000000000001000
> [  132.575812]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
> 0000002ad22fdd60
> [  132.583029]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
> 0000000000000068
> [  132.590246]  t5 : 0000000052d19905 t6 : 0000000000d19905
> [  132.595561] status: 0000000200004020 badaddr: 0000006c6b6a6968
> cause: 000000000000000c
> [  132.604448] systemd[1]: unhandled signal 11 code 0x1 at 0x0000006c6b6a6968
> [  132.611424] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  132.618987] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  132.625606] epc : 0000006c6b6a6968 ra : 0000003f838630b2 sp :
> 0000003fe5aab220
> [  132.632818]  gp : 0000002ad234ad28 tp : 0000003f83628e70 t0 :
> 0000000000a9919e
> [  132.640035]  t1 : 0000003f838595fc t2 : 0000000000000000 s0 :
> 0000003fe5aab410
> [  132.647252]  s1 : 0000000000000000 a0 : 0000003fe5aab230 a1 :
> 0000003fe5aab418
> [  132.654467]  a2 : 0000000000000080 a3 : 0000000000000010 a4 :
> 0000000000000001
> [  132.661682]  a5 : 0000003f839a1784 a6 : 0000000000000000 a7 :
> 0000000000000000
> [  132.668898]  s2 : 0000000000000011 s3 : 000000000000000b s4 :
> 0000000000000006
> [  132.676113]  s5 : 0000000000000002 s6 : 0000003f83a85918 s7 :
> 0000000000001000
> [  132.683329]  s8 : 0000003fe5aad750 s9 : 0000003fe5aad9e0 s10:
> 0000002ad22fdd60
> [  132.690556]  s11: 2f2e2d2c2b2a2928 t3 : 6f6e6d6c6b6a6968 t4 :
> 0000000000000068
> [  132.697773]  t5 : 0000000052d19905 t6 : 0000000000d19905
> [  132.703086] status: 0000000200004020 badaddr: 0000006c6b6a6968
> cause: 000000000000000c
> [  132.993558] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [  133.001210] CPU: 0 PID: 1 Comm: systemd Not tainted
> 6.1.0-11009-gf4e9a8cdc25b #167
> [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [  133.015338] Call Trace:
> [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x0000000b ]---
> 
> If I revert this patch [0] bonnie++ works as expected.
> 
> Any pointers on what could be the issue here?
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> 
> Cheers,
> Prabhakar

Good catch. Thanks for reporting ! Discussion around the issue and
possible ways to fix it can be found in the following email thread:

https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/

Could you please apply the patch from Guo Ren instead of [0] and check
if you have any issues with your test ? Besides, could you please share
your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?

Regards,
Sergey
Lad, Prabhakar Dec. 22, 2022, 9 p.m. UTC | #3
Hi Sergey,

On Thu, Dec 22, 2022 at 7:54 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
>
> Hi Prabhakar,
>
> > Hi Sergey,
> >
> > On Mon, Aug 29, 2022 at 9:53 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > >
> > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > >
> > > Current implementation of update_mmu_cache function performs local TLB
> > > flush. It does not take into account ASID information. Besides, it does
> > > not take into account other harts currently running the same mm context
> > > or possible migration of the running context to other harts. Meanwhile
> > > TLB flush is not performed for every context switch if ASID support
> > > is enabled.
> > >
> > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > flushing local TLB entirely. This patch takes into account other
> > > harts currently running the same mm context as well as possible
> > > migration of this context to other harts.
> > >
> > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > harts currently running the same mm context are informed via SBI calls
> > > that they need to flush their local TLBs. All the other harts are marked
> > > as needing a deferred TLB flush when this mm context runs on them.
> > >
> > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > >
> > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > ---
> > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > >
<snip>
> > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > [  133.015338] Call Trace:
> > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > init! exitcode=0x0000000b ]---
> >
> > If I revert this patch [0] bonnie++ works as expected.
> >
> > Any pointers on what could be the issue here?
> >
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> >
> > Cheers,
> > Prabhakar
>
> Good catch. Thanks for reporting ! Discussion around the issue and
> possible ways to fix it can be found in the following email thread:
>
> https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
>
> Could you please apply the patch from Guo Ren instead of [0] and check
> if you have any issues with your test ? Besides, could you please share
> your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
>
Thanks for the pointer, I'll undo my changes and test Guo's patch.

I have pasted the script here [0] and attached config.

[0] https://paste.debian.net/hidden/a7a769b5/


Cheers,
Prabhakar
Sergey Matyukevich Dec. 22, 2022, 9:14 p.m. UTC | #4
Hi Prabhakar,

> > > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > >
> > > > Current implementation of update_mmu_cache function performs local TLB
> > > > flush. It does not take into account ASID information. Besides, it does
> > > > not take into account other harts currently running the same mm context
> > > > or possible migration of the running context to other harts. Meanwhile
> > > > TLB flush is not performed for every context switch if ASID support
> > > > is enabled.
> > > >
> > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > flushing local TLB entirely. This patch takes into account other
> > > > harts currently running the same mm context as well as possible
> > > > migration of this context to other harts.
> > > >
> > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > harts currently running the same mm context are informed via SBI calls
> > > > that they need to flush their local TLBs. All the other harts are marked
> > > > as needing a deferred TLB flush when this mm context runs on them.
> > > >
> > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > >
> > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > ---
> > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > >
> <snip>
> > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > [  133.015338] Call Trace:
> > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > init! exitcode=0x0000000b ]---
> > >
> > > If I revert this patch [0] bonnie++ works as expected.
> > >
> > > Any pointers on what could be the issue here?
> > >
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > >
> > > Cheers,
> > > Prabhakar
> >
> > Good catch. Thanks for reporting ! Discussion around the issue and
> > possible ways to fix it can be found in the following email thread:
> >
> > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> >
> > Could you please apply the patch from Guo Ren instead of [0] and check
> > if you have any issues with your test ? Besides, could you please share
> > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> >
> Thanks for the pointer, I'll undo my changes and test Guo's patch.
> 
> I have pasted the script here [0] and attached config.
> 
> [0] https://paste.debian.net/hidden/a7a769b5/

Thanks for the script and config. Could you please also share the
following information:
- how many cores your system has
- does your system support ASID


Regards,
Sergey
Lad, Prabhakar Dec. 22, 2022, 9:26 p.m. UTC | #5
Hi Sergey,

On Thu, Dec 22, 2022 at 9:14 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> Hi Prabhakar,
>
> > > > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > >
> > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > not take into account other harts currently running the same mm context
> > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > TLB flush is not performed for every context switch if ASID support
> > > > > is enabled.
> > > > >
> > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > flushing local TLB entirely. This patch takes into account other
> > > > > harts currently running the same mm context as well as possible
> > > > > migration of this context to other harts.
> > > > >
> > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > harts currently running the same mm context are informed via SBI calls
> > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > >
> > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > >
> > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > ---
> > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > >
> > <snip>
> > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > [  133.015338] Call Trace:
> > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > init! exitcode=0x0000000b ]---
> > > >
> > > > If I revert this patch [0] bonnie++ works as expected.
> > > >
> > > > Any pointers on what could be the issue here?
> > > >
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > >
> > > > Cheers,
> > > > Prabhakar
> > >
> > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > possible ways to fix it can be found in the following email thread:
> > >
> > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > >
> > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > if you have any issues with your test ? Besides, could you please share
> > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > >
> > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> >
> > I have pasted the script here [0] and attached config.
> >
> > [0] https://paste.debian.net/hidden/a7a769b5/
>
> Thanks for the script and config. Could you please also share the
> following information:
> - how many cores your system has
The Renesas RZ/Five SoC has a single Andes AX45MP core.

> - does your system support ASID
>
With a quick look at [0] It does support ASID, unless there is a way
to disable it.

[0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf

Cheers,
Prabhakar
Sergey Matyukevich Dec. 22, 2022, 10:20 p.m. UTC | #6
On Thu, Dec 22, 2022 at 09:26:07PM +0000, Lad, Prabhakar wrote:
> Hi Sergey,
> 
> On Thu, Dec 22, 2022 at 9:14 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> >
> > Hi Prabhakar,
> >
> > > > > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > >
> > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > not take into account other harts currently running the same mm context
> > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > is enabled.
> > > > > >
> > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > harts currently running the same mm context as well as possible
> > > > > > migration of this context to other harts.
> > > > > >
> > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > >
> > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > >
> > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > ---
> > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > >
> > > <snip>
> > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > [  133.015338] Call Trace:
> > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > init! exitcode=0x0000000b ]---
> > > > >
> > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > >
> > > > > Any pointers on what could be the issue here?
> > > > >
> > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > >
> > > > > Cheers,
> > > > > Prabhakar
> > > >
> > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > possible ways to fix it can be found in the following email thread:
> > > >
> > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > >
> > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > if you have any issues with your test ? Besides, could you please share
> > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > >
> > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > >
> > > I have pasted the script here [0] and attached config.
> > >
> > > [0] https://paste.debian.net/hidden/a7a769b5/
> >
> > Thanks for the script and config. Could you please also share the
> > following information:
> > - how many cores your system has
> The Renesas RZ/Five SoC has a single Andes AX45MP core.
> 
> > - does your system support ASID
> >
> With a quick look at [0] It does support ASID, unless there is a way
> to disable it.
> 
> [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf

So you have a single-core system, but your kernel configuration enables
CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
time if CONFIG_SMP is disabled. On the other hand, system should not
fail that way even if SMP is enabled.

Let me double-check if anything can go wrong if cpumasks may have only
a single cpu. Another suspect is a change in update_mmu_cache: probably
making it asid-specific (and thus more granular) was a bad idea.

Regards,
Sergey
Lad, Prabhakar Dec. 23, 2022, 1:02 p.m. UTC | #7
Hi Sergey,

On Thu, Dec 22, 2022 at 10:20 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> On Thu, Dec 22, 2022 at 09:26:07PM +0000, Lad, Prabhakar wrote:
> > Hi Sergey,
> >
> > On Thu, Dec 22, 2022 at 9:14 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > >
> > > Hi Prabhakar,
> > >
> > > > > > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > >
> > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > not take into account other harts currently running the same mm context
> > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > is enabled.
> > > > > > >
> > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > harts currently running the same mm context as well as possible
> > > > > > > migration of this context to other harts.
> > > > > > >
> > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > >
> > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > ---
> > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > >
> > > > <snip>
> > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > [  133.015338] Call Trace:
> > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > init! exitcode=0x0000000b ]---
> > > > > >
> > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > >
> > > > > > Any pointers on what could be the issue here?
> > > > > >
> > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > >
> > > > > > Cheers,
> > > > > > Prabhakar
> > > > >
> > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > possible ways to fix it can be found in the following email thread:
> > > > >
> > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > >
> > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > if you have any issues with your test ? Besides, could you please share
> > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > >
> > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > >
> > > > I have pasted the script here [0] and attached config.
> > > >
> > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > >
> > > Thanks for the script and config. Could you please also share the
> > > following information:
> > > - how many cores your system has
> > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> >
> > > - does your system support ASID
> > >
> > With a quick look at [0] It does support ASID, unless there is a way
> > to disable it.
> >
> > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
>
> So you have a single-core system, but your kernel configuration enables
> CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> time if CONFIG_SMP is disabled. On the other hand, system should not
> fail that way even if SMP is enabled.
>
I enabled CONFIG_SMP while doing some testing of PMA code and indeed
enabling this config should not introduce a failure.

> Let me double-check if anything can go wrong if cpumasks may have only
> a single cpu. Another suspect is a change in update_mmu_cache: probably
> making it asid-specific (and thus more granular) was a bad idea.
>
Thanks.

BTW I tested the patch [0] which you pointed out and that fixes the
issues seen earlier.

[0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/

Cheers,
Prabhakar
Sergey Matyukevich Dec. 23, 2022, 5:22 p.m. UTC | #8
Hi Prabhakar,

On Fri, Dec 23, 2022 at 01:02:10PM +0000, Lad, Prabhakar wrote:
> Hi Sergey,
> 
> On Thu, Dec 22, 2022 at 10:20 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> >
> > On Thu, Dec 22, 2022 at 09:26:07PM +0000, Lad, Prabhakar wrote:
> > > Hi Sergey,
> > >
> > > On Thu, Dec 22, 2022 at 9:14 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > > >
> > > > Hi Prabhakar,
> > > >
> > > > > > > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > >
> > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > is enabled.
> > > > > > > >
> > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > migration of this context to other harts.
> > > > > > > >
> > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > >
> > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > >
> > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > ---
> > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > >
> > > > > <snip>
> > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > [  133.015338] Call Trace:
> > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > init! exitcode=0x0000000b ]---
> > > > > > >
> > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > >
> > > > > > > Any pointers on what could be the issue here?
> > > > > > >
> > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Prabhakar
> > > > > >
> > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > possible ways to fix it can be found in the following email thread:
> > > > > >
> > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > >
> > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > >
> > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > >
> > > > > I have pasted the script here [0] and attached config.
> > > > >
> > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > >
> > > > Thanks for the script and config. Could you please also share the
> > > > following information:
> > > > - how many cores your system has
> > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > >
> > > > - does your system support ASID
> > > >
> > > With a quick look at [0] It does support ASID, unless there is a way
> > > to disable it.
> > >
> > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> >
> > So you have a single-core system, but your kernel configuration enables
> > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > time if CONFIG_SMP is disabled. On the other hand, system should not
> > fail that way even if SMP is enabled.
> >
> I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> enabling this config should not introduce a failure.
> 
> > Let me double-check if anything can go wrong if cpumasks may have only
> > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > making it asid-specific (and thus more granular) was a bad idea.
> >
> Thanks.
> 
> BTW I tested the patch [0] which you pointed out and that fixes the
> issues seen earlier.
> 
> [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/

All looks good with cpumasks in the single-core case. So deferred TLB
flush logic is not even executed in your case. So the root cause
should be in update_mmu_cache change.

May I ask you to repeat the original emmc test on your platform from
for-next (i.e. with [0] and without [1]) with the following partial revert:

: diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
: index ec6fb83349ce..92ec2d9d7273 100644
: --- a/arch/riscv/include/asm/pgtable.h
: +++ b/arch/riscv/include/asm/pgtable.h
: @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
:  	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
:  	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
:  	 */
: -	flush_tlb_page(vma, address);
: +	local_flush_tlb_page(address);
:  }
:  
:  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,

[0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
[1] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/

Regards,
Sergey
Lad, Prabhakar Dec. 24, 2022, 8:46 a.m. UTC | #9
Hi Sergey,

On Fri, Dec 23, 2022 at 5:22 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> Hi Prabhakar,
>
> On Fri, Dec 23, 2022 at 01:02:10PM +0000, Lad, Prabhakar wrote:
> > Hi Sergey,
> >
> > On Thu, Dec 22, 2022 at 10:20 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > >
> > > On Thu, Dec 22, 2022 at 09:26:07PM +0000, Lad, Prabhakar wrote:
> > > > Hi Sergey,
> > > >
> > > > On Thu, Dec 22, 2022 at 9:14 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > > > >
> > > > > Hi Prabhakar,
> > > > >
> > > > > > > > > From: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > >
> > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > is enabled.
> > > > > > > > >
> > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > migration of this context to other harts.
> > > > > > > > >
> > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > >
> > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > >
> > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > ---
> > > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > >
> > > > > > <snip>
> > > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > [  133.015338] Call Trace:
> > > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > >
> > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > >
> > > > > > > > Any pointers on what could be the issue here?
> > > > > > > >
> > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Prabhakar
> > > > > > >
> > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > >
> > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > >
> > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > >
> > > > > > I have pasted the script here [0] and attached config.
> > > > > >
> > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > >
> > > > > Thanks for the script and config. Could you please also share the
> > > > > following information:
> > > > > - how many cores your system has
> > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > >
> > > > > - does your system support ASID
> > > > >
> > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > to disable it.
> > > >
> > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > >
> > > So you have a single-core system, but your kernel configuration enables
> > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > fail that way even if SMP is enabled.
> > >
> > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > enabling this config should not introduce a failure.
> >
> > > Let me double-check if anything can go wrong if cpumasks may have only
> > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > making it asid-specific (and thus more granular) was a bad idea.
> > >
> > Thanks.
> >
> > BTW I tested the patch [0] which you pointed out and that fixes the
> > issues seen earlier.
> >
> > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
>
> All looks good with cpumasks in the single-core case. So deferred TLB
> flush logic is not even executed in your case. So the root cause
> should be in update_mmu_cache change.
>
> May I ask you to repeat the original emmc test on your platform from
> for-next (i.e. with [0] and without [1]) with the following partial revert:
>
> : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> : index ec6fb83349ce..92ec2d9d7273 100644
> : --- a/arch/riscv/include/asm/pgtable.h
> : +++ b/arch/riscv/include/asm/pgtable.h
> : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> :        */
> : -     flush_tlb_page(vma, address);
> : +     local_flush_tlb_page(address);
> :  }
> :
> :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
>
I tested your above proposed changes and I am no longer seeing an
issue on my platform.

Cheers,
Prabhakar
Sergey Matyukevich Dec. 24, 2022, 11:48 a.m. UTC | #10
Hi Prabhakar,

> > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > is enabled.
> > > > > > > > > >
> > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > migration of this context to other harts.
> > > > > > > > > >
> > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > >
> > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > ---
> > > > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > >
> > > > > > > <snip>
> > > > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > [  133.015338] Call Trace:
> > > > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > >
> > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > >
> > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > >
> > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Prabhakar
> > > > > > > >
> > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > >
> > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > >
> > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > >
> > > > > > > I have pasted the script here [0] and attached config.
> > > > > > >
> > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > >
> > > > > > Thanks for the script and config. Could you please also share the
> > > > > > following information:
> > > > > > - how many cores your system has
> > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > >
> > > > > > - does your system support ASID
> > > > > >
> > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > to disable it.
> > > > >
> > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > >
> > > > So you have a single-core system, but your kernel configuration enables
> > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > fail that way even if SMP is enabled.
> > > >
> > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > enabling this config should not introduce a failure.
> > >
> > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > making it asid-specific (and thus more granular) was a bad idea.
> > > >
> > > Thanks.
> > >
> > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > issues seen earlier.
> > >
> > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> >
> > All looks good with cpumasks in the single-core case. So deferred TLB
> > flush logic is not even executed in your case. So the root cause
> > should be in update_mmu_cache change.
> >
> > May I ask you to repeat the original emmc test on your platform from
> > for-next (i.e. with [0] and without [1]) with the following partial revert:
> >
> > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > : index ec6fb83349ce..92ec2d9d7273 100644
> > : --- a/arch/riscv/include/asm/pgtable.h
> > : +++ b/arch/riscv/include/asm/pgtable.h
> > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > :        */
> > : -     flush_tlb_page(vma, address);
> > : +     local_flush_tlb_page(address);
> > :  }
> > :
> > :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> >
> I tested your above proposed changes and I am no longer seeing an
> issue on my platform.

Great. I will send a fixup after I double-check on several other
hardware platforms. Thanks for testing ! 

Regards,
Sergey
Lad, Prabhakar Dec. 30, 2022, 4:15 p.m. UTC | #11
Hi Sergey,

On Sat, Dec 24, 2022 at 11:48 AM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> Hi Prabhakar,
>
> > > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > > is enabled.
> > > > > > > > > > >
> > > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > > migration of this context to other harts.
> > > > > > > > > > >
> > > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > > >
> > > > > > > > <snip>
> > > > > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > > [  133.015338] Call Trace:
> > > > > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > > >
> > > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > > >
> > > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > > >
> > > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Prabhakar
> > > > > > > > >
> > > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > > >
> > > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > > >
> > > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > > >
> > > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > > >
> > > > > > > > I have pasted the script here [0] and attached config.
> > > > > > > >
> > > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > > >
> > > > > > > Thanks for the script and config. Could you please also share the
> > > > > > > following information:
> > > > > > > - how many cores your system has
> > > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > > >
> > > > > > > - does your system support ASID
> > > > > > >
> > > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > > to disable it.
> > > > > >
> > > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > > >
> > > > > So you have a single-core system, but your kernel configuration enables
> > > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > > fail that way even if SMP is enabled.
> > > > >
> > > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > > enabling this config should not introduce a failure.
> > > >
> > > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > > making it asid-specific (and thus more granular) was a bad idea.
> > > > >
> > > > Thanks.
> > > >
> > > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > > issues seen earlier.
> > > >
> > > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> > >
> > > All looks good with cpumasks in the single-core case. So deferred TLB
> > > flush logic is not even executed in your case. So the root cause
> > > should be in update_mmu_cache change.
> > >
> > > May I ask you to repeat the original emmc test on your platform from
> > > for-next (i.e. with [0] and without [1]) with the following partial revert:
> > >
> > > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > : index ec6fb83349ce..92ec2d9d7273 100644
> > > : --- a/arch/riscv/include/asm/pgtable.h
> > > : +++ b/arch/riscv/include/asm/pgtable.h
> > > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > > :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > > :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > > :        */
> > > : -     flush_tlb_page(vma, address);
> > > : +     local_flush_tlb_page(address);
> > > :  }
> > > :
> > > :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > >
> > I tested your above proposed changes and I am no longer seeing an
> > issue on my platform.
>
> Great. I will send a fixup after I double-check on several other
> hardware platforms. Thanks for testing !
>
Actuall, I did hit an issue now with your proposed changes with bonnie++ again!

[ 1873.355279] EXT4-fs (sda1): mounted filesystem
050ad3b9-b571-4b4c-9500-db56feed01ab with ordered data mode. Quota
mode: disabled.
Using uid:0, gid:0.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...[ 2682.180114] sd-resolve[126]: unhandled signal
11 code 0x1 at 0x0000000000000000 in libc-2.28.so[3fbe503000+ff000]
[ 2682.190552] CPU: 0 PID: 126 Comm: sd-resolve Not tainted
6.2.0-rc1-00111-g7bcd7d932cf6 #189
[ 2682.198917] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
[ 2682.205536] epc : 0000003fbe568004 ra : 0000003fbe569440 sp :
0000003fbe417b10
[ 2682.212770]  gp : 0000002ab8d18b88 tp : 0000003fbe41e810 t0 :
0000000000000022
[ 2682.219987]  t1 : 0000003fbdbf1e4c t2 : 0000003fbe417290 s0 :
00000000000000e0
[ 2682.227219]  s1 : 0000000000000000 a0 : 0000000000000000 a1 :
8080808080808080
[ 2682.234433]  a2 : 0000003fb80019f0 a3 : 0000003fbe41e8f0 a4 :
0000000000000008
[ 2682.241648]  a5 : 0000000000000000 a6 : fefefefefefefeff a7 :
0000000000000039
[ 2682.248863]  s2 : 0000003fbe417b37 s3 : 0000003fbe417e48 s4 :
0000000000000001
[ 2682.256082]  s5 : 0000003fbe417eb0 s6 : 00000000000000e0 s7 :
0000003fbdbf2f2a
[ 2682.263297]  s8 : ffffffffffffffff s9 : fffffffffffffffd s10:
ffffffffffffffff
[ 2682.270511]  s11: fffffffffffffffe t3 : 0000000000000000 t4 :
000000077ce2ea37
[ 2682.277733]  t5 : 000000000000003f t6 : 0000000000000000
[ 2682.283046] status: 0000000200004020 badaddr: 0000000000000000
cause: 000000000000000d
[ 2682.441183] audit: type=1701 audit(1671222620.403:17):
auid=4294967295 uid=995 gid=994 ses=4294967295 pid=124
comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
done
Reading intelligently...done

Let me know if you want me to share my branch.

Cheers,
Prabhakar
Sergey Matyukevich Dec. 30, 2022, 4:53 p.m. UTC | #12
Hi Prabhakar,

> Hi Sergey,
> 
> On Sat, Dec 24, 2022 at 11:48 AM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> >
> > Hi Prabhakar,
> >
> > > > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > > > is enabled.
> > > > > > > > > > > >
> > > > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > > > migration of this context to other harts.
> > > > > > > > > > > >
> > > > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > <snip>
> > > > > > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > > > [  133.015338] Call Trace:
> > > > > > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > > > >
> > > > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > > > >
> > > > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > > > >
> > > > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Prabhakar
> > > > > > > > > >
> > > > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > > > >
> > > > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > > > >
> > > > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > > > >
> > > > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > > > >
> > > > > > > > > I have pasted the script here [0] and attached config.
> > > > > > > > >
> > > > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > > > >
> > > > > > > > Thanks for the script and config. Could you please also share the
> > > > > > > > following information:
> > > > > > > > - how many cores your system has
> > > > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > > > >
> > > > > > > > - does your system support ASID
> > > > > > > >
> > > > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > > > to disable it.
> > > > > > >
> > > > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > > > >
> > > > > > So you have a single-core system, but your kernel configuration enables
> > > > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > > > fail that way even if SMP is enabled.
> > > > > >
> > > > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > > > enabling this config should not introduce a failure.
> > > > >
> > > > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > > > making it asid-specific (and thus more granular) was a bad idea.
> > > > > >
> > > > > Thanks.
> > > > >
> > > > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > > > issues seen earlier.
> > > > >
> > > > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> > > >
> > > > All looks good with cpumasks in the single-core case. So deferred TLB
> > > > flush logic is not even executed in your case. So the root cause
> > > > should be in update_mmu_cache change.
> > > >
> > > > May I ask you to repeat the original emmc test on your platform from
> > > > for-next (i.e. with [0] and without [1]) with the following partial revert:
> > > >
> > > > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > > : index ec6fb83349ce..92ec2d9d7273 100644
> > > > : --- a/arch/riscv/include/asm/pgtable.h
> > > > : +++ b/arch/riscv/include/asm/pgtable.h
> > > > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > > > :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > > > :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > > > :        */
> > > > : -     flush_tlb_page(vma, address);
> > > > : +     local_flush_tlb_page(address);
> > > > :  }
> > > > :
> > > > :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > > >
> > > I tested your above proposed changes and I am no longer seeing an
> > > issue on my platform.
> >
> > Great. I will send a fixup after I double-check on several other
> > hardware platforms. Thanks for testing !
> >
> Actuall, I did hit an issue now with your proposed changes with bonnie++ again!
> 
> [ 1873.355279] EXT4-fs (sda1): mounted filesystem
> 050ad3b9-b571-4b4c-9500-db56feed01ab with ordered data mode. Quota
> mode: disabled.
> Using uid:0, gid:0.
> Writing with putc()...done
> Writing intelligently...done
> Rewriting...done
> Reading with getc()...[ 2682.180114] sd-resolve[126]: unhandled signal
> 11 code 0x1 at 0x0000000000000000 in libc-2.28.so[3fbe503000+ff000]
> [ 2682.190552] CPU: 0 PID: 126 Comm: sd-resolve Not tainted
> 6.2.0-rc1-00111-g7bcd7d932cf6 #189
> [ 2682.198917] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> [ 2682.205536] epc : 0000003fbe568004 ra : 0000003fbe569440 sp :
> 0000003fbe417b10
> [ 2682.212770]  gp : 0000002ab8d18b88 tp : 0000003fbe41e810 t0 :
> 0000000000000022
> [ 2682.219987]  t1 : 0000003fbdbf1e4c t2 : 0000003fbe417290 s0 :
> 00000000000000e0
> [ 2682.227219]  s1 : 0000000000000000 a0 : 0000000000000000 a1 :
> 8080808080808080
> [ 2682.234433]  a2 : 0000003fb80019f0 a3 : 0000003fbe41e8f0 a4 :
> 0000000000000008
> [ 2682.241648]  a5 : 0000000000000000 a6 : fefefefefefefeff a7 :
> 0000000000000039
> [ 2682.248863]  s2 : 0000003fbe417b37 s3 : 0000003fbe417e48 s4 :
> 0000000000000001
> [ 2682.256082]  s5 : 0000003fbe417eb0 s6 : 00000000000000e0 s7 :
> 0000003fbdbf2f2a
> [ 2682.263297]  s8 : ffffffffffffffff s9 : fffffffffffffffd s10:
> ffffffffffffffff
> [ 2682.270511]  s11: fffffffffffffffe t3 : 0000000000000000 t4 :
> 000000077ce2ea37
> [ 2682.277733]  t5 : 000000000000003f t6 : 0000000000000000
> [ 2682.283046] status: 0000000200004020 badaddr: 0000000000000000
> cause: 000000000000000d
> [ 2682.441183] audit: type=1701 audit(1671222620.403:17):
> auid=4294967295 uid=995 gid=994 ses=4294967295 pid=124
> comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
> done
> Reading intelligently...done
> 
> Let me know if you want me to share my branch.

Hmmm... I assume you hit the issue after you applied suggested partial
revert for the update_mmu_cache function. If so, then your issue is not
related to the the remaining part (deferred local_flush_tlb_all_asid)
of the commit 4bd1d80efb5a. This is because that flush is not executed
on a single-core system, even if CONFIG_SMP is enabled.

Could you please run your tests for a while on your branch with reverted
commit 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache
updates"). It looks like the change in the function update_mmu_cache
from the commit 4bd1d80efb5a increases the probability of crash, but
there is also something else in the new kernel that bites your system.

Yes, could you please share your branch. I will take a look and run
some tests on different boards that I have at hand.

Regards,
Sergey
Lad, Prabhakar Dec. 30, 2022, 5:28 p.m. UTC | #13
Hi Sergey,

On Fri, Dec 30, 2022 at 4:53 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> Hi Prabhakar,
>
> > Hi Sergey,
> >
> > On Sat, Dec 24, 2022 at 11:48 AM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > >
<snip>
> > > > > > > > > Thanks for the script and config. Could you please also share the
> > > > > > > > > following information:
> > > > > > > > > - how many cores your system has
> > > > > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > > > > >
> > > > > > > > > - does your system support ASID
> > > > > > > > >
> > > > > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > > > > to disable it.
> > > > > > > >
> > > > > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > > > > >
> > > > > > > So you have a single-core system, but your kernel configuration enables
> > > > > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > > > > fail that way even if SMP is enabled.
> > > > > > >
> > > > > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > > > > enabling this config should not introduce a failure.
> > > > > >
> > > > > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > > > > making it asid-specific (and thus more granular) was a bad idea.
> > > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > > > > issues seen earlier.
> > > > > >
> > > > > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> > > > >
> > > > > All looks good with cpumasks in the single-core case. So deferred TLB
> > > > > flush logic is not even executed in your case. So the root cause
> > > > > should be in update_mmu_cache change.
> > > > >
> > > > > May I ask you to repeat the original emmc test on your platform from
> > > > > for-next (i.e. with [0] and without [1]) with the following partial revert:
> > > > >
> > > > > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > > > : index ec6fb83349ce..92ec2d9d7273 100644
> > > > > : --- a/arch/riscv/include/asm/pgtable.h
> > > > > : +++ b/arch/riscv/include/asm/pgtable.h
> > > > > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > > > > :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > > > > :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > > > > :        */
> > > > > : -     flush_tlb_page(vma, address);
> > > > > : +     local_flush_tlb_page(address);
> > > > > :  }
> > > > > :
> > > > > :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > > > >
> > > > I tested your above proposed changes and I am no longer seeing an
> > > > issue on my platform.
> > >
> > > Great. I will send a fixup after I double-check on several other
> > > hardware platforms. Thanks for testing !
> > >
> > Actuall, I did hit an issue now with your proposed changes with bonnie++ again!
> >
> > [ 1873.355279] EXT4-fs (sda1): mounted filesystem
> > 050ad3b9-b571-4b4c-9500-db56feed01ab with ordered data mode. Quota
> > mode: disabled.
> > Using uid:0, gid:0.
> > Writing with putc()...done
> > Writing intelligently...done
> > Rewriting...done
> > Reading with getc()...[ 2682.180114] sd-resolve[126]: unhandled signal
> > 11 code 0x1 at 0x0000000000000000 in libc-2.28.so[3fbe503000+ff000]
> > [ 2682.190552] CPU: 0 PID: 126 Comm: sd-resolve Not tainted
> > 6.2.0-rc1-00111-g7bcd7d932cf6 #189
> > [ 2682.198917] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > [ 2682.205536] epc : 0000003fbe568004 ra : 0000003fbe569440 sp :
> > 0000003fbe417b10
> > [ 2682.212770]  gp : 0000002ab8d18b88 tp : 0000003fbe41e810 t0 :
> > 0000000000000022
> > [ 2682.219987]  t1 : 0000003fbdbf1e4c t2 : 0000003fbe417290 s0 :
> > 00000000000000e0
> > [ 2682.227219]  s1 : 0000000000000000 a0 : 0000000000000000 a1 :
> > 8080808080808080
> > [ 2682.234433]  a2 : 0000003fb80019f0 a3 : 0000003fbe41e8f0 a4 :
> > 0000000000000008
> > [ 2682.241648]  a5 : 0000000000000000 a6 : fefefefefefefeff a7 :
> > 0000000000000039
> > [ 2682.248863]  s2 : 0000003fbe417b37 s3 : 0000003fbe417e48 s4 :
> > 0000000000000001
> > [ 2682.256082]  s5 : 0000003fbe417eb0 s6 : 00000000000000e0 s7 :
> > 0000003fbdbf2f2a
> > [ 2682.263297]  s8 : ffffffffffffffff s9 : fffffffffffffffd s10:
> > ffffffffffffffff
> > [ 2682.270511]  s11: fffffffffffffffe t3 : 0000000000000000 t4 :
> > 000000077ce2ea37
> > [ 2682.277733]  t5 : 000000000000003f t6 : 0000000000000000
> > [ 2682.283046] status: 0000000200004020 badaddr: 0000000000000000
> > cause: 000000000000000d
> > [ 2682.441183] audit: type=1701 audit(1671222620.403:17):
> > auid=4294967295 uid=995 gid=994 ses=4294967295 pid=124
> > comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
> > done
> > Reading intelligently...done
> >
> > Let me know if you want me to share my branch.
>
> Hmmm... I assume you hit the issue after you applied suggested partial
> revert for the update_mmu_cache function. If so, then your issue is not
> related to the the remaining part (deferred local_flush_tlb_all_asid)
> of the commit 4bd1d80efb5a. This is because that flush is not executed
> on a single-core system, even if CONFIG_SMP is enabled.
>
Yes this issue was seen after the partial revert [0] as per your suggestion.

> Could you please run your tests for a while on your branch with reverted
> commit 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache
> updates"). It looks like the change in the function update_mmu_cache
> from the commit 4bd1d80efb5a increases the probability of crash, but
> there is also something else in the new kernel that bites your system.
>
Sure, I'll re-run the tests with the complete revert of 4bd1d80efb5a
and check the status of it.

> Yes, could you please share your branch. I will take a look and run
> some tests on different boards that I have at hand.
>
Sure, I have pushed a branch [1] for you to have a look.

[0] https://github.com/prabhakarlad/linux/commit/58f0c079b2f839e635a77ded5505de5b7de05dbc
[1] https://github.com/prabhakarlad/linux/commits/rzfive-cmo-bisect

Cheers,
Prabhakar
Lad, Prabhakar Jan. 26, 2023, 5:17 p.m. UTC | #14
Hi Sergey,

On Fri, Dec 30, 2022 at 4:53 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
>
> Hi Prabhakar,
>
> > Hi Sergey,
> >
> > On Sat, Dec 24, 2022 at 11:48 AM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > >
> > > Hi Prabhakar,
> > >
> > > > > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > > > > is enabled.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > > > > migration of this context to other harts.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > <snip>
> > > > > > > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > > > > [  133.015338] Call Trace:
> > > > > > > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > > > > >
> > > > > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > > > > >
> > > > > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > > > > >
> > > > > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Prabhakar
> > > > > > > > > > >
> > > > > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > > > > >
> > > > > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > > > > >
> > > > > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > > > > >
> > > > > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > > > > >
> > > > > > > > > > I have pasted the script here [0] and attached config.
> > > > > > > > > >
> > > > > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > > > > >
> > > > > > > > > Thanks for the script and config. Could you please also share the
> > > > > > > > > following information:
> > > > > > > > > - how many cores your system has
> > > > > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > > > > >
> > > > > > > > > - does your system support ASID
> > > > > > > > >
> > > > > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > > > > to disable it.
> > > > > > > >
> > > > > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > > > > >
> > > > > > > So you have a single-core system, but your kernel configuration enables
> > > > > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > > > > fail that way even if SMP is enabled.
> > > > > > >
> > > > > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > > > > enabling this config should not introduce a failure.
> > > > > >
> > > > > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > > > > making it asid-specific (and thus more granular) was a bad idea.
> > > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > > > > issues seen earlier.
> > > > > >
> > > > > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> > > > >
> > > > > All looks good with cpumasks in the single-core case. So deferred TLB
> > > > > flush logic is not even executed in your case. So the root cause
> > > > > should be in update_mmu_cache change.
> > > > >
> > > > > May I ask you to repeat the original emmc test on your platform from
> > > > > for-next (i.e. with [0] and without [1]) with the following partial revert:
> > > > >
> > > > > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > > > : index ec6fb83349ce..92ec2d9d7273 100644
> > > > > : --- a/arch/riscv/include/asm/pgtable.h
> > > > > : +++ b/arch/riscv/include/asm/pgtable.h
> > > > > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > > > > :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > > > > :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > > > > :        */
> > > > > : -     flush_tlb_page(vma, address);
> > > > > : +     local_flush_tlb_page(address);
> > > > > :  }
> > > > > :
> > > > > :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > > > >
> > > > I tested your above proposed changes and I am no longer seeing an
> > > > issue on my platform.
> > >
> > > Great. I will send a fixup after I double-check on several other
> > > hardware platforms. Thanks for testing !
> > >
> > Actuall, I did hit an issue now with your proposed changes with bonnie++ again!
> >
> > [ 1873.355279] EXT4-fs (sda1): mounted filesystem
> > 050ad3b9-b571-4b4c-9500-db56feed01ab with ordered data mode. Quota
> > mode: disabled.
> > Using uid:0, gid:0.
> > Writing with putc()...done
> > Writing intelligently...done
> > Rewriting...done
> > Reading with getc()...[ 2682.180114] sd-resolve[126]: unhandled signal
> > 11 code 0x1 at 0x0000000000000000 in libc-2.28.so[3fbe503000+ff000]
> > [ 2682.190552] CPU: 0 PID: 126 Comm: sd-resolve Not tainted
> > 6.2.0-rc1-00111-g7bcd7d932cf6 #189
> > [ 2682.198917] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > [ 2682.205536] epc : 0000003fbe568004 ra : 0000003fbe569440 sp :
> > 0000003fbe417b10
> > [ 2682.212770]  gp : 0000002ab8d18b88 tp : 0000003fbe41e810 t0 :
> > 0000000000000022
> > [ 2682.219987]  t1 : 0000003fbdbf1e4c t2 : 0000003fbe417290 s0 :
> > 00000000000000e0
> > [ 2682.227219]  s1 : 0000000000000000 a0 : 0000000000000000 a1 :
> > 8080808080808080
> > [ 2682.234433]  a2 : 0000003fb80019f0 a3 : 0000003fbe41e8f0 a4 :
> > 0000000000000008
> > [ 2682.241648]  a5 : 0000000000000000 a6 : fefefefefefefeff a7 :
> > 0000000000000039
> > [ 2682.248863]  s2 : 0000003fbe417b37 s3 : 0000003fbe417e48 s4 :
> > 0000000000000001
> > [ 2682.256082]  s5 : 0000003fbe417eb0 s6 : 00000000000000e0 s7 :
> > 0000003fbdbf2f2a
> > [ 2682.263297]  s8 : ffffffffffffffff s9 : fffffffffffffffd s10:
> > ffffffffffffffff
> > [ 2682.270511]  s11: fffffffffffffffe t3 : 0000000000000000 t4 :
> > 000000077ce2ea37
> > [ 2682.277733]  t5 : 000000000000003f t6 : 0000000000000000
> > [ 2682.283046] status: 0000000200004020 badaddr: 0000000000000000
> > cause: 000000000000000d
> > [ 2682.441183] audit: type=1701 audit(1671222620.403:17):
> > auid=4294967295 uid=995 gid=994 ses=4294967295 pid=124
> > comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
> > done
> > Reading intelligently...done
> >
> > Let me know if you want me to share my branch.
>
> Hmmm... I assume you hit the issue after you applied suggested partial
> revert for the update_mmu_cache function. If so, then your issue is not
> related to the the remaining part (deferred local_flush_tlb_all_asid)
> of the commit 4bd1d80efb5a. This is because that flush is not executed
> on a single-core system, even if CONFIG_SMP is enabled.
>
> Could you please run your tests for a while on your branch with reverted
> commit 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache
> updates"). It looks like the change in the function update_mmu_cache
> from the commit 4bd1d80efb5a increases the probability of crash, but
> there is also something else in the new kernel that bites your system.
>
> Yes, could you please share your branch. I will take a look and run
> some tests on different boards that I have at hand.
>
Do you have any further updates on this?

Cheers,
Prabhakar
Sergey Matyukevich Jan. 26, 2023, 8:07 p.m. UTC | #15
On Thu, Jan 26, 2023 at 05:17:23PM +0000, Lad, Prabhakar wrote:
> Hi Sergey,
> 
> On Fri, Dec 30, 2022 at 4:53 PM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> >
> > Hi Prabhakar,
> >
> > > Hi Sergey,
> > >
> > > On Sat, Dec 24, 2022 at 11:48 AM Sergey Matyukevich <geomatsi@gmail.com> wrote:
> > > >
> > > > Hi Prabhakar,
> > > >
> > > > > > > > > > > > > > Current implementation of update_mmu_cache function performs local TLB
> > > > > > > > > > > > > > flush. It does not take into account ASID information. Besides, it does
> > > > > > > > > > > > > > not take into account other harts currently running the same mm context
> > > > > > > > > > > > > > or possible migration of the running context to other harts. Meanwhile
> > > > > > > > > > > > > > TLB flush is not performed for every context switch if ASID support
> > > > > > > > > > > > > > is enabled.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Patch [1] proposed to add ASID support to update_mmu_cache to avoid
> > > > > > > > > > > > > > flushing local TLB entirely. This patch takes into account other
> > > > > > > > > > > > > > harts currently running the same mm context as well as possible
> > > > > > > > > > > > > > migration of this context to other harts.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For this purpose the approach from flush_icache_mm is reused. Remote
> > > > > > > > > > > > > > harts currently running the same mm context are informed via SBI calls
> > > > > > > > > > > > > > that they need to flush their local TLBs. All the other harts are marked
> > > > > > > > > > > > > > as needing a deferred TLB flush when this mm context runs on them.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  arch/riscv/include/asm/mmu.h      |  2 ++
> > > > > > > > > > > > > >  arch/riscv/include/asm/pgtable.h  |  2 +-
> > > > > > > > > > > > > >  arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++
> > > > > > > > > > > > > >  arch/riscv/mm/context.c           | 10 ++++++++++
> > > > > > > > > > > > > >  arch/riscv/mm/tlbflush.c          | 28 +++++++++++-----------------
> > > > > > > > > > > > > >  5 files changed, 42 insertions(+), 18 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > <snip>
> > > > > > > > > > > > > [  133.008752] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > > > > > > > > > > > [  133.015338] Call Trace:
> > > > > > > > > > > > > [  133.017778] [<ffffffff800055cc>] dump_backtrace+0x1c/0x24
> > > > > > > > > > > > > [  133.023174] [<ffffffff80776836>] show_stack+0x2c/0x38
> > > > > > > > > > > > > [  133.028214] [<ffffffff80780244>] dump_stack_lvl+0x3c/0x54
> > > > > > > > > > > > > [  133.033597] [<ffffffff80780270>] dump_stack+0x14/0x1c
> > > > > > > > > > > > > [  133.038633] [<ffffffff80776c00>] panic+0x102/0x29a
> > > > > > > > > > > > > [  133.043409] [<ffffffff800137ba>] do_exit+0x704/0x70a
> > > > > > > > > > > > > [  133.048362] [<ffffffff8001390e>] do_group_exit+0x24/0x70
> > > > > > > > > > > > > [  133.053659] [<ffffffff8001de54>] get_signal+0x68a/0x6dc
> > > > > > > > > > > > > [  133.058874] [<ffffffff8000494e>] do_work_pending+0xd6/0x44e
> > > > > > > > > > > > > [  133.064427] [<ffffffff800036c2>] resume_userspace_slow+0x8/0xa
> > > > > > > > > > > > > [  133.070249] ---[ end Kernel panic - not syncing: Attempted to kill
> > > > > > > > > > > > > init! exitcode=0x0000000b ]---
> > > > > > > > > > > > >
> > > > > > > > > > > > > If I revert this patch [0] bonnie++ works as expected.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any pointers on what could be the issue here?
> > > > > > > > > > > > >
> > > > > > > > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=4bd1d80efb5af640f99157f39b50fb11326ce641
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Prabhakar
> > > > > > > > > > > >
> > > > > > > > > > > > Good catch. Thanks for reporting ! Discussion around the issue and
> > > > > > > > > > > > possible ways to fix it can be found in the following email thread:
> > > > > > > > > > > >
> > > > > > > > > > > > https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
> > > > > > > > > > > >
> > > > > > > > > > > > Could you please apply the patch from Guo Ren instead of [0] and check
> > > > > > > > > > > > if you have any issues with your test ? Besides, could you please share
> > > > > > > > > > > > your kernel configuration and the actual bonnie++ params from emmc_t_002.sh script ?
> > > > > > > > > > > >
> > > > > > > > > > > Thanks for the pointer, I'll undo my changes and test Guo's patch.
> > > > > > > > > > >
> > > > > > > > > > > I have pasted the script here [0] and attached config.
> > > > > > > > > > >
> > > > > > > > > > > [0] https://paste.debian.net/hidden/a7a769b5/
> > > > > > > > > >
> > > > > > > > > > Thanks for the script and config. Could you please also share the
> > > > > > > > > > following information:
> > > > > > > > > > - how many cores your system has
> > > > > > > > > The Renesas RZ/Five SoC has a single Andes AX45MP core.
> > > > > > > > >
> > > > > > > > > > - does your system support ASID
> > > > > > > > > >
> > > > > > > > > With a quick look at [0] It does support ASID, unless there is a way
> > > > > > > > > to disable it.
> > > > > > > > >
> > > > > > > > > [0] http://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
> > > > > > > >
> > > > > > > > So you have a single-core system, but your kernel configuration enables
> > > > > > > > CONFIG_SMP. Additional 'deferred TLB flush' logic is dropped at build
> > > > > > > > time if CONFIG_SMP is disabled. On the other hand, system should not
> > > > > > > > fail that way even if SMP is enabled.
> > > > > > > >
> > > > > > > I enabled CONFIG_SMP while doing some testing of PMA code and indeed
> > > > > > > enabling this config should not introduce a failure.
> > > > > > >
> > > > > > > > Let me double-check if anything can go wrong if cpumasks may have only
> > > > > > > > a single cpu. Another suspect is a change in update_mmu_cache: probably
> > > > > > > > making it asid-specific (and thus more granular) was a bad idea.
> > > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > BTW I tested the patch [0] which you pointed out and that fixes the
> > > > > > > issues seen earlier.
> > > > > > >
> > > > > > > [0] https://patchwork.kernel.org/project/linux-riscv/patch/20221111075902.798571-1-guoren@kernel.org/
> > > > > >
> > > > > > All looks good with cpumasks in the single-core case. So deferred TLB
> > > > > > flush logic is not even executed in your case. So the root cause
> > > > > > should be in update_mmu_cache change.
> > > > > >
> > > > > > May I ask you to repeat the original emmc test on your platform from
> > > > > > for-next (i.e. with [0] and without [1]) with the following partial revert:
> > > > > >
> > > > > > : diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > > > > : index ec6fb83349ce..92ec2d9d7273 100644
> > > > > > : --- a/arch/riscv/include/asm/pgtable.h
> > > > > > : +++ b/arch/riscv/include/asm/pgtable.h
> > > > > > : @@ -415,7 +415,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> > > > > > :        * Relying on flush_tlb_fix_spurious_fault would suffice, but
> > > > > > :        * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
> > > > > > :        */
> > > > > > : -     flush_tlb_page(vma, address);
> > > > > > : +     local_flush_tlb_page(address);
> > > > > > :  }
> > > > > > :
> > > > > > :  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> > > > > >
> > > > > I tested your above proposed changes and I am no longer seeing an
> > > > > issue on my platform.
> > > >
> > > > Great. I will send a fixup after I double-check on several other
> > > > hardware platforms. Thanks for testing !
> > > >
> > > Actuall, I did hit an issue now with your proposed changes with bonnie++ again!
> > >
> > > [ 1873.355279] EXT4-fs (sda1): mounted filesystem
> > > 050ad3b9-b571-4b4c-9500-db56feed01ab with ordered data mode. Quota
> > > mode: disabled.
> > > Using uid:0, gid:0.
> > > Writing with putc()...done
> > > Writing intelligently...done
> > > Rewriting...done
> > > Reading with getc()...[ 2682.180114] sd-resolve[126]: unhandled signal
> > > 11 code 0x1 at 0x0000000000000000 in libc-2.28.so[3fbe503000+ff000]
> > > [ 2682.190552] CPU: 0 PID: 126 Comm: sd-resolve Not tainted
> > > 6.2.0-rc1-00111-g7bcd7d932cf6 #189
> > > [ 2682.198917] Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> > > [ 2682.205536] epc : 0000003fbe568004 ra : 0000003fbe569440 sp :
> > > 0000003fbe417b10
> > > [ 2682.212770]  gp : 0000002ab8d18b88 tp : 0000003fbe41e810 t0 :
> > > 0000000000000022
> > > [ 2682.219987]  t1 : 0000003fbdbf1e4c t2 : 0000003fbe417290 s0 :
> > > 00000000000000e0
> > > [ 2682.227219]  s1 : 0000000000000000 a0 : 0000000000000000 a1 :
> > > 8080808080808080
> > > [ 2682.234433]  a2 : 0000003fb80019f0 a3 : 0000003fbe41e8f0 a4 :
> > > 0000000000000008
> > > [ 2682.241648]  a5 : 0000000000000000 a6 : fefefefefefefeff a7 :
> > > 0000000000000039
> > > [ 2682.248863]  s2 : 0000003fbe417b37 s3 : 0000003fbe417e48 s4 :
> > > 0000000000000001
> > > [ 2682.256082]  s5 : 0000003fbe417eb0 s6 : 00000000000000e0 s7 :
> > > 0000003fbdbf2f2a
> > > [ 2682.263297]  s8 : ffffffffffffffff s9 : fffffffffffffffd s10:
> > > ffffffffffffffff
> > > [ 2682.270511]  s11: fffffffffffffffe t3 : 0000000000000000 t4 :
> > > 000000077ce2ea37
> > > [ 2682.277733]  t5 : 000000000000003f t6 : 0000000000000000
> > > [ 2682.283046] status: 0000000200004020 badaddr: 0000000000000000
> > > cause: 000000000000000d
> > > [ 2682.441183] audit: type=1701 audit(1671222620.403:17):
> > > auid=4294967295 uid=995 gid=994 ses=4294967295 pid=124
> > > comm="sd-resolve" exe="/lib/systemd/systemd-timesyncd" sig=11 res=1
> > > done
> > > Reading intelligently...done
> > >
> > > Let me know if you want me to share my branch.
> >
> > Hmmm... I assume you hit the issue after you applied suggested partial
> > revert for the update_mmu_cache function. If so, then your issue is not
> > related to the the remaining part (deferred local_flush_tlb_all_asid)
> > of the commit 4bd1d80efb5a. This is because that flush is not executed
> > on a single-core system, even if CONFIG_SMP is enabled.
> >
> > Could you please run your tests for a while on your branch with reverted
> > commit 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache
> > updates"). It looks like the change in the function update_mmu_cache
> > from the commit 4bd1d80efb5a increases the probability of crash, but
> > there is also something else in the new kernel that bites your system.
> >
> > Yes, could you please share your branch. I will take a look and run
> > some tests on different boards that I have at hand.
> >
> Do you have any further updates on this?

Yes, I reproduced a similar issue on some of my boards. And the root
cause was in the same line in update_mmu_cache that I suggested you to
revert. So I have been troubleshooting on available hardware. I plan
to send a partial revert and alternative fix for local_flush_tlb_page
in update_mmu_cache.

Btw, did you have a chance to run more tests with reverted commit
4bd1d80efb5a ?

Regards,
Sergey
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index cedcf8ea3c76..86670a1b4ffd 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -20,6 +20,8 @@  typedef struct {
 #ifdef CONFIG_SMP
 	/* A local icache flush is needed before user execution can resume. */
 	cpumask_t icache_stale_mask;
+	/* A local tlb flush is needed before user execution can resume. */
+	cpumask_t tlb_stale_mask;
 #endif
 } mm_context_t;
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7ec936910a96..330f75fe1278 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -415,7 +415,7 @@  static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
 	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
 	 */
-	local_flush_tlb_page(address);
+	flush_tlb_page(vma, address);
 }
 
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 801019381dea..907b9efd39a8 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -22,6 +22,24 @@  static inline void local_flush_tlb_page(unsigned long addr)
 {
 	ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
 }
+
+static inline void local_flush_tlb_all_asid(unsigned long asid)
+{
+	__asm__ __volatile__ ("sfence.vma x0, %0"
+			:
+			: "r" (asid)
+			: "memory");
+}
+
+static inline void local_flush_tlb_page_asid(unsigned long addr,
+		unsigned long asid)
+{
+	__asm__ __volatile__ ("sfence.vma %0, %1"
+			:
+			: "r" (addr), "r" (asid)
+			: "memory");
+}
+
 #else /* CONFIG_MMU */
 #define local_flush_tlb_all()			do { } while (0)
 #define local_flush_tlb_page(addr)		do { } while (0)
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 7acbfbd14557..80ce9caba8d2 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -196,6 +196,16 @@  static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
 
 	if (need_flush_tlb)
 		local_flush_tlb_all();
+#ifdef CONFIG_SMP
+	else {
+		cpumask_t *mask = &mm->context.tlb_stale_mask;
+
+		if (cpumask_test_cpu(cpu, mask)) {
+			cpumask_clear_cpu(cpu, mask);
+			local_flush_tlb_all_asid(cntx & asid_mask);
+		}
+	}
+#endif
 }
 
 static void set_mm_noasid(struct mm_struct *mm)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 37ed760d007c..ce7dfc81bb3f 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -5,23 +5,7 @@ 
 #include <linux/sched.h>
 #include <asm/sbi.h>
 #include <asm/mmu_context.h>
-
-static inline void local_flush_tlb_all_asid(unsigned long asid)
-{
-	__asm__ __volatile__ ("sfence.vma x0, %0"
-			:
-			: "r" (asid)
-			: "memory");
-}
-
-static inline void local_flush_tlb_page_asid(unsigned long addr,
-		unsigned long asid)
-{
-	__asm__ __volatile__ ("sfence.vma %0, %1"
-			:
-			: "r" (addr), "r" (asid)
-			: "memory");
-}
+#include <asm/tlbflush.h>
 
 void flush_tlb_all(void)
 {
@@ -31,6 +15,7 @@  void flush_tlb_all(void)
 static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 				  unsigned long size, unsigned long stride)
 {
+	struct cpumask *pmask = &mm->context.tlb_stale_mask;
 	struct cpumask *cmask = mm_cpumask(mm);
 	unsigned int cpuid;
 	bool broadcast;
@@ -44,6 +29,15 @@  static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 	if (static_branch_unlikely(&use_asid_allocator)) {
 		unsigned long asid = atomic_long_read(&mm->context.id);
 
+		/*
+		 * TLB will be immediately flushed on harts concurrently
+		 * executing this MM context. TLB flush on other harts
+		 * is deferred until this MM context migrates there.
+		 */
+		cpumask_setall(pmask);
+		cpumask_clear_cpu(cpuid, pmask);
+		cpumask_andnot(pmask, pmask, cmask);
+
 		if (broadcast) {
 			sbi_remote_sfence_vma_asid(cmask, start, size, asid);
 		} else if (size <= stride) {