Message ID | addfcb3-b5f4-976e-e050-a2508e589cfe@google.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | None | expand |
On Thu, Jun 15, 2023 at 04:02:43PM -0700, Hugh Dickins wrote: > To keep balance in future, __update_tlb() remember to pte_unmap() after > pte_offset_map(). This is an odd case, since the caller has already done > pte_offset_map_lock(), then mips forgets the address and recalculates it; > but my two naive attempts to clean that up did more harm than good. > > Tested-by: Nathan Chancellor <nathan@kernel.org> > Signed-off-by: Hugh Dickins <hughd@google.com> FWIW: Tested-by: Yu Zhao <yuzhao@google.com> There is another problem, likely caused by khugepaged, happened multiple times. But I don't think it's related to your series, just FYI. Got mcheck at ffffffff81134ef0 CPU: 3 PID: 36 Comm: khugepaged Not tainted 6.4.0-rc6-00049-g62d8779610bb-dirty #1 $ 0 : 0000000000000000 0000000000000014 40000000011ac004 4000000000000000 $ 4 : c000000000000000 0000000000000045 000000011a80045b 000000011a80045b $ 8 : 8000000080188000 ffffffff81b526c0 0000000000000200 0000000000000000 $12 : 0000000000000028 ffffffff81910cb4 0000000000000000 0000000000000207 $16 : 000000aaab800000 80000000037ee990 ffffffff81b50200 8000000005066ae0 $20 : 0000000000000001 ffffffff80000000 ffffffff81c10000 000000aaab800000 $24 : 0000000000000002 ffffffff812b75f8 $28 : 8000000002310000 8000000002313b00 ffffffff81b50000 ffffffff81134d88 Hi : 000000000000017a Lo : 0000000000000000 epc : ffffffff81134ef0 __update_tlb+0x260/0x2a0 ra : ffffffff81134d88 __update_tlb+0xf8/0x2a0 Status: 14309ce2 KX SX UX KERNEL EXL Cause : 00800060 (ExcCode 18) PrId : 000d9602 (Cavium Octeon III) CPU: 3 PID: 36 Comm: khugepaged Not tainted 6.4.0-rc6-00049-g62d8779610bb-dirty #1 Stack : 0000000000000001 0000000000000000 0000000000000008 8000000002313768 8000000002313768 80000000023138f8 0000000000000000 0000000000000000 a6c8cd76e1667e00 8000000001db4f28 0000000000000001 30302d3663722d30 643236672d393430 0000000000000010 ffffffff81910cc0 0000000000000000 8000000001d96bcc 0000000000000000 0000000000000000 ffffffff81a68ed0 ffffffff81b50000 0000000000000001 ffffffff80000000 ffffffff81c10000 000000aaab800000 0000000000000002 ffffffff815b78c0 ffffffffa184e710 00000000000000c0 8000000002310000 8000000002313760 ffffffff81b50000 ffffffff8111c9cc 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffffffff8111c9ec 0000000000000000 ... Call Trace: [<ffffffff8111c9ec>] show_stack+0x64/0x158 [<ffffffff81920078>] dump_stack_lvl+0x5c/0x7c [<ffffffff8111e03c>] do_mcheck+0x2c/0x98 [<ffffffff81118608>] handle_mcheck_int+0x38/0x50 Index : 80000000 PageMask : 1fe000 EntryHi : 000000aaab8000bd EntryLo0 : 40000000011a8004 EntryLo1 : 40000000011ac004 Wired : 0 PageGrain: e8000000 Index: 2 pgmask=4kb va=c00000feffff4000 asid=b9 [ri=0 xi=0 pa=000022a7000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000022af000 c=0 d=1 v=1 g=1] Index: 3 pgmask=4kb va=c00000feffff8000 asid=b9 [ri=0 xi=0 pa=00002380000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002381000 c=0 d=1 v=1 g=1] Index: 4 pgmask=4kb va=c00000feffffa000 asid=b9 [ri=0 xi=0 pa=000023e9000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000023ea000 c=0 d=1 v=1 g=1] Index: 5 pgmask=4kb va=c00000feffffe000 asid=b9 [ri=0 xi=0 pa=00002881000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002882000 c=0 d=1 v=1 g=1] Index: 6 pgmask=4kb va=c00000fefffb0000 asid=b9 [ri=0 xi=0 pa=00002cc2000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002cc3000 c=0 d=1 v=1 g=1] Index: 7 pgmask=4kb va=c00000feffffc000 asid=b9 [ri=0 xi=0 pa=000023eb000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002880000 c=0 d=1 v=1 g=1] Index: 8 pgmask=4kb va=c00000feffff6000 asid=b9 [ri=0 xi=0 pa=0000237e000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000237f000 c=0 d=1 v=1 g=1] Index: 14 pgmask=4kb va=c00000fefff62000 asid=8e [ri=0 xi=0 pa=00007477000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000745e000 c=0 d=1 v=1 g=1] Index: 15 pgmask=4kb va=c00000fefff52000 asid=8e [ri=0 xi=0 pa=0000744c000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000616d000 c=0 d=1 v=1 g=1] Index: 16 pgmask=4kb va=c00000fefff42000 asid=8e [ri=0 xi=0 pa=00006334000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000616b000 c=0 d=1 v=1 g=1] Index: 19 pgmask=4kb va=c00000fefffb6000 asid=8e [ri=0 xi=0 pa=00005050000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00005051000 c=0 d=1 v=1 g=1] Index: 20 pgmask=4kb va=c00000fefff72000 asid=b9 [ri=0 xi=0 pa=00007504000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00007503000 c=0 d=1 v=1 g=1] Index: 58 pgmask=4kb va=c00000fefffaa000 asid=8e [ri=0 xi=0 pa=00005126000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00005127000 c=0 d=1 v=1 g=1] Index: 59 pgmask=4kb va=c00000fefffba000 asid=8e [ri=0 xi=0 pa=00005129000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000512a000 c=0 d=1 v=1 g=1] Index: 79 pgmask=4kb va=c000000000060000 asid=8e [ri=0 xi=0 pa=0000534b000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000062f9000 c=0 d=1 v=1 g=1] Index: 80 pgmask=4kb va=c00000000005e000 asid=8e [ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=1] [ri=0 xi=0 pa=00004013000 c=0 d=1 v=1 g=1] Index: 81 pgmask=4kb va=c0000000003a0000 asid=8e [ri=0 xi=0 pa=000060c6000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=0000340e000 c=0 d=1 v=1 g=1] Index: 82 pgmask=4kb va=c00000000039e000 asid=8e [ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=1] [ri=0 xi=0 pa=000060c5000 c=0 d=1 v=1 g=1] Index: 83 pgmask=4kb va=c00000000003e000 asid=8e [ri=0 xi=0 pa=00002bf3000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002c42000 c=0 d=1 v=1 g=1] Index: 84 pgmask=4kb va=c000000000042000 asid=8e [ri=0 xi=0 pa=00002c45000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002c46000 c=0 d=1 v=1 g=1] Index: 85 pgmask=4kb va=0aaab820000 asid=bd [ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] Index: 86 pgmask=4kb va=0aaab748000 asid=bd [ri=0 xi=1 pa=0003c959000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0000f7b6000 c=0 d=0 v=1 g=0] Index: 87 pgmask=4kb va=0fff37c4000 asid=bd [ri=0 xi=0 pa=0000bd23000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=0000bd24000 c=0 d=0 v=1 g=0] Index: 88 pgmask=4kb va=0fff3992000 asid=bd [ri=0 xi=1 pa=0000bfcd000 c=0 d=0 v=1 g=0] [ri=0 xi=1 pa=0002977b000 c=0 d=0 v=1 g=0] Index: 89 pgmask=4kb va=0fff3288000 asid=bd [ri=0 xi=0 pa=00032b62000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=00032b63000 c=0 d=0 v=1 g=0] Index: 90 pgmask=4kb va=0fff3982000 asid=bd [ri=0 xi=1 pa=0002d6a3000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0002a423000 c=0 d=0 v=1 g=0] Index: 91 pgmask=4kb va=0fffbb5e000 asid=bd [ri=0 xi=0 pa=00028949000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=00035060000 c=0 d=1 v=1 g=0] Index: 92 pgmask=4kb va=c00000fefffe2000 asid=8e [ri=0 xi=0 pa=000020f0000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000020ff000 c=0 d=1 v=1 g=1] Index: 93 pgmask=4kb va=c00000fefffd2000 asid=8e [ri=0 xi=0 pa=000020b7000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000020fe000 c=0 d=1 v=1 g=1] Index: 94 pgmask=4kb va=c00000fefffc2000 asid=8e [ri=0 xi=0 pa=000020b6000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000020fd000 c=0 d=1 v=1 g=1] Index: 110 pgmask=4kb va=c00000feffff2000 asid=bc [ri=0 xi=0 pa=000020f1000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00002100000 c=0 d=1 v=1 g=1] Index: 125 pgmask=4kb va=c00000fefffbe000 asid=bc [ri=0 xi=0 pa=00005268000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=000053dc000 c=0 d=1 v=1 g=1] Index: 126 pgmask=4kb va=c00000fefffbc000 asid=bc [ri=0 xi=0 pa=00005266000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00005267000 c=0 d=1 v=1 g=1] Index: 188 pgmask=4kb va=c00000fefff76000 asid=bb [ri=0 xi=0 pa=00007576000 c=0 d=1 v=1 g=1] [ri=0 xi=0 pa=00007577000 c=0 d=1 v=1 g=1] Code: 1000ff92 00601025 00000000 <42000006> 1000ffb8 00000000 00000000 8f820018 00021238 Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB. ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB. ]---
On Fri, Jun 16, 2023 at 9:54 PM Yu Zhao <yuzhao@google.com> wrote: > > On Thu, Jun 15, 2023 at 04:02:43PM -0700, Hugh Dickins wrote: > > To keep balance in future, __update_tlb() remember to pte_unmap() after > > pte_offset_map(). This is an odd case, since the caller has already done > > pte_offset_map_lock(), then mips forgets the address and recalculates it; > > but my two naive attempts to clean that up did more harm than good. > > > > Tested-by: Nathan Chancellor <nathan@kernel.org> > > Signed-off-by: Hugh Dickins <hughd@google.com> > > FWIW: Tested-by: Yu Zhao <yuzhao@google.com> > > There is another problem, likely caused by khugepaged, happened multiple times. But I don't think it's related to your series, just FYI. > > Got mcheck at ffffffff81134ef0 > CPU: 3 PID: 36 Comm: khugepaged Not tainted 6.4.0-rc6-00049-g62d8779610bb-dirty #1 ... > Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB. In case anyone plans to try to fix this - the problem goes back to at least 5.15 stable. My (educated) guess is that nobody complained about it because all the testing is done in QEMU, which does NOT detect conflicting TLBs. This means the verification of the fix would need to be on a real piece of h/w or an updated QEMU. In target/mips/tcg/sysemu/tlb_helper.c: static void r4k_fill_tlb(CPUMIPSState *env, int idx) { r4k_tlb_t *tlb; uint64_t mask = env->CP0_PageMask >> (TARGET_PAGE_BITS + 1); /* XXX: detect conflicting TLBs and raise a MCHECK exception when needed */ ...
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c index 1b939abbe4ca..93c2d695588a 100644 --- a/arch/mips/mm/tlb-r4k.c +++ b/arch/mips/mm/tlb-r4k.c @@ -297,7 +297,7 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte) p4d_t *p4dp; pud_t *pudp; pmd_t *pmdp; - pte_t *ptep; + pte_t *ptep, *ptemap = NULL; int idx, pid; /* @@ -344,7 +344,12 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte) } else #endif { - ptep = pte_offset_map(pmdp, address); + ptemap = ptep = pte_offset_map(pmdp, address); + /* + * update_mmu_cache() is called between pte_offset_map_lock() + * and pte_unmap_unlock(), so we can assume that ptep is not + * NULL here: and what should be done below if it were NULL? + */ #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) #ifdef CONFIG_XPA @@ -373,6 +378,9 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte) tlbw_use_hazard(); htw_start(); flush_micro_tlb_vm(vma); + + if (ptemap) + pte_unmap(ptemap); local_irq_restore(flags); }