Message ID | 20230228213738.272178-19-willy@infradead.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | None | expand |
On 2023-02-28 4:37 p.m., Matthew Wilcox (Oracle) wrote: > Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() > and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag > from being per-page to per-folio. I have tested this change on rp3440 at mainline commit e492250d5252635b6c97d52eddf2792ec26f1ec1 and c8000 at mainline commit ee3f96b164688dae21e2466a57f2e806b64e8a37. So far, I haven't seen an issues on c8000. On rp3440, I saw the following: _swap_info_get: Unused swap offset entry 00000320 BUG: Bad page map in process buildd pte:00032100 pmd:003606c3 addr:0000000000482000 vm_flags:00100077 anon_vma:0000000066f61340 mapping:0000000000000000 index:482 file:(null) fault:0x0 mmap:0x0 read_folio:0x0 CPU: 0 PID: 6813 Comm: buildd Not tainted 6.2.0+ #1 Hardware name: 9000/800/rp3440 Backtrace: [<000000004020af50>] show_stack+0x70/0x90 [<0000000040b7d408>] dump_stack_lvl+0xd8/0x128 [<0000000040b7d48c>] dump_stack+0x34/0x48 [<00000000404513a4>] print_bad_pte+0x24c/0x318 [<00000000404560dc>] zap_pte_range+0x8d4/0x958 [<0000000040456398>] unmap_page_range+0x1d8/0x490 [<000000004045681c>] unmap_vmas+0x10c/0x1a8 [<0000000040466330>] exit_mmap+0x198/0x4a0 [<0000000040235cbc>] mmput+0x114/0x2a8 [<0000000040244e90>] do_exit+0x4e0/0xc68 [<0000000040245938>] do_group_exit+0x68/0x128 [<000000004025967c>] get_signal+0xae4/0xb60 [<000000004021a570>] do_signal+0x50/0x228 [<000000004021ab38>] do_notify_resume+0x68/0x150 [<00000000402030b4>] intr_check_sig+0x38/0x3c Disabling lock debugging due to kernel taint _swap_info_get: Unused swap offset entry 000003a9 BUG: Bad page map in process buildd pte:0003a940 pmd:003606c3 addr:0000000000523000 vm_flags:00100077 anon_vma:0000000066f61340 mapping:0000000000000000 index:523 file:(null) fault:0x0 mmap:0x0 read_folio:0x0 CPU: 2 PID: 6813 Comm: buildd Tainted: G B 6.2.0+ #1 Hardware name: 9000/800/rp3440 Backtrace: [<000000004020af50>] show_stack+0x70/0x90 [<0000000040b7d408>] dump_stack_lvl+0xd8/0x128 [<0000000040b7d48c>] dump_stack+0x34/0x48 [<00000000404513a4>] print_bad_pte+0x24c/0x318 [<00000000404560dc>] zap_pte_range+0x8d4/0x958 [<0000000040456398>] unmap_page_range+0x1d8/0x490 [<000000004045681c>] unmap_vmas+0x10c/0x1a8 [<0000000040466330>] exit_mmap+0x198/0x4a0 [<0000000040235cbc>] mmput+0x114/0x2a8 [<0000000040244e90>] do_exit+0x4e0/0xc68 [<0000000040245938>] do_group_exit+0x68/0x128 [<000000004025967c>] get_signal+0xae4/0xb60 [<000000004021a570>] do_signal+0x50/0x228 [<000000004021ab38>] do_notify_resume+0x68/0x150 [<00000000402030b4>] intr_check_sig+0x38/0x3c [...] pagefault_out_of_memory: 1158973 callbacks suppressed Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF Rebooted rp3440. Since then, I haven't seen any more problems. Dave
On 2023-03-02 11:43 a.m., John David Anglin wrote: > On 2023-02-28 4:37 p.m., Matthew Wilcox (Oracle) wrote: >> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() >> and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag >> from being per-page to per-folio. > I have tested this change on rp3440 at mainline commit e492250d5252635b6c97d52eddf2792ec26f1ec1 > and c8000 at mainline commit ee3f96b164688dae21e2466a57f2e806b64e8a37. Here's another one: ------------[ cut here ]------------ kernel BUG at mm/memory.c:3865! CPU: 1 PID: 6972 Comm: sbuild Not tainted 6.2.0+ #1 Hardware name: 9000/800/rp3440 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001101111111100001111 Not tainted r00-03 000000000806ff0f 000000004fab8d40 00000000404584b0 000000004fab8d40 r04-07 0000000040c2f4c0 0000000047fe60c0 000000004fab8b98 0000000000000953 r08-11 000000004de3de00 0000000000000000 0000000047fe60c0 0000004093ff4660 r12-15 0000000000000001 0000000047fe60c0 0000000040000540 000000022f8e9540 r16-19 0000000000000000 000000004c694c40 000000004fab8860 00000000000003d0 r20-23 0000000007be3a40 0000000000000fff 0000000000000000 000000004109f1a0 r24-27 0000000000000000 0000000000000cc0 0000000046de3a68 0000000040c2f4c0 r28-31 80e00000000a0435 000000004fab8df0 000000004fab8e20 0000000000000001 sr00-03 0000000000207c00 0000000000000000 0000000000000000 0000000002f11c00 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004045908c 0000000040459090 IIR: 03ffe01f ISR: 0000000000000000 IOR: 0000000000000000 CPU: 1 CR30: 0000004095d64c20 CR31: ffffffffffffffff ORIG_R28: 000000001c569ad0 IAOQ[0]: do_swap_page+0x108c/0x1168 IAOQ[1]: do_swap_page+0x1090/0x1168 RP(r2): do_swap_page+0x4b0/0x1168 Backtrace: [<000000004045a554>] handle_pte_fault+0x244/0x358 [<000000004045c58c>] __handle_mm_fault+0x104/0x1b8 [<000000004045c81c>] handle_mm_fault+0x1dc/0x318 [<000000004044cb38>] faultin_page+0xa8/0x178 [<000000004044e848>] __get_user_pages+0x328/0x560 [<0000000040450ac4>] get_dump_page+0x9c/0x128 [<0000000040596cb8>] dump_user_range+0xc0/0x2d8 [<000000004058e790>] elf_core_dump+0x5f8/0x708 [<0000000040596384>] do_coredump+0xc2c/0x14a0 [<0000000040259040>] get_signal+0x4a8/0xb60 [<000000004021a570>] do_signal+0x50/0x228 [<000000004021ab38>] do_notify_resume+0x68/0x150 [<0000000040203ee0>] syscall_do_signal+0x54/0xa0 CPU: 1 PID: 6972 Comm: sbuild Not tainted 6.2.0+ #1 Hardware name: 9000/800/rp3440 Backtrace: [<000000004020af50>] show_stack+0x70/0x90 [<0000000040b7d408>] dump_stack_lvl+0xd8/0x128 [<0000000040b7d48c>] dump_stack+0x34/0x48 [<000000004020b160>] die_if_kernel+0x1d0/0x388 [<000000004020c1c4>] handle_interruption+0xc34/0xc88 [<000000004020307c>] intr_check_sig+0x0/0x3c ---[ end trace 0000000000000000 ]--- note: sbuild[6972] exited with preempt_count 1 Dave
On 2023-03-02 3:40 p.m., John David Anglin wrote: > On 2023-03-02 11:43 a.m., John David Anglin wrote: >> On 2023-02-28 4:37 p.m., Matthew Wilcox (Oracle) wrote: >>> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() >>> and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag >>> from being per-page to per-folio. >> I have tested this change on rp3440 at mainline commit e492250d5252635b6c97d52eddf2792ec26f1ec1 >> and c8000 at mainline commit ee3f96b164688dae21e2466a57f2e806b64e8a37. > Here's another one: > > ------------[ cut here ]------------ > kernel BUG at mm/memory.c:3865! > CPU: 1 PID: 6972 Comm: sbuild Not tainted 6.2.0+ #1 > Hardware name: 9000/800/rp3440 > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > PSW: 00001000000001101111111100001111 Not tainted > r00-03 000000000806ff0f 000000004fab8d40 00000000404584b0 000000004fab8d40 > r04-07 0000000040c2f4c0 0000000047fe60c0 000000004fab8b98 0000000000000953 > r08-11 000000004de3de00 0000000000000000 0000000047fe60c0 0000004093ff4660 > r12-15 0000000000000001 0000000047fe60c0 0000000040000540 000000022f8e9540 > r16-19 0000000000000000 000000004c694c40 000000004fab8860 00000000000003d0 > r20-23 0000000007be3a40 0000000000000fff 0000000000000000 000000004109f1a0 > r24-27 0000000000000000 0000000000000cc0 0000000046de3a68 0000000040c2f4c0 > r28-31 80e00000000a0435 000000004fab8df0 000000004fab8e20 0000000000000001 > sr00-03 0000000000207c00 0000000000000000 0000000000000000 0000000002f11c00 > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004045908c 0000000040459090 > IIR: 03ffe01f ISR: 0000000000000000 IOR: 0000000000000000 > CPU: 1 CR30: 0000004095d64c20 CR31: ffffffffffffffff > ORIG_R28: 000000001c569ad0 > IAOQ[0]: do_swap_page+0x108c/0x1168 > IAOQ[1]: do_swap_page+0x1090/0x1168 > RP(r2): do_swap_page+0x4b0/0x1168 > Backtrace: > [<000000004045a554>] handle_pte_fault+0x244/0x358 > [<000000004045c58c>] __handle_mm_fault+0x104/0x1b8 > [<000000004045c81c>] handle_mm_fault+0x1dc/0x318 > [<000000004044cb38>] faultin_page+0xa8/0x178 > [<000000004044e848>] __get_user_pages+0x328/0x560 > [<0000000040450ac4>] get_dump_page+0x9c/0x128 > [<0000000040596cb8>] dump_user_range+0xc0/0x2d8 > [<000000004058e790>] elf_core_dump+0x5f8/0x708 > [<0000000040596384>] do_coredump+0xc2c/0x14a0 > [<0000000040259040>] get_signal+0x4a8/0xb60 > [<000000004021a570>] do_signal+0x50/0x228 > [<000000004021ab38>] do_notify_resume+0x68/0x150 > [<0000000040203ee0>] syscall_do_signal+0x54/0xa0 Removed new page table API change and still see a swap issue on rp3440. So, these bugs are probably unrelated to the API change. get_swap_device: Bad swap file entry 600000000014ee20 get_swap_device: Bad swap file entry 600000000014ee20 [...] get_swap_device: Bad swap file entry 600000000014ee20 _swap_info_get: Bad swap file entry 600000000014ee20 BUG: Bad page map in process sh pte:14ee2418 pmd:01372913 addr:00000000f8406000 vm_flags:00000075 anon_vma:0000000000000000 mapping:000000007f67e1a8 index:25 file:libc.so.6 fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs] CPU: 3 PID: 12702 Comm: sh Not tainted 6.2.0+ #1 Hardware name: 9000/800/rp3440 Backtrace: [<000000004020ac50>] show_stack+0x70/0x90 [<0000000040b7c148>] dump_stack_lvl+0xd8/0x128 [<0000000040b7c1cc>] dump_stack+0x34/0x48 [<000000004045020c>] print_bad_pte+0x24c/0x318 [<0000000040454f78>] zap_pte_range+0x908/0x990 [<0000000040455238>] unmap_page_range+0x1d8/0x490 [<00000000404556bc>] unmap_vmas+0x10c/0x1a8 [<0000000040465278>] exit_mmap+0x198/0x4a0 [<0000000040234a3c>] mmput+0x114/0x2a8 [<0000000040243c10>] do_exit+0x4e0/0xc68 [<00000000402446b8>] do_group_exit+0x68/0x128 [<00000000402583fc>] get_signal+0xae4/0xb60 [<0000000040219310>] do_signal+0x50/0x228 [<00000000402198d8>] do_notify_resume+0x68/0x150 [<00000000402030b4>] intr_check_sig+0x38/0x3c Dave
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index ff07c509e04b..0bf8b69d086b 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -46,16 +46,20 @@ void invalidate_kernel_vmap_range(void *vaddr, int size); #define flush_cache_vmap(start, end) flush_cache_all() #define flush_cache_vunmap(start, end) flush_cache_all() +void flush_dcache_folio(struct folio *folio); +#define flush_dcache_folio flush_dcache_folio #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1 -void flush_dcache_page(struct page *page); +static inline void flush_dcache_page(struct page *page) +{ + flush_dcache_folio(page_folio(page)); +} #define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages) #define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages) -#define flush_icache_page(vma,page) do { \ - flush_kernel_dcache_page_addr(page_address(page)); \ - flush_kernel_icache_page(page_address(page)); \ -} while (0) +void flush_icache_pages(struct vm_area_struct *vma, struct page *page, + unsigned int nr); +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1) #define flush_icache_range(s,e) do { \ flush_kernel_dcache_range_asm(s,e); \ diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index e2950f5db7c9..78ee9816f423 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -73,14 +73,7 @@ extern void __update_cache(pte_t pte); mb(); \ } while(0) -#define set_pte_at(mm, addr, pteptr, pteval) \ - do { \ - if (pte_present(pteval) && \ - pte_user(pteval)) \ - __update_cache(pteval); \ - *(pteptr) = (pteval); \ - purge_tlb_entries(mm, addr); \ - } while (0) +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #endif /* !__ASSEMBLY__ */ @@ -391,11 +384,28 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) extern void paging_init (void); +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + if (pte_present(pte) && pte_user(pte)) + __update_cache(pte); + for (;;) { + *ptep = pte; + purge_tlb_entries(mm, addr); + if (--nr == 0) + break; + ptep++; + pte_val(pte) += 1 << PFN_PTE_SHIFT; + addr += PAGE_SIZE; + } +} + /* Used for deferring calls to flush_dcache_page() */ #define PG_dcache_dirty PG_arch_1 -#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep) +#define update_mmu_cache_range(vma, addr, ptep, nr) __update_cache(*ptep) +#define update_mmu_cache(vma, addr, ptep) __update_cache(*ptep) /* * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index 984d3a1b3828..16057812103b 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -92,11 +92,11 @@ static inline void flush_data_cache(void) /* Kernel virtual address of pfn. */ #define pfn_va(pfn) __va(PFN_PHYS(pfn)) -void -__update_cache(pte_t pte) +void __update_cache(pte_t pte) { unsigned long pfn = pte_pfn(pte); - struct page *page; + struct folio *folio; + unsigned int nr; /* We don't have pte special. As a result, we can be called with an invalid pfn and we don't need to flush the kernel dcache page. @@ -104,13 +104,17 @@ __update_cache(pte_t pte) if (!pfn_valid(pfn)) return; - page = pfn_to_page(pfn); - if (page_mapping_file(page) && - test_bit(PG_dcache_dirty, &page->flags)) { - flush_kernel_dcache_page_addr(pfn_va(pfn)); - clear_bit(PG_dcache_dirty, &page->flags); + folio = page_folio(pfn_to_page(pfn)); + pfn = folio_pfn(folio); + nr = folio_nr_pages(folio); + if (folio_flush_mapping(folio) && + test_bit(PG_dcache_dirty, &folio->flags)) { + while (nr--) + flush_kernel_dcache_page_addr(pfn_va(pfn + nr)); + clear_bit(PG_dcache_dirty, &folio->flags); } else if (parisc_requires_coherency()) - flush_kernel_dcache_page_addr(pfn_va(pfn)); + while (nr--) + flush_kernel_dcache_page_addr(pfn_va(pfn + nr)); } void @@ -365,6 +369,20 @@ static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmad preempt_enable(); } +void flush_icache_pages(struct vm_area_struct *vma, struct page *page, + unsigned int nr) +{ + void *kaddr = page_address(page); + + for (;;) { + flush_kernel_dcache_page_addr(kaddr); + flush_kernel_icache_page(kaddr); + if (--nr == 0) + break; + page += PAGE_SIZE; + } +} + static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr) { pte_t *ptep = NULL; @@ -393,26 +411,30 @@ static inline bool pte_needs_flush(pte_t pte) == (_PAGE_PRESENT | _PAGE_ACCESSED); } -void flush_dcache_page(struct page *page) +void flush_dcache_folio(struct folio *folio) { - struct address_space *mapping = page_mapping_file(page); - struct vm_area_struct *mpnt; - unsigned long offset; + struct address_space *mapping = folio_flush_mapping(folio); + struct vm_area_struct *vma; unsigned long addr, old_addr = 0; + void *kaddr; unsigned long count = 0; + unsigned long i, nr; pgoff_t pgoff; if (mapping && !mapping_mapped(mapping)) { - set_bit(PG_dcache_dirty, &page->flags); + set_bit(PG_dcache_dirty, &folio->flags); return; } - flush_kernel_dcache_page_addr(page_address(page)); + nr = folio_nr_pages(folio); + kaddr = folio_address(folio); + for (i = 0; i < nr; i++) + flush_kernel_dcache_page_addr(kaddr + i * PAGE_SIZE); if (!mapping) return; - pgoff = page->index; + pgoff = folio->index; /* * We have carefully arranged in arch_get_unmapped_area() that @@ -422,15 +444,29 @@ void flush_dcache_page(struct page *page) * on machines that support equivalent aliasing */ flush_dcache_mmap_lock(mapping); - vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { - offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; - addr = mpnt->vm_start + offset; - if (parisc_requires_coherency()) { - pte_t *ptep; + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) { + unsigned long offset = pgoff - vma->vm_pgoff; + unsigned long pfn = folio_pfn(folio); + + addr = vma->vm_start; + nr = folio_nr_pages(folio); + if (offset > -nr) { + pfn -= offset; + nr += offset; + } else { + addr += offset * PAGE_SIZE; + } + if (addr + nr * PAGE_SIZE > vma->vm_end) + nr = (vma->vm_end - addr) / PAGE_SIZE; - ptep = get_ptep(mpnt->vm_mm, addr); - if (ptep && pte_needs_flush(*ptep)) - flush_user_cache_page(mpnt, addr); + if (parisc_requires_coherency()) { + for (i = 0; i < nr; i++) { + pte_t *ptep = get_ptep(vma->vm_mm, + addr + i * PAGE_SIZE); + if (ptep && pte_needs_flush(*ptep)) + flush_user_cache_page(vma, + addr + i * PAGE_SIZE); + } } else { /* * The TLB is the engine of coherence on parisc: @@ -443,27 +479,32 @@ void flush_dcache_page(struct page *page) * in (until the user or kernel specifically * accesses it, of course) */ - flush_tlb_page(mpnt, addr); + for (i = 0; i < nr; i++) + flush_tlb_page(vma, addr + i * PAGE_SIZE); if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1)) != (addr & (SHM_COLOUR - 1))) { - __flush_cache_page(mpnt, addr, page_to_phys(page)); + for (i = 0; i < nr; i++) + __flush_cache_page(vma, + addr + i * PAGE_SIZE, + (pfn + i) * PAGE_SIZE); /* * Software is allowed to have any number * of private mappings to a page. */ - if (!(mpnt->vm_flags & VM_SHARED)) + if (!(vma->vm_flags & VM_SHARED)) continue; if (old_addr) pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n", - old_addr, addr, mpnt->vm_file); - old_addr = addr; + old_addr, addr, vma->vm_file); + if (nr == folio_nr_pages(folio)) + old_addr = addr; } } WARN_ON(++count == 4096); } flush_dcache_mmap_unlock(mapping); } -EXPORT_SYMBOL(flush_dcache_page); +EXPORT_SYMBOL(flush_dcache_folio); /* Defined in arch/parisc/kernel/pacache.S */ EXPORT_SYMBOL(flush_kernel_dcache_range_asm);
Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to per-folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-parisc@vger.kernel.org --- arch/parisc/include/asm/cacheflush.h | 14 ++-- arch/parisc/include/asm/pgtable.h | 28 +++++--- arch/parisc/kernel/cache.c | 101 +++++++++++++++++++-------- 3 files changed, 99 insertions(+), 44 deletions(-)