Message ID | 20191101140942.51554-1-steven.price@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Generic page walk and ptdump | expand |
On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: > Many architectures current have a debugfs file for dumping the kernel > page tables. Currently each architecture has to implement custom > functions for this because the details of walking the page tables used > by the kernel are different between architectures. > > This series extends the capabilities of walk_page_range() so that it can > deal with the page tables of the kernel (which have no VMAs and can > contain larger huge pages than exist for user space). A generic PTDUMP > implementation is the implemented making use of the new functionality of > walk_page_range() and finally arm64 and x86 are switch to using it, > removing the custom table walkers. > > To enable a generic page table walker to walk the unusual mappings of > the kernel we need to implement a set of functions which let us know > when the walker has reached the leaf entry. After a suggestion from Will > Deacon I've chosen the name p?d_leaf() as this (hopefully) describes > the purpose (and is a new name so has no historic baggage). Some > architectures have p?d_large macros but this is easily confused with > "large pages". > > This series ends with a generic PTDUMP implemention for arm64 and x86. > > Mostly this is a clean up and there should be very little functional > change. The exceptions are: > > * arm64 PTDUMP debugfs now displays pages which aren't present (patch 22). > > * arm64 has the ability to efficiently process KASAN pages (which > previously only x86 implemented). This means that the combination of > KASAN and DEBUG_WX is now useable. > > Also available as a git tree: > git://linux-arm.org/linux-sp.git walk_page_range/v15 > > Changes since v14: > https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ > * Switch walk_page_range() into two functions, the existing > walk_page_range() now still requires VMAs (and treats areas without a > VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and > will report the actual page table layout. This fixes the previous > breakage of /proc/<pid>/pagemap > * New patch at the end of the series which reduces the 'level' numbers > by 1 to simplify the code slightly > * Added tags Does this new version also take care of this boot crash seen with v14? Suppose it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config [ 10.550313][ T0] Switched APIC routing to physical flat. [ 10.563899][ T0] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 10.614633][ T0] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fa6f481074, max_idle_ns: 440795311917 ns [ 10.625979][ T0] Calibrating delay loop (skipped), value calculated using timer frequency.. 4391.73 BogoMIPS (lpj=21958690) [ 10.635990][ T0] pid_max: default: 131072 minimum: 1024 [ 11.259736][ T0] ---[ User Space ]--- [ 11.263737][ T0] 0x0000000000000000- 0x0000000000001000 4K RW x pte [ 11.266028][ T0] 0x0000000000001000- 0x0000000000200000 2044K pte [ 11.275992][ T0] 0x0000000000200000- 0x0000000004000000 62M pmd [ 11.285998][ T0] 0x0000000004000000- 0x0000000004076000 472K pte [ 11.296019][ T0] 0x0000000004076000- 0x0000000004200000 1576K pte [ 11.305997][ T0] 0x0000000004200000- 0x0000000011000000 206M pmd [ 11.316008][ T0] 0x0000000011000000- 0x0000000011100000 1M pte [ 11.326008][ T0] 0x0000000011100000- 0x0000000011200000 1M pte [ 11.335990][ T0] 0x0000000011200000- 0x0000000011800000 6M pmd [ 11.346054][ T0] ================================================================== [ 11.354074][ T0] BUG: KASAN: wild-memory-access in ptdump_pte_entry+0x39/0x60 [ 11.355975][ T0] Read of size 8 at addr 000f887fee5ff000 by task swapper/0/0 [ 11.355975][ T0] [ 11.355975][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc5-mm1+ #1 [ 11.355975][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 11.355975][ T0] Call Trace: [ 11.355975][ T0] dump_stack+0xa0/0xea [ 11.355975][ T0] __kasan_report.cold.7+0xb0/0xc0 [ 11.355975][ T0] ? note_page+0x7f8/0xa70 [ 11.355975][ T0] ? ptdump_pte_entry+0x39/0x60 [ 11.355975][ T0] ? ptdump_walk_pgd_level_core+0x1b0/0x1b0 [ 11.355975][ T0] kasan_report+0x12/0x20 [ 11.355975][ T0] __asan_load8+0x71/0xa0 [ 11.355975][ T0] ptdump_pte_entry+0x39/0x60 [ 11.355975][ T0] walk_pgd_range+0xb75/0xce0 [ 11.355975][ T0] __walk_page_range+0x206/0x230 [ 11.355975][ T0] ? vmacache_find+0x3a/0x170 [ 11.355975][ T0] walk_page_range+0x136/0x210 [ 11.355975][ T0] ? __walk_page_range+0x230/0x230 [ 11.355975][ T0] ? find_held_lock+0xca/0xf0 [ 11.355975][ T0] ptdump_walk_pgd+0x76/0xd0 [ 11.355975][ T0] ptdump_walk_pgd_level_core+0x13b/0x1b0 [ 11.355975][ T0] ? hugetlb_get_unmapped_area+0x5b0/0x5b0 [ 11.355975][ T0] ? trace_hardirqs_on+0x3a/0x160 [ 11.355975][ T0] ? ptdump_walk_pgd_level_core+0x1b0/0x1b0 [ 11.355975][ T0] ? efi_delete_dummy_variable+0xa9/0xd0 [ 11.355975][ T0] ? __enc_copy+0x90/0x90 [ 11.355975][ T0] ptdump_walk_pgd_level+0x15/0x20 [ 11.355975][ T0] efi_dump_pagetable+0x35/0x37 [ 11.355975][ T0] efi_enter_virtual_mode+0x72a/0x737 [ 11.355975][ T0] start_kernel+0x607/0x6a9 [ 11.355975][ T0] ? thread_stack_cache_init+0xb/0xb [ 11.355975][ T0] ? idt_setup_from_table+0xd9/0x130 [ 11.355975][ T0] x86_64_start_reservations+0x24/0x26 [ 11.355975][ T0] x86_64_start_kernel+0xf4/0xfb [ 11.355975][ T0] secondary_startup_64+0xb6/0xc0 [ 11.355975][ T0] ================================================================== [ 11.355975][ T0] Disabling lock debugging due to kernel taint [ 11.355991][ T0] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 11.364049][ T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B 5.4.0-rc5-mm1+ #1 [ 11.365975][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 11.365975][ T0] RIP: 0010:ptdump_pte_entry+0x39/0x60 [ 11.365975][ T0] Code: 55 41 54 49 89 fc 48 8d 79 18 53 48 89 cb e8 5e 0e fa ff 48 8b 5b 18 48 89 df e8 52 0e fa ff 4c 89 e7 4c 8b 2b e8 47 0e fa ff <49> 8b 0c 24 4c 89 f6 48 89 df ba 05 00 00 00 e8 03 1d 9b 00 31 c0 [ 11.365975][ T0] RSP: 0000:ffffffffaf8079d0 EFLAGS: 00010282 [ 11.365975][ T0] RAX: 0000000000000000 RBX: ffffffffaf807cf0 RCX: ffffffffae374306 [ 11.365975][ T0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffffafef2bf4 [ 11.365975][ T0] RBP: ffffffffaf8079f0 R08: fffffbfff5fdbb22 R09: fffffbfff5fdbb22 [ 11.365975][ T0] R10: fffffbfff5fdbb21 R11: ffffffffafedd90b R12: 000f887fee5ff000 [ 11.365975][ T0] R13: ffffffffae2aee40 R14: 0000000011a00000 R15: 0000000011a01000 [ 11.365975][ T0] FS: 0000000000000000(0000) GS:ffff888843400000(0000) knlGS:0000000000000000 [ 11.365975][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.365975][ T0] CR2: ffff8890779ff000 CR3: 0000000baf412000 CR4: 00000000000406b0 [ 11.365975][ T0] Call Trace: [ 11.365975][ T0] walk_pgd_range+0xb75/0xce0 [ 11.365975][ T0] __walk_page_range+0x206/0x230 [ 11.365975][ T0] ? vmacache_find+0x3a/0x170 [ 11.365975][ T0] walk_page_range+0x136/0x210 [ 11.365975][ T0] ? __walk_page_range+0x230/0x230 [ 11.365975][ T0] ? find_held_lock+0xca/0xf0 [ 11.365975][ T0] ptdump_walk_pgd+0x76/0xd0 [ 11.365975][ T0] ptdump_walk_pgd_level_core+0x13b/0x1b0 [ 11.365975][ T0] ? hugetlb_get_unmapped_area+0x5b0/0x5b0 [ 11.365975][ T0] ? trace_hardirqs_on+0x3a/0x160 [ 11.365975][ T0] ? ptdump_walk_pgd_level_core+0x1b0/0x1b0 [ 11.365975][ T0] ? efi_delete_dummy_variable+0xa9/0xd0 [ 11.365975][ T0] ? __enc_copy+0x90/0x90 [ 11.365975][ T0] ptdump_walk_pgd_level+0x15/0x20 [ 11.365975][ T0] efi_dump_pagetable+0x35/0x37 [ 11.365975][ T0] efi_enter_virtual_mode+0x72a/0x737 [ 11.365975][ T0] start_kernel+0x607/0x6a9 [ 11.365975][ T0] ? thread_stack_cache_init+0xb/0xb [ 11.365975][ T0] ? idt_setup_from_table+0xd9/0x130 [ 11.365975][ T0] x86_64_start_reservations+0x24/0x26 [ 11.365975][ T0] x86_64_start_kernel+0xf4/0xfb [ 11.365975][ T0] secondary_startup_64+0xb6/0xc0 [ 11.365975][ T0] Modules linked in: [ 11.365988][ T0] ---[ end trace 8e90dc89e2468d55 ]--- [ 11.375984][ T0] RIP: 0010:ptdump_pte_entry+0x39/0x60 [ 11.381335][ T0] Code: 55 41 54 49 89 fc 48 8d 79 18 53 48 89 cb e8 5e 0e fa ff 48 8b 5b 18 48 89 df e8 52 0e fa ff 4c 89 e7 4c 8b 2b e8 47 0e fa ff <49> 8b 0c 24 4c 89 f6 48 89 df ba 05 00 00 00 e8 03 1d 9b 00 31 c0 [ 11.385982][ T0] RSP: 0000:ffffffffaf8079d0 EFLAGS: 00010282 [ 11.395982][ T0] RAX: 0000000000000000 RBX: ffffffffaf807cf0 RCX: ffffffffae374306 [ 11.403864][ T0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffffafef2bf4 [ 11.405982][ T0] RBP: ffffffffaf8079f0 R08: fffffbfff5fdbb22 R09: fffffbfff5fdbb22 [ 11.415982][ T0] R10: fffffbfff5fdbb21 R11: ffffffffafedd90b R12: 000f887fee5ff000 [ 11.425982][ T0] R13: ffffffffae2aee40 R14: 0000000011a00000 R15: 0000000011a01000 [ 11.435982][ T0] FS: 0000000000000000(0000) GS:ffff888843400000(0000) knlGS:0000000000000000 [ 11.445982][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.452466][ T0] CR2: ffff8890779ff000 CR3: 0000000baf412000 CR4: 00000000000406b0 [ 11.455981][ T0] Kernel panic - not syncing: Fatal exception [ 11.462246][ T0] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > Changes since v13: > https://lore.kernel.org/lkml/20191024093716.49420-1-steven.price@arm.com/ > * Fixed typo in arc definition of pmd_leaf() spotted by the kbuild test > robot > * Added tags > > Changes since v12: > https://lore.kernel.org/lkml/20191018101248.33727-1-steven.price@arm.com/ > * Correct code format in riscv pud_leaf()/pmd_leaf() > * v12 may not have reached everyone because of mail server problems > (which are now hopefully resolved!) > > Changes since v11: > https://lore.kernel.org/lkml/20191007153822.16518-1-steven.price@arm.com/ > * Use "-1" as dummy depth parameter in patch 14. > > Changes since v10: > https://lore.kernel.org/lkml/20190731154603.41797-1-steven.price@arm.com/ > * Rebased to v5.4-rc1 - mainly various updates to deal with the > splitting out of ops from struct mm_walk. > * Deal with PGD_LEVEL_MULT not always being constant on x86. > > Changes since v9: > https://lore.kernel.org/lkml/20190722154210.42799-1-steven.price@arm.com/ > * Moved generic macros to first page in the series and explained the > macro naming in the commit message. > * mips: Moved macros to pgtable.h as they are now valid for both 32 and 64 > bit > * x86: Dropped patch which changed the debugfs output for x86, instead > we have... > * new patch adding 'depth' parameter to pte_hole. This is used to > provide the necessary information to output lines for 'holes' in the > debugfs files > * new patch changing arm64 debugfs output to include holes to match x86 > * generic ptdump KASAN handling has been simplified and now works with > CONFIG_DEBUG_VIRTUAL. > > Changes since v8: > https://lore.kernel.org/lkml/20190403141627.11664-1-steven.price@arm.com/ > * Rename from p?d_large() to p?d_leaf() > * Dropped patches migrating arm64/x86 custom walkers to > walk_page_range() in favour of adding a generic PTDUMP implementation > and migrating arm64/x86 to that instead. > * Rebased to v5.3-rc1 > > Steven Price (23): > mm: Add generic p?d_leaf() macros > arc: mm: Add p?d_leaf() definitions > arm: mm: Add p?d_leaf() definitions > arm64: mm: Add p?d_leaf() definitions > mips: mm: Add p?d_leaf() definitions > powerpc: mm: Add p?d_leaf() definitions > riscv: mm: Add p?d_leaf() definitions > s390: mm: Add p?d_leaf() definitions > sparc: mm: Add p?d_leaf() definitions > x86: mm: Add p?d_leaf() definitions > mm: pagewalk: Add p4d_entry() and pgd_entry() > mm: pagewalk: Allow walking without vma > mm: pagewalk: Add test_p?d callbacks > mm: pagewalk: Add 'depth' parameter to pte_hole > x86: mm: Point to struct seq_file from struct pg_state > x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct > x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct > x86: mm: Convert ptdump_walk_pgd_level_core() to take an mm_struct > mm: Add generic ptdump > x86: mm: Convert dump_pagetables to use walk_page_range > arm64: mm: Convert mm/dump.c to use walk_page_range() > arm64: mm: Display non-present entries in ptdump > mm: ptdump: Reduce level numbers by 1 in note_page() > > arch/arc/include/asm/pgtable.h | 1 + > arch/arm/include/asm/pgtable-2level.h | 1 + > arch/arm/include/asm/pgtable-3level.h | 1 + > arch/arm64/Kconfig | 1 + > arch/arm64/Kconfig.debug | 19 +- > arch/arm64/include/asm/pgtable.h | 2 + > arch/arm64/include/asm/ptdump.h | 8 +- > arch/arm64/mm/Makefile | 4 +- > arch/arm64/mm/dump.c | 148 +++----- > arch/arm64/mm/mmu.c | 4 +- > arch/arm64/mm/ptdump_debugfs.c | 2 +- > arch/mips/include/asm/pgtable.h | 5 + > arch/powerpc/include/asm/book3s/64/pgtable.h | 30 +- > arch/riscv/include/asm/pgtable-64.h | 7 + > arch/riscv/include/asm/pgtable.h | 7 + > arch/s390/include/asm/pgtable.h | 2 + > arch/sparc/include/asm/pgtable_64.h | 2 + > arch/x86/Kconfig | 1 + > arch/x86/Kconfig.debug | 20 +- > arch/x86/include/asm/pgtable.h | 10 +- > arch/x86/mm/Makefile | 4 +- > arch/x86/mm/debug_pagetables.c | 8 +- > arch/x86/mm/dump_pagetables.c | 343 +++++-------------- > arch/x86/platform/efi/efi_32.c | 2 +- > arch/x86/platform/efi/efi_64.c | 4 +- > drivers/firmware/efi/arm-runtime.c | 2 +- > fs/proc/task_mmu.c | 4 +- > include/asm-generic/pgtable.h | 20 ++ > include/linux/pagewalk.h | 42 ++- > include/linux/ptdump.h | 22 ++ > mm/Kconfig.debug | 21 ++ > mm/Makefile | 1 + > mm/hmm.c | 8 +- > mm/migrate.c | 5 +- > mm/mincore.c | 1 + > mm/pagewalk.c | 126 +++++-- > mm/ptdump.c | 151 ++++++++ > 37 files changed, 586 insertions(+), 453 deletions(-) > create mode 100644 include/linux/ptdump.h > create mode 100644 mm/ptdump.c >
> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: > > On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: >> Many architectures current have a debugfs file for dumping the kernel >> page tables. Currently each architecture has to implement custom >> functions for this because the details of walking the page tables used >> by the kernel are different between architectures. >> >> This series extends the capabilities of walk_page_range() so that it can >> deal with the page tables of the kernel (which have no VMAs and can >> contain larger huge pages than exist for user space). A generic PTDUMP >> implementation is the implemented making use of the new functionality of >> walk_page_range() and finally arm64 and x86 are switch to using it, >> removing the custom table walkers. >> >> To enable a generic page table walker to walk the unusual mappings of >> the kernel we need to implement a set of functions which let us know >> when the walker has reached the leaf entry. After a suggestion from Will >> Deacon I've chosen the name p?d_leaf() as this (hopefully) describes >> the purpose (and is a new name so has no historic baggage). Some >> architectures have p?d_large macros but this is easily confused with >> "large pages". >> >> This series ends with a generic PTDUMP implemention for arm64 and x86. >> >> Mostly this is a clean up and there should be very little functional >> change. The exceptions are: >> >> * arm64 PTDUMP debugfs now displays pages which aren't present (patch 22). >> >> * arm64 has the ability to efficiently process KASAN pages (which >> previously only x86 implemented). This means that the combination of >> KASAN and DEBUG_WX is now useable. >> >> Also available as a git tree: >> git://linux-arm.org/linux-sp.git walk_page_range/v15 >> >> Changes since v14: >> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ >> * Switch walk_page_range() into two functions, the existing >> walk_page_range() now still requires VMAs (and treats areas without a >> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and >> will report the actual page table layout. This fixes the previous >> breakage of /proc/<pid>/pagemap >> * New patch at the end of the series which reduces the 'level' numbers >> by 1 to simplify the code slightly >> * Added tags > > Does this new version also take care of this boot crash seen with v14? Suppose > it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > V15 is indeed DOA here. [ 10.957006][ T0] pid_max: default: 131072 minimum: 1024 [ 11.543186][ T0] ---[ User Space ]--- [ 11.547009][ T0] 0x0000000000000000-0x0000000000001000 4K RW x pte [ 11.556612][ T0] 0x0000000000001000-0x0000000000200000 2044K pte [ 11.557008][ T0] 0x0000000000200000-0x0000000004000000 62M pmd [ 11.567014][ T0] 0x0000000004000000-0x0000000004076000 472K pte [ 11.577033][ T0] 0x0000000004076000-0x0000000004200000 1576K pte [ 11.587013][ T0] 0x0000000004200000-0x0000000011000000 206M pmd [ 11.597023][ T0] 0x0000000011000000-0x0000000011100000 1M pte [ 11.607023][ T0] 0x0000000011100000-0x0000000011200000 1M pte [ 11.617006][ T0] 0x0000000011200000-0x0000000011800000 6M pmd [ 11.627068][ T0] ================================================================== [ 11.635087][ T0] BUG: KASAN: wild-memory-access in ptdump_pte_entry+0x39/0x60 [ 11.636992][ T0] Read of size 8 at addr 000f887fee5ff000 by task swapper/0/0 [ 11.636992][ T0] [ 11.636992][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc6-next-20191106+ #6 [ 11.636992][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 11.636992][ T0] Call Trace: [ 11.636992][ T0] dump_stack+0xa0/0xea [ 11.636992][ T0] __kasan_report.cold.7+0xb0/0xc0 [ 11.636992][ T0] ? note_page+0x6a9/0xa70 [ 11.636992][ T0] ? ptdump_pte_entry+0x39/0x60 [ 11.636992][ T0] ? ptdump_walk_pgd_level_core+0x1e0/0x1e0 [ 11.636992][ T0] kasan_report+0x12/0x20 [ 11.636992][ T0] __asan_load8+0x71/0xa0 [ 11.636992][ T0] ptdump_pte_entry+0x39/0x60 [ 11.636992][ T0] walk_pgd_range+0x9e5/0xdb0 [ 11.636992][ T0] __walk_page_range+0x206/0x230 [ 11.636992][ T0] walk_page_range_novma+0xc5/0x130 [ 11.636992][ T0] ? walk_page_range+0x220/0x220 [ 11.636992][ T0] ptdump_walk_pgd+0x76/0xd0 [ 11.636992][ T0] ptdump_walk_pgd_level_core+0x169/0x1e0 [ 11.636992][ T0] ? hugetlb_get_unmapped_area+0x5b0/0x5b0 [ 11.636992][ T0] ? trace_hardirqs_on+0x3a/0x160 [ 11.636992][ T0] ? ptdump_walk_pgd_level_core+0x1e0/0x1e0 [ 11.636992][ T0] ? efi_delete_dummy_variable+0xa9/0xd0 [ 11.636992][ T0] ? __enc_copy+0x90/0x90 [ 11.636992][ T0] ptdump_walk_pgd_level+0x15/0x20 [ 11.636992][ T0] efi_dump_pagetable+0x35/0x37 [ 11.636992][ T0] efi_enter_virtual_mode+0x72a/0x737 [ 11.636992][ T0] start_kernel+0x607/0x6a9 [ 11.636992][ T0] ? thread_stack_cache_init+0xb/0xb [ 11.636992][ T0] ? idt_setup_from_table+0xd9/0x130 [ 11.636992][ T0] x86_64_start_reservations+0x24/0x26 [ 11.636992][ T0] x86_64_start_kernel+0xf4/0xfb [ 11.636992][ T0] secondary_startup_64+0xb6/0xc0 [ 11.636992][ T0] ================================================================== [ 11.636992][ T0] Disabling lock debugging due to kernel taint [ 11.637009][ T0] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 11.645067][ T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B 5.4.0-rc6-next-20191106+ #6 [ 11.646992][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 11.646992][ T0] RIP: 0010:ptdump_pte_entry+0x39/0x60 [ 11.646992][ T0] Code: 55 41 54 49 89 fc 48 8d 79 20 53 48 89 cb e8 8e 9d fa ff 48 8b 5b 20 48 89 df e8 82 9d fa ff 4c 89 e7 4c 8b 2b e8 77 9d fa ff <49> 8b 0c 24 4c 89 f6 48 89 df ba 04 00 00 00 e8 f3 8d 9b 00 31 c0 [ 11.646992][ T0] RSP: 0000:ffffffff8a2079f0 EFLAGS: 00010286 [ 11.646992][ T0] RAX: 0000000000000000 RBX: ffffffff8a207cf0 RCX: ffffffff88d74576 [ 11.646992][ T0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffff8a8f53d4 [ 11.646992][ T0] RBP: ffffffff8a207a10 R08: fffffbfff151c01a R09: fffffbfff151c01a [ 11.646992][ T0] R10: fffffbfff151c019 R11: ffffffff8a8e00cb R12: 000f887fee5ff000 [ 11.646992][ T0] R13: ffffffff88caf040 R14: 0000000011a00000 R15: ffffffff89cfdcc0 [ 11.646992][ T0] FS: 0000000000000000(0000) GS:ffff888843400000(0000) knlGS:0000000000000000 [ 11.646992][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.646992][ T0] CR2: ffff8890779ff000 CR3: 0000000c54a12000 CR4: 00000000000406b0 [ 11.646992][ T0] Call Trace: [ 11.646992][ T0] walk_pgd_range+0x9e5/0xdb0 [ 11.646992][ T0] __walk_page_range+0x206/0x230 [ 11.646992][ T0] walk_page_range_novma+0xc5/0x130 [ 11.646992][ T0] ? walk_page_range+0x220/0x220 [ 11.646992][ T0] ptdump_walk_pgd+0x76/0xd0 [ 11.646992][ T0] ptdump_walk_pgd_level_core+0x169/0x1e0 [ 11.646992][ T0] ? hugetlb_get_unmapped_area+0x5b0/0x5b0 [ 11.646992][ T0] ? trace_hardirqs_on+0x3a/0x160 [ 11.646992][ T0] ? ptdump_walk_pgd_level_core+0x1e0/0x1e0 [ 11.646992][ T0] ? efi_delete_dummy_variable+0xa9/0xd0 [ 11.646992][ T0] ? __enc_copy+0x90/0x90 [ 11.646992][ T0] ptdump_walk_pgd_level+0x15/0x20 [ 11.646992][ T0] efi_dump_pagetable+0x35/0x37 [ 11.646992][ T0] efi_enter_virtual_mode+0x72a/0x737 [ 11.646992][ T0] start_kernel+0x607/0x6a9 [ 11.646992][ T0] ? thread_stack_cache_init+0xb/0xb [ 11.646992][ T0] ? idt_setup_from_table+0xd9/0x130 [ 11.646992][ T0] x86_64_start_reservations+0x24/0x26 [ 11.646992][ T0] x86_64_start_kernel+0xf4/0xfb [ 11.646992][ T0] secondary_startup_64+0xb6/0xc0 [ 11.646992][ T0] Modules linked in: [ 11.647003][ T0] ---[ end trace 751e8882de194a93 ]--- [ 11.652355][ T0] RIP: 0010:ptdump_pte_entry+0x39/0x60 [ 11.657001][ T0] Code: 55 41 54 49 89 fc 48 8d 79 20 53 48 89 cb e8 8e 9d fa ff 48 8b 5b 20 48 89 df e8 82 9d fa ff 4c 89 e7 4c 8b 2b e8 77 9d fa ff <49> 8b 0c 24 4c 89 f6 48 89 df ba 04 00 00 00 e8 f3 8d 9b 00 31 c0 [ 11.666998][ T0] RSP: 0000:ffffffff8a2079f0 EFLAGS: 00010286 [ 11.672961][ T0] RAX: 0000000000000000 RBX: ffffffff8a207cf0 RCX: ffffffff88d74576 [ 11.676998][ T0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffff8a8f53d4 [ 11.686998][ T0] RBP: ffffffff8a207a10 R08: fffffbfff151c01a R09: fffffbfff151c01a [ 11.696998][ T0] R10: fffffbfff151c019 R11: ffffffff8a8e00cb R12: 000f887fee5ff000 [ 11.704882][ T0] R13: ffffffff88caf040 R14: 0000000011a00000 R15: ffffffff89cfdcc0 [ 11.706999][ T0] FS: 0000000000000000(0000) GS:ffff888843400000(0000) knlGS:0000000000000000 [ 11.716998][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.726998][ T0] CR2: ffff8890779ff000 CR3: 0000000c54a12000 CR4: 00000000000406b0 [ 11.736998][ T0] Kernel panic - not syncing: Fatal exception [ 11.743272][ T0] ---[ end Kernel panic - not syncing: Fatal exception ]--- >> >> Changes since v13: >> https://lore.kernel.org/lkml/20191024093716.49420-1-steven.price@arm.com/ >> * Fixed typo in arc definition of pmd_leaf() spotted by the kbuild test >> robot >> * Added tags >> >> Changes since v12: >> https://lore.kernel.org/lkml/20191018101248.33727-1-steven.price@arm.com/ >> * Correct code format in riscv pud_leaf()/pmd_leaf() >> * v12 may not have reached everyone because of mail server problems >> (which are now hopefully resolved!) >> >> Changes since v11: >> https://lore.kernel.org/lkml/20191007153822.16518-1-steven.price@arm.com/ >> * Use "-1" as dummy depth parameter in patch 14. >> >> Changes since v10: >> https://lore.kernel.org/lkml/20190731154603.41797-1-steven.price@arm.com/ >> * Rebased to v5.4-rc1 - mainly various updates to deal with the >> splitting out of ops from struct mm_walk. >> * Deal with PGD_LEVEL_MULT not always being constant on x86. >> >> Changes since v9: >> https://lore.kernel.org/lkml/20190722154210.42799-1-steven.price@arm.com/ >> * Moved generic macros to first page in the series and explained the >> macro naming in the commit message. >> * mips: Moved macros to pgtable.h as they are now valid for both 32 and 64 >> bit >> * x86: Dropped patch which changed the debugfs output for x86, instead >> we have... >> * new patch adding 'depth' parameter to pte_hole. This is used to >> provide the necessary information to output lines for 'holes' in the >> debugfs files >> * new patch changing arm64 debugfs output to include holes to match x86 >> * generic ptdump KASAN handling has been simplified and now works with >> CONFIG_DEBUG_VIRTUAL. >> >> Changes since v8: >> https://lore.kernel.org/lkml/20190403141627.11664-1-steven.price@arm.com/ >> * Rename from p?d_large() to p?d_leaf() >> * Dropped patches migrating arm64/x86 custom walkers to >> walk_page_range() in favour of adding a generic PTDUMP implementation >> and migrating arm64/x86 to that instead. >> * Rebased to v5.3-rc1 >> >> Steven Price (23): >> mm: Add generic p?d_leaf() macros >> arc: mm: Add p?d_leaf() definitions >> arm: mm: Add p?d_leaf() definitions >> arm64: mm: Add p?d_leaf() definitions >> mips: mm: Add p?d_leaf() definitions >> powerpc: mm: Add p?d_leaf() definitions >> riscv: mm: Add p?d_leaf() definitions >> s390: mm: Add p?d_leaf() definitions >> sparc: mm: Add p?d_leaf() definitions >> x86: mm: Add p?d_leaf() definitions >> mm: pagewalk: Add p4d_entry() and pgd_entry() >> mm: pagewalk: Allow walking without vma >> mm: pagewalk: Add test_p?d callbacks >> mm: pagewalk: Add 'depth' parameter to pte_hole >> x86: mm: Point to struct seq_file from struct pg_state >> x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct >> x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct >> x86: mm: Convert ptdump_walk_pgd_level_core() to take an mm_struct >> mm: Add generic ptdump >> x86: mm: Convert dump_pagetables to use walk_page_range >> arm64: mm: Convert mm/dump.c to use walk_page_range() >> arm64: mm: Display non-present entries in ptdump >> mm: ptdump: Reduce level numbers by 1 in note_page() >> >> arch/arc/include/asm/pgtable.h | 1 + >> arch/arm/include/asm/pgtable-2level.h | 1 + >> arch/arm/include/asm/pgtable-3level.h | 1 + >> arch/arm64/Kconfig | 1 + >> arch/arm64/Kconfig.debug | 19 +- >> arch/arm64/include/asm/pgtable.h | 2 + >> arch/arm64/include/asm/ptdump.h | 8 +- >> arch/arm64/mm/Makefile | 4 +- >> arch/arm64/mm/dump.c | 148 +++----- >> arch/arm64/mm/mmu.c | 4 +- >> arch/arm64/mm/ptdump_debugfs.c | 2 +- >> arch/mips/include/asm/pgtable.h | 5 + >> arch/powerpc/include/asm/book3s/64/pgtable.h | 30 +- >> arch/riscv/include/asm/pgtable-64.h | 7 + >> arch/riscv/include/asm/pgtable.h | 7 + >> arch/s390/include/asm/pgtable.h | 2 + >> arch/sparc/include/asm/pgtable_64.h | 2 + >> arch/x86/Kconfig | 1 + >> arch/x86/Kconfig.debug | 20 +- >> arch/x86/include/asm/pgtable.h | 10 +- >> arch/x86/mm/Makefile | 4 +- >> arch/x86/mm/debug_pagetables.c | 8 +- >> arch/x86/mm/dump_pagetables.c | 343 +++++-------------- >> arch/x86/platform/efi/efi_32.c | 2 +- >> arch/x86/platform/efi/efi_64.c | 4 +- >> drivers/firmware/efi/arm-runtime.c | 2 +- >> fs/proc/task_mmu.c | 4 +- >> include/asm-generic/pgtable.h | 20 ++ >> include/linux/pagewalk.h | 42 ++- >> include/linux/ptdump.h | 22 ++ >> mm/Kconfig.debug | 21 ++ >> mm/Makefile | 1 + >> mm/hmm.c | 8 +- >> mm/migrate.c | 5 +- >> mm/mincore.c | 1 + >> mm/pagewalk.c | 126 +++++-- >> mm/ptdump.c | 151 ++++++++ >> 37 files changed, 586 insertions(+), 453 deletions(-) >> create mode 100644 include/linux/ptdump.h >> create mode 100644 mm/ptdump.c >>
On 06/11/2019 13:31, Qian Cai wrote: > > >> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: >> >> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: [...] >>> Changes since v14: >>> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ >>> * Switch walk_page_range() into two functions, the existing >>> walk_page_range() now still requires VMAs (and treats areas without a >>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and >>> will report the actual page table layout. This fixes the previous >>> breakage of /proc/<pid>/pagemap >>> * New patch at the end of the series which reduces the 'level' numbers >>> by 1 to simplify the code slightly >>> * Added tags >> >> Does this new version also take care of this boot crash seen with v14? Suppose >> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, >> >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config >> > > V15 is indeed DOA here. Thanks for finding this, it looks like EFI causes issues here. The below fixes this for me (booting in QEMU). Andrew: do you want me to send out the entire series again for this fix, or can you squash this into mm-pagewalk-allow-walking-without-vma.patch? Thanks, Steve ---8<--- diff --git a/mm/pagewalk.c b/mm/pagewalk.c index c7529dc4f82b..70dcaa23598f 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, split_huge_pmd(walk->vma, pmd, addr); if (pmd_trans_unstable(pmd)) goto again; - } else if (pmd_leaf(*pmd)) { + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) { continue; } @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, split_huge_pud(walk->vma, pud, addr); if (pud_none(*pud)) goto again; - } else if (pud_leaf(*pud)) { + } else if (pud_leaf(*pud) || !pud_present(*pud)) { continue; }
On 06.11.19 16:05, Steven Price wrote: > On 06/11/2019 13:31, Qian Cai wrote: >> >> >>> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: >>> >>> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: > [...] >>>> Changes since v14: >>>> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ >>>> * Switch walk_page_range() into two functions, the existing >>>> walk_page_range() now still requires VMAs (and treats areas without a >>>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and >>>> will report the actual page table layout. This fixes the previous >>>> breakage of /proc/<pid>/pagemap >>>> * New patch at the end of the series which reduces the 'level' numbers >>>> by 1 to simplify the code slightly >>>> * Added tags >>> >>> Does this new version also take care of this boot crash seen with v14? Suppose >>> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, >>> >>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config >>> >> >> V15 is indeed DOA here. > > Thanks for finding this, it looks like EFI causes issues here. The below fixes > this for me (booting in QEMU). > > Andrew: do you want me to send out the entire series again for this fix, or > can you squash this into mm-pagewalk-allow-walking-without-vma.patch? > > Thanks, > > Steve > > ---8<--- > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > index c7529dc4f82b..70dcaa23598f 100644 > --- a/mm/pagewalk.c > +++ b/mm/pagewalk.c > @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > split_huge_pmd(walk->vma, pmd, addr); > if (pmd_trans_unstable(pmd)) > goto again; > - } else if (pmd_leaf(*pmd)) { > + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) { > continue; > } > > @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, > split_huge_pud(walk->vma, pud, addr); > if (pud_none(*pud)) > goto again; > - } else if (pud_leaf(*pud)) { > + } else if (pud_leaf(*pud) || !pud_present(*pud)) { > continue; > } > > Even with this fix, booting for me under QEMU fails. See https://lore.kernel.org/linux-mm/b7ce62f2-9a48-6e48-6685-003431e521aa@redhat.com/
> On Dec 3, 2019, at 6:02 AM, David Hildenbrand <david@redhat.com> wrote: > > On 06.11.19 16:05, Steven Price wrote: >> On 06/11/2019 13:31, Qian Cai wrote: >>> >>> >>>> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: >>>> >>>> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: >> [...] >>>>> Changes since v14: >>>>> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ >>>>> * Switch walk_page_range() into two functions, the existing >>>>> walk_page_range() now still requires VMAs (and treats areas without a >>>>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and >>>>> will report the actual page table layout. This fixes the previous >>>>> breakage of /proc/<pid>/pagemap >>>>> * New patch at the end of the series which reduces the 'level' numbers >>>>> by 1 to simplify the code slightly >>>>> * Added tags >>>> >>>> Does this new version also take care of this boot crash seen with v14? Suppose >>>> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, >>>> >>>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config >>>> >>> >>> V15 is indeed DOA here. >> >> Thanks for finding this, it looks like EFI causes issues here. The below fixes >> this for me (booting in QEMU). >> >> Andrew: do you want me to send out the entire series again for this fix, or >> can you squash this into mm-pagewalk-allow-walking-without-vma.patch? >> >> Thanks, >> >> Steve >> >> ---8<--- >> diff --git a/mm/pagewalk.c b/mm/pagewalk.c >> index c7529dc4f82b..70dcaa23598f 100644 >> --- a/mm/pagewalk.c >> +++ b/mm/pagewalk.c >> @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >> split_huge_pmd(walk->vma, pmd, addr); >> if (pmd_trans_unstable(pmd)) >> goto again; >> - } else if (pmd_leaf(*pmd)) { >> + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) { >> continue; >> } >> >> @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, >> split_huge_pud(walk->vma, pud, addr); >> if (pud_none(*pud)) >> goto again; >> - } else if (pud_leaf(*pud)) { >> + } else if (pud_leaf(*pud) || !pud_present(*pud)) { >> continue; >> } >> >> > > Even with this fix, booting for me under QEMU fails. See > > https://lore.kernel.org/linux-mm/b7ce62f2-9a48-6e48-6685-003431e521aa@redhat.com/ > Yes, for some reasons, this starts to crash on almost all arches here, so it might be worth for Andrew to revert those in the meantime while allowing Steven to rework.
On 04.12.19 15:54, Qian Cai wrote: > > >> On Dec 3, 2019, at 6:02 AM, David Hildenbrand <david@redhat.com> wrote: >> >> On 06.11.19 16:05, Steven Price wrote: >>> On 06/11/2019 13:31, Qian Cai wrote: >>>> >>>> >>>>> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: >>>>> >>>>> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: >>> [...] >>>>>> Changes since v14: >>>>>> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ >>>>>> * Switch walk_page_range() into two functions, the existing >>>>>> walk_page_range() now still requires VMAs (and treats areas without a >>>>>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and >>>>>> will report the actual page table layout. This fixes the previous >>>>>> breakage of /proc/<pid>/pagemap >>>>>> * New patch at the end of the series which reduces the 'level' numbers >>>>>> by 1 to simplify the code slightly >>>>>> * Added tags >>>>> >>>>> Does this new version also take care of this boot crash seen with v14? Suppose >>>>> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, >>>>> >>>>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config >>>>> >>>> >>>> V15 is indeed DOA here. >>> >>> Thanks for finding this, it looks like EFI causes issues here. The below fixes >>> this for me (booting in QEMU). >>> >>> Andrew: do you want me to send out the entire series again for this fix, or >>> can you squash this into mm-pagewalk-allow-walking-without-vma.patch? >>> >>> Thanks, >>> >>> Steve >>> >>> ---8<--- >>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c >>> index c7529dc4f82b..70dcaa23598f 100644 >>> --- a/mm/pagewalk.c >>> +++ b/mm/pagewalk.c >>> @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >>> split_huge_pmd(walk->vma, pmd, addr); >>> if (pmd_trans_unstable(pmd)) >>> goto again; >>> - } else if (pmd_leaf(*pmd)) { >>> + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) { >>> continue; >>> } >>> >>> @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, >>> split_huge_pud(walk->vma, pud, addr); >>> if (pud_none(*pud)) >>> goto again; >>> - } else if (pud_leaf(*pud)) { >>> + } else if (pud_leaf(*pud) || !pud_present(*pud)) { >>> continue; >>> } >>> >>> >> >> Even with this fix, booting for me under QEMU fails. See >> >> https://lore.kernel.org/linux-mm/b7ce62f2-9a48-6e48-6685-003431e521aa@redhat.com/ >> > > Yes, for some reasons, this starts to crash on almost all arches here, so it might be worth > for Andrew to revert those in the meantime while allowing Steven to rework. I agree, this produces too much noise.
On Wed, Dec 04, 2019 at 02:56:58PM +0000, David Hildenbrand wrote: > On 04.12.19 15:54, Qian Cai wrote: > > > > > >> On Dec 3, 2019, at 6:02 AM, David Hildenbrand <david@redhat.com> wrote: > >> > >> On 06.11.19 16:05, Steven Price wrote: > >>> On 06/11/2019 13:31, Qian Cai wrote: > >>>> > >>>> > >>>>> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: > >>>>> > >>>>> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: > >>> [...] > >>>>>> Changes since v14: > >>>>>> https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/ > >>>>>> * Switch walk_page_range() into two functions, the existing > >>>>>> walk_page_range() now still requires VMAs (and treats areas without a > >>>>>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and > >>>>>> will report the actual page table layout. This fixes the previous > >>>>>> breakage of /proc/<pid>/pagemap > >>>>>> * New patch at the end of the series which reduces the 'level' numbers > >>>>>> by 1 to simplify the code slightly > >>>>>> * Added tags > >>>>> > >>>>> Does this new version also take care of this boot crash seen with v14? Suppose > >>>>> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, > >>>>> > >>>>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > >>>>> > >>>> > >>>> V15 is indeed DOA here. > >>> > >>> Thanks for finding this, it looks like EFI causes issues here. The below fixes > >>> this for me (booting in QEMU). > >>> > >>> Andrew: do you want me to send out the entire series again for this fix, or > >>> can you squash this into mm-pagewalk-allow-walking-without-vma.patch? > >>> > >>> Thanks, > >>> > >>> Steve > >>> > >>> ---8<--- > >>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c > >>> index c7529dc4f82b..70dcaa23598f 100644 > >>> --- a/mm/pagewalk.c > >>> +++ b/mm/pagewalk.c > >>> @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > >>> split_huge_pmd(walk->vma, pmd, addr); > >>> if (pmd_trans_unstable(pmd)) > >>> goto again; > >>> - } else if (pmd_leaf(*pmd)) { > >>> + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) { > >>> continue; > >>> } > >>> > >>> @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, > >>> split_huge_pud(walk->vma, pud, addr); > >>> if (pud_none(*pud)) > >>> goto again; > >>> - } else if (pud_leaf(*pud)) { > >>> + } else if (pud_leaf(*pud) || !pud_present(*pud)) { > >>> continue; > >>> } > >>> > >>> > >> > >> Even with this fix, booting for me under QEMU fails. See > >> > >> https://lore.kernel.org/linux-mm/b7ce62f2-9a48-6e48-6685-003431e521aa@redhat.com/ > >> > > > > Yes, for some reasons, this starts to crash on almost all arches here, so it might be worth > > for Andrew to revert those in the meantime while allowing Steven to rework. > > I agree, this produces too much noise. I've bisected this problem and it's a merge conflict with: ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in walk_pte_range()") Reverting that commit "fixes" the problem. That commit adds a call to pte_offset_map_lock(), however that isn't necessarily safe when considering an "unusual" mapping in the kernel. Combined with my patch set this leads to the BUG when walking the kernel's page tables. At this stage I think it's best if Andrew drops my series and I'll try to rework it on top -rc1 fixing up this conflict and the other x86 32-bit issue that has cropped up. Steve
On 12/4/19 5:32 PM, Steven Price wrote: > On Wed, Dec 04, 2019 at 02:56:58PM +0000, David Hildenbrand wrote: >> On 04.12.19 15:54, Qian Cai wrote: >>> >>>> On Dec 3, 2019, at 6:02 AM, David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 06.11.19 16:05, Steven Price wrote: >>>>> On 06/11/2019 13:31, Qian Cai wrote: >>>>>> >>>>>>> On Nov 4, 2019, at 2:35 PM, Qian Cai <cai@lca.pw> wrote: >>>>>>> >>>>>>> On Fri, 2019-11-01 at 14:09 +0000, Steven Price wrote: >>>>> [...] >>>>>>>> Changes since v14: >>>>>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20191028135910.33253-1-steven.price%40arm.com%2F&data=02%7C01%7Cthellstrom%40vmware.com%7C9f50ca595f81432eff5b08d778d7968a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637110739727088799&sdata=B3n6TFU7hluQyAXUOEaHBAGNC8mhscMfxSJi%2FrFr%2Flo%3D&reserved=0 >>>>>>>> * Switch walk_page_range() into two functions, the existing >>>>>>>> walk_page_range() now still requires VMAs (and treats areas without a >>>>>>>> VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and >>>>>>>> will report the actual page table layout. This fixes the previous >>>>>>>> breakage of /proc/<pid>/pagemap >>>>>>>> * New patch at the end of the series which reduces the 'level' numbers >>>>>>>> by 1 to simplify the code slightly >>>>>>>> * Added tags >>>>>>> Does this new version also take care of this boot crash seen with v14? Suppose >>>>>>> it is now breaking CONFIG_EFI_PGT_DUMP=y? The full config is, >>>>>>> >>>>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fcailca%2Flinux-mm%2Fmaster%2Fx86.config&data=02%7C01%7Cthellstrom%40vmware.com%7C9f50ca595f81432eff5b08d778d7968a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637110739727088799&sdata=ymVh49kh7VL9yseRdkjSbTwRh%2B7yBXxhK7QMTUzwn4U%3D&reserved=0 >>>>>>> >>>>>> V15 is indeed DOA here. >>>>> Thanks for finding this, it looks like EFI causes issues here. The below fixes >>>>> this for me (booting in QEMU). >>>>> >>>>> Andrew: do you want me to send out the entire series again for this fix, or >>>>> can you squash this into mm-pagewalk-allow-walking-without-vma.patch? >>>>> >>>>> Thanks, >>>>> >>>>> Steve >>>>> >>>>> ---8<--- >>>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c >>>>> index c7529dc4f82b..70dcaa23598f 100644 >>>>> --- a/mm/pagewalk.c >>>>> +++ b/mm/pagewalk.c >>>>> @@ -90,7 +90,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, >>>>> split_huge_pmd(walk->vma, pmd, addr); >>>>> if (pmd_trans_unstable(pmd)) >>>>> goto again; >>>>> - } else if (pmd_leaf(*pmd)) { >>>>> + } else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) { >>>>> continue; >>>>> } >>>>> >>>>> @@ -141,7 +141,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, >>>>> split_huge_pud(walk->vma, pud, addr); >>>>> if (pud_none(*pud)) >>>>> goto again; >>>>> - } else if (pud_leaf(*pud)) { >>>>> + } else if (pud_leaf(*pud) || !pud_present(*pud)) { >>>>> continue; >>>>> } >>>>> >>>>> >>>> Even with this fix, booting for me under QEMU fails. See >>>> >>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2Fb7ce62f2-9a48-6e48-6685-003431e521aa%40redhat.com%2F&data=02%7C01%7Cthellstrom%40vmware.com%7C9f50ca595f81432eff5b08d778d7968a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637110739727088799&sdata=fRuLrmrzNEkU2MFzSVdyVyXyRoyZ95yZOYuy7aMSi7A%3D&reserved=0 >>>> >>> Yes, for some reasons, this starts to crash on almost all arches here, so it might be worth >>> for Andrew to revert those in the meantime while allowing Steven to rework. >> I agree, this produces too much noise. > I've bisected this problem and it's a merge conflict with: > > ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in walk_pte_range()") > > Reverting that commit "fixes" the problem. That commit adds a call to > pte_offset_map_lock(), however that isn't necessarily safe when > considering an "unusual" mapping in the kernel. Combined with my patch > set this leads to the BUG when walking the kernel's page tables. > > At this stage I think it's best if Andrew drops my series and I'll try > to rework it on top -rc1 fixing up this conflict and the other x86 > 32-bit issue that has cropped up. Hi, Unfortunately I wasn't aware of that conflict. Perhaps something similar to this https://elixir.bootlin.com/linux/v5.4/source/mm/memory.c#L2012 would fix at least this particular issue? /Thomas > > Steve >
> On Dec 4, 2019, at 11:32 AM, Steven Price <Steven.Price@arm.com> wrote: > > I've bisected this problem and it's a merge conflict with: > > ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in walk_pte_range()") Sigh, how does that commit end up merging in the mainline without going through Andrew’s tree and missed all the linux-next testing? It was merged into the mainline Oct 4th? > Reverting that commit "fixes" the problem. That commit adds a call to > pte_offset_map_lock(), however that isn't necessarily safe when > considering an "unusual" mapping in the kernel. Combined with my patch > set this leads to the BUG when walking the kernel's page tables. > > At this stage I think it's best if Andrew drops my series and I'll try > to rework it on top -rc1 fixing up this conflict and the other x86 > 32-bit issue that has cropped up.
On Thu, 2019-12-05 at 08:15 -0500, Qian Cai wrote: > > On Dec 4, 2019, at 11:32 AM, Steven Price <Steven.Price@arm.com> > > wrote: > > > > I've bisected this problem and it's a merge conflict with: > > > > ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in > > walk_pte_range()") > > Sigh, how does that commit end up merging in the mainline without > going through Andrew’s tree and missed all the linux-next testing? It > was merged into the mainline Oct 4th? It was acked by Andrew to be merged through a drm tree, since it was part of a graphics driver functionality. It was preceded by a fairly lenghty discussion on linux-mm / linux-kernel. It was merged into drm-next on 19-11-28, I think that's when it normally is seen by linux-next. Merged into mainline 19-11-30. Andrew's tree got merged 19-12-05. linux-next signaled a merge conflict from one of the patches in this series (not this one) resolved manually with the akpm tree on 19-12-02. Thomas
> On Dec 5, 2019, at 9:32 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > > On Thu, 2019-12-05 at 08:15 -0500, Qian Cai wrote: >>> On Dec 4, 2019, at 11:32 AM, Steven Price <Steven.Price@arm.com> >>> wrote: >>> >>> I've bisected this problem and it's a merge conflict with: >>> >>> ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in >>> walk_pte_range()") >> >> Sigh, how does that commit end up merging in the mainline without >> going through Andrew’s tree and missed all the linux-next testing? It >> was merged into the mainline Oct 4th? > > It was acked by Andrew to be merged through a drm tree, since it was > part of a graphics driver functionality. It was preceded by a fairly > lenghty discussion on linux-mm / linux-kernel. > > It was merged into drm-next on 19-11-28, I think that's when it > normally is seen by linux-next. Merged into mainline 19-11-30. Andrew's > tree got merged 19-12-05. Ah, that was the problem. Merged into the mainline after only a day or two showed up in the linux-next. There isn’t enough time for integration testing. > > linux-next signaled a merge conflict from one of the patches in this > series (not this one) resolved manually with the akpm tree on 19-12-02. > > Thomas > > > > > >