Message ID | 20150119174317.GK20386@saruman (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Felipe Balbi wrote: > Hi, > > On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote: > > Most platforms seem broken intoday's next tag. > > > > https://github.com/nmenon/kernel-test-logs/tree/next-20150119 > > (defconfig: omap2plus_defconfig) > > > > > [ 7.166600] ------------[ cut here ]------------ > > > [ 7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c() > > > [ 7.179194] Modules linked in: > > > [ 7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1 > > > [ 7.191863] Hardware name: Generic AM33XX (Flattened Device Tree) > > > [ 7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14) > > > [ 7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94) > > > [ 7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4) > > > [ 7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24) > > > [ 7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c) > > > [ 7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec) > > > [ 7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4) > > > [ 7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144) > > > [ 7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac) > > > [ 7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280) > > > [ 7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac) > > > [ 7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4) > > > [ 7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34) > > > [ 7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c) > > > [ 7.316881] ---[ end trace 3b8a46b1b280f423 ]--- > > seems like it's caused by: > > b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit > > Adding Kiryl to the loop. > > git bisect start > # good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5 > git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc > # bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119 > git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699 > # good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next' > git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2 > # good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next' > git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40 > # good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next' > git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782 > # bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters > git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777 > # good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters > git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223 > # bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called > git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d > # good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless > git bisect good 54faa439355a9ae476a446429967e9e38f04363e > # bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer > git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e > # good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked > git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93 > # bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal > git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c > # bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty() > git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27 > # bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process > git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d > # first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process > > I've added a dump_mm() call when the bug happens followed by a > while (true) loop (to avoid constant reprinting of the same thing), > here's what I get: > > [ 7.235903] ------------[ cut here ]------------ > [ 7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218() > [ 7.248369] Modules linked in: ipv6 autofs4 > [ 7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888 > [ 7.261274] Hardware name: Generic AM43 (Flattened Device Tree) > [ 7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14) > [ 7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c) > [ 7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4) > [ 7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24) > [ 7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218) > [ 7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec) > [ 7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4) > [ 7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234) > [ 7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8) > [ 7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8) > [ 7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c) > [ 7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c) > [ 7.367485] ---[ end trace 633a89eb76b1d46e ]--- > [ 7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256 > [ 7.372360] get_unmapped_area c001cfc0 > [ 7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552 > [ 7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59 > [ 7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0 > [ 7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22 > [ 7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4 > [ 7.372360] start_brk eb000 brk 10c000 start_stack becd5f10 > [ 7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1 > [ 7.372360] binfmt c08b3158 flags cd core_state (null) > [ 7.372360] ioctx_table (null) > [ 7.372360] owner (null) exe_file ee49d040 > [ 7.372360] tlb_flush_pending 0 > [ 7.448908] flags: 0x0() > > Looking at nr_pmds, that's basically (unsigned long) -4, which tells me > we are decrementing mm->nr_pmds without incrementing first. In, when I > add: > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index ba5f3bcca55d..8425fb419eab 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm) > static inline void mm_inc_nr_pmds(struct mm_struct *mm) > { > atomic_long_inc(&mm->nr_pmds); > + printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__, > + atomic_long_read(&mm->nr_pmds)); > } > > static inline void mm_dec_nr_pmds(struct mm_struct *mm) > { > atomic_long_dec(&mm->nr_pmds); > + printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__, > + atomic_long_read(&mm->nr_pmds)); > } > #endif > > I start getting: > > [...] > > [ 5.935390] ===> mm_dec_nr_pmds nr_pmds -1 > [ 6.236832] random: systemd urandom read with 34 bits of entropy > available > [ 6.276340] systemd[1]: systemd 215 running in system mode. (+PAM > +AUDIT +SELINUX +IMA +SYSVINIT +LIBC > RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR) > [ 6.291380] systemd[1]: Detected architecture 'arm'. > > Welcome to Debian GNU/Linux 8 (jessie)! > > [ 6.434013] systemd[1]: Inserted module 'autofs4' > [ 7.152037] NET: Registered protocol family 10 > [ 7.165770] systemd[1]: Inserted module 'ipv6' > [ 7.170646] ===> mm_dec_nr_pmds nr_pmds -2 > [ 7.174932] ===> mm_dec_nr_pmds nr_pmds -3 > [ 7.179427] ===> mm_dec_nr_pmds nr_pmds -4 > [ 7.258496] ===> mm_dec_nr_pmds nr_pmds -1 > [ 7.262809] ===> mm_dec_nr_pmds nr_pmds -2 > [ 7.267206] ===> mm_dec_nr_pmds nr_pmds -3 > [ 7.271486] ===> mm_dec_nr_pmds nr_pmds -4 > [ 7.275884] ------------[ cut here ]------------ > [ 7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218() > > [...] > > Which confirms my suspicion. So we never increment nr_pmds, but we > decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed > long (see below) and cast roundup()'s return to (signed long), but > that's not what we really want in this case because it's clear our PMD > accounting is bogus. > > Kiryl, any better idea on how to balance mm_inc_nr_pmds() and > mm_dec_nr_pmds() ? I assume it's on !LPAE kernel, right? I did a quick look. ARM has folded PMD level in case of 2-level pages tables, but it doesn't use standard approach -- pgtable-nopmd.h. As result ARM doesn't have __PAGETABLE_PMD_FOLDED defined. I will look further tomorrow, but I wounder if we can just define __PAGETABLE_PMD_FOLDED in arch/arm/include/asm/pgtable-2level.h ? This way we would also get rid of dead code -- __pmd_alloc() is never called in this configuration. And fix the accounting issue: mm_*_nr_pmd() helpers will become nop. Better option would be converting 2-lvl ARM configuration to <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote: > Better option would be converting 2-lvl ARM configuration to > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible. Well, IMHO the folded approach in asm-generic was done the wrong way which barred ARM from ever using it. By that, I mean that the asm-generic stuff encapsulates a pgd into a pud, and a pud into a pmd: typedef struct { pgd_t pgd; } pud_t; typedef struct { pud_t pud; } pmd_t; This, I assert, is the wrong way around. Think about it when you have a real 4 level page table structure - a single pgd points to a set of puds. So, one pgd encapsulates via a pointer a set of puds. One pud does not encapsulate a set of pgds. What we have on ARM is slightly different: because of the sizes of page tables, we have a pgd entry which is physically two page table pointers. However, there are cases where we want to access these as two separate pointers. So, we define pgd_t to be an array of two u32's, and a pmd_t to be a single entry. This works fine, we set the masks, shifts and sizes appropriately so that the pmd code is optimised away, but leaves us with the ability to go down to the individual pgd_t entries when we need to (eg, for section mappings, writing the pgd pointers for page tables, etc.) I think I also ran into problems with: #define pmd_val(x) (pud_val((x).pud)) #define __pmd(x) ((pmd_t) { __pud(x) } ) too - but it's been a very long time since the nopmd.h stuff was introduced, and I last looked at it. In any case, what we have today is what has worked for well over a decade (and pre-dates nopmd.h), and I'm really not interested today in trying to rework tonnes of code to make use of nopmd.h - especially as it will most likely require nopmd.h to be rewritten too, and we now have real 3 level page table support (which I have no way to test.)
diff --git a/include/linux/mm.h b/include/linux/mm.h index ba5f3bcca55d..8425fb419eab 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm) static inline void mm_inc_nr_pmds(struct mm_struct *mm) { atomic_long_inc(&mm->nr_pmds); + printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__, + atomic_long_read(&mm->nr_pmds)); } static inline void mm_dec_nr_pmds(struct mm_struct *mm) { atomic_long_dec(&mm->nr_pmds); + printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__, + atomic_long_read(&mm->nr_pmds)); } #endif I start getting: [...] [ 5.935390] ===> mm_dec_nr_pmds nr_pmds -1 [ 6.236832] random: systemd urandom read with 34 bits of entropy available [ 6.276340] systemd[1]: systemd 215 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBC RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR) [ 6.291380] systemd[1]: Detected architecture 'arm'. Welcome to Debian GNU/Linux 8 (jessie)! [ 6.434013] systemd[1]: Inserted module 'autofs4' [ 7.152037] NET: Registered protocol family 10 [ 7.165770] systemd[1]: Inserted module 'ipv6' [ 7.170646] ===> mm_dec_nr_pmds nr_pmds -2 [ 7.174932] ===> mm_dec_nr_pmds nr_pmds -3 [ 7.179427] ===> mm_dec_nr_pmds nr_pmds -4 [ 7.258496] ===> mm_dec_nr_pmds nr_pmds -1 [ 7.262809] ===> mm_dec_nr_pmds nr_pmds -2 [ 7.267206] ===> mm_dec_nr_pmds nr_pmds -3 [ 7.271486] ===> mm_dec_nr_pmds nr_pmds -4 [ 7.275884] ------------[ cut here ]------------ [ 7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218() [...] Which confirms my suspicion. So we never increment nr_pmds, but we decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed long (see below) and cast roundup()'s return to (signed long), but that's not what we really want in this case because it's clear our PMD accounting is bogus. Kiryl, any better idea on how to balance mm_inc_nr_pmds() and mm_dec_nr_pmds() ? diff --git a/include/linux/mm.h b/include/linux/mm.h index ba5f3bcca55d..c0ce2e8a9d45 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1432,7 +1432,7 @@ static inline void mm_dec_nr_pmds(struct mm_struct *mm) {} #else int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address); -static inline unsigned long mm_nr_pmds(struct mm_struct *mm) +static inline long mm_nr_pmds(struct mm_struct *mm) { return atomic_long_read(&mm->nr_pmds); } diff --git a/mm/mmap.c b/mm/mmap.c index 25271805ab39..703474d3336b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2856,7 +2856,7 @@ void exit_mmap(struct mm_struct *mm) round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT); #ifdef PUD_SHIFT WARN_ON(mm_nr_pmds(mm) > - round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); + (signed long) round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); #endif }