Message ID | 20180312090054.mqu56pju7nijjufh@node.shutemov.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Mikhail Gavrilov wrote: > On 12 March 2018 at 14:00, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Sun, Mar 11, 2018 at 11:11:52PM +0500, Mikhail Gavrilov wrote: > >> $ uname -a > >> Linux localhost.localdomain 4.15.7-300.fc27.x86_64+debug #1 SMP Wed > >> Feb 28 17:32:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > >> > >> > >> How reproduce: > >> 1. start virtual machine > >> 2. open https://oom.sy24.ru/ in Firefox which will helps occurred OOM. > >> Sorry I can't attach here html page because my message will rejected > >> as message would contained HTML subpart. > >> > >> Actual result virtual machine hang and even couldn't be force off. > >> > >> Expected result virtual machine continue work. > >> > >> [ 2335.903277] INFO: task CPU 0/KVM:7450 blocked for more than 120 seconds. > >> [ 2335.903284] Not tainted 4.15.7-300.fc27.x86_64+debug #1 > >> [ 2335.903287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> disables this message. > >> [ 2335.903291] CPU 0/KVM D10648 7450 1 0x00000000 > >> [ 2335.903298] Call Trace: > >> [ 2335.903308] ? __schedule+0x2e9/0xbb0 > >> [ 2335.903318] ? __lock_page+0xad/0x180 > >> [ 2335.903322] schedule+0x2f/0x90 > >> [ 2335.903327] io_schedule+0x12/0x40 > >> [ 2335.903331] __lock_page+0xed/0x180 > >> [ 2335.903338] ? page_cache_tree_insert+0x130/0x130 > >> [ 2335.903347] deferred_split_scan+0x318/0x340 > > > > I guess it's bad idea to wait the page to be unlocked in the relaim path. > > Could you check if this makes a difference: > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 87ab9b8f56b5..529cf36b7edb 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > > > > list_for_each_safe(pos, next, &list) { > > page = list_entry((void *)pos, struct page, mapping); > > - lock_page(page); > > + if (!trylock_page(page)) > > + goto next; > > /* split_huge_page() removes page from list on success */ > > if (!split_huge_page(page)) > > split++; > > unlock_page(page); > > +next: > > put_page(page); > > } > > > > Kiril,thanks for pay attention to the problem. > But your patch couldn't help. Virtual machine was hang after OOM. > New dmesg is attached. > Indeed, but the location of hungup seems to be different. dmesg.txt was hanging at io_schedule() waiting for lock_page() and dmesg2.txt was hanging at down_write(&mm->mmap_sem)/down_read(&mm->mmap_sem). But dmesg3.txt was not hanging at io_schedule() waiting for lock_page(). What activities are performed between lock_page() and unlock_page()? Do the activities (directly or indirectly) depend on __GFP_DIRECT_RECLAIM memory allocation requests (e.g. GFP_NOFS/GFP_NOIO)? If yes, it will be unsafe to call lock_page() unconditionally (i.e. without checking GFP context where the shrinker function was called), won't it?
On Mon 12-03-18 23:56:05, Mikhail Gavrilov wrote: [...] > [ 355.531533] ============================================ > [ 355.531534] WARNING: possible recursive locking detected > [ 355.531536] 4.16.0-rc1-amd-vega+ #6 Not tainted > [ 355.531537] -------------------------------------------- > [ 355.531539] CPU 0/KVM/4034 is trying to acquire lock: > [ 355.531540] (&mm->mmap_sem){++++}, at: [<0000000026cd8acd>] get_user_pages_unlocked+0xe0/0x1d0 > [ 355.531549] > but task is already holding lock: > [ 355.531550] (&mm->mmap_sem){++++}, at: [<000000000690373b>] get_user_pages_unlocked+0x5e/0x1d0 > [ 355.531554] > other info that might help us debug this: > [ 355.531555] Possible unsafe locking scenario: > > [ 355.531556] CPU0 > [ 355.531557] ---- > [ 355.531558] lock(&mm->mmap_sem); > [ 355.531559] lock(&mm->mmap_sem); > [ 355.531561] > *** DEADLOCK *** > > [ 355.531562] May be due to missing lock nesting notation > > [ 355.531564] 3 locks held by CPU 0/KVM/4034: > [ 355.531564] #0: (&vcpu->mutex){+.+.}, at: [<00000000230699e6>] kvm_vcpu_ioctl+0x81/0x6b0 [kvm] > [ 355.531586] #1: (&kvm->srcu){....}, at: [<000000000a3cc9a1>] kvm_arch_vcpu_ioctl_run+0x752/0x1b90 [kvm] > [ 355.531602] #2: (&mm->mmap_sem){++++}, at: [<000000000690373b>] get_user_pages_unlocked+0x5e/0x1d0 Is this lockdep report real or a false positive? Because many tasks are failing to take a mmap_sem. Maybe the same one? [...] > [ 615.239436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 615.239437] pidof D11816 5076 5071 0x00000000 > [ 615.239441] Call Trace: > [ 615.239443] ? __schedule+0x2e9/0xba0 > [ 615.239446] ? rwsem_down_read_failed+0x147/0x190 > [ 615.239449] schedule+0x2f/0x90 > [ 615.239450] rwsem_down_read_failed+0x118/0x190 > [ 615.239455] ? call_rwsem_down_read_failed+0x14/0x30 > [ 615.239457] call_rwsem_down_read_failed+0x14/0x30 > [ 615.239460] ? proc_pid_cmdline_read+0xd2/0x4a0 > [ 615.239462] down_read+0x97/0xa0 > [ 615.239464] proc_pid_cmdline_read+0xd2/0x4a0 > [ 615.239467] ? lock_acquire+0x9f/0x200 > [ 615.239469] ? debug_check_no_obj_freed+0xda/0x244 > [ 615.239473] ? __vfs_read+0x36/0x170 > [ 615.239475] __vfs_read+0x36/0x170 > [ 615.239479] vfs_read+0x9e/0x150 > [ 615.239482] SyS_read+0x55/0xc0 > [ 615.239484] ? trace_hardirqs_off_thunk+0x1a/0x1c > [ 615.239487] do_syscall_64+0x7a/0x220 > [ 615.239489] entry_SYSCALL_64_after_hwframe+0x26/0x9b > [ 615.239491] RIP: 0033:0x7f4d5da16701 > [ 615.239492] RSP: 002b:00007ffd5200b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [ 615.239494] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4d5da16701 > [ 615.239496] RDX: 00000000000007ff RSI: 00007ffd5200b1a0 RDI: 0000000000000004 > [ 615.239497] RBP: 0000000000000042 R08: 0000000000000007 R09: 00007f4d5e1825f4 > [ 615.239498] R10: 0000000000000000 R11: 0000000000000246 R12: 000055abb07f5980 > [ 615.239500] R13: 00007ffd5200b1a0 R14: 0000000000000004 R15: 0000000000000000 > [ 615.239505] INFO: lockdep is turned off. > [ 639.815026] kworker/dying (149) used greatest stack depth: 10520 bytes left
On Mon 12-03-18 12:00:54, Kirill A. Shutemov wrote: > On Sun, Mar 11, 2018 at 11:11:52PM +0500, Mikhail Gavrilov wrote: > > $ uname -a > > Linux localhost.localdomain 4.15.7-300.fc27.x86_64+debug #1 SMP Wed > > Feb 28 17:32:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > > > > > How reproduce: > > 1. start virtual machine > > 2. open https://oom.sy24.ru/ in Firefox which will helps occurred OOM. > > Sorry I can't attach here html page because my message will rejected > > as message would contained HTML subpart. > > > > Actual result virtual machine hang and even couldn't be force off. > > > > Expected result virtual machine continue work. > > > > [ 2335.903277] INFO: task CPU 0/KVM:7450 blocked for more than 120 seconds. > > [ 2335.903284] Not tainted 4.15.7-300.fc27.x86_64+debug #1 > > [ 2335.903287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 2335.903291] CPU 0/KVM D10648 7450 1 0x00000000 > > [ 2335.903298] Call Trace: > > [ 2335.903308] ? __schedule+0x2e9/0xbb0 > > [ 2335.903318] ? __lock_page+0xad/0x180 > > [ 2335.903322] schedule+0x2f/0x90 > > [ 2335.903327] io_schedule+0x12/0x40 > > [ 2335.903331] __lock_page+0xed/0x180 > > [ 2335.903338] ? page_cache_tree_insert+0x130/0x130 > > [ 2335.903347] deferred_split_scan+0x318/0x340 > > I guess it's bad idea to wait the page to be unlocked in the relaim path. > Could you check if this makes a difference: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 87ab9b8f56b5..529cf36b7edb 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > > list_for_each_safe(pos, next, &list) { > page = list_entry((void *)pos, struct page, mapping); > - lock_page(page); > + if (!trylock_page(page)) > + goto next; > /* split_huge_page() removes page from list on success */ > if (!split_huge_page(page)) > split++; > unlock_page(page); > +next: > put_page(page); > } Absolutely! Can you send a proper patch please?
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..529cf36b7edb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, list_for_each_safe(pos, next, &list) { page = list_entry((void *)pos, struct page, mapping); - lock_page(page); + if (!trylock_page(page)) + goto next; /* split_huge_page() removes page from list on success */ if (!split_huge_page(page)) split++; unlock_page(page); +next: put_page(page); }