diff mbox

KVM hang after OOM

Message ID 20180312090054.mqu56pju7nijjufh@node.shutemov.name (mailing list archive)
State New, archived
Headers show

Commit Message

Kirill A . Shutemov March 12, 2018, 9 a.m. UTC
On Sun, Mar 11, 2018 at 11:11:52PM +0500, Mikhail Gavrilov wrote:
> $ uname -a
> Linux localhost.localdomain 4.15.7-300.fc27.x86_64+debug #1 SMP Wed
> Feb 28 17:32:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> How reproduce:
> 1. start virtual machine
> 2. open https://oom.sy24.ru/ in Firefox which will helps occurred OOM.
> Sorry I can't attach here html page because my message will rejected
> as message would contained HTML subpart.
> 
> Actual result virtual machine hang and even couldn't be force off.
> 
> Expected result virtual machine continue work.
> 
> [ 2335.903277] INFO: task CPU 0/KVM:7450 blocked for more than 120 seconds.
> [ 2335.903284]       Not tainted 4.15.7-300.fc27.x86_64+debug #1
> [ 2335.903287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 2335.903291] CPU 0/KVM       D10648  7450      1 0x00000000
> [ 2335.903298] Call Trace:
> [ 2335.903308]  ? __schedule+0x2e9/0xbb0
> [ 2335.903318]  ? __lock_page+0xad/0x180
> [ 2335.903322]  schedule+0x2f/0x90
> [ 2335.903327]  io_schedule+0x12/0x40
> [ 2335.903331]  __lock_page+0xed/0x180
> [ 2335.903338]  ? page_cache_tree_insert+0x130/0x130
> [ 2335.903347]  deferred_split_scan+0x318/0x340

I guess it's bad idea to wait the page to be unlocked in the relaim path.
Could you check if this makes a difference:

Comments

Tetsuo Handa March 13, 2018, 12:22 p.m. UTC | #1
Mikhail Gavrilov wrote:
> On 12 March 2018 at 14:00, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > On Sun, Mar 11, 2018 at 11:11:52PM +0500, Mikhail Gavrilov wrote:
> >> $ uname -a
> >> Linux localhost.localdomain 4.15.7-300.fc27.x86_64+debug #1 SMP Wed
> >> Feb 28 17:32:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >>
> >> How reproduce:
> >> 1. start virtual machine
> >> 2. open https://oom.sy24.ru/ in Firefox which will helps occurred OOM.
> >> Sorry I can't attach here html page because my message will rejected
> >> as message would contained HTML subpart.
> >>
> >> Actual result virtual machine hang and even couldn't be force off.
> >>
> >> Expected result virtual machine continue work.
> >>
> >> [ 2335.903277] INFO: task CPU 0/KVM:7450 blocked for more than 120 seconds.
> >> [ 2335.903284]       Not tainted 4.15.7-300.fc27.x86_64+debug #1
> >> [ 2335.903287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [ 2335.903291] CPU 0/KVM       D10648  7450      1 0x00000000
> >> [ 2335.903298] Call Trace:
> >> [ 2335.903308]  ? __schedule+0x2e9/0xbb0
> >> [ 2335.903318]  ? __lock_page+0xad/0x180
> >> [ 2335.903322]  schedule+0x2f/0x90
> >> [ 2335.903327]  io_schedule+0x12/0x40
> >> [ 2335.903331]  __lock_page+0xed/0x180
> >> [ 2335.903338]  ? page_cache_tree_insert+0x130/0x130
> >> [ 2335.903347]  deferred_split_scan+0x318/0x340
> >
> > I guess it's bad idea to wait the page to be unlocked in the relaim path.
> > Could you check if this makes a difference:
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 87ab9b8f56b5..529cf36b7edb 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> >
> >         list_for_each_safe(pos, next, &list) {
> >                 page = list_entry((void *)pos, struct page, mapping);
> > -               lock_page(page);
> > +               if (!trylock_page(page))
> > +                       goto next;
> >                 /* split_huge_page() removes page from list on success */
> >                 if (!split_huge_page(page))
> >                         split++;
> >                 unlock_page(page);
> > +next:
> >                 put_page(page);
> >         }
> >
>
> Kiril,thanks for pay attention to the problem.
> But your patch couldn't help. Virtual machine was hang after OOM.
> New dmesg is attached.
>

Indeed, but the location of hungup seems to be different. dmesg.txt was
hanging at io_schedule() waiting for lock_page() and dmesg2.txt was
hanging at down_write(&mm->mmap_sem)/down_read(&mm->mmap_sem). But
dmesg3.txt was not hanging at io_schedule() waiting for lock_page().

What activities are performed between lock_page() and unlock_page()?
Do the activities (directly or indirectly) depend on __GFP_DIRECT_RECLAIM
memory allocation requests (e.g. GFP_NOFS/GFP_NOIO)? If yes, it will be
unsafe to call lock_page() unconditionally (i.e. without checking GFP
context where the shrinker function was called), won't it?
Michal Hocko March 15, 2018, 12:34 p.m. UTC | #2
On Mon 12-03-18 23:56:05, Mikhail Gavrilov wrote:
[...]
> [  355.531533] ============================================
> [  355.531534] WARNING: possible recursive locking detected
> [  355.531536] 4.16.0-rc1-amd-vega+ #6 Not tainted
> [  355.531537] --------------------------------------------
> [  355.531539] CPU 0/KVM/4034 is trying to acquire lock:
> [  355.531540]  (&mm->mmap_sem){++++}, at: [<0000000026cd8acd>] get_user_pages_unlocked+0xe0/0x1d0
> [  355.531549] 
>                but task is already holding lock:
> [  355.531550]  (&mm->mmap_sem){++++}, at: [<000000000690373b>] get_user_pages_unlocked+0x5e/0x1d0
> [  355.531554] 
>                other info that might help us debug this:
> [  355.531555]  Possible unsafe locking scenario:
> 
> [  355.531556]        CPU0
> [  355.531557]        ----
> [  355.531558]   lock(&mm->mmap_sem);
> [  355.531559]   lock(&mm->mmap_sem);
> [  355.531561] 
>                 *** DEADLOCK ***
> 
> [  355.531562]  May be due to missing lock nesting notation
> 
> [  355.531564] 3 locks held by CPU 0/KVM/4034:
> [  355.531564]  #0:  (&vcpu->mutex){+.+.}, at: [<00000000230699e6>] kvm_vcpu_ioctl+0x81/0x6b0 [kvm]
> [  355.531586]  #1:  (&kvm->srcu){....}, at: [<000000000a3cc9a1>] kvm_arch_vcpu_ioctl_run+0x752/0x1b90 [kvm]
> [  355.531602]  #2:  (&mm->mmap_sem){++++}, at: [<000000000690373b>] get_user_pages_unlocked+0x5e/0x1d0

Is this lockdep report real or a false positive? Because many tasks are
failing to take a mmap_sem. Maybe the same one?

[...]
> [  615.239436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  615.239437] pidof           D11816  5076   5071 0x00000000
> [  615.239441] Call Trace:
> [  615.239443]  ? __schedule+0x2e9/0xba0
> [  615.239446]  ? rwsem_down_read_failed+0x147/0x190
> [  615.239449]  schedule+0x2f/0x90
> [  615.239450]  rwsem_down_read_failed+0x118/0x190
> [  615.239455]  ? call_rwsem_down_read_failed+0x14/0x30
> [  615.239457]  call_rwsem_down_read_failed+0x14/0x30
> [  615.239460]  ? proc_pid_cmdline_read+0xd2/0x4a0
> [  615.239462]  down_read+0x97/0xa0
> [  615.239464]  proc_pid_cmdline_read+0xd2/0x4a0
> [  615.239467]  ? lock_acquire+0x9f/0x200
> [  615.239469]  ? debug_check_no_obj_freed+0xda/0x244
> [  615.239473]  ? __vfs_read+0x36/0x170
> [  615.239475]  __vfs_read+0x36/0x170
> [  615.239479]  vfs_read+0x9e/0x150
> [  615.239482]  SyS_read+0x55/0xc0
> [  615.239484]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [  615.239487]  do_syscall_64+0x7a/0x220
> [  615.239489]  entry_SYSCALL_64_after_hwframe+0x26/0x9b
> [  615.239491] RIP: 0033:0x7f4d5da16701
> [  615.239492] RSP: 002b:00007ffd5200b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [  615.239494] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4d5da16701
> [  615.239496] RDX: 00000000000007ff RSI: 00007ffd5200b1a0 RDI: 0000000000000004
> [  615.239497] RBP: 0000000000000042 R08: 0000000000000007 R09: 00007f4d5e1825f4
> [  615.239498] R10: 0000000000000000 R11: 0000000000000246 R12: 000055abb07f5980
> [  615.239500] R13: 00007ffd5200b1a0 R14: 0000000000000004 R15: 0000000000000000
> [  615.239505] INFO: lockdep is turned off.
> [  639.815026] kworker/dying (149) used greatest stack depth: 10520 bytes left
Michal Hocko March 15, 2018, 12:37 p.m. UTC | #3
On Mon 12-03-18 12:00:54, Kirill A. Shutemov wrote:
> On Sun, Mar 11, 2018 at 11:11:52PM +0500, Mikhail Gavrilov wrote:
> > $ uname -a
> > Linux localhost.localdomain 4.15.7-300.fc27.x86_64+debug #1 SMP Wed
> > Feb 28 17:32:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > 
> > How reproduce:
> > 1. start virtual machine
> > 2. open https://oom.sy24.ru/ in Firefox which will helps occurred OOM.
> > Sorry I can't attach here html page because my message will rejected
> > as message would contained HTML subpart.
> > 
> > Actual result virtual machine hang and even couldn't be force off.
> > 
> > Expected result virtual machine continue work.
> > 
> > [ 2335.903277] INFO: task CPU 0/KVM:7450 blocked for more than 120 seconds.
> > [ 2335.903284]       Not tainted 4.15.7-300.fc27.x86_64+debug #1
> > [ 2335.903287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 2335.903291] CPU 0/KVM       D10648  7450      1 0x00000000
> > [ 2335.903298] Call Trace:
> > [ 2335.903308]  ? __schedule+0x2e9/0xbb0
> > [ 2335.903318]  ? __lock_page+0xad/0x180
> > [ 2335.903322]  schedule+0x2f/0x90
> > [ 2335.903327]  io_schedule+0x12/0x40
> > [ 2335.903331]  __lock_page+0xed/0x180
> > [ 2335.903338]  ? page_cache_tree_insert+0x130/0x130
> > [ 2335.903347]  deferred_split_scan+0x318/0x340
> 
> I guess it's bad idea to wait the page to be unlocked in the relaim path.
> Could you check if this makes a difference:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 87ab9b8f56b5..529cf36b7edb 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>  
>  	list_for_each_safe(pos, next, &list) {
>  		page = list_entry((void *)pos, struct page, mapping);
> -		lock_page(page);
> +		if (!trylock_page(page))
> +			goto next;
>  		/* split_huge_page() removes page from list on success */
>  		if (!split_huge_page(page))
>  			split++;
>  		unlock_page(page);
> +next:
>  		put_page(page);
>  	}

Absolutely! Can you send a proper patch please?
diff mbox

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 87ab9b8f56b5..529cf36b7edb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2783,11 +2783,13 @@  static unsigned long deferred_split_scan(struct shrinker *shrink,
 
 	list_for_each_safe(pos, next, &list) {
 		page = list_entry((void *)pos, struct page, mapping);
-		lock_page(page);
+		if (!trylock_page(page))
+			goto next;
 		/* split_huge_page() removes page from list on success */
 		if (!split_huge_page(page))
 			split++;
 		unlock_page(page);
+next:
 		put_page(page);
 	}