Patchwork [1/2] Revert "vmalloc: back off when the current task is killed"

login
register
mail settings
Submitter Johannes Weiner
Date Oct. 4, 2017, 6:59 p.m.
Message ID <20171004185906.GB2136@cmpxchg.org>
Download mbox | patch
Permalink /patch/9985373/
State New
Headers show

Comments

Johannes Weiner - Oct. 4, 2017, 6:59 p.m.
This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
commit 171012f561274784160f666f8398af8b42216e1f.

5d17a73a2ebe ("vmalloc: back off when the current task is killed")
made all vmalloc allocations from a signal-killed task fail. We have
seen crashes in the tty driver from this, where a killed task exiting
tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
failing, and later crashes when dereferencing tty->disc_data.

Arguably, relying on a vmalloc() call to succeed in order to properly
exit a task is not the most robust way of doing things. There will be
a follow-up patch to the tty code to fall back to the N_NULL ldisc.

But the justification to make that vmalloc() call fail like this isn't
convincing, either. The patch mentions an OOM victim exhausting the
memory reserves and thus deadlocking the machine. But the OOM killer
is only one, improbable source of fatal signals. It doesn't make sense
to fail allocations preemptively with plenty of memory in most cases.

The patch doesn't mention real-life instances where vmalloc sites
would exhaust memory, which makes it sound more like a theoretical
issue to begin with. But just in case, the OOM access to memory
reserves has been restricted on the allocator side in cd04ae1e2dc8
("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
which should take care of any theoretical concerns on that front.

Revert this patch, and the follow-up that suppresses the allocation
warnings when we fail the allocations due to a signal.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmalloc.c | 6 ------
 1 file changed, 6 deletions(-)
Tetsuo Handa - Oct. 4, 2017, 8:49 p.m.
On 2017/10/05 3:59, Johannes Weiner wrote:
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.

By the time the current thread reaches do_exit(), fatal_signal_pending(current)
should become false. As far as I can guess, the source of fatal signal will be
tty_signal_session_leader(tty, exit_session) which is called just before
tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
know whether it is possible to make fatal_signal_pending(current) true inside
do_exit() though...
Johannes Weiner - Oct. 4, 2017, 9 p.m.
On Thu, Oct 05, 2017 at 05:49:43AM +0900, Tetsuo Handa wrote:
> On 2017/10/05 3:59, Johannes Weiner wrote:
> > But the justification to make that vmalloc() call fail like this isn't
> > convincing, either. The patch mentions an OOM victim exhausting the
> > memory reserves and thus deadlocking the machine. But the OOM killer
> > is only one, improbable source of fatal signals. It doesn't make sense
> > to fail allocations preemptively with plenty of memory in most cases.
> 
> By the time the current thread reaches do_exit(), fatal_signal_pending(current)
> should become false. As far as I can guess, the source of fatal signal will be
> tty_signal_session_leader(tty, exit_session) which is called just before
> tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
> know whether it is possible to make fatal_signal_pending(current) true inside
> do_exit() though...

It's definitely not the OOM killer, the memory situation looks fine
when this happens. I didn't look closer where the signal comes from.

That said, we trigger this issue fairly easily. We tested the revert
over night on a couple thousand machines, and it fixed the issue
(whereas the control group still saw the crashes).
Tetsuo Handa - Oct. 4, 2017, 9:42 p.m.
Johannes Weiner wrote:
> On Thu, Oct 05, 2017 at 05:49:43AM +0900, Tetsuo Handa wrote:
> > On 2017/10/05 3:59, Johannes Weiner wrote:
> > > But the justification to make that vmalloc() call fail like this isn't
> > > convincing, either. The patch mentions an OOM victim exhausting the
> > > memory reserves and thus deadlocking the machine. But the OOM killer
> > > is only one, improbable source of fatal signals. It doesn't make sense
> > > to fail allocations preemptively with plenty of memory in most cases.
> > 
> > By the time the current thread reaches do_exit(), fatal_signal_pending(current)
> > should become false. As far as I can guess, the source of fatal signal will be
> > tty_signal_session_leader(tty, exit_session) which is called just before
> > tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
> > know whether it is possible to make fatal_signal_pending(current) true inside
> > do_exit() though...
> 
> It's definitely not the OOM killer, the memory situation looks fine
> when this happens. I didn't look closer where the signal comes from.
> 

Then, we could check tsk_is_oom_victim() instead of fatal_signal_pending().

> That said, we trigger this issue fairly easily. We tested the revert
> over night on a couple thousand machines, and it fixed the issue
> (whereas the control group still saw the crashes).
>
Andrew Morton - Oct. 4, 2017, 10:32 p.m.
On Wed, 4 Oct 2017 14:59:06 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> commit 171012f561274784160f666f8398af8b42216e1f.
> 
> 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> made all vmalloc allocations from a signal-killed task fail. We have
> seen crashes in the tty driver from this, where a killed task exiting
> tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> failing, and later crashes when dereferencing tty->disc_data.
> 
> Arguably, relying on a vmalloc() call to succeed in order to properly
> exit a task is not the most robust way of doing things. There will be
> a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> 
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.
> 
> The patch doesn't mention real-life instances where vmalloc sites
> would exhaust memory, which makes it sound more like a theoretical
> issue to begin with. But just in case, the OOM access to memory
> reserves has been restricted on the allocator side in cd04ae1e2dc8
> ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> which should take care of any theoretical concerns on that front.
> 
> Revert this patch, and the follow-up that suppresses the allocation
> warnings when we fail the allocations due to a signal.

You don't think they should be backported into -stables?
Johannes Weiner - Oct. 4, 2017, 11:18 p.m.
On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
> On Wed, 4 Oct 2017 14:59:06 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> > commit 171012f561274784160f666f8398af8b42216e1f.
> > 
> > 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> > made all vmalloc allocations from a signal-killed task fail. We have
> > seen crashes in the tty driver from this, where a killed task exiting
> > tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> > failing, and later crashes when dereferencing tty->disc_data.
> > 
> > Arguably, relying on a vmalloc() call to succeed in order to properly
> > exit a task is not the most robust way of doing things. There will be
> > a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> > 
> > But the justification to make that vmalloc() call fail like this isn't
> > convincing, either. The patch mentions an OOM victim exhausting the
> > memory reserves and thus deadlocking the machine. But the OOM killer
> > is only one, improbable source of fatal signals. It doesn't make sense
> > to fail allocations preemptively with plenty of memory in most cases.
> > 
> > The patch doesn't mention real-life instances where vmalloc sites
> > would exhaust memory, which makes it sound more like a theoretical
> > issue to begin with. But just in case, the OOM access to memory
> > reserves has been restricted on the allocator side in cd04ae1e2dc8
> > ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> > which should take care of any theoretical concerns on that front.
> > 
> > Revert this patch, and the follow-up that suppresses the allocation
> > warnings when we fail the allocations due to a signal.
> 
> You don't think they should be backported into -stables?

Good point. For this one, it makes sense to CC stable, for 4.11 and
up. The second patch is more of a fortification against potential
future issues, and probably shouldn't go into stable.
Johannes Weiner - Oct. 4, 2017, 11:21 p.m.
On Thu, Oct 05, 2017 at 06:42:38AM +0900, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > On Thu, Oct 05, 2017 at 05:49:43AM +0900, Tetsuo Handa wrote:
> > > On 2017/10/05 3:59, Johannes Weiner wrote:
> > > > But the justification to make that vmalloc() call fail like this isn't
> > > > convincing, either. The patch mentions an OOM victim exhausting the
> > > > memory reserves and thus deadlocking the machine. But the OOM killer
> > > > is only one, improbable source of fatal signals. It doesn't make sense
> > > > to fail allocations preemptively with plenty of memory in most cases.
> > > 
> > > By the time the current thread reaches do_exit(), fatal_signal_pending(current)
> > > should become false. As far as I can guess, the source of fatal signal will be
> > > tty_signal_session_leader(tty, exit_session) which is called just before
> > > tty_ldisc_hangup(tty, cons_filp != NULL) rather than the OOM killer. I don't
> > > know whether it is possible to make fatal_signal_pending(current) true inside
> > > do_exit() though...
> > 
> > It's definitely not the OOM killer, the memory situation looks fine
> > when this happens. I didn't look closer where the signal comes from.
> > 
> 
> Then, we could check tsk_is_oom_victim() instead of fatal_signal_pending().

The case for this patch didn't seem very strong to beging with, and
since it's causing problems a simple revert makes more sense than an
attempt to fine-tune it.

Generally, we should leave it to the page allocator to handle memory
reserves, not annotate random alloc_page() callsites.
Vlastimil Babka - Oct. 5, 2017, 6:49 a.m.
On 10/04/2017 08:59 PM, Johannes Weiner wrote:
> This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> commit 171012f561274784160f666f8398af8b42216e1f.
> 
> 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> made all vmalloc allocations from a signal-killed task fail. We have
> seen crashes in the tty driver from this, where a killed task exiting
> tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> failing, and later crashes when dereferencing tty->disc_data.
> 
> Arguably, relying on a vmalloc() call to succeed in order to properly
> exit a task is not the most robust way of doing things. There will be
> a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> 
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.
> 
> The patch doesn't mention real-life instances where vmalloc sites
> would exhaust memory, which makes it sound more like a theoretical
> issue to begin with. But just in case, the OOM access to memory
> reserves has been restricted on the allocator side in cd04ae1e2dc8
> ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> which should take care of any theoretical concerns on that front.
> 
> Revert this patch, and the follow-up that suppresses the allocation
> warnings when we fail the allocations due to a signal.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/vmalloc.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 8a43db6284eb..673942094328 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1695,11 +1695,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
>  
> -		if (fatal_signal_pending(current)) {
> -			area->nr_pages = i;
> -			goto fail_no_warn;
> -		}
> -
>  		if (node == NUMA_NO_NODE)
>  			page = alloc_page(alloc_mask|highmem_mask);
>  		else
> @@ -1723,7 +1718,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
>  			  (area->nr_pages*PAGE_SIZE), area->size);
> -fail_no_warn:
>  	vfree(area->addr);
>  	return NULL;
>  }
>
Michal Hocko - Oct. 5, 2017, 7:54 a.m.
On Wed 04-10-17 14:59:06, Johannes Weiner wrote:
> This reverts commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb and
> commit 171012f561274784160f666f8398af8b42216e1f.
> 
> 5d17a73a2ebe ("vmalloc: back off when the current task is killed")
> made all vmalloc allocations from a signal-killed task fail. We have
> seen crashes in the tty driver from this, where a killed task exiting
> tries to switch back to N_TTY, fails n_tty_open because of the vmalloc
> failing, and later crashes when dereferencing tty->disc_data.
> 
> Arguably, relying on a vmalloc() call to succeed in order to properly
> exit a task is not the most robust way of doing things. There will be
> a follow-up patch to the tty code to fall back to the N_NULL ldisc.
> 
> But the justification to make that vmalloc() call fail like this isn't
> convincing, either. The patch mentions an OOM victim exhausting the
> memory reserves and thus deadlocking the machine. But the OOM killer
> is only one, improbable source of fatal signals. It doesn't make sense
> to fail allocations preemptively with plenty of memory in most cases.
> 
> The patch doesn't mention real-life instances where vmalloc sites
> would exhaust memory, which makes it sound more like a theoretical
> issue to begin with. But just in case, the OOM access to memory
> reserves has been restricted on the allocator side in cd04ae1e2dc8
> ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access"),
> which should take care of any theoretical concerns on that front.
> 
> Revert this patch, and the follow-up that suppresses the allocation
> warnings when we fail the allocations due to a signal.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmalloc.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 8a43db6284eb..673942094328 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1695,11 +1695,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	for (i = 0; i < area->nr_pages; i++) {
>  		struct page *page;
>  
> -		if (fatal_signal_pending(current)) {
> -			area->nr_pages = i;
> -			goto fail_no_warn;
> -		}
> -
>  		if (node == NUMA_NO_NODE)
>  			page = alloc_page(alloc_mask|highmem_mask);
>  		else
> @@ -1723,7 +1718,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
>  			  (area->nr_pages*PAGE_SIZE), area->size);
> -fail_no_warn:
>  	vfree(area->addr);
>  	return NULL;
>  }
> -- 
> 2.14.1
Michal Hocko - Oct. 5, 2017, 7:57 a.m.
On Wed 04-10-17 19:18:21, Johannes Weiner wrote:
> On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
[...]
> > You don't think they should be backported into -stables?
> 
> Good point. For this one, it makes sense to CC stable, for 4.11 and
> up. The second patch is more of a fortification against potential
> future issues, and probably shouldn't go into stable.

I am not against. It is true that the memory reserves depletion fix was
theoretical because I haven't seen any real life bug. I would argue that
the more robust allocation failure behavior is a stable candidate as
well, though, because the allocation can fail regardless of the vmalloc
revert. It is less likely but still possible.
Tetsuo Handa - Oct. 5, 2017, 10:36 a.m.
On 2017/10/05 16:57, Michal Hocko wrote:
> On Wed 04-10-17 19:18:21, Johannes Weiner wrote:
>> On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
> [...]
>>> You don't think they should be backported into -stables?
>>
>> Good point. For this one, it makes sense to CC stable, for 4.11 and
>> up. The second patch is more of a fortification against potential
>> future issues, and probably shouldn't go into stable.
> 
> I am not against. It is true that the memory reserves depletion fix was
> theoretical because I haven't seen any real life bug. I would argue that
> the more robust allocation failure behavior is a stable candidate as
> well, though, because the allocation can fail regardless of the vmalloc
> revert. It is less likely but still possible.
> 

I don't want this patch backported. If you want to backport,
"s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.

On 2017/10/04 17:33, Michal Hocko wrote:
> Now that we have cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for
> memory reserves access") the risk of the memory depletion is much
> smaller so reverting the above commit should be acceptable. 

Are you aware that stable kernels do not have cd04ae1e2dc8 ?

We added fatal_signal_pending() check inside read()/write() loop
because one read()/write() request could consume 2GB of kernel memory.

What if there is a kernel module which uses vmalloc(1GB) from some
ioctl() for legitimate reason? You are going to allow such vmalloc()
calls to deplete memory reserves completely.

On 2017/10/05 8:21, Johannes Weiner wrote:
> Generally, we should leave it to the page allocator to handle memory
> reserves, not annotate random alloc_page() callsites.

I disagree. Interrupting the loop as soon as possible is preferable.

Since we don't have __GFP_KILLABLE, we had to do fatal_signal_pending()
check inside read()/write() loop. Since vmalloc() resembles read()/write()
in a sense that it can consume GB of memory, it is pointless to expect
the caller of vmalloc() to check tsk_is_oom_victim().

Again, checking tsk_is_oom_victim() inside vmalloc() loop is the better.
Michal Hocko - Oct. 5, 2017, 10:49 a.m.
On Thu 05-10-17 19:36:17, Tetsuo Handa wrote:
> On 2017/10/05 16:57, Michal Hocko wrote:
> > On Wed 04-10-17 19:18:21, Johannes Weiner wrote:
> >> On Wed, Oct 04, 2017 at 03:32:45PM -0700, Andrew Morton wrote:
> > [...]
> >>> You don't think they should be backported into -stables?
> >>
> >> Good point. For this one, it makes sense to CC stable, for 4.11 and
> >> up. The second patch is more of a fortification against potential
> >> future issues, and probably shouldn't go into stable.
> > 
> > I am not against. It is true that the memory reserves depletion fix was
> > theoretical because I haven't seen any real life bug. I would argue that
> > the more robust allocation failure behavior is a stable candidate as
> > well, though, because the allocation can fail regardless of the vmalloc
> > revert. It is less likely but still possible.
> > 
> 
> I don't want this patch backported. If you want to backport,
> "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> 
> On 2017/10/04 17:33, Michal Hocko wrote:
> > Now that we have cd04ae1e2dc8 ("mm, oom: do not rely on TIF_MEMDIE for
> > memory reserves access") the risk of the memory depletion is much
> > smaller so reverting the above commit should be acceptable. 
> 
> Are you aware that stable kernels do not have cd04ae1e2dc8 ?

yes

> We added fatal_signal_pending() check inside read()/write() loop
> because one read()/write() request could consume 2GB of kernel memory.

yes, because this is easily trigerable by userspace.

> What if there is a kernel module which uses vmalloc(1GB) from some
> ioctl() for legitimate reason? You are going to allow such vmalloc()
> calls to deplete memory reserves completely.

Do you have any specific example in mind? If yes we can handle it.
Tetsuo Handa - Oct. 7, 2017, 2:21 a.m.
On 2017/10/05 19:36, Tetsuo Handa wrote:
> I don't want this patch backported. If you want to backport,
> "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.

If you backport this patch, you will see "complete depletion of memory reserves"
and "extra OOM kills due to depletion of memory reserves" using below reproducer.

----------
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/oom.h>

static char *buffer;

static int __init test_init(void)
{
	set_current_oom_origin();
	buffer = vmalloc((1UL << 32) - 480 * 1048576);
	clear_current_oom_origin();
	return buffer ? 0 : -ENOMEM;
}

static void test_exit(void)
{
	vfree(buffer);
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
----------

----------
CentOS Linux 7 (Core)
Kernel 4.13.5+ on an x86_64

ccsecurity login: [   53.637666] test: loading out-of-tree module taints kernel.
[   53.856166] insmod invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[   53.858754] insmod cpuset=/ mems_allowed=0
[   53.859713] CPU: 1 PID: 2763 Comm: insmod Tainted: G           O    4.13.5+ #10
[   53.861134] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   53.863072] Call Trace:
[   53.863548]  dump_stack+0x4d/0x6f
[   53.864172]  dump_header+0x92/0x22a
[   53.864869]  ? has_ns_capability_noaudit+0x30/0x40
[   53.865887]  oom_kill_process+0x250/0x440
[   53.866644]  out_of_memory+0x10d/0x480
[   53.867343]  __alloc_pages_nodemask+0x1087/0x1140
[   53.868216]  alloc_pages_current+0x65/0xd0
[   53.869086]  __vmalloc_node_range+0x129/0x230
[   53.869895]  vmalloc+0x39/0x40
[   53.870472]  ? test_init+0x26/0x1000 [test]
[   53.871248]  test_init+0x26/0x1000 [test]
[   53.871993]  ? 0xffffffffa00fa000
[   53.872609]  do_one_initcall+0x4d/0x190
[   53.873301]  do_init_module+0x5a/0x1f7
[   53.873999]  load_module+0x2022/0x2960
[   53.874678]  ? vfs_read+0x116/0x130
[   53.875312]  SyS_finit_module+0xe1/0xf0
[   53.876074]  ? SyS_finit_module+0xe1/0xf0
[   53.876806]  do_syscall_64+0x5c/0x140
[   53.877488]  entry_SYSCALL64_slow_path+0x25/0x25
[   53.878316] RIP: 0033:0x7f1b27c877f9
[   53.878964] RSP: 002b:00007ffff552e718 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   53.880620] RAX: ffffffffffffffda RBX: 0000000000a2d210 RCX: 00007f1b27c877f9
[   53.881883] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   53.883167] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007ffff552e8b8
[   53.884685] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   53.885949] R13: 0000000000a2d1e0 R14: 0000000000000000 R15: 0000000000000000
[   53.887392] Mem-Info:
[   53.887909] active_anon:14248 inactive_anon:2088 isolated_anon:0
[   53.887909]  active_file:4 inactive_file:2 isolated_file:2
[   53.887909]  unevictable:0 dirty:3 writeback:2 unstable:0
[   53.887909]  slab_reclaimable:2818 slab_unreclaimable:4420
[   53.887909]  mapped:453 shmem:2162 pagetables:1676 bounce:0
[   53.887909]  free:21418 free_pcp:0 free_cma:0
[   53.895172] Node 0 active_anon:56992kB inactive_anon:8352kB active_file:12kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):8kB mapped:1812kB dirty:12kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   53.901844] Node 0 DMA free:14932kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   53.907765] lowmem_reserve[]: 0 2703 3662 3662
[   53.909333] Node 0 DMA32 free:53424kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   53.915597] lowmem_reserve[]: 0 0 958 958
[   53.916992] Node 0 Normal free:17192kB min:17608kB low:22008kB high:26408kB active_anon:56992kB inactive_anon:8352kB active_file:12kB inactive_file:12kB unevictable:0kB writepending:20kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3648kB pagetables:6704kB bounce:0kB free_pcp:112kB local_pcp:0kB free_cma:0kB
[   53.924610] lowmem_reserve[]: 0 0 0 0
[   53.926131] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 1*64kB (U) 0*128kB 0*256kB 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14932kB
[   53.929273] Node 0 DMA32: 4*4kB (UM) 2*8kB (UM) 5*16kB (UM) 4*32kB (M) 3*64kB (M) 4*128kB (M) 5*256kB (UM) 4*512kB (M) 4*1024kB (UM) 2*2048kB (UM) 10*4096kB (M) = 53424kB
[   53.934010] Node 0 Normal: 896*4kB (ME) 466*8kB (UME) 288*16kB (UME) 128*32kB (UME) 23*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17488kB
[   53.937833] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   53.940769] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   53.943250] 2166 total pagecache pages
[   53.944788] 0 pages in swap cache
[   53.946249] Swap cache stats: add 0, delete 0, find 0/0
[   53.948075] Free swap  = 0kB
[   53.949419] Total swap = 0kB
[   53.950873] 1048445 pages RAM
[   53.952238] 0 pages HighMem/MovableOnly
[   53.953768] 101550 pages reserved
[   53.955555] 0 pages hwpoisoned
[   53.956923] Out of memory: Kill process 2763 (insmod) score 3621739297 or sacrifice child
[   53.959298] Killed process 2763 (insmod) total-vm:13084kB, anon-rss:132kB, file-rss:0kB, shmem-rss:0kB
[   53.962059] oom_reaper: reaped process 2763 (insmod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   53.968054] insmod invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[   53.971406] insmod cpuset=/ mems_allowed=0
[   53.973066] CPU: 1 PID: 2763 Comm: insmod Tainted: G           O    4.13.5+ #10
[   53.975339] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   53.978388] Call Trace:
[   53.979714]  dump_stack+0x4d/0x6f
[   53.981176]  dump_header+0x92/0x22a
[   53.982747]  ? has_ns_capability_noaudit+0x30/0x40
[   53.984481]  oom_kill_process+0x250/0x440
[   53.986133]  out_of_memory+0x10d/0x480
[   53.987667]  __alloc_pages_nodemask+0x1087/0x1140
[   53.989431]  alloc_pages_current+0x65/0xd0
[   53.991037]  __vmalloc_node_range+0x129/0x230
[   53.992775]  vmalloc+0x39/0x40
[   53.994421]  ? test_init+0x26/0x1000 [test]
[   53.996063]  test_init+0x26/0x1000 [test]
[   53.997825]  ? 0xffffffffa00fa000
[   53.999280]  do_one_initcall+0x4d/0x190
[   54.000786]  do_init_module+0x5a/0x1f7
[   54.002351]  load_module+0x2022/0x2960
[   54.003789]  ? vfs_read+0x116/0x130
[   54.005299]  SyS_finit_module+0xe1/0xf0
[   54.006872]  ? SyS_finit_module+0xe1/0xf0
[   54.008300]  do_syscall_64+0x5c/0x140
[   54.009912]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.011464] RIP: 0033:0x7f1b27c877f9
[   54.012816] RSP: 002b:00007ffff552e718 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   54.014958] RAX: ffffffffffffffda RBX: 0000000000a2d210 RCX: 00007f1b27c877f9
[   54.017062] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   54.019065] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007ffff552e8b8
[   54.020951] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   54.022738] R13: 0000000000a2d1e0 R14: 0000000000000000 R15: 0000000000000000
[   54.024673] Mem-Info:
[   54.025767] active_anon:14220 inactive_anon:2088 isolated_anon:0
[   54.025767]  active_file:3 inactive_file:0 isolated_file:0
[   54.025767]  unevictable:0 dirty:1 writeback:2 unstable:0
[   54.025767]  slab_reclaimable:2774 slab_unreclaimable:4420
[   54.025767]  mapped:453 shmem:2162 pagetables:1676 bounce:0
[   54.025767]  free:72 free_pcp:0 free_cma:0
[   54.034925] Node 0 active_anon:56880kB inactive_anon:8352kB active_file:12kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1812kB dirty:4kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[   54.041176] Node 0 DMA free:12kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.047349] lowmem_reserve[]: 0 2703 3662 3662
[   54.048922] Node 0 DMA32 free:104kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.055698] lowmem_reserve[]: 0 0 958 958
[   54.057182] Node 0 Normal free:188kB min:17608kB low:22008kB high:26408kB active_anon:56880kB inactive_anon:8352kB active_file:12kB inactive_file:0kB unevictable:0kB writepending:12kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3648kB pagetables:6704kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.065665] lowmem_reserve[]: 0 0 0 0
[   54.067279] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.069949] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.072630] Node 0 Normal: 31*4kB (UM) 5*8kB (UM) 1*16kB (E) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 180kB
[   54.075624] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   54.078142] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   54.080509] 2165 total pagecache pages
[   54.081931] 0 pages in swap cache
[   54.083381] Swap cache stats: add 0, delete 0, find 0/0
[   54.085051] Free swap  = 0kB
[   54.086305] Total swap = 0kB
[   54.087931] 1048445 pages RAM
[   54.089296] 0 pages HighMem/MovableOnly
[   54.090731] 101550 pages reserved
[   54.092161] 0 pages hwpoisoned
[   54.093738] Out of memory: Kill process 2458 (tuned) score 3 or sacrifice child
[   54.095910] Killed process 2458 (tuned) total-vm:562424kB, anon-rss:12764kB, file-rss:0kB, shmem-rss:0kB
[   54.098531] insmod: vmalloc: allocation failure, allocated 3725393920 of 3791654912 bytes, mode:0x14000c0(GFP_KERNEL), nodemask=(null)
[   54.101771] insmod cpuset=/ mems_allowed=0
[   54.103661] oom_reaper: reaped process 2458 (tuned), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   54.103807] tuned invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
[   54.103809] tuned cpuset=/ mems_allowed=0
[   54.103815] CPU: 2 PID: 2712 Comm: tuned Tainted: G           O    4.13.5+ #10
[   54.103815] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   54.103816] Call Trace:
[   54.103825]  dump_stack+0x4d/0x6f
[   54.103827]  dump_header+0x92/0x22a
[   54.103830]  ? has_ns_capability_noaudit+0x30/0x40
[   54.103834]  oom_kill_process+0x250/0x440
[   54.103835]  out_of_memory+0x10d/0x480
[   54.103836]  __alloc_pages_nodemask+0x1087/0x1140
[   54.103840]  alloc_pages_current+0x65/0xd0
[   54.103843]  pte_alloc_one+0x12/0x40
[   54.103845]  do_huge_pmd_anonymous_page+0xfd/0x620
[   54.103847]  __handle_mm_fault+0x9a7/0x1040
[   54.103848]  ? _lookup_address_cpa.isra.7+0x38/0x40
[   54.103849]  handle_mm_fault+0xd1/0x1c0
[   54.103852]  __do_page_fault+0x28b/0x4f0
[   54.103854]  do_page_fault+0x20/0x70
[   54.103857]  page_fault+0x22/0x30
[   54.103859] RIP: 0010:__get_user_8+0x1b/0x25
[   54.103860] RSP: 0000:ffffc90002703c38 EFLAGS: 00010287
[   54.103860] RAX: 00007fc407fff9e7 RBX: ffff880136cbc740 RCX: 00000000000002b0
[   54.103861] RDX: ffff880133c98e00 RSI: ffff880136cbc740 RDI: ffff880133c98e00
[   54.103861] RBP: ffffc90002703c80 R08: 0000000000000001 R09: 0000000000000000
[   54.103862] R10: ffffc90002703c48 R11: 00000000000003f6 R12: ffff880133c98e00
[   54.103862] R13: ffff880133c98e00 R14: 00007fc407fff9e0 R15: 0000000001399fc8
[   54.103866]  ? exit_robust_list+0x2e/0x110
[   54.103868]  mm_release+0x100/0x140
[   54.103869]  do_exit+0x14b/0xb50
[   54.103871]  ? pick_next_task_fair+0x17d/0x4d0
[   54.103874]  ? put_prev_entity+0x26/0x340
[   54.103875]  do_group_exit+0x36/0xb0
[   54.103878]  get_signal+0x263/0x5f0
[   54.103881]  do_signal+0x32/0x630
[   54.103884]  ? __audit_syscall_exit+0x21a/0x2b0
[   54.103886]  ? syscall_slow_exit_work+0x15c/0x1a0
[   54.103888]  ? getnstimeofday64+0x9/0x20
[   54.103890]  ? wake_up_q+0x80/0x80
[   54.103891]  exit_to_usermode_loop+0x76/0x90
[   54.103892]  do_syscall_64+0x12e/0x140
[   54.103893]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.103895] RIP: 0033:0x7fc42486e923
[   54.103895] RSP: 002b:00007fc407ffe360 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[   54.103896] RAX: fffffffffffffffc RBX: 00007fc4259b7828 RCX: 00007fc42486e923
[   54.103896] RDX: 00000000000003ff RSI: 00007fc400001980 RDI: 000000000000000a
[   54.103897] RBP: 00000000ffffffff R08: 00007fc41a1558e0 R09: 0000000000002ff4
[   54.103897] R10: 00000000ffffffff R11: 0000000000000293 R12: 00007fc40c010140
[   54.103898] R13: 00007fc400001980 R14: 00007fc400001790 R15: 0000000001399fc8
[   54.103899] Mem-Info:
[   54.103902] active_anon:11004 inactive_anon:2088 isolated_anon:0
[   54.103902]  active_file:6 inactive_file:0 isolated_file:0
[   54.103902]  unevictable:0 dirty:1 writeback:2 unstable:0
[   54.103902]  slab_reclaimable:2770 slab_unreclaimable:4420
[   54.103902]  mapped:453 shmem:2162 pagetables:1676 bounce:0
[   54.103902]  free:3117 free_pcp:158 free_cma:0
[   54.103904] Node 0 active_anon:44016kB inactive_anon:8352kB active_file:24kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1812kB dirty:4kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[   54.103905] Node 0 DMA free:12kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.103908] lowmem_reserve[]: 0 2703 3662 3662
[   54.103909] Node 0 DMA32 free:104kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.103911] lowmem_reserve[]: 0 0 958 958
[   54.103912] Node 0 Normal free:12352kB min:17608kB low:22008kB high:26408kB active_anon:44068kB inactive_anon:8352kB active_file:24kB inactive_file:0kB unevictable:0kB writepending:12kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3616kB pagetables:6704kB bounce:0kB free_pcp:632kB local_pcp:632kB free_cma:0kB
[   54.103914] lowmem_reserve[]: 0 0 0 0
[   54.103915] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.103918] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.103921] Node 0 Normal: 536*4kB (UM) 281*8kB (UM) 124*16kB (UME) 76*32kB (UM) 12*64kB (U) 3*128kB (U) 2*256kB (U) 0*512kB 0*1024kB 1*2048kB (M) 0*4096kB = 12520kB
[   54.103926] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   54.103926] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   54.103927] 2165 total pagecache pages
[   54.103928] 0 pages in swap cache
[   54.103929] Swap cache stats: add 0, delete 0, find 0/0
[   54.103929] Free swap  = 0kB
[   54.103929] Total swap = 0kB
[   54.103929] 1048445 pages RAM
[   54.103930] 0 pages HighMem/MovableOnly
[   54.103930] 101550 pages reserved
[   54.103930] 0 pages hwpoisoned
[   54.103931] Out of memory: Kill process 2353 (dhclient) score 3 or sacrifice child
[   54.103984] Killed process 2353 (dhclient) total-vm:113384kB, anon-rss:12488kB, file-rss:0kB, shmem-rss:0kB
[   54.262237] CPU: 1 PID: 2763 Comm: insmod Tainted: G           O    4.13.5+ #10
[   54.264476] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   54.267326] Call Trace:
[   54.268614]  dump_stack+0x4d/0x6f
[   54.270200]  warn_alloc+0x10f/0x1a0
[   54.271723]  __vmalloc_node_range+0x14e/0x230
[   54.273359]  vmalloc+0x39/0x40
[   54.274778]  ? test_init+0x26/0x1000 [test]
[   54.276480]  test_init+0x26/0x1000 [test]
[   54.278081]  ? 0xffffffffa00fa000
[   54.279576]  do_one_initcall+0x4d/0x190
[   54.281089]  do_init_module+0x5a/0x1f7
[   54.282637]  load_module+0x2022/0x2960
[   54.284221]  ? vfs_read+0x116/0x130
[   54.285674]  SyS_finit_module+0xe1/0xf0
[   54.287216]  ? SyS_finit_module+0xe1/0xf0
[   54.288737]  do_syscall_64+0x5c/0x140
[   54.290285]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.291930] RIP: 0033:0x7f1b27c877f9
[   54.293557] RSP: 002b:00007ffff552e718 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[   54.295810] RAX: ffffffffffffffda RBX: 0000000000a2d210 RCX: 00007f1b27c877f9
[   54.297875] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   54.299904] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007ffff552e8b8
[   54.301935] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[   54.303884] R13: 0000000000a2d1e0 R14: 0000000000000000 R15: 0000000000000000
[   54.305896] Mem-Info:
[   54.307238] active_anon:7863 inactive_anon:2088 isolated_anon:0
[   54.307238]  active_file:3 inactive_file:431 isolated_file:0
[   54.307238]  unevictable:0 dirty:1 writeback:2 unstable:0
[   54.307238]  slab_reclaimable:2767 slab_unreclaimable:4413
[   54.307238]  mapped:660 shmem:2162 pagetables:1529 bounce:0
[   54.307238]  free:5315 free_pcp:291 free_cma:0
[   54.317589] Node 0 active_anon:31452kB inactive_anon:8352kB active_file:12kB inactive_file:1836kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2700kB dirty:4kB writeback:8kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4096kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[   54.324325] Node 0 DMA free:12kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.330628] lowmem_reserve[]: 0 2703 3662 3662
[   54.332163] Node 0 DMA32 free:104kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   54.338996] lowmem_reserve[]: 0 0 958 958
[   54.340615] Node 0 Normal free:20648kB min:17608kB low:22008kB high:26408kB active_anon:31452kB inactive_anon:8352kB active_file:12kB inactive_file:2360kB unevictable:0kB writepending:12kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3584kB pagetables:6116kB bounce:0kB free_pcp:1192kB local_pcp:8kB free_cma:0kB
[   54.348671] lowmem_reserve[]: 0 0 0 0
[   54.350205] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.353027] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[   54.355895] Node 0 Normal: 580*4kB (UE) 329*8kB (U) 129*16kB (UE) 70*32kB (U) 16*64kB (U) 5*128kB (UM) 5*256kB (UM) 2*512kB (M) 1*1024kB (M) 3*2048kB (M) 0*4096kB = 20392kB
[   54.360581] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   54.363080] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   54.365507] 2864 total pagecache pages
[   54.366963] 0 pages in swap cache
[   54.368390] Swap cache stats: add 0, delete 0, find 0/0
[   54.370124] Free swap  = 0kB
[   54.371431] Total swap = 0kB
[   54.372770] 1048445 pages RAM
[   54.374085] 0 pages HighMem/MovableOnly
[   54.376827] 101550 pages reserved
[   54.378635] 0 pages hwpoisoned
----------

On the other hand, if you do "s/fatal_signal_pending/tsk_is_oom_victim/", there
is no "depletion of memory reseres" and no "extra OOM kills due to depletion of
memory reserves".

----------
CentOS Linux 7 (Core)
Kernel 4.13.5+ on an x86_64

ccsecurity login: [   54.746704] test: loading out-of-tree module taints kernel.
[   54.896608] insmod invoked oom-killer: gfp_mask=0x14002c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[   54.900107] insmod cpuset=/ mems_allowed=0
[   54.902235] CPU: 3 PID: 2749 Comm: insmod Tainted: G           O    4.13.5+ #11
[   54.906886] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   54.909943] Call Trace:
[   54.911433]  dump_stack+0x4d/0x6f
[   54.912957]  dump_header+0x92/0x22a
[   54.914426]  ? has_ns_capability_noaudit+0x30/0x40
[   54.916242]  oom_kill_process+0x250/0x440
[   54.917912]  out_of_memory+0x10d/0x480
[   54.919426]  __alloc_pages_nodemask+0x1087/0x1140
[   54.921365]  ? vmap_page_range_noflush+0x280/0x320
[   54.923232]  alloc_pages_current+0x65/0xd0
[   54.924784]  __vmalloc_node_range+0x16a/0x280
[   54.926386]  vmalloc+0x39/0x40
[   54.927686]  ? test_init+0x26/0x1000 [test]
[   54.929258]  test_init+0x26/0x1000 [test]
[   54.930793]  ? 0xffffffffa00a0000
[   54.932167]  do_one_initcall+0x4d/0x190
[   54.933586]  ? kfree+0x16f/0x180
[   54.934992]  ? kfree+0x16f/0x180
[   54.936393]  do_init_module+0x5a/0x1f7
[   54.937807]  load_module+0x2022/0x2960
[   54.939344]  ? vfs_read+0x116/0x130
[   54.940901]  SyS_finit_module+0xe1/0xf0
[   54.942386]  ? SyS_finit_module+0xe1/0xf0
[   54.943955]  do_syscall_64+0x5c/0x140
[   54.945991]  entry_SYSCALL64_slow_path+0x25/0x25
[   54.947802] RIP: 0033:0x7fd1655057f9
[   54.949220] RSP: 002b:00007fff9d59fdf8 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[   54.951317] RAX: ffffffffffffffda RBX: 000000000085e210 RCX: 00007fd1655057f9
[   54.953379] RDX: 0000000000000000 RSI: 000000000041a678 RDI: 0000000000000003
[   54.955837] RBP: 000000000041a678 R08: 0000000000000000 R09: 00007fff9d59ff98
[   54.959966] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
[   54.962171] R13: 000000000085e1e0 R14: 0000000000000000 R15: 0000000000000000
[   54.978917] Mem-Info:
[   54.980118] active_anon:13936 inactive_anon:2088 isolated_anon:0
[   54.980118]  active_file:32 inactive_file:6 isolated_file:0
[   54.980118]  unevictable:0 dirty:10 writeback:0 unstable:0
[   54.980118]  slab_reclaimable:2812 slab_unreclaimable:4414
[   54.980118]  mapped:456 shmem:2162 pagetables:1681 bounce:0
[   54.980118]  free:21335 free_pcp:0 free_cma:0
[   54.990120] Node 0 active_anon:55744kB inactive_anon:8352kB active_file:128kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1824kB dirty:40kB writeback:0kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   54.996847] Node 0 DMA free:14932kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   55.003426] lowmem_reserve[]: 0 2703 3662 3662
[   55.004962] Node 0 DMA32 free:53056kB min:49684kB low:62104kB high:74524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2790292kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   55.011598] lowmem_reserve[]: 0 0 958 958
[   55.013852] Node 0 Normal free:17352kB min:17608kB low:22008kB high:26408kB active_anon:55696kB inactive_anon:8352kB active_file:364kB inactive_file:180kB unevictable:0kB writepending:36kB present:1048576kB managed:981384kB mlocked:0kB kernel_stack:3600kB pagetables:6724kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
[   55.021929] lowmem_reserve[]: 0 0 0 0
[   55.023636] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 1*64kB (U) 0*128kB 0*256kB 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14932kB
[   55.026942] Node 0 DMA32: 4*4kB (UM) 2*8kB (UM) 5*16kB (UM) 4*32kB (M) 3*64kB (M) 5*128kB (UM) 4*256kB (M) 4*512kB (M) 4*1024kB (UM) 2*2048kB (UM) 10*4096kB (M) = 53296kB
[   55.031534] Node 0 Normal: 974*4kB (UME) 560*8kB (UME) 288*16kB (ME) 96*32kB (ME) 24*64kB (UM) 0*128kB 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17848kB
[   55.036126] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   55.038841] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   55.041431] 2197 total pagecache pages
[   55.043071] 0 pages in swap cache
[   55.044597] Swap cache stats: add 0, delete 0, find 0/0
[   55.046509] Free swap  = 0kB
[   55.047977] Total swap = 0kB
[   55.049548] 1048445 pages RAM
[   55.051143] 0 pages HighMem/MovableOnly
[   55.052799] 101550 pages reserved
[   55.054319] 0 pages hwpoisoned
[   55.055906] Out of memory: Kill process 2749 (insmod) score 3621739297 or sacrifice child
[   55.058429] Killed process 2749 (insmod) total-vm:13084kB, anon-rss:132kB, file-rss:0kB, shmem-rss:0kB
[   55.061278] oom_reaper: reaped process 2749 (insmod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
----------

Therfore, I throw

Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Johannes Weiner - Oct. 7, 2017, 2:51 a.m.
On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> On 2017/10/05 19:36, Tetsuo Handa wrote:
> > I don't want this patch backported. If you want to backport,
> > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> 
> If you backport this patch, you will see "complete depletion of memory reserves"
> and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> 
> ----------
> #include <linux/module.h>
> #include <linux/slab.h>
> #include <linux/oom.h>
> 
> static char *buffer;
> 
> static int __init test_init(void)
> {
> 	set_current_oom_origin();
> 	buffer = vmalloc((1UL << 32) - 480 * 1048576);

That's not a reproducer, that's a kernel module. It's not hard to
crash the kernel from within the kernel.
Tetsuo Handa - Oct. 7, 2017, 4:05 a.m.
Johannes Weiner wrote:
> On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> > On 2017/10/05 19:36, Tetsuo Handa wrote:
> > > I don't want this patch backported. If you want to backport,
> > > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> > 
> > If you backport this patch, you will see "complete depletion of memory reserves"
> > and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> > 
> > ----------
> > #include <linux/module.h>
> > #include <linux/slab.h>
> > #include <linux/oom.h>
> > 
> > static char *buffer;
> > 
> > static int __init test_init(void)
> > {
> > 	set_current_oom_origin();
> > 	buffer = vmalloc((1UL << 32) - 480 * 1048576);
> 
> That's not a reproducer, that's a kernel module. It's not hard to
> crash the kernel from within the kernel.
> 

When did we agree that "reproducer" is "userspace program" ?
A "reproducer" is a program that triggers something intended.

Year by year, people are spending efforts for kernel hardening.
It is silly to say that "It's not hard to crash the kernel from
within the kernel." when we can easily mitigate.

Even with cd04ae1e2dc8, there is no point with triggering extra
OOM kills by needlessly consuming memory reserves.
Michal Hocko - Oct. 7, 2017, 7:59 a.m.
On Sat 07-10-17 13:05:24, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> > > On 2017/10/05 19:36, Tetsuo Handa wrote:
> > > > I don't want this patch backported. If you want to backport,
> > > > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> > > 
> > > If you backport this patch, you will see "complete depletion of memory reserves"
> > > and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> > > 
> > > ----------
> > > #include <linux/module.h>
> > > #include <linux/slab.h>
> > > #include <linux/oom.h>
> > > 
> > > static char *buffer;
> > > 
> > > static int __init test_init(void)
> > > {
> > > 	set_current_oom_origin();
> > > 	buffer = vmalloc((1UL << 32) - 480 * 1048576);
> > 
> > That's not a reproducer, that's a kernel module. It's not hard to
> > crash the kernel from within the kernel.
> > 
> 
> When did we agree that "reproducer" is "userspace program" ?
> A "reproducer" is a program that triggers something intended.

This way of argumentation is just ridiculous. I can construct whatever
code to put kernel on knees and there is no way around it.

The patch in question was supposed to mitigate a theoretical problem
while it caused a real issue seen out there. That is a reason to
revert the patch. Especially when a better mitigation has been put
in place. You are right that replacing fatal_signal_pending by
tsk_is_oom_victim would keep the original mitigation in pre-cd04ae1e2dc8
kernels but I would only agree to do that if the mitigated problem was
real. And this doesn't seem to be the case. If any of the stable kernels
regresses due to the revert I am willing to put a mitigation in place.
 
> Year by year, people are spending efforts for kernel hardening.
> It is silly to say that "It's not hard to crash the kernel from
> within the kernel." when we can easily mitigate.

This is true but we do not spread random hacks around for problems that
are not real and there are better ways to address them. In this
particular case cd04ae1e2dc8 was a better way to address the problem in
general without spreading tsk_is_oom_victim all over the place.
 
> Even with cd04ae1e2dc8, there is no point with triggering extra
> OOM kills by needlessly consuming memory reserves.

Yet again you are making unfounded claims and I am really fed up
arguing discussing that any further.
Tetsuo Handa - Oct. 7, 2017, 9:57 a.m.
Michal Hocko wrote:
> On Sat 07-10-17 13:05:24, Tetsuo Handa wrote:
> > Johannes Weiner wrote:
> > > On Sat, Oct 07, 2017 at 11:21:26AM +0900, Tetsuo Handa wrote:
> > > > On 2017/10/05 19:36, Tetsuo Handa wrote:
> > > > > I don't want this patch backported. If you want to backport,
> > > > > "s/fatal_signal_pending/tsk_is_oom_victim/" is the safer way.
> > > > 
> > > > If you backport this patch, you will see "complete depletion of memory reserves"
> > > > and "extra OOM kills due to depletion of memory reserves" using below reproducer.
> > > > 
> > > > ----------
> > > > #include <linux/module.h>
> > > > #include <linux/slab.h>
> > > > #include <linux/oom.h>
> > > > 
> > > > static char *buffer;
> > > > 
> > > > static int __init test_init(void)
> > > > {
> > > > 	set_current_oom_origin();
> > > > 	buffer = vmalloc((1UL << 32) - 480 * 1048576);
> > > 
> > > That's not a reproducer, that's a kernel module. It's not hard to
> > > crash the kernel from within the kernel.
> > > 
> > 
> > When did we agree that "reproducer" is "userspace program" ?
> > A "reproducer" is a program that triggers something intended.
> 
> This way of argumentation is just ridiculous. I can construct whatever
> code to put kernel on knees and there is no way around it.

But you don't distinguish between kernel module and userspace program.
What you distinguish is "real" and "theoretical". And, more you reject
with "ridiculous"/"theoretical", more I resist stronger.

> 
> The patch in question was supposed to mitigate a theoretical problem
> while it caused a real issue seen out there. That is a reason to
> revert the patch. Especially when a better mitigation has been put
> in place. You are right that replacing fatal_signal_pending by
> tsk_is_oom_victim would keep the original mitigation in pre-cd04ae1e2dc8
> kernels but I would only agree to do that if the mitigated problem was
> real. And this doesn't seem to be the case. If any of the stable kernels
> regresses due to the revert I am willing to put a mitigation in place.

The real issue here is that caller of vmalloc() was not ready to handle
allocation failure. We addressed kmem_zalloc_greedy() case
( https://marc.info/?l=linux-mm&m=148844910724880 ) by 08b005f1333154ae
rather than reverting fatal_signal_pending(). Removing
fatal_signal_pending() in order to hide real issues is a random hack.

>  
> > Year by year, people are spending efforts for kernel hardening.
> > It is silly to say that "It's not hard to crash the kernel from
> > within the kernel." when we can easily mitigate.
> 
> This is true but we do not spread random hacks around for problems that
> are not real and there are better ways to address them. In this
> particular case cd04ae1e2dc8 was a better way to address the problem in
> general without spreading tsk_is_oom_victim all over the place.

Using tsk_is_oom_victim() is reasonable for vmalloc() because it is a
memory allocation function which belongs to memory management subsystem.

>  
> > Even with cd04ae1e2dc8, there is no point with triggering extra
> > OOM kills by needlessly consuming memory reserves.
> 
> Yet again you are making unfounded claims and I am really fed up
> arguing discussing that any further.

Kernel hardening changes are mostly addressing "theoretical" issues
but we don't call them "ridiculous".

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8a43db6284eb..673942094328 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1695,11 +1695,6 @@  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	for (i = 0; i < area->nr_pages; i++) {
 		struct page *page;
 
-		if (fatal_signal_pending(current)) {
-			area->nr_pages = i;
-			goto fail_no_warn;
-		}
-
 		if (node == NUMA_NO_NODE)
 			page = alloc_page(alloc_mask|highmem_mask);
 		else
@@ -1723,7 +1718,6 @@  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	warn_alloc(gfp_mask, NULL,
 			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
 			  (area->nr_pages*PAGE_SIZE), area->size);
-fail_no_warn:
 	vfree(area->addr);
 	return NULL;
 }