diff mbox

[v3] fork: free vmapped stacks in cache when cpus are offline

Message ID 1486715554-12772-1-git-send-email-hoeun.ryu@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hoeun Ryu Feb. 10, 2017, 8:32 a.m. UTC
Using virtually mapped stack, kernel stacks are allocated via vmalloc.
In the current implementation, two stacks per cpu can be cached when
tasks are freed and the cached stacks are used again in task duplications.
but the cached stacks may remain unfreed even when cpu are offline.
 By adding a cpu hotplug callback to free the cached stacks when a cpu
goes offline, the pages of the cached stacks are not wasted.

Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com>
---
Changes in v3:
 fix misuse of per-cpu api
 fix location of function definition within CONFIG_VMAP_STACK

Changes in v2:
 remove cpuhp callback for `startup`, only `teardown` callback is installed.

 kernel/fork.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Michal Hocko Feb. 10, 2017, 12:05 p.m. UTC | #1
On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
[...]
> +static int free_vm_stack_cache(unsigned int cpu)
> +{
> +	struct vm_struct **cached_vm_stacks = per_cpu_ptr(cached_stacks, cpu);
> +	int i;
> +
> +	for (i = 0; i < NR_CACHED_STACKS; i++) {
> +		struct vm_struct **vm_stack = &cached_vm_stacks[i];
> +
> +		if (*vm_stack == NULL)
> +			continue;
> +
> +		vfree((*vm_stack)->addr);
> +		*vm_stack = NULL;

this seems more obscure than necessary. Probably a matter of taste but I
would find the following easier to read
		struct vm_struct *vm_stack = cached_vm_stacks[i];

		if (!vm_stack)
			continue;

		vfree(vm_stack);
		cached_vm_stacks[i] = NULL;

> +	}
> +
> +	return 0;
> +}
>  #endif
>  
>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> @@ -456,6 +474,11 @@ void __init fork_init(void)
>  	for (i = 0; i < UCOUNT_COUNTS; i++) {
>  		init_user_ns.ucount_max[i] = max_threads/2;
>  	}
> +
> +#ifdef CONFIG_VMAP_STACK
> +	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
> +			  NULL, free_vm_stack_cache);
> +#endif

I am not familiar the new hotplug infrastructure so I might be missing
something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
30 slots available. The name also suggests this will be called on an
online event. Why doesn't this have its own state like other users. The
name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
something like that.
Hoeun Ryu Feb. 10, 2017, 2:31 p.m. UTC | #2
On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
> [...]
>> +static int free_vm_stack_cache(unsigned int cpu)
>> +{
>> +     struct vm_struct **cached_vm_stacks = per_cpu_ptr(cached_stacks, cpu);
>> +     int i;
>> +
>> +     for (i = 0; i < NR_CACHED_STACKS; i++) {
>> +             struct vm_struct **vm_stack = &cached_vm_stacks[i];
>> +
>> +             if (*vm_stack == NULL)
>> +                     continue;
>> +
>> +             vfree((*vm_stack)->addr);
>> +             *vm_stack = NULL;
>
> this seems more obscure than necessary. Probably a matter of taste but I
> would find the following easier to read
>                 struct vm_struct *vm_stack = cached_vm_stacks[i];
>
>                 if (!vm_stack)
>                         continue;
>
>                 vfree(vm_stack);
>                 cached_vm_stacks[i] = NULL;
>

OK, it's easier to read, I'll fix it.

>> +     }
>> +
>> +     return 0;
>> +}
>>  #endif
>>
>>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>> @@ -456,6 +474,11 @@ void __init fork_init(void)
>>       for (i = 0; i < UCOUNT_COUNTS; i++) {
>>               init_user_ns.ucount_max[i] = max_threads/2;
>>       }
>> +
>> +#ifdef CONFIG_VMAP_STACK
>> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
>> +                       NULL, free_vm_stack_cache);
>> +#endif
>
> I am not familiar the new hotplug infrastructure so I might be missing
> something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
> 30 slots available. The name also suggests this will be called on an
> online event. Why doesn't this have its own state like other users. The
> name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
> something like that.

I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
cpuhotplug.h.
Do you think the change is made in a separate patch or not ?

Thank you for your review anyway.

>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Feb. 10, 2017, 2:41 p.m. UTC | #3
On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
> On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
[...]
> >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> >> @@ -456,6 +474,11 @@ void __init fork_init(void)
> >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
> >>               init_user_ns.ucount_max[i] = max_threads/2;
> >>       }
> >> +
> >> +#ifdef CONFIG_VMAP_STACK
> >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
> >> +                       NULL, free_vm_stack_cache);
> >> +#endif
> >
> > I am not familiar the new hotplug infrastructure so I might be missing
> > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
> > 30 slots available. The name also suggests this will be called on an
> > online event. Why doesn't this have its own state like other users. The
> > name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
> > something like that.
> 
> I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
> cpuhotplug.h.
> Do you think the change is made in a separate patch or not ?

I think it should be in a single patch. I am not sure what are the rules
to define a new state though. Let's CC Thomas.
Hoeun Ryu Feb. 10, 2017, 3 p.m. UTC | #4
On Fri, Feb 10, 2017 at 11:41 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
>> On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
> [...]
>> >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>> >> @@ -456,6 +474,11 @@ void __init fork_init(void)
>> >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
>> >>               init_user_ns.ucount_max[i] = max_threads/2;
>> >>       }
>> >> +
>> >> +#ifdef CONFIG_VMAP_STACK
>> >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
>> >> +                       NULL, free_vm_stack_cache);
>> >> +#endif
>> >
>> > I am not familiar the new hotplug infrastructure so I might be missing
>> > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
>> > 30 slots available. The name also suggests this will be called on an
>> > online event. Why doesn't this have its own state like other users. The
>> > name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
>> > something like that.
>>
>> I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
>> cpuhotplug.h.
>> Do you think the change is made in a separate patch or not ?
>
> I think it should be in a single patch. I am not sure what are the rules
> to define a new state though. Let's CC Thomas.

OK, I will.

> --
> Michal Hocko
> SUSE Labs
Thomas Gleixner Feb. 10, 2017, 3:32 p.m. UTC | #5
On Fri, 10 Feb 2017, Michal Hocko wrote:
> On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
> > On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
> [...]
> > >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> > >> @@ -456,6 +474,11 @@ void __init fork_init(void)
> > >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
> > >>               init_user_ns.ucount_max[i] = max_threads/2;
> > >>       }
> > >> +
> > >> +#ifdef CONFIG_VMAP_STACK
> > >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
> > >> +                       NULL, free_vm_stack_cache);
> > >> +#endif
> > >
> > > I am not familiar the new hotplug infrastructure so I might be missing
> > > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
> > > 30 slots available. The name also suggests this will be called on an
> > > online event. Why doesn't this have its own state like other users. The
> > > name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
> > > something like that.
> > 
> > I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
> > cpuhotplug.h.
> > Do you think the change is made in a separate patch or not ?
> 
> I think it should be in a single patch. I am not sure what are the rules
> to define a new state though. Let's CC Thomas.

So the first question is where do you want that to be called? i.e. in which
section:

CPU up		CPU down

PREPARE		DEAD		<- called on some other CPU
ONLINE		DOWN		<- called on the hotplugged CPU

And then the next question is whether you have ordering constraints,
i.e. it must be called before or after some other callback. Only in that
case you want to have an explicit state. If not, just use a dynamically
allocated one.

Thanks,

	tglx
Thomas Gleixner Feb. 10, 2017, 3:34 p.m. UTC | #6
On Sat, 11 Feb 2017, Hoeun Ryu wrote:

> On Fri, Feb 10, 2017 at 11:41 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
> >> On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> >> > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
> > [...]
> >> >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> >> >> @@ -456,6 +474,11 @@ void __init fork_init(void)
> >> >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
> >> >>               init_user_ns.ucount_max[i] = max_threads/2;
> >> >>       }
> >> >> +
> >> >> +#ifdef CONFIG_VMAP_STACK
> >> >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
> >> >> +                       NULL, free_vm_stack_cache);
> >> >> +#endif
> >> >
> >> > I am not familiar the new hotplug infrastructure so I might be missing
> >> > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
> >> > 30 slots available.

That's enough and when we hit that limit we just up it.

> >> > The name also suggests this will be called on an online event.

The states are symmetric.
Hoeun Ryu Feb. 10, 2017, 4:42 p.m. UTC | #7
On Sat, Feb 11, 2017 at 12:32 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 10 Feb 2017, Michal Hocko wrote:
>> On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
>> > On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> > > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
>> [...]
>> > >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>> > >> @@ -456,6 +474,11 @@ void __init fork_init(void)
>> > >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
>> > >>               init_user_ns.ucount_max[i] = max_threads/2;
>> > >>       }
>> > >> +
>> > >> +#ifdef CONFIG_VMAP_STACK
>> > >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
>> > >> +                       NULL, free_vm_stack_cache);
>> > >> +#endif
>> > >
>> > > I am not familiar the new hotplug infrastructure so I might be missing
>> > > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
>> > > 30 slots available. The name also suggests this will be called on an
>> > > online event. Why doesn't this have its own state like other users. The
>> > > name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
>> > > something like that.
>> >
>> > I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
>> > cpuhotplug.h.
>> > Do you think the change is made in a separate patch or not ?
>>
>> I think it should be in a single patch. I am not sure what are the rules
>> to define a new state though. Let's CC Thomas.
>
> So the first question is where do you want that to be called? i.e. in which
> section:
>
> CPU up          CPU down
>
> PREPARE         DEAD            <- called on some other CPU
> ONLINE          DOWN            <- called on the hotplugged CPU
>

It doesn't matter whether the callback is called on the hotplugged CPU
or other CPUs.

> And then the next question is whether you have ordering constraints,
> i.e. it must be called before or after some other callback. Only in that
> case you want to have an explicit state. If not, just use a dynamically
> allocated one.

The cache is for virtually mapped kernel stacks. so I think the
callback should be called after all tasks are migrated to other CPUs.
It must reside before CPUHP_AP_SCHED_STARTING or CPUHP_BRINGUP_CPU.

>
> Thanks,
>
>         tglx
Thomas Gleixner Feb. 10, 2017, 5:51 p.m. UTC | #8
On Sat, 11 Feb 2017, Hoeun Ryu wrote:
> On Sat, Feb 11, 2017 at 12:32 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Fri, 10 Feb 2017, Michal Hocko wrote:
> >> On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
> >> > On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> >> > > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
> >> [...]
> >> > >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> >> > >> @@ -456,6 +474,11 @@ void __init fork_init(void)
> >> > >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
> >> > >>               init_user_ns.ucount_max[i] = max_threads/2;
> >> > >>       }
> >> > >> +
> >> > >> +#ifdef CONFIG_VMAP_STACK
> >> > >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
> >> > >> +                       NULL, free_vm_stack_cache);
> >> > >> +#endif
> >> > >
> >> > > I am not familiar the new hotplug infrastructure so I might be missing
> >> > > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
> >> > > 30 slots available. The name also suggests this will be called on an
> >> > > online event. Why doesn't this have its own state like other users. The
> >> > > name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
> >> > > something like that.
> >> >
> >> > I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
> >> > cpuhotplug.h.
> >> > Do you think the change is made in a separate patch or not ?
> >>
> >> I think it should be in a single patch. I am not sure what are the rules
> >> to define a new state though. Let's CC Thomas.
> >
> > So the first question is where do you want that to be called? i.e. in which
> > section:
> >
> > CPU up          CPU down
> >
> > PREPARE         DEAD            <- called on some other CPU
> > ONLINE          DOWN            <- called on the hotplugged CPU
> >
> 
> It doesn't matter whether the callback is called on the hotplugged CPU
> or other CPUs.
>
> > And then the next question is whether you have ordering constraints,
> > i.e. it must be called before or after some other callback. Only in that
> > case you want to have an explicit state. If not, just use a dynamically
> > allocated one.
> 
> The cache is for virtually mapped kernel stacks. so I think the
> callback should be called after all tasks are migrated to other CPUs.
> It must reside before CPUHP_AP_SCHED_STARTING or CPUHP_BRINGUP_CPU.

Between AP_OFFLINE and AP_SCHED_STARTING does not work because those states
path are called on the hotplugged CPU with interrupts disabled and after
the CPU has been taken out from the scheduler.

So the proper place is the dynamic states CPUHP_BP_PREPARE_DYN. You do not
have a prepare callback, but then it's called on an still online CPU
_AFTER_ the hotplugged CPU vanished from the system.

Thanks,

	tglx
Hoeun Ryu Feb. 10, 2017, 11:33 p.m. UTC | #9
On Sat, Feb 11, 2017 at 2:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 11 Feb 2017, Hoeun Ryu wrote:
>> On Sat, Feb 11, 2017 at 12:32 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Fri, 10 Feb 2017, Michal Hocko wrote:
>> >> On Fri 10-02-17 23:31:41, Hoeun Ryu wrote:
>> >> > On Fri, Feb 10, 2017 at 9:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> >> > > On Fri 10-02-17 17:32:07, Hoeun Ryu wrote:
>> >> [...]
>> >> > >>  static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>> >> > >> @@ -456,6 +474,11 @@ void __init fork_init(void)
>> >> > >>       for (i = 0; i < UCOUNT_COUNTS; i++) {
>> >> > >>               init_user_ns.ucount_max[i] = max_threads/2;
>> >> > >>       }
>> >> > >> +
>> >> > >> +#ifdef CONFIG_VMAP_STACK
>> >> > >> +     cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
>> >> > >> +                       NULL, free_vm_stack_cache);
>> >> > >> +#endif
>> >> > >
>> >> > > I am not familiar the new hotplug infrastructure so I might be missing
>> >> > > something. CPUHP_AP_ONLINE_DYN will allocate a state which is has only
>> >> > > 30 slots available. The name also suggests this will be called on an
>> >> > > online event. Why doesn't this have its own state like other users. The
>> >> > > name should also reflect offline event CPUHP_STACK_CACHE_DEAD or
>> >> > > something like that.
>> >> >
>> >> > I'll define CPUHP_VMSTACK_CACHE_DEAD before CPUHP_BP_PREPARE_DYN in
>> >> > cpuhotplug.h.
>> >> > Do you think the change is made in a separate patch or not ?
>> >>
>> >> I think it should be in a single patch. I am not sure what are the rules
>> >> to define a new state though. Let's CC Thomas.
>> >
>> > So the first question is where do you want that to be called? i.e. in which
>> > section:
>> >
>> > CPU up          CPU down
>> >
>> > PREPARE         DEAD            <- called on some other CPU
>> > ONLINE          DOWN            <- called on the hotplugged CPU
>> >
>>
>> It doesn't matter whether the callback is called on the hotplugged CPU
>> or other CPUs.
>>
>> > And then the next question is whether you have ordering constraints,
>> > i.e. it must be called before or after some other callback. Only in that
>> > case you want to have an explicit state. If not, just use a dynamically
>> > allocated one.
>>
>> The cache is for virtually mapped kernel stacks. so I think the
>> callback should be called after all tasks are migrated to other CPUs.
>> It must reside before CPUHP_AP_SCHED_STARTING or CPUHP_BRINGUP_CPU.
>
> Between AP_OFFLINE and AP_SCHED_STARTING does not work because those states
> path are called on the hotplugged CPU with interrupts disabled and after
> the CPU has been taken out from the scheduler.
>
> So the proper place is the dynamic states CPUHP_BP_PREPARE_DYN. You do not
> have a prepare callback, but then it's called on an still online CPU
> _AFTER_ the hotplugged CPU vanished from the system.
>

OK, I agree.

> Thanks,
>
>         tglx
diff mbox

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index 937ba59..8fad87ba 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -168,6 +168,24 @@  void __weak arch_release_thread_stack(unsigned long *stack)
  */
 #define NR_CACHED_STACKS 2
 static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]);
+
+static int free_vm_stack_cache(unsigned int cpu)
+{
+	struct vm_struct **cached_vm_stacks = per_cpu_ptr(cached_stacks, cpu);
+	int i;
+
+	for (i = 0; i < NR_CACHED_STACKS; i++) {
+		struct vm_struct **vm_stack = &cached_vm_stacks[i];
+
+		if (*vm_stack == NULL)
+			continue;
+
+		vfree((*vm_stack)->addr);
+		*vm_stack = NULL;
+	}
+
+	return 0;
+}
 #endif
 
 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
@@ -456,6 +474,11 @@  void __init fork_init(void)
 	for (i = 0; i < UCOUNT_COUNTS; i++) {
 		init_user_ns.ucount_max[i] = max_threads/2;
 	}
+
+#ifdef CONFIG_VMAP_STACK
+	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "vm_stack_cache",
+			  NULL, free_vm_stack_cache);
+#endif
 }
 
 int __weak arch_dup_task_struct(struct task_struct *dst,