diff mbox series

[v3] mm: memcontrol: fix kernel stack account

Message ID 20210303151843.81156-1-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series [v3] mm: memcontrol: fix kernel stack account | expand

Commit Message

Muchun Song March 3, 2021, 3:18 p.m. UTC
For simplification 991e7673859e ("mm: memcontrol: account kernel stack
per node") has changed the per zone vmalloc backed stack pages
accounting to per node. By doing that we have lost a certain precision
because those pages might live in different NUMA nodes. In the end
NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
some nodes while underestimated on others. But this is not a real world
problem, just a problem found by reading the code. So there is no actual
data to showing how much impact it has on users.

This doesn't impose any real problem to correctnes of the kernel
behavior as the counter is not used for any internal processing but it
can cause some confusion to the userspace.

Address the problem by accounting each vmalloc backing page to its own
node.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
Changelog in v3:
 - Remove BUG_ON().
 - Update commit log.

Changelog in v2:
 - Rework commit log suggested by Michal.

 Thanks to Michal and Shakeel for review.

 kernel/fork.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Roman Gushchin March 3, 2021, 6:01 p.m. UTC | #1
On Wed, Mar 03, 2021 at 11:18:43PM +0800, Muchun Song wrote:
> For simplification 991e7673859e ("mm: memcontrol: account kernel stack
> per node") has changed the per zone vmalloc backed stack pages
> accounting to per node. By doing that we have lost a certain precision
> because those pages might live in different NUMA nodes. In the end
> NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
> some nodes while underestimated on others. But this is not a real world
> problem, just a problem found by reading the code. So there is no actual
> data to showing how much impact it has on users.
> 
> This doesn't impose any real problem to correctnes of the kernel
> behavior as the counter is not used for any internal processing but it
> can cause some confusion to the userspace.
> 
> Address the problem by accounting each vmalloc backing page to its own
> node.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
> Changelog in v3:
>  - Remove BUG_ON().
>  - Update commit log.
> 
> Changelog in v2:
>  - Rework commit log suggested by Michal.
> 
>  Thanks to Michal and Shakeel for review.
> 
>  kernel/fork.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d66cd1014211..242fdad6972b 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -379,14 +379,17 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>  	void *stack = task_stack_page(tsk);
>  	struct vm_struct *vm = task_stack_vm_area(tsk);
>  
> +	if (vm) {
> +		int i;
>  
> -	/* All stack pages are in the same node. */
> -	if (vm)
> -		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
> -				      account * (THREAD_SIZE / 1024));
> -	else
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> +			mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB,
> +					      account * (PAGE_SIZE / 1024));
> +	} else {
> +		/* All stack pages are in the same node. */
>  		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
>  				      account * (THREAD_SIZE / 1024));
> +	}
>  }
>  
>  static int memcg_charge_kernel_stack(struct task_struct *tsk)
> -- 
> 2.11.0
>
diff mbox series

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index d66cd1014211..242fdad6972b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -379,14 +379,17 @@  static void account_kernel_stack(struct task_struct *tsk, int account)
 	void *stack = task_stack_page(tsk);
 	struct vm_struct *vm = task_stack_vm_area(tsk);
 
+	if (vm) {
+		int i;
 
-	/* All stack pages are in the same node. */
-	if (vm)
-		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
-				      account * (THREAD_SIZE / 1024));
-	else
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
+			mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB,
+					      account * (PAGE_SIZE / 1024));
+	} else {
+		/* All stack pages are in the same node. */
 		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
+	}
 }
 
 static int memcg_charge_kernel_stack(struct task_struct *tsk)