[04/13] mm: Track NR_KERNEL_STACK in pages instead of number of stacks
diff mbox

Message ID 24279d4009c821de64109055665429fad2a7bff7.1466036668.git.luto@kernel.org
State New
Headers show

Commit Message

Andy Lutomirski June 16, 2016, 12:28 a.m. UTC
Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a
zone.  This only makes sense if each kernel stack exists entirely in
one zone, and allowing vmapped stacks could break this assumption.

It turns out that the code for tracking kernel stack allocations in
units of pages is slightly simpler, so just switch to counting
pages.

Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 fs/proc/meminfo.c | 2 +-
 kernel/fork.c     | 3 ++-
 mm/page_alloc.c   | 3 +--
 3 files changed, 4 insertions(+), 4 deletions(-)

Comments

Vladimir Davydov June 16, 2016, 11:10 a.m. UTC | #1
On Wed, Jun 15, 2016 at 05:28:26PM -0700, Andy Lutomirski wrote:
...
> @@ -225,7 +225,8 @@ static void account_kernel_stack(struct thread_info *ti, int account)
>  {
>  	struct zone *zone = page_zone(virt_to_page(ti));
>  
> -	mod_zone_page_state(zone, NR_KERNEL_STACK, account);
> +	mod_zone_page_state(zone, NR_KERNEL_STACK,
> +			    THREAD_SIZE / PAGE_SIZE * account);

It won't work if THREAD_SIZE < PAGE_SIZE. Is there an arch with such a
thread size, anyway? If no, we should probably drop thread_info_cache.
Josh Poimboeuf June 16, 2016, 3:33 p.m. UTC | #2
On Wed, Jun 15, 2016 at 05:28:26PM -0700, Andy Lutomirski wrote:
> Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a
> zone.  This only makes sense if each kernel stack exists entirely in
> one zone, and allowing vmapped stacks could break this assumption.
> 
> It turns out that the code for tracking kernel stack allocations in
> units of pages is slightly simpler, so just switch to counting
> pages.
> 
> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  fs/proc/meminfo.c | 2 +-
>  kernel/fork.c     | 3 ++-
>  mm/page_alloc.c   | 3 +--
>  3 files changed, 4 insertions(+), 4 deletions(-)

You missed another usage of NR_KERNEL_STACK in drivers/base/node.c.
Andy Lutomirski June 16, 2016, 5:21 p.m. UTC | #3
On Thu, Jun 16, 2016 at 4:10 AM, Vladimir Davydov
<vdavydov@virtuozzo.com> wrote:
> On Wed, Jun 15, 2016 at 05:28:26PM -0700, Andy Lutomirski wrote:
> ...
>> @@ -225,7 +225,8 @@ static void account_kernel_stack(struct thread_info *ti, int account)
>>  {
>>       struct zone *zone = page_zone(virt_to_page(ti));
>>
>> -     mod_zone_page_state(zone, NR_KERNEL_STACK, account);
>> +     mod_zone_page_state(zone, NR_KERNEL_STACK,
>> +                         THREAD_SIZE / PAGE_SIZE * account);
>
> It won't work if THREAD_SIZE < PAGE_SIZE. Is there an arch with such a
> thread size, anyway? If no, we should probably drop thread_info_cache.

On a quick grep, I can't find any.  I'll add a BUILD_BUG_ON for now.

--Andy
Andy Lutomirski June 16, 2016, 5:39 p.m. UTC | #4
On Thu, Jun 16, 2016 at 8:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Jun 15, 2016 at 05:28:26PM -0700, Andy Lutomirski wrote:
>> Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a
>> zone.  This only makes sense if each kernel stack exists entirely in
>> one zone, and allowing vmapped stacks could break this assumption.
>>
>> It turns out that the code for tracking kernel stack allocations in
>> units of pages is slightly simpler, so just switch to counting
>> pages.
>>
>> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: linux-mm@kvack.org
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  fs/proc/meminfo.c | 2 +-
>>  kernel/fork.c     | 3 ++-
>>  mm/page_alloc.c   | 3 +--
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> You missed another usage of NR_KERNEL_STACK in drivers/base/node.c.

Thanks.

The real reason I cc'd you was so you could look at
rewind_stack_do_exit and the sneaky trick I did in no_context in the
last patch, though.  :)  Both survive objtool, but I figured I'd check
with objtool's author as well.  If there was a taint bit I could set
saying "kernel is hosed -- don't try to apply live patches any more",
I'd have extra confidence.

--Andy
Andy Lutomirski June 16, 2016, 7:20 p.m. UTC | #5
On Thu, Jun 16, 2016 at 10:21 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Jun 16, 2016 at 4:10 AM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
>> On Wed, Jun 15, 2016 at 05:28:26PM -0700, Andy Lutomirski wrote:
>> ...
>>> @@ -225,7 +225,8 @@ static void account_kernel_stack(struct thread_info *ti, int account)
>>>  {
>>>       struct zone *zone = page_zone(virt_to_page(ti));
>>>
>>> -     mod_zone_page_state(zone, NR_KERNEL_STACK, account);
>>> +     mod_zone_page_state(zone, NR_KERNEL_STACK,
>>> +                         THREAD_SIZE / PAGE_SIZE * account);
>>
>> It won't work if THREAD_SIZE < PAGE_SIZE. Is there an arch with such a
>> thread size, anyway? If no, we should probably drop thread_info_cache.
>
> On a quick grep, I can't find any.  I'll add a BUILD_BUG_ON for now.

Correction.  frv has 16KiB pages and 8KiB stacks.  I'll rework it to
count NR_KERNEL_STACK in KiB.  Sigh.

--Andy

>
> --Andy
Josh Poimboeuf June 16, 2016, 7:39 p.m. UTC | #6
On Thu, Jun 16, 2016 at 10:39:43AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 16, 2016 at 8:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Jun 15, 2016 at 05:28:26PM -0700, Andy Lutomirski wrote:
> >> Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a
> >> zone.  This only makes sense if each kernel stack exists entirely in
> >> one zone, and allowing vmapped stacks could break this assumption.
> >>
> >> It turns out that the code for tracking kernel stack allocations in
> >> units of pages is slightly simpler, so just switch to counting
> >> pages.
> >>
> >> Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: linux-mm@kvack.org
> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> >> ---
> >>  fs/proc/meminfo.c | 2 +-
> >>  kernel/fork.c     | 3 ++-
> >>  mm/page_alloc.c   | 3 +--
> >>  3 files changed, 4 insertions(+), 4 deletions(-)
> >
> > You missed another usage of NR_KERNEL_STACK in drivers/base/node.c.
> 
> Thanks.
> 
> The real reason I cc'd you was so you could look at
> rewind_stack_do_exit and the sneaky trick I did in no_context in the
> last patch, though.  :)  Both survive objtool, but I figured I'd check
> with objtool's author as well.  If there was a taint bit I could set
> saying "kernel is hosed -- don't try to apply live patches any more",
> I'd have extra confidence.

I think it all looks fine from an objtool and a live patching
standpoint.  Other than my previous comment about setting the stack
pointer correctly before calling do_exit(), I didn't see anything else
which would mess up the stack of a sleeping task, which is all I really
care about.

Patch
diff mbox

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 83720460c5bc..8338c0569a8d 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -145,7 +145,7 @@  static int meminfo_proc_show(struct seq_file *m, void *v)
 				global_page_state(NR_SLAB_UNRECLAIMABLE)),
 		K(global_page_state(NR_SLAB_RECLAIMABLE)),
 		K(global_page_state(NR_SLAB_UNRECLAIMABLE)),
-		global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024,
+		K(global_page_state(NR_KERNEL_STACK)),
 		K(global_page_state(NR_PAGETABLE)),
 #ifdef CONFIG_QUICKLIST
 		K(quicklist_total_size()),
diff --git a/kernel/fork.c b/kernel/fork.c
index 5c2c355aa97f..95bebde59d79 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -225,7 +225,8 @@  static void account_kernel_stack(struct thread_info *ti, int account)
 {
 	struct zone *zone = page_zone(virt_to_page(ti));
 
-	mod_zone_page_state(zone, NR_KERNEL_STACK, account);
+	mod_zone_page_state(zone, NR_KERNEL_STACK,
+			    THREAD_SIZE / PAGE_SIZE * account);
 }
 
 void free_task(struct task_struct *tsk)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6903b695ebae..2b0203b3a976 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4457,8 +4457,7 @@  void show_free_areas(unsigned int filter)
 			K(zone_page_state(zone, NR_SHMEM)),
 			K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
 			K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
-			zone_page_state(zone, NR_KERNEL_STACK) *
-				THREAD_SIZE / 1024,
+			K(zone_page_state(zone, NR_KERNEL_STACK)),
 			K(zone_page_state(zone, NR_PAGETABLE)),
 			K(zone_page_state(zone, NR_UNSTABLE_NFS)),
 			K(zone_page_state(zone, NR_BOUNCE)),