diff mbox series

mm: memcontrol: add the mempolicy interface for cgroup v2.

Message ID 20220524103638.473-1-hezhongkun.hzk@bytedance.com (mailing list archive)
State New
Headers show
Series mm: memcontrol: add the mempolicy interface for cgroup v2. | expand

Commit Message

Zhongkun He May 24, 2022, 10:36 a.m. UTC
From: Hezhongkun <hezhongkun.hzk@bytedance.com>

Mempolicy is difficult to use because it is set in-process
via a system call. We want to make it easier to use mempolicy
in cgroups, so that we can control low-priority cgroups to
allocate memory in specified nodes. So this patch want to
adds the mempolicy interface.

the mempolicy priority of memcgroup is higher than the priority
of task. The order of getting the policy is,
memcgroup->policy,task->policy or vma policy, default policy.
memcgroup's policy is owned by itself, so descendants will
not inherit it.

Signed-off-by: Hezhongkun <hezhongkun.hzk@bytedance.com>
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 42 ++++++++++++++++++++++++++++++++++++++
 mm/mempolicy.c             | 30 ++++++++++++++++++++++-----
 3 files changed, 68 insertions(+), 5 deletions(-)

Comments

Michal Hocko May 24, 2022, 10:47 a.m. UTC | #1
On Tue 24-05-22 18:36:38, hezhongkun wrote:
> From: Hezhongkun <hezhongkun.hzk@bytedance.com>
> 
> Mempolicy is difficult to use because it is set in-process
> via a system call. We want to make it easier to use mempolicy
> in cgroups, so that we can control low-priority cgroups to
> allocate memory in specified nodes. So this patch want to
> adds the mempolicy interface.
> 
> the mempolicy priority of memcgroup is higher than the priority
> of task. The order of getting the policy is,
> memcgroup->policy,task->policy or vma policy, default policy.
> memcgroup's policy is owned by itself, so descendants will
> not inherit it.

Why cannot you use cpuset cgroup?
Zhongkun He May 24, 2022, 11:46 a.m. UTC | #2
Hi Michal, thanks for your reply.
mempolicy has two functions, which nodes to choose and how to use these
nodes. cpuset can only decide the first one,it equal to 'bind' mempolicy.
If cgroups support mempolicy, we can continue to develop more policy
types. For example, allocate memory according to node weight, etc.
We would like to have more precise control over memory allocation in NUMA
server.

On Tue, May 24, 2022 at 6:47 PM Michal Hocko <mhocko@suse.com> wrote:

> On Tue 24-05-22 18:36:38, hezhongkun wrote:
> > From: Hezhongkun <hezhongkun.hzk@bytedance.com>
> >
> > Mempolicy is difficult to use because it is set in-process
> > via a system call. We want to make it easier to use mempolicy
> > in cgroups, so that we can control low-priority cgroups to
> > allocate memory in specified nodes. So this patch want to
> > adds the mempolicy interface.
> >
> > the mempolicy priority of memcgroup is higher than the priority
> > of task. The order of getting the policy is,
> > memcgroup->policy,task->policy or vma policy, default policy.
> > memcgroup's policy is owned by itself, so descendants will
> > not inherit it.
>
> Why cannot you use cpuset cgroup?
> --
> Michal Hocko
> SUSE Labs
>
Michal Hocko May 24, 2022, 12:04 p.m. UTC | #3
On Tue 24-05-22 19:46:38, 贺中坤 wrote:
> Hi Michal, thanks for your reply.
> mempolicy has two functions, which nodes to choose and how to use these
> nodes. cpuset can only decide the first one,it equal to 'bind' mempolicy.
> If cgroups support mempolicy, we can continue to develop more policy
> types. For example, allocate memory according to node weight, etc.
> We would like to have more precise control over memory allocation in NUMA
> server.

Why cputset controller cannot be extended instead?
kernel test robot May 24, 2022, 1:10 p.m. UTC | #4
Hi hezhongkun,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.18 next-20220524]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 143a6252e1b8ab424b4b293512a97cca7295c182
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20220524/202205242108.pqUxw2OF-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-1) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/6adb0a02c27c8811bee9783451ee25155baf490e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
        git checkout 6adb0a02c27c8811bee9783451ee25155baf490e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> mm/mempolicy.c:179:19: warning: no previous prototype for 'get_cgrp_or_task_policy' [-Wmissing-prototypes]
     179 | struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c: In function 'get_cgrp_or_task_policy':
>> mm/mempolicy.c:182:36: error: implicit declaration of function 'mem_cgroup_from_task'; did you mean 'perf_cgroup_from_task'? [-Werror=implicit-function-declaration]
     182 |         struct mem_cgroup *memcg = mem_cgroup_from_task(p);
         |                                    ^~~~~~~~~~~~~~~~~~~~
         |                                    perf_cgroup_from_task
>> mm/mempolicy.c:182:36: warning: initialization of 'struct mem_cgroup *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
>> mm/mempolicy.c:184:30: error: invalid use of undefined type 'struct mem_cgroup'
     184 |         pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
         |                              ^~
   mm/mempolicy.c:184:50: error: invalid use of undefined type 'struct mem_cgroup'
     184 |         pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
         |                                                  ^~
   mm/mempolicy.c: In function 'get_cgrp_or_vma_policy':
   mm/mempolicy.c:1799:36: warning: initialization of 'struct mem_cgroup *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    1799 |         struct mem_cgroup *memcg = mem_cgroup_from_task(current);
         |                                    ^~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c:1801:30: error: invalid use of undefined type 'struct mem_cgroup'
    1801 |         pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_vma_policy(vma, addr);
         |                              ^~
   mm/mempolicy.c:1801:50: error: invalid use of undefined type 'struct mem_cgroup'
    1801 |         pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_vma_policy(vma, addr);
         |                                                  ^~
   cc1: some warnings being treated as errors


vim +182 mm/mempolicy.c

   178	
 > 179	struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
   180	{
   181		struct mempolicy *pol;
 > 182		struct mem_cgroup *memcg = mem_cgroup_from_task(p);
   183	
 > 184		pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
   185		return pol;
   186	}
   187
kernel test robot May 24, 2022, 3:02 p.m. UTC | #5
Hi hezhongkun,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.18 next-20220524]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 143a6252e1b8ab424b4b293512a97cca7295c182
config: x86_64-randconfig-a016 (https://download.01.org/0day-ci/archive/20220524/202205242200.VGAUIGvw-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/6adb0a02c27c8811bee9783451ee25155baf490e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
        git checkout 6adb0a02c27c8811bee9783451ee25155baf490e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> mm/mempolicy.c:182:29: error: call to undeclared function 'mem_cgroup_from_task'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           struct mem_cgroup *memcg = mem_cgroup_from_task(p);
                                      ^
   mm/mempolicy.c:182:29: note: did you mean 'mem_cgroup_from_css'?
   include/linux/memcontrol.h:1267:20: note: 'mem_cgroup_from_css' declared here
   struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css)
                      ^
>> mm/mempolicy.c:182:21: warning: incompatible integer to pointer conversion initializing 'struct mem_cgroup *' with an expression of type 'int' [-Wint-conversion]
           struct mem_cgroup *memcg = mem_cgroup_from_task(p);
                              ^       ~~~~~~~~~~~~~~~~~~~~~~~
>> mm/mempolicy.c:184:23: error: incomplete definition of type 'struct mem_cgroup'
           pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
                           ~~~~~^
   include/linux/mm_types.h:31:8: note: forward declaration of 'struct mem_cgroup'
   struct mem_cgroup;
          ^
   mm/mempolicy.c:184:43: error: incomplete definition of type 'struct mem_cgroup'
           pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
                                               ~~~~~^
   include/linux/mm_types.h:31:8: note: forward declaration of 'struct mem_cgroup'
   struct mem_cgroup;
          ^
   mm/mempolicy.c:179:19: warning: no previous prototype for function 'get_cgrp_or_task_policy' [-Wmissing-prototypes]
   struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
                     ^
   mm/mempolicy.c:179:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
   ^
   static 
   mm/mempolicy.c:1799:29: error: call to undeclared function 'mem_cgroup_from_task'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           struct mem_cgroup *memcg = mem_cgroup_from_task(current);
                                      ^
   mm/mempolicy.c:1799:21: warning: incompatible integer to pointer conversion initializing 'struct mem_cgroup *' with an expression of type 'int' [-Wint-conversion]
           struct mem_cgroup *memcg = mem_cgroup_from_task(current);
                              ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c:1801:23: error: incomplete definition of type 'struct mem_cgroup'
           pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_vma_policy(vma, addr);
                           ~~~~~^
   include/linux/mm_types.h:31:8: note: forward declaration of 'struct mem_cgroup'
   struct mem_cgroup;
          ^
   mm/mempolicy.c:1801:43: error: incomplete definition of type 'struct mem_cgroup'
           pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_vma_policy(vma, addr);
                                               ~~~~~^
   include/linux/mm_types.h:31:8: note: forward declaration of 'struct mem_cgroup'
   struct mem_cgroup;
          ^
   3 warnings and 6 errors generated.


vim +/mem_cgroup_from_task +182 mm/mempolicy.c

   178	
   179	struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
   180	{
   181		struct mempolicy *pol;
 > 182		struct mem_cgroup *memcg = mem_cgroup_from_task(p);
   183	
 > 184		pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
   185		return pol;
   186	}
   187
kernel test robot May 24, 2022, 3:12 p.m. UTC | #6
Hi hezhongkun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.18 next-20220524]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 143a6252e1b8ab424b4b293512a97cca7295c182
config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20220524/202205242316.8f8rvh3s-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/6adb0a02c27c8811bee9783451ee25155baf490e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
        git checkout 6adb0a02c27c8811bee9783451ee25155baf490e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/mempolicy.c:179:19: warning: no previous prototype for function 'get_cgrp_or_task_policy' [-Wmissing-prototypes]
   struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
                     ^
   mm/mempolicy.c:179:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
   ^
   static 
   1 warning generated.


vim +/get_cgrp_or_task_policy +179 mm/mempolicy.c

   178	
 > 179	struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
   180	{
   181		struct mempolicy *pol;
   182		struct mem_cgroup *memcg = mem_cgroup_from_task(p);
   183	
   184		pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
   185		return pol;
   186	}
   187
kernel test robot May 25, 2022, 7:56 a.m. UTC | #7
Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 6adb0a02c27c8811bee9783451ee25155baf490e ("[PATCH] mm: memcontrol: add the mempolicy interface for cgroup v2.")
url: https://github.com/intel-lab-lkp/linux/commits/hezhongkun/mm-memcontrol-add-the-mempolicy-interface-for-cgroup-v2/20220524-183922
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 143a6252e1b8ab424b4b293512a97cca7295c182
patch link: https://lore.kernel.org/lkml/20220524103638.473-1-hezhongkun.hzk@bytedance.com

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


[    1.775514][    T2] WARNING: suspicious RCU usage
[    1.776115][    T2] 5.18.0-01158-g6adb0a02c27c #10 Not tainted
[    1.776513][    T2] -----------------------------
[    1.777133][    T2] include/linux/cgroup.h:495 suspicious rcu_dereference_check() usage!
[    1.777513][    T2]
[    1.777513][    T2] other info that might help us debug this:
[    1.777513][    T2]
[    1.778513][    T2]
[    1.778513][    T2] rcu_scheduler_active = 1, debug_locks = 1
[    1.779493][    T2] no locks held by kthreadd/2.
[    1.779514][    T2]
[    1.779514][    T2] stack backtrace:
[    1.780272][    T2] CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.18.0-01158-g6adb0a02c27c #10
[    1.780509][    T2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[    1.780509][    T2] Call Trace:
[    1.780509][    T2]  <TASK>
[ 1.780509][ T2] dump_stack_lvl (kbuild/src/x86_64-2/lib/dump_stack.c:107 (discriminator 4)) 
[ 1.780509][ T2] mem_cgroup_from_task (kbuild/src/x86_64-2/include/linux/cgroup.h:495 kbuild/src/x86_64-2/mm/memcontrol.c:909) 
[ 1.780509][ T2] get_cgrp_or_task_policy (kbuild/src/x86_64-2/mm/mempolicy.c:184) 
[ 1.780509][ T2] alloc_pages (kbuild/src/x86_64-2/mm/mempolicy.c:2280) 
[ 1.780509][ T2] allocate_slab (kbuild/src/x86_64-2/mm/slub.c:1799 kbuild/src/x86_64-2/mm/slub.c:1944) 
[ 1.780509][ T2] ___slab_alloc (kbuild/src/x86_64-2/mm/slub.c:3005) 
[ 1.780509][ T2] ? dup_task_struct (kbuild/src/x86_64-2/kernel/fork.c:172 kbuild/src/x86_64-2/kernel/fork.c:971) 
[ 1.780509][ T2] kmem_cache_alloc_node (kbuild/src/x86_64-2/mm/slub.c:3092 kbuild/src/x86_64-2/mm/slub.c:3183 kbuild/src/x86_64-2/mm/slub.c:3267) 
[ 1.780509][ T2] dup_task_struct (kbuild/src/x86_64-2/kernel/fork.c:172 kbuild/src/x86_64-2/kernel/fork.c:971) 
[ 1.780509][ T2] ? trace_hardirqs_on (kbuild/src/x86_64-2/kernel/trace/trace_preemptirq.c:50 (discriminator 22)) 
[ 1.780509][ T2] copy_process (kbuild/src/x86_64-2/kernel/fork.c:2073) 
[ 1.780509][ T2] ? alloc_chain_hlocks (kbuild/src/x86_64-2/kernel/locking/lockdep.c:3455) 
[ 1.780509][ T2] ? add_chain_cache (kbuild/src/x86_64-2/kernel/locking/lockdep.c:3664) 
[ 1.780509][ T2] ? __lock_acquire (kbuild/src/x86_64-2/kernel/locking/lockdep.c:5029) 
[ 1.780509][ T2] ? __cleanup_sighand (kbuild/src/x86_64-2/kernel/fork.c:1982) 
[ 1.780509][ T2] ? finish_task_switch+0x20f/0x900 
[ 1.780509][ T2] ? check_prev_add (kbuild/src/x86_64-2/kernel/locking/lockdep.c:3759) 
[ 1.780509][ T2] ? __lock_release (kbuild/src/x86_64-2/kernel/locking/lockdep.c:5317) 
[ 1.780509][ T2] kernel_clone (kbuild/src/x86_64-2/kernel/fork.c:2644) 
[ 1.780509][ T2] ? create_io_thread (kbuild/src/x86_64-2/kernel/fork.c:2604) 
[ 1.780509][ T2] ? __lock_acquire (kbuild/src/x86_64-2/kernel/locking/lockdep.c:5029) 
[ 1.780509][ T2] ? finish_task_switch+0x214/0x900 
[ 1.780509][ T2] ? find_held_lock (kbuild/src/x86_64-2/kernel/locking/lockdep.c:5132) 
[ 1.780509][ T2] kernel_thread (kbuild/src/x86_64-2/kernel/fork.c:2687) 
[ 1.780509][ T2] ? __ia32_sys_clone3 (kbuild/src/x86_64-2/kernel/fork.c:2687) 
[ 1.780509][ T2] ? lock_downgrade (kbuild/src/x86_64-2/kernel/locking/lockdep.c:5293) 
[ 1.780509][ T2] ? kthread_complete_and_exit (kbuild/src/x86_64-2/kernel/kthread.c:331) 
[ 1.780509][ T2] ? kthreadd (kbuild/src/x86_64-2/kernel/kthread.c:396 kbuild/src/x86_64-2/kernel/kthread.c:745) 
[ 1.780509][ T2] ? do_raw_spin_unlock (kbuild/src/x86_64-2/arch/x86/include/asm/atomic.h:29 kbuild/src/x86_64-2/include/linux/atomic/atomic-instrumented.h:28 kbuild/src/x86_64-2/include/asm-generic/qspinlock.h:28 kbuild/src/x86_64-2/kernel/locking/spinlock_debug.c:100 kbuild/src/x86_64-2/kernel/locking/spinlock_debug.c:140) 
[ 1.780509][ T2] kthreadd (kbuild/src/x86_64-2/kernel/kthread.c:400 kbuild/src/x86_64-2/kernel/kthread.c:745) 
[ 1.780509][ T2] ? kthread_is_per_cpu (kbuild/src/x86_64-2/kernel/kthread.c:718) 
[ 1.780509][ T2] ret_from_fork (kbuild/src/x86_64-2/arch/x86/entry/entry_64.S:308) 
[    1.780509][    T2]  </TASK>
[    1.781590][    T1] cblist_init_generic: Setting adjustable number of callback queues.
[    1.782518][    T1] cblist_init_generic: Setting shift to 1 and lim to 1.
[    1.783730][    T1] cblist_init_generic: Setting shift to 1 and lim to 1.
[    1.784646][    T1] Running RCU-tasks wait API self tests
[    1.785657][    T1] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[    1.787556][    T1] rcu: Hierarchical SRCU implementation.
[    1.791308][    T1] NMI watchdog: Perf NMI watchdog permanently disabled
[    1.792109][    T1] smp: Bringing up secondary CPUs ...
[    1.793634][    T1] x86: Booting SMP configuration:
[    1.794300][    T1] .... node  #0, CPUs:      #1
[    0.090644][    T0] masked ExtINT on CPU#1
[    1.797615][    T1] smp: Brought up 1 node, 2 CPUs
[    1.798527][    T1] smpboot: Max logical packages: 1
[    1.799519][    T1] smpboot: Total of 2 processors activated (8380.31 BogoMIPS)
[    1.802552][   T11] Callback from call_rcu_tasks_trace() invoked.
[    1.898728][   T10] Callback from call_rcu_tasks_rude() invoked.
[    1.998585][   T22] node 0 deferred pages initialised in 196ms
[    2.099652][    T1] allocated 268435456 bytes of page_ext
[    2.100769][    T1] Node 0, zone      DMA: page owner found early allocated 0 pages
[    2.106388][    T1] Node 0, zone    DMA32: page owner found early allocated 0 pages
[    2.143231][    T1] Node 0, zone   Normal: page owner found early allocated 66872 pages
[    2.145828][    T1] devtmpfs: initialized
[    2.147626][    T1] x86/mm: Memory block size: 128MB
[    2.195988][    T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    2.197567][    T1] futex hash table entries: 512 (order: 4, 65536 bytes, linear)
[    2.199426][    T1] pinctrl core: initialized pinctrl subsystem
[    2.212984][    T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    2.217521][    T1] audit: initializing netlink subsys (disabled)
[    2.219652][   T27] audit: type=2000 audit(1653397015.364:1): state=initialized audit_enabled=0 res=1
[    2.222174][    T1] thermal_sys: Registered thermal governor 'fair_share'
[    2.222184][    T1] thermal_sys: Registered thermal governor 'bang_bang'
[    2.222529][    T1] thermal_sys: Registered thermal governor 'step_wise'
[    2.223522][    T1] thermal_sys: Registered thermal governor 'user_space'
[    2.224756][    T1] cpuidle: using governor menu
[    2.227738][    T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    2.229834][    T1] PCI: Using configuration type 1 for base access
[    2.279233][    T1] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[    2.281673][    T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    2.285568][    T1] cryptd: max_cpu_qlen set to 1000
[    2.291676][    T1] ACPI: Added _OSI(Module Device)
[    2.292523][    T1] ACPI: Added _OSI(Processor Device)
[    2.293523][    T1] ACPI: Added _OSI(3.0 _SCP Extensions)
[    2.294523][    T1] ACPI: Added _OSI(Processor Aggregator Device)
[    2.295547][    T1] ACPI: Added _OSI(Linux-Dell-Video)
[    2.296535][    T1] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    2.297539][    T1] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[    2.347738][    T1] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    2.363724][    T1] ACPI: Interpreter enabled
[    2.364811][    T1] ACPI: PM: (supports S0 S3 S4 S5)
[    2.365567][    T1] ACPI: Using IOAPIC for interrupt routing
[    2.366799][    T1] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    2.370916][    T1] ACPI: Enabled 2 GPEs in block 00 to 0F


To reproduce:

        # build kernel
	cd linux
	cp config-5.18.0-01158-g6adb0a02c27c .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 89b14729d59f..2261eeb6100c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -343,6 +343,7 @@  struct mem_cgroup {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	struct deferred_split deferred_split_queue;
 #endif
+	struct mempolicy *mempolicy;
 
 	struct mem_cgroup_per_node *nodeinfo[];
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 598fece89e2b..38108fd4df64 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6332,6 +6332,42 @@  static int memory_numa_stat_show(struct seq_file *m, void *v)
 
 	return 0;
 }
+
+static int memory_policy_show(struct seq_file *m, void *v)
+{
+	char buffer[64];
+	struct mempolicy *mpol = mem_cgroup_from_seq(m)->mempolicy;
+
+	memset(buffer, 0, sizeof(buffer));
+
+	if (!mpol || mpol->mode == MPOL_DEFAULT)
+		return 0;
+
+	mpol_to_str(buffer, sizeof(buffer), mpol);
+	seq_printf(m, buffer);
+	seq_putc(m, '\n');
+	return 0;
+}
+
+static ssize_t memory_policy_write(struct kernfs_open_file *of,
+				char *buf, size_t nbytes, loff_t off)
+{
+	int err = 1;
+	struct mempolicy *mpol, *old;
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+
+	old = memcg->mempolicy;
+	buf = strstrip(buf);
+	err = mpol_parse_str(buf, &mpol);
+
+	if (err)
+		goto out;
+	mpol_put(old);
+	memcg->mempolicy = mpol;
+out:
+	return nbytes;
+}
+
 #endif
 
 static int memory_oom_group_show(struct seq_file *m, void *v)
@@ -6416,6 +6452,12 @@  static struct cftype memory_files[] = {
 		.name = "numa_stat",
 		.seq_show = memory_numa_stat_show,
 	},
+	{
+		.name = "policy",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_policy_show,
+		.write = memory_policy_write,
+	},
 #endif
 	{
 		.name = "oom.group",
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8c74107a2b15..5153b046f8c3 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -176,6 +176,16 @@  struct mempolicy *get_task_policy(struct task_struct *p)
 	return &default_policy;
 }
 
+struct mempolicy *get_cgrp_or_task_policy(struct task_struct *p)
+{
+	struct mempolicy *pol;
+	struct mem_cgroup *memcg = mem_cgroup_from_task(p);
+
+	pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_task_policy(p);
+	return pol;
+}
+
+
 static const struct mempolicy_operations {
 	int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
 	void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes);
@@ -1782,6 +1792,16 @@  static struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
 	return pol;
 }
 
+static struct mempolicy *get_cgrp_or_vma_policy(struct vm_area_struct *vma,
+						unsigned long addr)
+{
+	struct mempolicy *pol;
+	struct mem_cgroup *memcg = mem_cgroup_from_task(current);
+
+	pol = (memcg && memcg->mempolicy) ? memcg->mempolicy : get_vma_policy(vma, addr);
+	return pol;
+}
+
 bool vma_policy_mof(struct vm_area_struct *vma)
 {
 	struct mempolicy *pol;
@@ -1896,7 +1916,7 @@  unsigned int mempolicy_slab_node(void)
 	if (!in_task())
 		return node;
 
-	policy = current->mempolicy;
+	policy = get_cgrp_or_task_policy(current);
 	if (!policy)
 		return node;
 
@@ -2005,7 +2025,7 @@  int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags,
 	int nid;
 	int mode;
 
-	*mpol = get_vma_policy(vma, addr);
+	*mpol = get_cgrp_or_vma_policy(vma, addr);
 	*nodemask = NULL;
 	mode = (*mpol)->mode;
 
@@ -2158,7 +2178,7 @@  struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	int preferred_nid;
 	nodemask_t *nmask;
 
-	pol = get_vma_policy(vma, addr);
+	pol = get_cgrp_or_vma_policy(vma, addr);
 
 	if (pol->mode == MPOL_INTERLEAVE) {
 		unsigned nid;
@@ -2257,7 +2277,7 @@  struct page *alloc_pages(gfp_t gfp, unsigned order)
 	struct page *page;
 
 	if (!in_interrupt() && !(gfp & __GFP_THISNODE))
-		pol = get_task_policy(current);
+		pol = get_cgrp_or_task_policy(current);
 
 	/*
 	 * No reference counting needed for current->mempolicy
@@ -2562,7 +2582,7 @@  int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 	int polnid = NUMA_NO_NODE;
 	int ret = NUMA_NO_NODE;
 
-	pol = get_vma_policy(vma, addr);
+	pol = get_cgrp_or_vma_policy(vma, addr);
 	if (!(pol->flags & MPOL_F_MOF))
 		goto out;