diff mbox series

mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task

Message ID 20210813164053.51481-1-yanghui.def@bytedance.com (mailing list archive)
State New
Headers show
Series mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task | expand

Commit Message

yanghui Aug. 13, 2021, 4:40 p.m. UTC
Servers happened below panic:
Kernel version:5.4.56
BUG: unable to handle page fault for address: 0000000000002c48
RIP: 0010:__next_zones_zonelist+0x1d/0x40
[264003.977696] RAX: 0000000000002c40 RBX: 0000000000100dca RCX: 0000000000000014
[264003.977872] Call Trace:
[264003.977888]  __alloc_pages_nodemask+0x277/0x310
[264003.977908]  alloc_page_interleave+0x13/0x70
[264003.977926]  handle_mm_fault+0xf99/0x1390
[264003.977951]  __do_page_fault+0x288/0x500
[264003.977979]  ? schedule+0x39/0xa0
[264003.977994]  do_page_fault+0x30/0x110
[264003.978010]  page_fault+0x3e/0x50

The reason of panic is that MAX_NUMNODES is passd in the third parameter
in function __alloc_pages_nodemask(preferred_nid). So if to access
zonelist->zoneref->zone_idx in __next_zones_zonelist the panic will happen.

In offset_il_node(), first_node() return nid from pol->v.nodes, after
this other threads may changed pol->v.nodes before next_node().
This race condition will let next_node return MAX_NUMNODES.So put
pol->nodes in a local variable.

The race condition is between offset_il_node and cpuset_change_task_nodemask:
CPU0:                                     CPU1:
alloc_pages_vma()
  interleave_nid(pol,)
    offset_il_node(pol,)
      first_node(pol->v.nodes)            cpuset_change_task_nodemask
                      //nodes==0xc          mpol_rebind_task
                                              mpol_rebind_policy
                                                mpol_rebind_nodemask(pol,nodes)
                      //nodes==0x3
      next_node(nid, pol->v.nodes)//return MAX_NUMNODES

Signed-off-by: yanghui <yanghui.def@bytedance.com>
---
 mm/mempolicy.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Matthew Wilcox Aug. 13, 2021, 4:57 p.m. UTC | #1
On Sat, Aug 14, 2021 at 12:40:53AM +0800, yanghui wrote:
> +++ b/mm/mempolicy.c
> @@ -193,7 +193,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes)
>  {
>  	if (nodes_empty(*nodes))
>  		return -EINVAL;
> -	pol->nodes = *nodes;
> +	WRITE_ONCE(pol->nodes, *nodes);

typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;

If MAX_NUMNODES is large enough, is WRITE_ONCE going to work?  It could
be 128 bits, and few architectures have an atomic 128-bit store
instruction.
kernel test robot Aug. 13, 2021, 11:16 p.m. UTC | #2
Hi yanghui,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.14-rc5]
[cannot apply to hnaz-linux-mm/master next-20210813]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yanghui/mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task/20210814-004451
base:    36a21d51725af2ce0700c6ebcb6b9594aac658a6
config: x86_64-randconfig-a003-20210813 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 62df4df41c939205b2dc0a2a3bfb75b8c1ed74fa)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/99d8d888eb92ea46a5f4883773f3edaee5ccd28e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review yanghui/mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task/20210814-004451
        git checkout 99d8d888eb92ea46a5f4883773f3edaee5ccd28e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: __compiletime_assert_467
   >>> referenced by mempolicy.c:1968 (mm/mempolicy.c:1968)
   >>> mempolicy.o:(offset_il_node) in archive mm/built-in.a

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Nathan Chancellor Aug. 14, 2021, 1:35 a.m. UTC | #3
On Sat, Aug 14, 2021 at 07:16:24AM +0800, kernel test robot wrote:
> Hi yanghui,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on v5.14-rc5]
> [cannot apply to hnaz-linux-mm/master next-20210813]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/yanghui/mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task/20210814-004451
> base:    36a21d51725af2ce0700c6ebcb6b9594aac658a6
> config: x86_64-randconfig-a003-20210813 (attached as .config)
> compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 62df4df41c939205b2dc0a2a3bfb75b8c1ed74fa)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/0day-ci/linux/commit/99d8d888eb92ea46a5f4883773f3edaee5ccd28e
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review yanghui/mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task/20210814-004451
>         git checkout 99d8d888eb92ea46a5f4883773f3edaee5ccd28e
>         # save the attached .config to linux build tree
>         mkdir build_dir
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
> >> ld.lld: error: undefined symbol: __compiletime_assert_467
>    >>> referenced by mempolicy.c:1968 (mm/mempolicy.c:1968)
>    >>> mempolicy.o:(offset_il_node) in archive mm/built-in.a

Grabbing Nick's patch [1] to support the error function attribute in
clang and the kernel side patch [2] turns this into:

mm/mempolicy.c:1968:24: error: call to '__compiletime_assert_467' declared with 'error' attribute: Unsupported access size for {READ,WRITE}_ONCE().
        nodemask_t nodemask = READ_ONCE(pol->nodes);
                              ^
./include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
        compiletime_assert_rwonce_type(x);                              \
        ^
./include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
        ^
./include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
./include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
        __compiletime_assert(condition, msg, prefix, suffix)
        ^
./include/linux/compiler_types.h:303:4: note: expanded from macro '__compiletime_assert'
                        prefix ## suffix();                             \
                        ^
<scratch space>:246:1: note: expanded from here
__compiletime_assert_467
^
mm/mempolicy.c:214:2: error: call to '__compiletime_assert_448' declared with 'error' attribute: Unsupported access size for {READ,WRITE}_ONCE().
        WRITE_ONCE(pol->nodes, *nodes);
        ^
./include/asm-generic/rwonce.h:60:2: note: expanded from macro 'WRITE_ONCE'
        compiletime_assert_rwonce_type(x);                              \
        ^
./include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
        ^
./include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
./include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
        __compiletime_assert(condition, msg, prefix, suffix)
        ^
./include/linux/compiler_types.h:303:4: note: expanded from macro '__compiletime_assert'
                        prefix ## suffix();                             \
                        ^
<scratch space>:16:1: note: expanded from here
__compiletime_assert_448
^
mm/mempolicy.c:337:2: error: call to '__compiletime_assert_452' declared with 'error' attribute: Unsupported access size for {READ,WRITE}_ONCE().
        WRITE_ONCE(pol->nodes, tmp);
        ^
./include/asm-generic/rwonce.h:60:2: note: expanded from macro 'WRITE_ONCE'
        compiletime_assert_rwonce_type(x);                              \
        ^
./include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
        ^
./include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
./include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
        __compiletime_assert(condition, msg, prefix, suffix)
        ^
./include/linux/compiler_types.h:303:4: note: expanded from macro '__compiletime_assert'
                        prefix ## suffix();                             \
                        ^
<scratch space>:38:1: note: expanded from here
__compiletime_assert_452
^
mm/mempolicy.c:196:2: error: call to '__compiletime_assert_447' declared with 'error' attribute: Unsupported access size for {READ,WRITE}_ONCE().
        WRITE_ONCE(pol->nodes, *nodes);
        ^
./include/asm-generic/rwonce.h:60:2: note: expanded from macro 'WRITE_ONCE'
        compiletime_assert_rwonce_type(x);                              \
        ^
./include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
        ^
./include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
./include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
        __compiletime_assert(condition, msg, prefix, suffix)
        ^
./include/linux/compiler_types.h:303:4: note: expanded from macro '__compiletime_assert'
                        prefix ## suffix();                             \
                        ^
<scratch space>:13:1: note: expanded from here
__compiletime_assert_447
^
4 errors generated.

Which appears to be what Matthew mentioned, as this config has
CONFIG_NODES_SHIFT=10 so nodemask_t is 128 bytes large.

[1]: https://reviews.llvm.org/D106030
[2]: https://lore.kernel.org/r/20210802202326.1817503-1-ndesaulniers@google.com/

Cheers,
Nathan
kernel test robot Aug. 14, 2021, 1:40 a.m. UTC | #4
Hi yanghui,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.14-rc5]
[cannot apply to hnaz-linux-mm/master next-20210813]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yanghui/mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task/20210814-004451
base:    36a21d51725af2ce0700c6ebcb6b9594aac658a6
config: ia64-randconfig-c024-20210813 (attached as .config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/99d8d888eb92ea46a5f4883773f3edaee5ccd28e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review yanghui/mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task/20210814-004451
        git checkout 99d8d888eb92ea46a5f4883773f3edaee5ccd28e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=ia64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/ia64/include/asm/pgtable.h:153,
                    from include/linux/pgtable.h:6,
                    from arch/ia64/include/asm/uaccess.h:40,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/dax.h:5,
                    from include/linux/mempolicy.h:11,
                    from mm/mempolicy.c:70:
   arch/ia64/include/asm/mmu_context.h: In function 'reload_context':
   arch/ia64/include/asm/mmu_context.h:127:48: warning: variable 'old_rr4' set but not used [-Wunused-but-set-variable]
     127 |         unsigned long rr0, rr1, rr2, rr3, rr4, old_rr4;
         |                                                ^~~~~~~
   In file included from <command-line>:
   mm/mempolicy.c: In function 'mpol_new_interleave':
>> include/linux/compiler_types.h:328:45: error: call to '__compiletime_assert_319' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:309:25: note: in definition of macro '__compiletime_assert'
     309 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:328:9: note: in expansion of macro '_compiletime_assert'
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      60 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c:196:9: note: in expansion of macro 'WRITE_ONCE'
     196 |         WRITE_ONCE(pol->nodes, *nodes);
         |         ^~~~~~~~~~
   mm/mempolicy.c: In function 'mpol_new_bind':
   include/linux/compiler_types.h:328:45: error: call to '__compiletime_assert_320' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:309:25: note: in definition of macro '__compiletime_assert'
     309 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:328:9: note: in expansion of macro '_compiletime_assert'
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      60 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c:214:9: note: in expansion of macro 'WRITE_ONCE'
     214 |         WRITE_ONCE(pol->nodes, *nodes);
         |         ^~~~~~~~~~
   mm/mempolicy.c: In function 'offset_il_node':
   include/linux/compiler_types.h:328:45: error: call to '__compiletime_assert_326' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:309:25: note: in definition of macro '__compiletime_assert'
     309 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:328:9: note: in expansion of macro '_compiletime_assert'
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:49:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      49 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c:1968:31: note: in expansion of macro 'READ_ONCE'
    1968 |         nodemask_t nodemask = READ_ONCE(pol->nodes);
         |                               ^~~~~~~~~
   mm/mempolicy.c: In function 'mpol_rebind_nodemask':
   include/linux/compiler_types.h:328:45: error: call to '__compiletime_assert_321' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:309:25: note: in definition of macro '__compiletime_assert'
     309 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:328:9: note: in expansion of macro '_compiletime_assert'
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      60 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/mempolicy.c:337:9: note: in expansion of macro 'WRITE_ONCE'
     337 |         WRITE_ONCE(pol->nodes, tmp);
         |         ^~~~~~~~~~


vim +/__compiletime_assert_319 +328 include/linux/compiler_types.h

eb5c2d4b45e3d2d Will Deacon 2020-07-21  314  
eb5c2d4b45e3d2d Will Deacon 2020-07-21  315  #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2d Will Deacon 2020-07-21  316  	__compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2d Will Deacon 2020-07-21  317  
eb5c2d4b45e3d2d Will Deacon 2020-07-21  318  /**
eb5c2d4b45e3d2d Will Deacon 2020-07-21  319   * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2d Will Deacon 2020-07-21  320   * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2d Will Deacon 2020-07-21  321   * @msg:       a message to emit if condition is false
eb5c2d4b45e3d2d Will Deacon 2020-07-21  322   *
eb5c2d4b45e3d2d Will Deacon 2020-07-21  323   * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2d Will Deacon 2020-07-21  324   * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2d Will Deacon 2020-07-21  325   * compiler has support to do so.
eb5c2d4b45e3d2d Will Deacon 2020-07-21  326   */
eb5c2d4b45e3d2d Will Deacon 2020-07-21  327  #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2d Will Deacon 2020-07-21 @328  	_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2d Will Deacon 2020-07-21  329  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Muchun Song Aug. 14, 2021, 2:37 a.m. UTC | #5
On Sat, Aug 14, 2021 at 12:58 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Aug 14, 2021 at 12:40:53AM +0800, yanghui wrote:
> > +++ b/mm/mempolicy.c
> > @@ -193,7 +193,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes)
> >  {
> >       if (nodes_empty(*nodes))
> >               return -EINVAL;
> > -     pol->nodes = *nodes;
> > +     WRITE_ONCE(pol->nodes, *nodes);
>
> typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
>
> If MAX_NUMNODES is large enough, is WRITE_ONCE going to work?  It could
> be 128 bits, and few architectures have an atomic 128-bit store
> instruction.
>

Hi Matthew,

In my memory, the WRITE_ONCE will become memcpy if
the size is greater than sizeof(long). But I found that it didn't
support the size over sizeof(long) nowadays. So you are
right. It doesn't work when MAX_NUMNODES is large enough.

If we don't use WRITE_ONCE/READ_ONCE here, it seems
like no problem. At least we can sure offset_il_node returns
a valid node id. What is your opinion about this fix?

Thanks,
Muchun
Matthew Wilcox Aug. 14, 2021, 3:11 a.m. UTC | #6
On Sat, Aug 14, 2021 at 10:37:23AM +0800, Muchun Song wrote:
> On Sat, Aug 14, 2021 at 12:58 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sat, Aug 14, 2021 at 12:40:53AM +0800, yanghui wrote:
> > > +++ b/mm/mempolicy.c
> > > @@ -193,7 +193,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes)
> > >  {
> > >       if (nodes_empty(*nodes))
> > >               return -EINVAL;
> > > -     pol->nodes = *nodes;
> > > +     WRITE_ONCE(pol->nodes, *nodes);
> >
> > typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
> >
> > If MAX_NUMNODES is large enough, is WRITE_ONCE going to work?  It could
> > be 128 bits, and few architectures have an atomic 128-bit store
> > instruction.
> >
> 
> Hi Matthew,
> 
> In my memory, the WRITE_ONCE will become memcpy if
> the size is greater than sizeof(long). But I found that it didn't
> support the size over sizeof(long) nowadays. So you are
> right. It doesn't work when MAX_NUMNODES is large enough.
> 
> If we don't use WRITE_ONCE/READ_ONCE here, it seems
> like no problem. At least we can sure offset_il_node returns
> a valid node id. What is your opinion about this fix?

Yes, I think that's a reasonable fix.  Include a comment about how
copying pol->nodes to the stack might race with a change to pol->nodes,
but that doesn't matter as long as we don't look at pol->nodes later?
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e32360e90274..9c3c168af3e2 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -193,7 +193,7 @@  static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes)
 {
 	if (nodes_empty(*nodes))
 		return -EINVAL;
-	pol->nodes = *nodes;
+	WRITE_ONCE(pol->nodes, *nodes);
 	return 0;
 }
 
@@ -211,7 +211,7 @@  static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes)
 {
 	if (nodes_empty(*nodes))
 		return -EINVAL;
-	pol->nodes = *nodes;
+	WRITE_ONCE(pol->nodes, *nodes);
 	return 0;
 }
 
@@ -334,7 +334,7 @@  static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes)
 	if (nodes_empty(tmp))
 		tmp = *nodes;
 
-	pol->nodes = tmp;
+	WRITE_ONCE(pol->nodes, tmp);
 }
 
 static void mpol_rebind_preferred(struct mempolicy *pol,
@@ -1965,7 +1965,8 @@  unsigned int mempolicy_slab_node(void)
  */
 static unsigned offset_il_node(struct mempolicy *pol, unsigned long n)
 {
-	unsigned nnodes = nodes_weight(pol->nodes);
+	nodemask_t nodemask = READ_ONCE(pol->nodes);
+	unsigned nnodes = nodes_weight(nodemask);
 	unsigned target;
 	int i;
 	int nid;
@@ -1973,9 +1974,9 @@  static unsigned offset_il_node(struct mempolicy *pol, unsigned long n)
 	if (!nnodes)
 		return numa_node_id();
 	target = (unsigned int)n % nnodes;
-	nid = first_node(pol->nodes);
+	nid = first_node(nodemask);
 	for (i = 0; i < target; i++)
-		nid = next_node(nid, pol->nodes);
+		nid = next_node(nid, nodemask);
 	return nid;
 }