Message ID | 20250227-pageblock-lockdep-v1-1-3701efb331bb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/page_alloc: Add lockdep assertion for pageblock type change | expand |
On Thu, 27 Feb 2025 16:15:47 +0000 Brendan Jackman <jackmanb@google.com> wrote: > Since the migratetype hygiene patches [0], the locking here is > a bit more formalised, so write it down with an assert. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, > > void set_pageblock_migratetype(struct page *page, int migratetype) > { > + lockdep_assert_once(system_state == SYSTEM_BOOTING || > + in_mem_hotplug() || > + lockdep_is_held(&page_zone(page)->lock)); > + > if (unlikely(page_group_by_mobility_disabled && > migratetype < MIGRATE_PCPTYPES)) > migratetype = MIGRATE_UNMOVABLE; > We could add such assertions all over the place. Why this place in particular?
On Thu, Feb 27, 2025 at 02:33:02PM -0800, Andrew Morton wrote: > On Thu, 27 Feb 2025 16:15:47 +0000 Brendan Jackman <jackmanb@google.com> wrote: > > > Since the migratetype hygiene patches [0], the locking here is > > a bit more formalised, so write it down with an assert. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, > > > > void set_pageblock_migratetype(struct page *page, int migratetype) > > { > > + lockdep_assert_once(system_state == SYSTEM_BOOTING || > > + in_mem_hotplug() || > > + lockdep_is_held(&page_zone(page)->lock)); > > + > > if (unlikely(page_group_by_mobility_disabled && > > migratetype < MIGRATE_PCPTYPES)) > > migratetype = MIGRATE_UNMOVABLE; > > > > We could add such assertions all over the place. Why this place in > particular? For the other stuff, it's pretty obvious that it would be protected by the zone lock (or, I don't know about it!). But it didn't seem totally self-evident to me that it should protect the pageblock type. So it seems particularly helpful to have it written in the code. I may be heavily biased about this though, because of the code I'm working on for [0]. I use the pageblock type to remember whether it's mapped in the ASI restricted address space. [0] https://lore.kernel.org/linux-mm/CA+i-1C1gOBLxRxE5YFGzeayYWBYyE_X6oH4D=9eVePt4=ehTig@mail.gmail.com/T/
On 2/28/25 10:20, Brendan Jackman wrote: > On Thu, Feb 27, 2025 at 02:33:02PM -0800, Andrew Morton wrote: >> On Thu, 27 Feb 2025 16:15:47 +0000 Brendan Jackman <jackmanb@google.com> wrote: >> >> > Since the migratetype hygiene patches [0], the locking here is >> > a bit more formalised, so write it down with an assert. >> > >> > ... >> > >> > --- a/mm/page_alloc.c >> > +++ b/mm/page_alloc.c >> > @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, >> > >> > void set_pageblock_migratetype(struct page *page, int migratetype) >> > { >> > + lockdep_assert_once(system_state == SYSTEM_BOOTING || >> > + in_mem_hotplug() || >> > + lockdep_is_held(&page_zone(page)->lock)); >> > + >> > if (unlikely(page_group_by_mobility_disabled && >> > migratetype < MIGRATE_PCPTYPES)) >> > migratetype = MIGRATE_UNMOVABLE; >> > >> >> We could add such assertions all over the place. Why this place in >> particular? > > For the other stuff, it's pretty obvious that it would be protected by > the zone lock (or, I don't know about it!). But it didn't seem totally > self-evident to me that it should protect the pageblock type. So it > seems particularly helpful to have it written in the code. I guess that's a good addition to the changelog. Acked-by: Vlastimil Babka <vbabka@suse.cz> > > I may be heavily biased about this though, because of the code I'm > working on for [0]. I use the pageblock type to remember whether it's > mapped in the ASI restricted address space. > > [0] https://lore.kernel.org/linux-mm/CA+i-1C1gOBLxRxE5YFGzeayYWBYyE_X6oH4D=9eVePt4=ehTig@mail.gmail.com/T/ >
Hi Brendan, kernel test robot noticed the following build errors: [auto build test ERROR on d58172d128acbafa2295aa17cc96e28260da9a86] url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/mm-page_alloc-Add-lockdep-assertion-for-pageblock-type-change/20250228-002107 base: d58172d128acbafa2295aa17cc96e28260da9a86 patch link: https://lore.kernel.org/r/20250227-pageblock-lockdep-v1-1-3701efb331bb%40google.com patch subject: [PATCH] mm/page_alloc: Add lockdep assertion for pageblock type change config: x86_64-buildonly-randconfig-002-20250228 (https://download.01.org/0day-ci/archive/20250301/202503010129.rJvGqZN1-lkp@intel.com/config) compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010129.rJvGqZN1-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202503010129.rJvGqZN1-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from mm/page_alloc.c:19: In file included from include/linux/mm.h:2302: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from mm/page_alloc.c:44: include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~ ^ ~~~ include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 49 | NR_ZONE_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~~~~~~ ^ ~~~ >> mm/page_alloc.c:421:3: error: call to undeclared function 'in_mem_hotplug'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 421 | in_mem_hotplug() || | ^ mm/page_alloc.c:2857:2: warning: arithmetic between different enumeration types ('enum vm_event_item' and 'enum zone_type') [-Wenum-enum-conversion] 2857 | __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:139:34: note: expanded from macro '__count_zid_vm_events' 139 | __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) | ~~~~~~~~~~~~~ ^ ~~~~~~~~~~~ mm/page_alloc.c:2974:3: warning: arithmetic between different enumeration types ('enum vm_event_item' and 'enum zone_type') [-Wenum-enum-conversion] 2974 | __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:139:34: note: expanded from macro '__count_zid_vm_events' 139 | __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) | ~~~~~~~~~~~~~ ^ ~~~~~~~~~~~ mm/page_alloc.c:4745:2: warning: arithmetic between different enumeration types ('enum vm_event_item' and 'enum zone_type') [-Wenum-enum-conversion] 4745 | __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:139:34: note: expanded from macro '__count_zid_vm_events' 139 | __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) | ~~~~~~~~~~~~~ ^ ~~~~~~~~~~~ 6 warnings and 1 error generated. vim +/in_mem_hotplug +421 mm/page_alloc.c 417 418 void set_pageblock_migratetype(struct page *page, int migratetype) 419 { 420 lockdep_assert_once(system_state == SYSTEM_BOOTING || > 421 in_mem_hotplug() || 422 lockdep_is_held(&page_zone(page)->lock)); 423 424 if (unlikely(page_group_by_mobility_disabled && 425 migratetype < MIGRATE_PCPTYPES)) 426 migratetype = MIGRATE_UNMOVABLE; 427 428 set_pfnblock_flags_mask(page, (unsigned long)migratetype, 429 page_to_pfn(page), MIGRATETYPE_MASK); 430 } 431
On Sat, Mar 01, 2025 at 01:31:30AM +0800, kernel test robot wrote: > Hi Brendan, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on d58172d128acbafa2295aa17cc96e28260da9a86] > > url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/mm-page_alloc-Add-lockdep-assertion-for-pageblock-type-change/20250228-002107 > base: d58172d128acbafa2295aa17cc96e28260da9a86 > patch link: https://lore.kernel.org/r/20250227-pageblock-lockdep-v1-1-3701efb331bb%40google.com > patch subject: [PATCH] mm/page_alloc: Add lockdep assertion for pageblock type change > config: x86_64-buildonly-randconfig-002-20250228 (https://download.01.org/0day-ci/archive/20250301/202503010129.rJvGqZN1-lkp@intel.com/config) > compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90) > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010129.rJvGqZN1-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202503010129.rJvGqZN1-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > In file included from mm/page_alloc.c:19: > In file included from include/linux/mm.h:2302: > include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] > 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" > | ~~~~~~~~~~~ ^ ~~~ > In file included from mm/page_alloc.c:44: > include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] > 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); > | ~~~~~~~~~~~ ^ ~~~ > include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] > 49 | NR_ZONE_LRU_BASE + lru, nr_pages); > | ~~~~~~~~~~~~~~~~ ^ ~~~ > >> mm/page_alloc.c:421:3: error: call to undeclared function 'in_mem_hotplug'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] > 421 | in_mem_hotplug() || The patch is missing a dummy in_mem_hotplug() in the !CONFIG_MEMORY_HOTPLUG section of <linux/memory_hotplug.h>.
Hi Brendan, kernel test robot noticed the following build errors: [auto build test ERROR on d58172d128acbafa2295aa17cc96e28260da9a86] url: https://github.com/intel-lab-lkp/linux/commits/Brendan-Jackman/mm-page_alloc-Add-lockdep-assertion-for-pageblock-type-change/20250228-002107 base: d58172d128acbafa2295aa17cc96e28260da9a86 patch link: https://lore.kernel.org/r/20250227-pageblock-lockdep-v1-1-3701efb331bb%40google.com patch subject: [PATCH] mm/page_alloc: Add lockdep assertion for pageblock type change config: i386-buildonly-randconfig-005-20250301 (https://download.01.org/0day-ci/archive/20250301/202503010632.CEkonAAQ-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010632.CEkonAAQ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202503010632.CEkonAAQ-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from arch/x86/include/asm/bug.h:99, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:6, from mm/page_alloc.c:19: mm/page_alloc.c: In function 'set_pageblock_migratetype': >> mm/page_alloc.c:421:17: error: implicit declaration of function 'in_mem_hotplug' [-Werror=implicit-function-declaration] 421 | in_mem_hotplug() || | ^~~~~~~~~~~~~~ include/asm-generic/bug.h:171:32: note: in definition of macro 'WARN_ON' 171 | int __ret_warn_on = !!(condition); \ | ^~~~~~~~~ include/linux/lockdep.h:282:14: note: in expansion of macro 'WARN_ON_ONCE' 282 | do { WARN_ON_ONCE(debug_locks && !(cond)); } while (0) | ^~~~~~~~~~~~ mm/page_alloc.c:420:9: note: in expansion of macro 'lockdep_assert_once' 420 | lockdep_assert_once(system_state == SYSTEM_BOOTING || | ^~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/in_mem_hotplug +421 mm/page_alloc.c 417 418 void set_pageblock_migratetype(struct page *page, int migratetype) 419 { 420 lockdep_assert_once(system_state == SYSTEM_BOOTING || > 421 in_mem_hotplug() || 422 lockdep_is_held(&page_zone(page)->lock)); 423 424 if (unlikely(page_group_by_mobility_disabled && 425 migratetype < MIGRATE_PCPTYPES)) 426 migratetype = MIGRATE_UNMOVABLE; 427 428 set_pfnblock_flags_mask(page, (unsigned long)migratetype, 429 page_to_pfn(page), MIGRATETYPE_MASK); 430 } 431
On Fri, Feb 28, 2025 at 01:28:04PM -0500, Johannes Weiner wrote: > On Sat, Mar 01, 2025 at 01:31:30AM +0800, kernel test robot wrote: > The patch is missing a dummy in_mem_hotplug() in the > !CONFIG_MEMORY_HOTPLUG section of <linux/memory_hotplug.h>. +1, I just stumbled over and this is not fixed in today's Linux Next. I'm wondering how this was missed during merge into Linux Next. Stephen?
Hi Andy, On Mon, 3 Mar 2025 13:18:42 +0200 Andy Shevchenko <andriy.shevchenko@intel.com> wrote: > > On Fri, Feb 28, 2025 at 01:28:04PM -0500, Johannes Weiner wrote: > > On Sat, Mar 01, 2025 at 01:31:30AM +0800, kernel test robot wrote: > > > The patch is missing a dummy in_mem_hotplug() in the > > !CONFIG_MEMORY_HOTPLUG section of <linux/memory_hotplug.h>. > > +1, I just stumbled over and this is not fixed in today's Linux Next. I'm > wondering how this was missed during merge into Linux Next. Stephen? I just get what people put in their trees. There are no conflicts around this and none of my builds failed, so I didn't see the problem. Has someone sent a fix patch to Andrew? If so, if you forward it to me, I will add it to linux-next today.
On Tue, Mar 04, 2025 at 10:13:57AM +1100, Stephen Rothwell wrote: > Hi Andy, > > On Mon, 3 Mar 2025 13:18:42 +0200 Andy Shevchenko <andriy.shevchenko@intel.com> wrote: > > > > On Fri, Feb 28, 2025 at 01:28:04PM -0500, Johannes Weiner wrote: > > > On Sat, Mar 01, 2025 at 01:31:30AM +0800, kernel test robot wrote: > > > > > The patch is missing a dummy in_mem_hotplug() in the > > > !CONFIG_MEMORY_HOTPLUG section of <linux/memory_hotplug.h>. > > > > +1, I just stumbled over and this is not fixed in today's Linux Next. I'm > > wondering how this was missed during merge into Linux Next. Stephen? > > I just get what people put in their trees. There are no conflicts > around this and none of my builds failed, so I didn't see the problem. > Has someone sent a fix patch to Andrew? If so, if you forward it to > me, I will add it to linux-next today. Andrew has backed it out of mm-unstable now. There's a v2 [0] which still has runtime issues but AFAIK it's not in any tree yet. [0] https://lore.kernel.org/linux-mm/20250303-pageblock-lockdep-v2-1-3fc0c37e9532@google.com/ In case it helps calibrate expectations: I think this particular patch had been reviewed but in general some patches get into mm-unstable without any review being recorded at all. My understanding is that Andrew squints at it and goes "that looks like it will probably eventually get merged" and puts it in so that people get a view of likely upcoming changes. So if an issue like this reaching linux-next is a big problem then I think the solution is not to merge mm-unstable. I'm not sure how high the bar is supposed to be for feeding into linux-next.
On 04.03.25 11:18, Brendan Jackman wrote: > On Tue, Mar 04, 2025 at 10:13:57AM +1100, Stephen Rothwell wrote: >> Hi Andy, >> >> On Mon, 3 Mar 2025 13:18:42 +0200 Andy Shevchenko <andriy.shevchenko@intel.com> wrote: >>> >>> On Fri, Feb 28, 2025 at 01:28:04PM -0500, Johannes Weiner wrote: >>>> On Sat, Mar 01, 2025 at 01:31:30AM +0800, kernel test robot wrote: >>> >>>> The patch is missing a dummy in_mem_hotplug() in the >>>> !CONFIG_MEMORY_HOTPLUG section of <linux/memory_hotplug.h>. >>> >>> +1, I just stumbled over and this is not fixed in today's Linux Next. I'm >>> wondering how this was missed during merge into Linux Next. Stephen? >> >> I just get what people put in their trees. There are no conflicts >> around this and none of my builds failed, so I didn't see the problem. >> Has someone sent a fix patch to Andrew? If so, if you forward it to >> me, I will add it to linux-next today. > > Andrew has backed it out of mm-unstable now. There's a v2 [0] which > still has runtime issues but AFAIK it's not in any tree yet. > > [0] https://lore.kernel.org/linux-mm/20250303-pageblock-lockdep-v2-1-3fc0c37e9532@google.com/ > > In case it helps calibrate expectations: I think this particular patch > had been reviewed but in general some patches get into mm-unstable > without any review being recorded at all. My understanding is that > Andrew squints at it and goes "that looks like it will probably > eventually get merged" and puts it in so that people get a view of > likely upcoming changes. > > So if an issue like this reaching linux-next is a big problem then I > think the solution is not to merge mm-unstable. I'm not sure how > high the bar is supposed to be for feeding into linux-next. Last year at LSF/MM I raised that maybe we should have something in-between mm-unstable and mm-stable that would get merged into -next. ( "mm-almost-stable" / "mm-for-next" ;) ) Alternatively, we could add something like "mm-staging" before "mm-unstable" and "mm-stable", whereby only "mm-unstable" would get merged into -next. Ideally, we'd have at least build-bots and some basic sanity checks going on on such a staging environment. After a couple of days (not more) of survival in such an environment, it could be moved to the tree that gets exposed to -next/ I guess the downside is that it's "yet another branch" and "yet more build+testing" effort; in particular, if build+testing doesn't happen it will be worthless. And likely, a lot of build+testing is happening on linux-next only. I have pretty good results with build bots going crazy on my branches, so most stuff that I send (after letting is rest for a couple of days on a public branch) doesn't really result in build issues.
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index eaac5ae8c05c8ee2f2868d5bc1b04d1f68235b3f..ff9511d9b9e6f775feeb4f897754878ac4c8cdb0 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -175,6 +175,7 @@ void put_online_mems(void); void mem_hotplug_begin(void); void mem_hotplug_done(void); +bool in_mem_hotplug(void); /* See kswapd_is_running() */ static inline void pgdat_kswapd_lock(pg_data_t *pgdat) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 16cf9e17077e359b98a69dc4bca48f4575b9a28c..2b9d1a6267c0872c433fafe8684dd4e2aa821f5f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -267,6 +267,11 @@ void mem_hotplug_done(void) cpus_read_unlock(); } +bool in_mem_hotplug(void) +{ + return percpu_is_write_locked(&mem_hotplug_lock); +} + u64 max_mem_size = U64_MAX; /* add this memory to iomem resource */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 299c9f10bbcd2aa1d35ac6eb5131a2471da825d5..e0516d70835e20e54ff8e43f3aa884c7e36d929a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, void set_pageblock_migratetype(struct page *page, int migratetype) { + lockdep_assert_once(system_state == SYSTEM_BOOTING || + in_mem_hotplug() || + lockdep_is_held(&page_zone(page)->lock)); + if (unlikely(page_group_by_mobility_disabled && migratetype < MIGRATE_PCPTYPES)) migratetype = MIGRATE_UNMOVABLE;
Since the migratetype hygiene patches [0], the locking here is a bit more formalised, so write it down with an assert. [0] https://lore.kernel.org/lkml/20240320180429.678181-3-hannes@cmpxchg.org/T/ Signed-off-by: Brendan Jackman <jackmanb@google.com> --- include/linux/memory_hotplug.h | 1 + mm/memory_hotplug.c | 5 +++++ mm/page_alloc.c | 4 ++++ 3 files changed, 10 insertions(+) --- base-commit: d58172d128acbafa2295aa17cc96e28260da9a86 change-id: 20250227-pageblock-lockdep-9628c48d7e08 Best regards,