Message ID | 20241210193035.2667005-1-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vmalloc: Move memcg logic into memcg code | expand |
On Tue, Dec 10, 2024 at 07:30:33PM +0000, Matthew Wilcox (Oracle) wrote:
> Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap)
Hm, no, this is necessary, but not enough as nr_vmalloc_pages has the
same problem. Will fix that and resend.
Hi Matthew, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master v6.13-rc2 next-20241211] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/vmalloc-Move-memcg-logic-into-memcg-code/20241211-033214 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20241210193035.2667005-1-willy%40infradead.org patch subject: [PATCH] vmalloc: Move memcg logic into memcg code config: m68k-randconfig-r112-20241211 (https://download.01.org/0day-ci/archive/20241211/202412111725.koF7jNeD-lkp@intel.com/config) compiler: m68k-linux-gcc (GCC) 14.2.0 reproduce: (https://download.01.org/0day-ci/archive/20241211/202412111725.koF7jNeD-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412111725.koF7jNeD-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> mm/vmalloc.c:3675:55: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected unsigned int nr_pages @@ got restricted gfp_t [assigned] [usertype] gfp_mask @@ mm/vmalloc.c:3675:55: sparse: expected unsigned int nr_pages mm/vmalloc.c:3675:55: sparse: got restricted gfp_t [assigned] [usertype] gfp_mask >> mm/vmalloc.c:3675:69: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected restricted gfp_t [usertype] gfp @@ got unsigned int nr_pages @@ mm/vmalloc.c:3675:69: sparse: expected restricted gfp_t [usertype] gfp mm/vmalloc.c:3675:69: sparse: got unsigned int nr_pages mm/vmalloc.c:180:74: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:231:74: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:282:74: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:384:46: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:407:46: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:427:46: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:531:46: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:549:46: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:567:46: sparse: sparse: self-comparison always evaluates to false mm/vmalloc.c:1054:25: sparse: sparse: context imbalance in 'find_vmap_area_exceed_addr_lock' - wrong count at exit mm/vmalloc.c:4426:28: sparse: sparse: context imbalance in 'vread_iter' - unexpected unlock vim +3675 mm/vmalloc.c 3623 3624 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, 3625 pgprot_t prot, unsigned int page_shift, 3626 int node) 3627 { 3628 const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; 3629 bool nofail = gfp_mask & __GFP_NOFAIL; 3630 unsigned long addr = (unsigned long)area->addr; 3631 unsigned long size = get_vm_area_size(area); 3632 unsigned long array_size; 3633 unsigned int nr_small_pages = size >> PAGE_SHIFT; 3634 unsigned int page_order; 3635 unsigned int flags; 3636 int ret; 3637 3638 array_size = (unsigned long)nr_small_pages * sizeof(struct page *); 3639 3640 if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) 3641 gfp_mask |= __GFP_HIGHMEM; 3642 3643 /* Please note that the recursion is strictly bounded. */ 3644 if (array_size > PAGE_SIZE) { 3645 area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node, 3646 area->caller); 3647 } else { 3648 area->pages = kmalloc_node_noprof(array_size, nested_gfp, node); 3649 } 3650 3651 if (!area->pages) { 3652 warn_alloc(gfp_mask, NULL, 3653 "vmalloc error: size %lu, failed to allocated page array size %lu", 3654 nr_small_pages * PAGE_SIZE, array_size); 3655 free_vm_area(area); 3656 return NULL; 3657 } 3658 3659 set_vm_area_page_order(area, page_shift - PAGE_SHIFT); 3660 page_order = vm_area_page_order(area); 3661 3662 /* 3663 * High-order nofail allocations are really expensive and 3664 * potentially dangerous (pre-mature OOM, disruptive reclaim 3665 * and compaction etc. 3666 * 3667 * Please note, the __vmalloc_node_range_noprof() falls-back 3668 * to order-0 pages if high-order attempt is unsuccessful. 3669 */ 3670 area->nr_pages = vm_area_alloc_pages((page_order ? 3671 gfp_mask & ~__GFP_NOFAIL : gfp_mask) | __GFP_NOWARN, 3672 node, page_order, nr_small_pages, area->pages); 3673 3674 atomic_long_add(area->nr_pages, &nr_vmalloc_pages); > 3675 ret = obj_cgroup_charge_vmalloc(&area->objcg, gfp_mask, area->nr_pages); 3676 if (ret) 3677 goto fail; 3678 3679 /* 3680 * If not enough pages were obtained to accomplish an 3681 * allocation request, free them via vfree() if any. 3682 */ 3683 if (area->nr_pages != nr_small_pages) { 3684 /* 3685 * vm_area_alloc_pages() can fail due to insufficient memory but 3686 * also:- 3687 * 3688 * - a pending fatal signal 3689 * - insufficient huge page-order pages 3690 * 3691 * Since we always retry allocations at order-0 in the huge page 3692 * case a warning for either is spurious. 3693 */ 3694 if (!fatal_signal_pending(current) && page_order == 0) 3695 warn_alloc(gfp_mask, NULL, 3696 "vmalloc error: size %lu, failed to allocate pages", 3697 area->nr_pages * PAGE_SIZE); 3698 goto fail; 3699 } 3700 3701 /* 3702 * page tables allocations ignore external gfp mask, enforce it 3703 * by the scope API 3704 */ 3705 if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) 3706 flags = memalloc_nofs_save(); 3707 else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0) 3708 flags = memalloc_noio_save(); 3709 3710 do { 3711 ret = vmap_pages_range(addr, addr + size, prot, area->pages, 3712 page_shift); 3713 if (nofail && (ret < 0)) 3714 schedule_timeout_uninterruptible(1); 3715 } while (nofail && (ret < 0)); 3716 3717 if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) 3718 memalloc_nofs_restore(flags); 3719 else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0) 3720 memalloc_noio_restore(flags); 3721 3722 if (ret < 0) { 3723 warn_alloc(gfp_mask, NULL, 3724 "vmalloc error: size %lu, failed to map pages", 3725 area->nr_pages * PAGE_SIZE); 3726 goto fail; 3727 } 3728 3729 return area->addr; 3730 3731 fail: 3732 vfree(area->addr); 3733 return NULL; 3734 } 3735
On 12/11/24 06:30, Matthew Wilcox (Oracle) wrote: > Today we account each page individually to the memcg, which works well > enough, if a little inefficiently (N atomic operations per page instead > of N per allocation). Unfortunately, the stats can get out of sync when > i915 calls vmap() with VM_MAP_PUT_PAGES. The pages being passed were not > allocated by vmalloc, so the MEMCG_VMALLOC counter was never incremented. > But it is decremented when the pages are freed with vfree(). > > Solve all of this by tracking the memcg at the vm_struct level. > This logic has to live in the memcontrol file as it calls several > functions which are currently static. > > Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap) > Cc: stable@vger.kernel.org > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > include/linux/memcontrol.h | 7 ++++++ > include/linux/vmalloc.h | 3 +++ > mm/memcontrol.c | 46 ++++++++++++++++++++++++++++++++++++++ > mm/vmalloc.c | 14 ++++++------ > 4 files changed, 63 insertions(+), 7 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 5502aa8e138e..83ebcadebba6 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1676,6 +1676,10 @@ static inline struct obj_cgroup *get_obj_cgroup_from_current(void) > > int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); > void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); > +int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp, > + unsigned int nr_pages, gfp_t gfp); > +void obj_cgroup_uncharge_vmalloc(struct obj_cgroup *objcgp, > + unsigned int nr_pages); > > extern struct static_key_false memcg_bpf_enabled_key; > static inline bool memcg_bpf_enabled(void) > @@ -1756,6 +1760,9 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order) > { > } > > +/* Must be macros to avoid dereferencing objcg in vm_struct */ > +#define obj_cgroup_charge_vmalloc(objcgp, nr_pages, gfp) 0 > +#define obj_cgroup_uncharge_vmalloc(objcg, nr_pages) do { } while (0) > static inline struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) > { > return NULL; > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 31e9ffd936e3..ec7c2d607382 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -60,6 +60,9 @@ struct vm_struct { > #endif > unsigned int nr_pages; > phys_addr_t phys_addr; > +#ifdef CONFIG_MEMCG > + struct obj_cgroup *objcg; > +#endif > const void *caller; > }; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7b3503d12aaf..629bffc3e26d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5472,4 +5472,50 @@ static int __init mem_cgroup_swap_init(void) > } > subsys_initcall(mem_cgroup_swap_init); > > +/** > + * obj_cgroup_charge_vmalloc - Charge vmalloc memory > + * @objcgp: Pointer to an object cgroup > + * @nr_pages: Number of pages > + * @gfp: Memory allocation flags > + * > + * Return: 0 on success, negative errno on failure. > + */ > +int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp, > + unsigned int nr_pages, gfp_t gfp) > +{ > + struct obj_cgroup *objcg; > + int err; Are we also responsible for setting *objcgp to NULL for return conditions (when we return before setting it)? > + > + if (mem_cgroup_disabled() || !(gfp & __GFP_ACCOUNT)) > + return 0; Don't we want !memcg_kmem_online() instead of mem_cgroup_disabled()? > + > + objcg = current_obj_cgroup(); > + if (!objcg) > + return 0; > + > + err = obj_cgroup_charge_pages(objcg, gfp, nr_pages); > + if (err) > + return err; > + obj_cgroup_get(objcg); > + mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_VMALLOC, nr_pages); > + *objcgp = objcg; > + > + return 0; > +} <snip> Balbir Singh
Hi Matthew, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.13-rc2 next-20241211] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/vmalloc-Move-memcg-logic-into-memcg-code/20241211-033214 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20241210193035.2667005-1-willy%40infradead.org patch subject: [PATCH] vmalloc: Move memcg logic into memcg code config: arm-randconfig-001-20241211 (https://download.01.org/0day-ci/archive/20241212/202412120251.VSLmEgIe-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241212/202412120251.VSLmEgIe-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412120251.VSLmEgIe-lkp@intel.com/ All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: obj_cgroup_uncharge_vmalloc >>> referenced by vmalloc.c >>> mm/vmalloc.o:(vfree) in archive vmlinux.a -- >> ld.lld: error: undefined symbol: obj_cgroup_charge_vmalloc >>> referenced by vmalloc.c >>> mm/vmalloc.o:(__vmalloc_node_range_noprof) in archive vmlinux.a
Hi Matthew, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.13-rc2 next-20241211] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/vmalloc-Move-memcg-logic-into-memcg-code/20241211-033214 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20241210193035.2667005-1-willy%40infradead.org patch subject: [PATCH] vmalloc: Move memcg logic into memcg code config: alpha-randconfig-r063-20241211 (https://download.01.org/0day-ci/archive/20241212/202412120752.71x951px-lkp@intel.com/config) compiler: alpha-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241212/202412120752.71x951px-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412120752.71x951px-lkp@intel.com/ All errors (new ones prefixed by >>): alpha-linux-ld: mm/vmalloc.o: in function `vfree': >> (.text+0x66b0): undefined reference to `obj_cgroup_uncharge_vmalloc' >> alpha-linux-ld: (.text+0x66b4): undefined reference to `obj_cgroup_uncharge_vmalloc' alpha-linux-ld: mm/vmalloc.o: in function `__vmalloc_area_node.constprop.0': >> (.text+0x6fa0): undefined reference to `obj_cgroup_charge_vmalloc' >> alpha-linux-ld: (.text+0x6fac): undefined reference to `obj_cgroup_charge_vmalloc'
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5502aa8e138e..83ebcadebba6 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1676,6 +1676,10 @@ static inline struct obj_cgroup *get_obj_cgroup_from_current(void) int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); +int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp, + unsigned int nr_pages, gfp_t gfp); +void obj_cgroup_uncharge_vmalloc(struct obj_cgroup *objcgp, + unsigned int nr_pages); extern struct static_key_false memcg_bpf_enabled_key; static inline bool memcg_bpf_enabled(void) @@ -1756,6 +1760,9 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order) { } +/* Must be macros to avoid dereferencing objcg in vm_struct */ +#define obj_cgroup_charge_vmalloc(objcgp, nr_pages, gfp) 0 +#define obj_cgroup_uncharge_vmalloc(objcg, nr_pages) do { } while (0) static inline struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio) { return NULL; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 31e9ffd936e3..ec7c2d607382 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -60,6 +60,9 @@ struct vm_struct { #endif unsigned int nr_pages; phys_addr_t phys_addr; +#ifdef CONFIG_MEMCG + struct obj_cgroup *objcg; +#endif const void *caller; }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..629bffc3e26d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5472,4 +5472,50 @@ static int __init mem_cgroup_swap_init(void) } subsys_initcall(mem_cgroup_swap_init); +/** + * obj_cgroup_charge_vmalloc - Charge vmalloc memory + * @objcgp: Pointer to an object cgroup + * @nr_pages: Number of pages + * @gfp: Memory allocation flags + * + * Return: 0 on success, negative errno on failure. + */ +int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp, + unsigned int nr_pages, gfp_t gfp) +{ + struct obj_cgroup *objcg; + int err; + + if (mem_cgroup_disabled() || !(gfp & __GFP_ACCOUNT)) + return 0; + + objcg = current_obj_cgroup(); + if (!objcg) + return 0; + + err = obj_cgroup_charge_pages(objcg, gfp, nr_pages); + if (err) + return err; + obj_cgroup_get(objcg); + mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_VMALLOC, nr_pages); + *objcgp = objcg; + + return 0; +} + +/** + * obj_cgroup_uncharge_vmalloc - Uncharge vmalloc memory + * @objcg: The object cgroup + * @nr_pages: Number of pages + */ +void obj_cgroup_uncharge_vmalloc(struct obj_cgroup *objcg, + unsigned int nr_pages) +{ + if (!objcg) + return; + mod_memcg_state(objcg->memcg, MEMCG_VMALLOC, 0L - nr_pages); + obj_cgroup_uncharge_pages(objcg, nr_pages); + obj_cgroup_put(objcg); +} + #endif /* CONFIG_SWAP */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f009b21705c1..438995d2f9f8 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3374,7 +3374,6 @@ void vfree(const void *addr) struct page *page = vm->pages[i]; BUG_ON(!page); - mod_memcg_page_state(page, MEMCG_VMALLOC, -1); /* * High-order allocs for huge vmallocs are split, so * can be freed as an array of order-0 allocations @@ -3383,6 +3382,7 @@ void vfree(const void *addr) cond_resched(); } atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages); + obj_cgroup_uncharge_vmalloc(vm->objcg, vm->nr_pages); kvfree(vm->pages); kfree(vm); } @@ -3536,6 +3536,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid, struct page *page; int i; + /* Accounting handled in caller */ + gfp &= ~__GFP_ACCOUNT; + /* * For order-0 pages we make use of bulk allocator, if * the page array is partly or not at all populated due @@ -3669,12 +3672,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, node, page_order, nr_small_pages, area->pages); atomic_long_add(area->nr_pages, &nr_vmalloc_pages); - if (gfp_mask & __GFP_ACCOUNT) { - int i; - - for (i = 0; i < area->nr_pages; i++) - mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); - } + ret = obj_cgroup_charge_vmalloc(&area->objcg, gfp_mask, area->nr_pages); + if (ret) + goto fail; /* * If not enough pages were obtained to accomplish an
Today we account each page individually to the memcg, which works well enough, if a little inefficiently (N atomic operations per page instead of N per allocation). Unfortunately, the stats can get out of sync when i915 calls vmap() with VM_MAP_PUT_PAGES. The pages being passed were not allocated by vmalloc, so the MEMCG_VMALLOC counter was never incremented. But it is decremented when the pages are freed with vfree(). Solve all of this by tracking the memcg at the vm_struct level. This logic has to live in the memcontrol file as it calls several functions which are currently static. Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap) Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- include/linux/memcontrol.h | 7 ++++++ include/linux/vmalloc.h | 3 +++ mm/memcontrol.c | 46 ++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 14 ++++++------ 4 files changed, 63 insertions(+), 7 deletions(-)