Message ID | 20230317134448.11082-3-mhocko@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads | expand |
On Fri, Mar 17, 2023 at 6:44 AM Michal Hocko <mhocko@kernel.org> wrote: > > From: Michal Hocko <mhocko@suse.com> > > Leonardo Bras has noticed that pcp charge cache draining might be > disruptive on workloads relying on 'isolated cpus', a feature commonly > used on workloads that are sensitive to interruption and context > switching such as vRAN and Industrial Control Systems. > > There are essentially two ways how to approach the issue. We can either > allow the pcp cache to be drained on a different rather than a local cpu > or avoid remote flushing on isolated cpus. > > The current pcp charge cache is really optimized for high performance > and it always relies to stick with its cpu. That means it only requires > local_lock (preempt_disable on !RT) and draining is handed over to pcp > WQ to drain locally again. > > The former solution (remote draining) would require to add an additional > locking to prevent local charges from racing with the draining. This > adds an atomic operation to otherwise simple arithmetic fast path in the > try_charge path. Another concern is that the remote draining can cause a > lock contention for the isolated workloads and therefore interfere with > it indirectly via user space interfaces. > > Another option is to avoid draining scheduling on isolated cpus > altogether. That means that those remote cpus would keep their charges > even after drain_all_stock returns. This is certainly not optimal either > but it shouldn't really cause any major problems. In the worst case > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > i.e 64 page) the memory consumption of a memcg would be artificially > higher than can be immediately used from other cpus. > > Theoretically a memcg OOM killer could be triggered pre-maturely. > Currently it is not really clear whether this is a practical problem > though. Tight memcg limit would be really counter productive to cpu > isolated workloads pretty much by definition because any memory > reclaimed induced by memcg limit could break user space timing > expectations as those usually expect execution in the userspace most of > the time. > > Also charges could be left behind on memcg removal. Any future charge on > those isolated cpus will drain that pcp cache so this won't be a > permanent leak. > > Considering cons and pros of both approaches this patch is implementing > the second option and simply do not schedule remote draining if the > target cpu is isolated. This solution is much more simpler. It doesn't > add any new locking and it is more more predictable from the user space > POV. Should the pre-mature memcg OOM become a real life problem, we can > revisit this decision. > > Cc: Leonardo Brás <leobras@redhat.com> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Frederic Weisbecker <frederic@kernel.org> > Reported-by: Leonardo Bras <leobras@redhat.com> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> > Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> > Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeelb@google.com>
Hi Michal, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.3-rc2 next-20230317] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230318/202303180520.5lKAJwrg-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 git checkout b376cfcf40a43276e1280950bb926fdf3521940a # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303180520.5lKAJwrg-lkp@intel.com/ All errors (new ones prefixed by >>): >> mm/memcontrol.c:2369:14: error: implicit declaration of function 'cpu_is_isolated' is invalid in C99 [-Werror,-Wimplicit-function-declaration] else if (!cpu_is_isolated(cpu)) ^ 1 error generated. vim +/cpu_is_isolated +2369 mm/memcontrol.c 2331 2332 /* 2333 * Drains all per-CPU charge caches for given root_memcg resp. subtree 2334 * of the hierarchy under it. 2335 */ 2336 static void drain_all_stock(struct mem_cgroup *root_memcg) 2337 { 2338 int cpu, curcpu; 2339 2340 /* If someone's already draining, avoid adding running more workers. */ 2341 if (!mutex_trylock(&percpu_charge_mutex)) 2342 return; 2343 /* 2344 * Notify other cpus that system-wide "drain" is running 2345 * We do not care about races with the cpu hotplug because cpu down 2346 * as well as workers from this path always operate on the local 2347 * per-cpu data. CPU up doesn't touch memcg_stock at all. 2348 */ 2349 migrate_disable(); 2350 curcpu = smp_processor_id(); 2351 for_each_online_cpu(cpu) { 2352 struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); 2353 struct mem_cgroup *memcg; 2354 bool flush = false; 2355 2356 rcu_read_lock(); 2357 memcg = stock->cached; 2358 if (memcg && stock->nr_pages && 2359 mem_cgroup_is_descendant(memcg, root_memcg)) 2360 flush = true; 2361 else if (obj_stock_flush_required(stock, root_memcg)) 2362 flush = true; 2363 rcu_read_unlock(); 2364 2365 if (flush && 2366 !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { 2367 if (cpu == curcpu) 2368 drain_local_stock(&stock->work); > 2369 else if (!cpu_is_isolated(cpu)) 2370 schedule_work_on(cpu, &stock->work); 2371 } 2372 } 2373 migrate_enable(); 2374 mutex_unlock(&percpu_charge_mutex); 2375 } 2376
Hi Michal, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.3-rc2 next-20230317] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20230318/202303180617.7E3aIlHf-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 git checkout b376cfcf40a43276e1280950bb926fdf3521940a # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 olddefconfig make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303180617.7E3aIlHf-lkp@intel.com/ All errors (new ones prefixed by >>): mm/memcontrol.c: In function 'drain_all_stock': >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] 2369 | else if (!cpu_is_isolated(cpu)) | ^~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/cpu_is_isolated +2369 mm/memcontrol.c 2331 2332 /* 2333 * Drains all per-CPU charge caches for given root_memcg resp. subtree 2334 * of the hierarchy under it. 2335 */ 2336 static void drain_all_stock(struct mem_cgroup *root_memcg) 2337 { 2338 int cpu, curcpu; 2339 2340 /* If someone's already draining, avoid adding running more workers. */ 2341 if (!mutex_trylock(&percpu_charge_mutex)) 2342 return; 2343 /* 2344 * Notify other cpus that system-wide "drain" is running 2345 * We do not care about races with the cpu hotplug because cpu down 2346 * as well as workers from this path always operate on the local 2347 * per-cpu data. CPU up doesn't touch memcg_stock at all. 2348 */ 2349 migrate_disable(); 2350 curcpu = smp_processor_id(); 2351 for_each_online_cpu(cpu) { 2352 struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); 2353 struct mem_cgroup *memcg; 2354 bool flush = false; 2355 2356 rcu_read_lock(); 2357 memcg = stock->cached; 2358 if (memcg && stock->nr_pages && 2359 mem_cgroup_is_descendant(memcg, root_memcg)) 2360 flush = true; 2361 else if (obj_stock_flush_required(stock, root_memcg)) 2362 flush = true; 2363 rcu_read_unlock(); 2364 2365 if (flush && 2366 !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { 2367 if (cpu == curcpu) 2368 drain_local_stock(&stock->work); > 2369 else if (!cpu_is_isolated(cpu)) 2370 schedule_work_on(cpu, &stock->work); 2371 } 2372 } 2373 migrate_enable(); 2374 mutex_unlock(&percpu_charge_mutex); 2375 } 2376
On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote: > mm/memcontrol.c: In function 'drain_all_stock': > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] > 2369 | else if (!cpu_is_isolated(cpu)) > | ^~~~~~~~~~~~~~~ Thanks. --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix +++ a/mm/memcontrol.c @@ -63,6 +63,7 @@ #include <linux/resume_user_mode.h> #include <linux/psi.h> #include <linux/seq_buf.h> +#include <linux/sched/isolation.h> #include "internal.h" #include <net/sock.h> #include <net/ip.h>
On 17 Mar 2023 14:44:48 +0100 Michal Hocko <mhocko@suse.com> > Leonardo Bras has noticed that pcp charge cache draining might be > disruptive on workloads relying on 'isolated cpus', a feature commonly > used on workloads that are sensitive to interruption and context > switching such as vRAN and Industrial Control Systems. > > There are essentially two ways how to approach the issue. We can either > allow the pcp cache to be drained on a different rather than a local cpu > or avoid remote flushing on isolated cpus. > > The current pcp charge cache is really optimized for high performance > and it always relies to stick with its cpu. That means it only requires > local_lock (preempt_disable on !RT) and draining is handed over to pcp > WQ to drain locally again. > > The former solution (remote draining) would require to add an additional > locking to prevent local charges from racing with the draining. This > adds an atomic operation to otherwise simple arithmetic fast path in the > try_charge path. Another concern is that the remote draining can cause a > lock contention for the isolated workloads and therefore interfere with > it indirectly via user space interfaces. > > Another option is to avoid draining scheduling on isolated cpus > altogether. That means that those remote cpus would keep their charges > even after drain_all_stock returns. This is certainly not optimal either > but it shouldn't really cause any major problems. In the worst case > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > i.e 64 page) the memory consumption of a memcg would be artificially > higher than can be immediately used from other cpus. > > Theoretically a memcg OOM killer could be triggered pre-maturely. > Currently it is not really clear whether this is a practical problem > though. Tight memcg limit would be really counter productive to cpu > isolated workloads pretty much by definition because any memory > reclaimed induced by memcg limit could break user space timing > expectations as those usually expect execution in the userspace most of > the time. > > Also charges could be left behind on memcg removal. Any future charge on > those isolated cpus will drain that pcp cache so this won't be a > permanent leak. > > Considering cons and pros of both approaches this patch is implementing > the second option and simply do not schedule remote draining if the > target cpu is isolated. This solution is much more simpler. It doesn't > add any new locking and it is more more predictable from the user space > POV. Should the pre-mature memcg OOM become a real life problem, we can > revisit this decision. JFYI feel free to take a look at the non-housekeeping CPUs [1]. [1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/
On Fri 17-03-23 16:32:29, Andrew Morton wrote: > On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote: > > > mm/memcontrol.c: In function 'drain_all_stock': > > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] > > 2369 | else if (!cpu_is_isolated(cpu)) > > | ^~~~~~~~~~~~~~~ > > Thanks. > > --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix > +++ a/mm/memcontrol.c > @@ -63,6 +63,7 @@ > #include <linux/resume_user_mode.h> > #include <linux/psi.h> > #include <linux/seq_buf.h> > +#include <linux/sched/isolation.h> > #include "internal.h" > #include <net/sock.h> > #include <net/ip.h> Thanks a lot Andrew! > _
On Sat 18-03-23 11:23:50, Hillf Danton wrote: > On 17 Mar 2023 14:44:48 +0100 Michal Hocko <mhocko@suse.com> > > Leonardo Bras has noticed that pcp charge cache draining might be > > disruptive on workloads relying on 'isolated cpus', a feature commonly > > used on workloads that are sensitive to interruption and context > > switching such as vRAN and Industrial Control Systems. > > > > There are essentially two ways how to approach the issue. We can either > > allow the pcp cache to be drained on a different rather than a local cpu > > or avoid remote flushing on isolated cpus. > > > > The current pcp charge cache is really optimized for high performance > > and it always relies to stick with its cpu. That means it only requires > > local_lock (preempt_disable on !RT) and draining is handed over to pcp > > WQ to drain locally again. > > > > The former solution (remote draining) would require to add an additional > > locking to prevent local charges from racing with the draining. This > > adds an atomic operation to otherwise simple arithmetic fast path in the > > try_charge path. Another concern is that the remote draining can cause a > > lock contention for the isolated workloads and therefore interfere with > > it indirectly via user space interfaces. > > > > Another option is to avoid draining scheduling on isolated cpus > > altogether. That means that those remote cpus would keep their charges > > even after drain_all_stock returns. This is certainly not optimal either > > but it shouldn't really cause any major problems. In the worst case > > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > > i.e 64 page) the memory consumption of a memcg would be artificially > > higher than can be immediately used from other cpus. > > > > Theoretically a memcg OOM killer could be triggered pre-maturely. > > Currently it is not really clear whether this is a practical problem > > though. Tight memcg limit would be really counter productive to cpu > > isolated workloads pretty much by definition because any memory > > reclaimed induced by memcg limit could break user space timing > > expectations as those usually expect execution in the userspace most of > > the time. > > > > Also charges could be left behind on memcg removal. Any future charge on > > those isolated cpus will drain that pcp cache so this won't be a > > permanent leak. > > > > Considering cons and pros of both approaches this patch is implementing > > the second option and simply do not schedule remote draining if the > > target cpu is isolated. This solution is much more simpler. It doesn't > > add any new locking and it is more more predictable from the user space > > POV. Should the pre-mature memcg OOM become a real life problem, we can > > revisit this decision. > > JFYI feel free to take a look at the non-housekeeping CPUs [1]. > > [1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/ Such an approach would require remote draining and I hope I have explained why that is not a preferred way in this case. Other than that I do agree with Christoph that a generic approach would be really nice.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0524add35cae..12559c08d976 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2366,7 +2366,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); - else + else if (!cpu_is_isolated(cpu)) schedule_work_on(cpu, &stock->work); } }