Message ID | 9123bca3-23bb-1361-c48f-e468c81ad4f6@virtuozzo.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v5,01/16] memcg: enable accounting for net_device and Tx/Rx queues | expand |
Hi Vasily, On Mon, 19 Jul 2021 at 11:45, Vasily Averin <vvs@virtuozzo.com> wrote: [..] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ae1f5d0..1bbf239 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) > return false; > > /* Memcg to charge can't be determined. */ > - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) > + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) > return true; This seems to do two separate things in one patch. Probably, it's better to separate them. (I may miss how route changes are related to more generic __alloc_pages() change) Thanks, Dmitry
On Mon, Jul 19, 2021 at 7:00 AM Dmitry Safonov <0x7f454c46@gmail.com> wrote: > > Hi Vasily, > > On Mon, 19 Jul 2021 at 11:45, Vasily Averin <vvs@virtuozzo.com> wrote: > [..] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index ae1f5d0..1bbf239 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) > > return false; > > > > /* Memcg to charge can't be determined. */ > > - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) > > + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) > > return true; > > This seems to do two separate things in one patch. > Probably, it's better to separate them. > (I may miss how route changes are related to more generic > __alloc_pages() change) > It was requested to squash them together in some previous versions. https://lore.kernel.org/linux-mm/YEiUIf0old+AZssa@dhcp22.suse.cz/
On 7/19/21 3:22 PM, Shakeel Butt wrote: > On Mon, Jul 19, 2021 at 7:00 AM Dmitry Safonov <0x7f454c46@gmail.com> wrote: >> >> Hi Vasily, >> >> On Mon, 19 Jul 2021 at 11:45, Vasily Averin <vvs@virtuozzo.com> wrote: >> [..] >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index ae1f5d0..1bbf239 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) >>> return false; >>> >>> /* Memcg to charge can't be determined. */ >>> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) >>> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) >>> return true; >> >> This seems to do two separate things in one patch. >> Probably, it's better to separate them. >> (I may miss how route changes are related to more generic >> __alloc_pages() change) >> > > It was requested to squash them together in some previous versions. > https://lore.kernel.org/linux-mm/YEiUIf0old+AZssa@dhcp22.suse.cz/ > Ah, alright, never mind than. Thanks, Dmitry
On Mon, Jul 19, 2021 at 3:44 AM Vasily Averin <vvs@virtuozzo.com> wrote: > > An netadmin inside container can use 'ip a a' and 'ip r a' > to assign a large number of ipv4/ipv6 addresses and routing entries > and force kernel to allocate megabytes of unaccounted memory > for long-lived per-netdevice related kernel objects: > 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node', > 'struct rt6_info', 'struct fib_rules' and ip_fib caches. > > These objects can be manually removed, though usually they lives > in memory till destroy of its net namespace. > > It makes sense to account for them to restrict the host's memory > consumption from inside the memcg-limited container. > > One of such objects is the 'struct fib6_node' mostly allocated in > net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: > > write_lock_bh(&table->tb6_lock); > err = fib6_add(&table->tb6_root, rt, info, mxc); > write_unlock_bh(&table->tb6_lock); > > In this case it is not enough to simply add SLAB_ACCOUNT to corresponding > kmem cache. The proper memory cgroup still cannot be found due to the > incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). > > Obsoleted in_interrupt() does not describe real execution context properly. > From include/linux/preempt.h: > > The following macros are deprecated and should not be used in new code: > in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled > > To verify the current execution context new macro should be used instead: > in_task() - We're in task context > > Signed-off-by: Vasily Averin <vvs@virtuozzo.com> > --- > mm/memcontrol.c | 2 +- > net/core/fib_rules.c | 4 ++-- > net/ipv4/devinet.c | 2 +- > net/ipv4/fib_trie.c | 4 ++-- > net/ipv6/addrconf.c | 2 +- > net/ipv6/ip6_fib.c | 4 ++-- > net/ipv6/route.c | 2 +- > 7 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ae1f5d0..1bbf239 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) > return false; > > /* Memcg to charge can't be determined. */ > - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) > + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) > return true; > > return false; Can you please also change in_interrupt() in active_memcg() as well? There are other unrelated in_interrupt() in that file but the one in active_memcg() should be coupled with this change.
On 7/20/21 10:26 PM, Shakeel Butt wrote: > On Mon, Jul 19, 2021 at 3:44 AM Vasily Averin <vvs@virtuozzo.com> wrote: >> >> An netadmin inside container can use 'ip a a' and 'ip r a' >> to assign a large number of ipv4/ipv6 addresses and routing entries >> and force kernel to allocate megabytes of unaccounted memory >> for long-lived per-netdevice related kernel objects: >> 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node', >> 'struct rt6_info', 'struct fib_rules' and ip_fib caches. >> >> These objects can be manually removed, though usually they lives >> in memory till destroy of its net namespace. >> >> It makes sense to account for them to restrict the host's memory >> consumption from inside the memcg-limited container. >> >> One of such objects is the 'struct fib6_node' mostly allocated in >> net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: >> >> write_lock_bh(&table->tb6_lock); >> err = fib6_add(&table->tb6_root, rt, info, mxc); >> write_unlock_bh(&table->tb6_lock); >> >> In this case it is not enough to simply add SLAB_ACCOUNT to corresponding >> kmem cache. The proper memory cgroup still cannot be found due to the >> incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). >> >> Obsoleted in_interrupt() does not describe real execution context properly. >> From include/linux/preempt.h: >> >> The following macros are deprecated and should not be used in new code: >> in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled >> >> To verify the current execution context new macro should be used instead: >> in_task() - We're in task context >> >> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> >> --- >> mm/memcontrol.c | 2 +- >> net/core/fib_rules.c | 4 ++-- >> net/ipv4/devinet.c | 2 +- >> net/ipv4/fib_trie.c | 4 ++-- >> net/ipv6/addrconf.c | 2 +- >> net/ipv6/ip6_fib.c | 4 ++-- >> net/ipv6/route.c | 2 +- >> 7 files changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index ae1f5d0..1bbf239 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) >> return false; >> >> /* Memcg to charge can't be determined. */ >> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) >> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) >> return true; >> >> return false; > > Can you please also change in_interrupt() in active_memcg() as well? > There are other unrelated in_interrupt() in that file but the one in > active_memcg() should be coupled with this change. Could you please elaborate? From my point of view active_memcg is paired with set_active_memcg() and is not related to this case. active_memcg uses memcg that was set by set_active_memcg(), either from int_active_memcg per-cpu pointer or from current->active_memcg pointer. I'm agree, it in case of disabled BH it is incorrect to use int_active_memcg, we still can use current->active_memcg. However it isn't a problem, memcg will be properly provided in both cases. I think it's better to fix set_active_memcg/active_memcg by separate patch. Am I missed something perhaps? Thank you, Vasily Averin
On Mon, Jul 26, 2021 at 3:23 AM Vasily Averin <vvs@virtuozzo.com> wrote: > [...] > > > > Can you please also change in_interrupt() in active_memcg() as well? > > There are other unrelated in_interrupt() in that file but the one in > > active_memcg() should be coupled with this change. > > Could you please elaborate? > From my point of view active_memcg is paired with set_active_memcg() and is not related to this case. > active_memcg uses memcg that was set by set_active_memcg(), either from int_active_memcg per-cpu pointer > or from current->active_memcg pointer. > I'm agree, it in case of disabled BH it is incorrect to use int_active_memcg, > we still can use current->active_memcg. However it isn't a problem, > memcg will be properly provided in both cases. > > I think it's better to fix set_active_memcg/active_memcg by separate patch. > > Am I missed something perhaps? > No you are right. That should be a separate patch.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ae1f5d0..1bbf239 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) return false; /* Memcg to charge can't be determined. */ - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) return true; return false; diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index a9f9379..79df7cd 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -57,7 +57,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops, { struct fib_rule *r; - r = kzalloc(ops->rule_size, GFP_KERNEL); + r = kzalloc(ops->rule_size, GFP_KERNEL_ACCOUNT); if (r == NULL) return -ENOMEM; @@ -541,7 +541,7 @@ static int fib_nl2rule(struct sk_buff *skb, struct nlmsghdr *nlh, goto errout; } - nlrule = kzalloc(ops->rule_size, GFP_KERNEL); + nlrule = kzalloc(ops->rule_size, GFP_KERNEL_ACCOUNT); if (!nlrule) { err = -ENOMEM; goto errout; diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 73721a4..d38124b 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -215,7 +215,7 @@ static void devinet_sysctl_unregister(struct in_device *idev) static struct in_ifaddr *inet_alloc_ifa(void) { - return kzalloc(sizeof(struct in_ifaddr), GFP_KERNEL); + return kzalloc(sizeof(struct in_ifaddr), GFP_KERNEL_ACCOUNT); } static void inet_rcu_free_ifa(struct rcu_head *head) diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 25cf387..8060524 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -2380,11 +2380,11 @@ void __init fib_trie_init(void) { fn_alias_kmem = kmem_cache_create("ip_fib_alias", sizeof(struct fib_alias), - 0, SLAB_PANIC, NULL); + 0, SLAB_PANIC | SLAB_ACCOUNT, NULL); trie_leaf_kmem = kmem_cache_create("ip_fib_trie", LEAF_SIZE, - 0, SLAB_PANIC, NULL); + 0, SLAB_PANIC | SLAB_ACCOUNT, NULL); } struct fib_table *fib_trie_table(u32 id, struct fib_table *alias) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 3bf685f..8eaeade 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1080,7 +1080,7 @@ static int ipv6_add_addr_hash(struct net_device *dev, struct inet6_ifaddr *ifa) goto out; } - ifa = kzalloc(sizeof(*ifa), gfp_flags); + ifa = kzalloc(sizeof(*ifa), gfp_flags | __GFP_ACCOUNT); if (!ifa) { err = -ENOBUFS; goto out; diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 2d650dc..a8f118e 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -2449,8 +2449,8 @@ int __init fib6_init(void) int ret = -ENOMEM; fib6_node_kmem = kmem_cache_create("fib6_nodes", - sizeof(struct fib6_node), - 0, SLAB_HWCACHE_ALIGN, + sizeof(struct fib6_node), 0, + SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL); if (!fib6_node_kmem) goto out; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7b756a7..5f7286a 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6638,7 +6638,7 @@ int __init ip6_route_init(void) ret = -ENOMEM; ip6_dst_ops_template.kmem_cachep = kmem_cache_create("ip6_dst_cache", sizeof(struct rt6_info), 0, - SLAB_HWCACHE_ALIGN, NULL); + SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL); if (!ip6_dst_ops_template.kmem_cachep) goto out;