Message ID | 5affff71-e503-9fb9-50cb-f6d48286dd52@virtuozzo.com (mailing list archive) |
---|---|
Headers | show |
Series | memcg accounting from OpenVZ | expand |
On Tue, Mar 9, 2021 at 12:04 AM Vasily Averin <vvs@virtuozzo.com> wrote: > > OpenVZ many years accounted memory of few kernel objects, > this helps us to prevent host memory abuse from inside memcg-limited container. > The text is cryptic but I am assuming you wanted to say that OpenVZ has remained on a kernel which was still on opt-out kmem accounting i.e. <4.5. Now OpenVZ wants to move to a newer kernel and thus these patches are needed, right?
On 3/10/21 12:12 AM, Shakeel Butt wrote: > On Tue, Mar 9, 2021 at 12:04 AM Vasily Averin <vvs@virtuozzo.com> wrote: >> >> OpenVZ many years accounted memory of few kernel objects, >> this helps us to prevent host memory abuse from inside memcg-limited container. >> > > The text is cryptic but I am assuming you wanted to say that OpenVZ > has remained on a kernel which was still on opt-out kmem accounting > i.e. <4.5. Now OpenVZ wants to move to a newer kernel and thus these > patches are needed, right? Something like this. Frankly speaking I badly understand which arguments should I provide to upstream to enable accounting for some new king of objects. OpenVZ used own accounting subsystem since 2001 (i.e. since v2.2.x linux kernels) and we have accounted all required kernel objects by using our own patches. When memcg was added to upstream Vladimir Davydov added accounting of some objects to upstream but did not skipped another ones. Now OpenVZ uses RHEL7-based kernels with cgroup v1 in production, and we still account "skipped" objects by our own patches just because we accounted such objects before. We're working on rebase to new kernels and we prefer to push our old patches to upstream. Thank you, Vasily Averin
On Wed 10-03-21 13:17:19, Vasily Averin wrote: > On 3/10/21 12:12 AM, Shakeel Butt wrote: > > On Tue, Mar 9, 2021 at 12:04 AM Vasily Averin <vvs@virtuozzo.com> wrote: > >> > >> OpenVZ many years accounted memory of few kernel objects, > >> this helps us to prevent host memory abuse from inside memcg-limited container. > >> > > > > The text is cryptic but I am assuming you wanted to say that OpenVZ > > has remained on a kernel which was still on opt-out kmem accounting > > i.e. <4.5. Now OpenVZ wants to move to a newer kernel and thus these > > patches are needed, right? > > Something like this. > Frankly speaking I badly understand which arguments should I provide to upstream > to enable accounting for some new king of objects. > > OpenVZ used own accounting subsystem since 2001 (i.e. since v2.2.x linux kernels) > and we have accounted all required kernel objects by using our own patches. > When memcg was added to upstream Vladimir Davydov added accounting of some objects > to upstream but did not skipped another ones. > Now OpenVZ uses RHEL7-based kernels with cgroup v1 in production, and we still account > "skipped" objects by our own patches just because we accounted such objects before. > We're working on rebase to new kernels and we prefer to push our old patches to upstream. That is certainly an interesting information. But for a changelog it would be more appropriate to provide information about how much memory user can induce and whether there is any way to limit that memory by other means. How practical those other means are and which usecases will benefit from the containment.
On 3/10/21 1:41 PM, Michal Hocko wrote: > On Wed 10-03-21 13:17:19, Vasily Averin wrote: >> On 3/10/21 12:12 AM, Shakeel Butt wrote: >>> On Tue, Mar 9, 2021 at 12:04 AM Vasily Averin <vvs@virtuozzo.com> wrote: >>>> >>>> OpenVZ many years accounted memory of few kernel objects, >>>> this helps us to prevent host memory abuse from inside memcg-limited container. >>> >>> The text is cryptic but I am assuming you wanted to say that OpenVZ >>> has remained on a kernel which was still on opt-out kmem accounting >>> i.e. <4.5. Now OpenVZ wants to move to a newer kernel and thus these >>> patches are needed, right? >> >> Something like this. >> Frankly speaking I badly understand which arguments should I provide to upstream >> to enable accounting for some new king of objects. >> >> OpenVZ used own accounting subsystem since 2001 (i.e. since v2.2.x linux kernels) >> and we have accounted all required kernel objects by using our own patches. >> When memcg was added to upstream Vladimir Davydov added accounting of some objects >> to upstream but did not skipped another ones. >> Now OpenVZ uses RHEL7-based kernels with cgroup v1 in production, and we still account >> "skipped" objects by our own patches just because we accounted such objects before. >> We're working on rebase to new kernels and we prefer to push our old patches to upstream. > > That is certainly an interesting information. But for a changelog it > would be more appropriate to provide information about how much memory > user can induce and whether there is any way to limit that memory by > other means. How practical those other means are and which usecases will > benefit from the containment. Right now I would like to understand how should I argument my requests about accounting of new kind of objects. Which description it enough to enable object accounting? Could you please specify some edge rules? Should I push such patches trough this list? Is it probably better to send them to mailing lists of according subsystems? Should I notify them somehow at least? "untrusted netadmin inside memcg-limited container can create unlimited number of routing entries, trigger OOM on host that will be unable to find the reason of memory shortage and kill huge" "each mount inside memcg-limited container creates non-accounted mount object, but new mount namespace creation consumes huge piece of non-accounted memory for cloned mounts" "unprivileged user inside memcg-limited container can create non-accounted multi-page per-thread kernel objects for LDT" "non-accounted multi-page tty objects can be created from inside memcg-limited container" "unprivileged user inside memcg-limited container can trigger creation of huge number of non-accounted fasync_struct objects" Thank you, Vasily Averin
On Thu 11-03-21 10:00:17, Vasily Averin wrote: > On 3/10/21 1:41 PM, Michal Hocko wrote: [...] > > That is certainly an interesting information. But for a changelog it > > would be more appropriate to provide information about how much memory > > user can induce and whether there is any way to limit that memory by > > other means. How practical those other means are and which usecases will > > benefit from the containment. > > Right now I would like to understand how should I argument my requests about > accounting of new kind of objects. > > Which description it enough to enable object accounting? Doesn't the above paragraph give you a hint? > Could you please specify some edge rules? There are no strong rules AFAIK. I would say that it is important is that the user can trigger a lot of or unbound amount of objects. > Should I push such patches trough this list? yes linux-mm and ccing memcg maintainers is the proper way. It would be great to CC maintainers of the affected subsystem as well. > Is it probably better to send them to mailing lists of according subsystems? > Should I notify them somehow at least? > > "untrusted netadmin inside memcg-limited container can create unlimited number of routing entries, trigger OOM on host that will be unable to find the reason of memory shortage and kill huge" > > "each mount inside memcg-limited container creates non-accounted mount object, > but new mount namespace creation consumes huge piece of non-accounted memory for cloned mounts" > > "unprivileged user inside memcg-limited container can create non-accounted multi-page per-thread kernel objects for LDT" > > "non-accounted multi-page tty objects can be created from inside memcg-limited container" > > "unprivileged user inside memcg-limited container can trigger creation of huge number of non-accounted fasync_struct objects" OK, that sounds better to me. It would be also great if you can mention whether there are any other means to limit those objects if there are any. Thanks!
On Wed, Mar 10, 2021 at 11:00 PM Vasily Averin <vvs@virtuozzo.com> wrote: > > On 3/10/21 1:41 PM, Michal Hocko wrote: > > On Wed 10-03-21 13:17:19, Vasily Averin wrote: > >> On 3/10/21 12:12 AM, Shakeel Butt wrote: > >>> On Tue, Mar 9, 2021 at 12:04 AM Vasily Averin <vvs@virtuozzo.com> wrote: > >>>> > >>>> OpenVZ many years accounted memory of few kernel objects, > >>>> this helps us to prevent host memory abuse from inside memcg-limited container. > >>> > >>> The text is cryptic but I am assuming you wanted to say that OpenVZ > >>> has remained on a kernel which was still on opt-out kmem accounting > >>> i.e. <4.5. Now OpenVZ wants to move to a newer kernel and thus these > >>> patches are needed, right? > >> > >> Something like this. > >> Frankly speaking I badly understand which arguments should I provide to upstream > >> to enable accounting for some new king of objects. > >> > >> OpenVZ used own accounting subsystem since 2001 (i.e. since v2.2.x linux kernels) > >> and we have accounted all required kernel objects by using our own patches. > >> When memcg was added to upstream Vladimir Davydov added accounting of some objects > >> to upstream but did not skipped another ones. > >> Now OpenVZ uses RHEL7-based kernels with cgroup v1 in production, and we still account > >> "skipped" objects by our own patches just because we accounted such objects before. > >> We're working on rebase to new kernels and we prefer to push our old patches to upstream. > > > > That is certainly an interesting information. But for a changelog it > > would be more appropriate to provide information about how much memory > > user can induce and whether there is any way to limit that memory by > > other means. How practical those other means are and which usecases will > > benefit from the containment. > > Right now I would like to understand how should I argument my requests about > accounting of new kind of objects. > > Which description it enough to enable object accounting? > Could you please specify some edge rules? > Should I push such patches trough this list? > Is it probably better to send them to mailing lists of according subsystems? > Should I notify them somehow at least? > > "untrusted netadmin inside memcg-limited container can create unlimited number of routing entries, trigger OOM on host that will be unable to find the reason of memory shortage and kill huge" > > "each mount inside memcg-limited container creates non-accounted mount object, > but new mount namespace creation consumes huge piece of non-accounted memory for cloned mounts" > > "unprivileged user inside memcg-limited container can create non-accounted multi-page per-thread kernel objects for LDT" > > "non-accounted multi-page tty objects can be created from inside memcg-limited container" > > "unprivileged user inside memcg-limited container can trigger creation of huge number of non-accounted fasync_struct objects" > I think the above reasoning is good enough. Just resend your patches with the corresponding details.
On Mon, Mar 15, 2021 at 03:24:01PM +0300, Vasily Averin wrote: > Unprivileged user inside memcg-limited container can create > non-accounted multi-page per-thread kernel objects for LDT I have hard time parsing this commit message. And I'm CCed only on patch 8 of what looks like a patchset. And that patchset is not on lkml so I can't find the rest to read about it, perhaps linux-mm. /me goes and finds it on lore I can see some bits and pieces, this, for example: https://lore.kernel.org/linux-mm/05c448c7-d992-8d80-b423-b80bf5446d7c@virtuozzo.com/ ( Btw, that version has your SOB and this patch doesn't even have a Signed-off-by. Next time, run your whole set through checkpatch please before sending. ) Now, this URL above talks about OOM, ok, that gets me close to the "why" this patch. From a quick look at the ldt.c code, we allow a single LDT struct per mm. Manpage says so too: DESCRIPTION modify_ldt() reads or writes the local descriptor table (LDT) for a process. The LDT is an array of segment descriptors that can be referenced by user code. Linux allows processes to configure a per-process (actually per-mm) LDT. We allow /* Maximum number of LDT entries supported. */ #define LDT_ENTRIES 8192 so there's an upper limit per mm. Now, please explain what is this accounting for? Thx.
On 3/15/21 6:23 AM, Vasily Averin wrote: > An untrusted netadmin inside a memcg-limited container can create a > huge number of routing entries. Currently, allocated kernel objects > are not accounted to proper memcg, so this can lead to global memory > shortage on the host and cause lot of OOM kiils. > > One such object is the 'struct fib6_node' mostly allocated in > net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: > > write_lock_bh(&table->tb6_lock); > err = fib6_add(&table->tb6_root, rt, info, mxc); > write_unlock_bh(&table->tb6_lock); > > It this case is not enough to simply add SLAB_ACCOUNT to corresponding > kmem cache. The proper memory cgroup still cannot be found due to the > incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). > To be sure that caller is not executed in process contxt > '!in_task()' check should be used instead > --- > mm/memcontrol.c | 2 +- > net/ipv6/ip6_fib.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > Acked-by: David Ahern <dsahern@kernel.org>
On 3/15/21 6:23 AM, Vasily Averin wrote: > An untrusted netadmin inside a memcg-limited container can create a > huge number of routing entries. Currently, allocated kernel objects > are not accounted to proper memcg, so this can lead to global memory > shortage on the host and cause lot of OOM kiils. > > This patches enables accounting for 'struct rt6_info' allocations. > --- > net/ipv6/route.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Acked-by: David Ahern <dsahern@kernel.org>
On 3/15/21 6:23 AM, Vasily Averin wrote: > An untrusted netadmin inside a memcg-limited container can create a > huge number of routing entries. Currently, allocated kernel objects > are not accounted to proper memcg, so this can lead to global memory > shortage on the host and cause lot of OOM kiils. > > This patch enables accounting for 'struct fib_rules' > --- > net/core/fib_rules.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > Acked-by: David Ahern <dsahern@kernel.org>
On 3/15/21 6:23 AM, Vasily Averin wrote: > An untrusted netadmin inside a memcg-limited container can create a > huge number of routing entries. Currently, allocated kernel objects > are not accounted to proper memcg, so this can lead to global memory > shortage on the host and cause lot of OOM kiils. > > This patch enables accounting for ip_fib_alias and ip_fib_trie caches > --- > net/ipv4/fib_trie.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > Acked-by: David Ahern <dsahern@kernel.org>
On Mon, Mar 15, 2021 at 5:23 AM Vasily Averin <vvs@virtuozzo.com> wrote: > > An untrusted netadmin inside a memcg-limited container can create a > huge number of routing entries. Currently, allocated kernel objects > are not accounted to proper memcg, so this can lead to global memory > shortage on the host and cause lot of OOM kiils. > > One such object is the 'struct fib6_node' mostly allocated in > net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: > > write_lock_bh(&table->tb6_lock); > err = fib6_add(&table->tb6_root, rt, info, mxc); > write_unlock_bh(&table->tb6_lock); > > It this case is 'In this case it is' > not enough to simply add SLAB_ACCOUNT to corresponding > kmem cache. The proper memory cgroup still cannot be found due to the > incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). > To be sure that caller is not executed in process contxt 'context' > '!in_task()' check should be used instead You missed the signoff and it seems like the whole series is missing it as well. Please run scripts/checkpatch.pl on the patches before sending again. > --- > mm/memcontrol.c | 2 +- > net/ipv6/ip6_fib.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 845eec0..568f2cb 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1076,7 +1076,7 @@ static __always_inline bool memcg_kmem_bypass(void) > return false; > > /* Memcg to charge can't be determined. */ > - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) > + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) Can you please also add some explanation in the commit message on the differences between in_interrupt() and in_task()? Why is in_interrupt() not correct here but !in_task() is? What about kernels with or without PREEMPT_COUNT? > return true; > > return false; > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index ef9d022..fa92ed1 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -2445,7 +2445,7 @@ int __init fib6_init(void) > > fib6_node_kmem = kmem_cache_create("fib6_nodes", > sizeof(struct fib6_node), > - 0, SLAB_HWCACHE_ALIGN, > + 0, SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT, > NULL); > if (!fib6_node_kmem) > goto out; > -- > 1.8.3.1 >
On Mon, Mar 15, 2021 at 6:27 AM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Mar 15, 2021 at 03:24:01PM +0300, Vasily Averin wrote: > > Unprivileged user inside memcg-limited container can create > > non-accounted multi-page per-thread kernel objects for LDT > > I have hard time parsing this commit message. > > And I'm CCed only on patch 8 of what looks like a patchset. > > And that patchset is not on lkml so I can't find the rest to read about > it, perhaps linux-mm. > > /me goes and finds it on lore > > I can see some bits and pieces, this, for example: > > https://lore.kernel.org/linux-mm/05c448c7-d992-8d80-b423-b80bf5446d7c@virtuozzo.com/ > > ( Btw, that version has your SOB and this patch doesn't even have a > Signed-off-by. Next time, run your whole set through checkpatch please > before sending. ) > > Now, this URL above talks about OOM, ok, that gets me close to the "why" > this patch. > > From a quick look at the ldt.c code, we allow a single LDT struct per > mm. Manpage says so too: > > DESCRIPTION > modify_ldt() reads or writes the local descriptor table (LDT) for a process. > The LDT is an array of segment descriptors that can be referenced by user code. > Linux allows processes to configure a per-process (actually per-mm) LDT. > > We allow > > /* Maximum number of LDT entries supported. */ > #define LDT_ENTRIES 8192 > > so there's an upper limit per mm. > > Now, please explain what is this accounting for? > Let me try to provide the reasoning at least from my perspective. There are legitimate workloads with hundreds of processes and there can be hundreds of workloads running on large machines. The unaccounted memory can cause isolation issues between the workloads particularly on highly utilized machines.
On Mon, Mar 15, 2021 at 5:23 AM Vasily Averin <vvs@virtuozzo.com> wrote: > > unprivileged user inside memcg-limited container can trigger > creation of huge number of non-accounted fasync_struct objects You need to make each patch of this series self-contained by including the motivation behind the series (just one or two sentences). For example, for this patch include what's the potential impact of these huge numbers of unaccounted fasync_struct objects? > --- > fs/fcntl.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/fcntl.c b/fs/fcntl.c > index dfc72f1..7941559 100644 > --- a/fs/fcntl.c > +++ b/fs/fcntl.c > @@ -1049,7 +1049,8 @@ static int __init fcntl_init(void) > __FMODE_EXEC | __FMODE_NONOTIFY)); > > fasync_cache = kmem_cache_create("fasync_cache", > - sizeof(struct fasync_struct), 0, SLAB_PANIC, NULL); > + sizeof(struct fasync_struct), 0, > + SLAB_PANIC | SLAB_ACCOUNT, NULL); > return 0; > } > > -- > 1.8.3.1 >
On Mon 15-03-21 08:48:26, Shakeel Butt wrote: > On Mon, Mar 15, 2021 at 6:27 AM Borislav Petkov <bp@alien8.de> wrote: > > > > On Mon, Mar 15, 2021 at 03:24:01PM +0300, Vasily Averin wrote: > > > Unprivileged user inside memcg-limited container can create > > > non-accounted multi-page per-thread kernel objects for LDT > > > > I have hard time parsing this commit message. > > > > And I'm CCed only on patch 8 of what looks like a patchset. > > > > And that patchset is not on lkml so I can't find the rest to read about > > it, perhaps linux-mm. > > > > /me goes and finds it on lore > > > > I can see some bits and pieces, this, for example: > > > > https://lore.kernel.org/linux-mm/05c448c7-d992-8d80-b423-b80bf5446d7c@virtuozzo.com/ > > > > ( Btw, that version has your SOB and this patch doesn't even have a > > Signed-off-by. Next time, run your whole set through checkpatch please > > before sending. ) > > > > Now, this URL above talks about OOM, ok, that gets me close to the "why" > > this patch. > > > > From a quick look at the ldt.c code, we allow a single LDT struct per > > mm. Manpage says so too: > > > > DESCRIPTION > > modify_ldt() reads or writes the local descriptor table (LDT) for a process. > > The LDT is an array of segment descriptors that can be referenced by user code. > > Linux allows processes to configure a per-process (actually per-mm) LDT. > > > > We allow > > > > /* Maximum number of LDT entries supported. */ > > #define LDT_ENTRIES 8192 > > > > so there's an upper limit per mm. > > > > Now, please explain what is this accounting for? > > > > Let me try to provide the reasoning at least from my perspective. > There are legitimate workloads with hundreds of processes and there > can be hundreds of workloads running on large machines. The > unaccounted memory can cause isolation issues between the workloads > particularly on highly utilized machines. It would be better to be explicit 8192 * 8 = 64kB * number_of_tasks so realistically this is in range of lower megabytes. Is this worth the memcg accounting overhead? Maybe yes but what kind of workloads really care?
On Mon, Mar 15, 2021 at 08:48:26AM -0700, Shakeel Butt wrote: > Let me try to provide the reasoning at least from my perspective. > There are legitimate workloads with hundreds of processes and there > can be hundreds of workloads running on large machines. The > unaccounted memory can cause isolation issues between the workloads > particularly on highly utilized machines. Good enough for me, as long as that is part of the commit message. Thx.
On Mon, 15 Mar 2021 15:23:00 +0300 Vasily Averin wrote: > An untrusted netadmin inside a memcg-limited container can create a > huge number of routing entries. Currently, allocated kernel objects > are not accounted to proper memcg, so this can lead to global memory > shortage on the host and cause lot of OOM kiils. > > One such object is the 'struct fib6_node' mostly allocated in > net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: > > write_lock_bh(&table->tb6_lock); > err = fib6_add(&table->tb6_root, rt, info, mxc); > write_unlock_bh(&table->tb6_lock); > > It this case is not enough to simply add SLAB_ACCOUNT to corresponding > kmem cache. The proper memory cgroup still cannot be found due to the > incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). > To be sure that caller is not executed in process contxt > '!in_task()' check should be used instead Sorry for a random question, I didn't get the cover letter. What's the overhead of adding SLAB_ACCOUNT? Please make sure you CC netdev on series which may impact networking.
On Mon, Mar 15, 2021 at 10:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 15 Mar 2021 15:23:00 +0300 Vasily Averin wrote: > > An untrusted netadmin inside a memcg-limited container can create a > > huge number of routing entries. Currently, allocated kernel objects > > are not accounted to proper memcg, so this can lead to global memory > > shortage on the host and cause lot of OOM kiils. > > > > One such object is the 'struct fib6_node' mostly allocated in > > net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: > > > > write_lock_bh(&table->tb6_lock); > > err = fib6_add(&table->tb6_root, rt, info, mxc); > > write_unlock_bh(&table->tb6_lock); > > > > It this case is not enough to simply add SLAB_ACCOUNT to corresponding > > kmem cache. The proper memory cgroup still cannot be found due to the > > incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). > > To be sure that caller is not executed in process contxt > > '!in_task()' check should be used instead > > Sorry for a random question, I didn't get the cover letter. > > What's the overhead of adding SLAB_ACCOUNT? > The potential overhead is for MEMCG users where we need to charge/account each allocation from SLAB_ACCOUNT kmem caches. However charging is done in batches, so the cost is amortized. If there is a concern about a specific workload then it would be good to see the impact of this patch for that workload. > Please make sure you CC netdev on series which may impact networking.
On Mon, Mar 15, 2021 at 12:24:31PM -0700, Shakeel Butt wrote: > On Mon, Mar 15, 2021 at 10:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Mon, 15 Mar 2021 15:23:00 +0300 Vasily Averin wrote: > > > An untrusted netadmin inside a memcg-limited container can create a > > > huge number of routing entries. Currently, allocated kernel objects > > > are not accounted to proper memcg, so this can lead to global memory > > > shortage on the host and cause lot of OOM kiils. > > > > > > One such object is the 'struct fib6_node' mostly allocated in > > > net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: > > > > > > write_lock_bh(&table->tb6_lock); > > > err = fib6_add(&table->tb6_root, rt, info, mxc); > > > write_unlock_bh(&table->tb6_lock); > > > > > > It this case is not enough to simply add SLAB_ACCOUNT to corresponding > > > kmem cache. The proper memory cgroup still cannot be found due to the > > > incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). > > > To be sure that caller is not executed in process contxt > > > '!in_task()' check should be used instead > > > > Sorry for a random question, I didn't get the cover letter. > > > > What's the overhead of adding SLAB_ACCOUNT? > > > > The potential overhead is for MEMCG users where we need to > charge/account each allocation from SLAB_ACCOUNT kmem caches. However > charging is done in batches, so the cost is amortized. If there is a > concern about a specific workload then it would be good to see the > impact of this patch for that workload. > > > Please make sure you CC netdev on series which may impact networking. In general the overhead is not that big, so I don't think we should argue too much about every new case where we want to enable the accounting and rather focus on those few examples (if any?) where it actually hurts the performance in a meaningful way. Thanks!
On Mon, 15 Mar 2021 12:32:07 -0700 Roman Gushchin wrote: > On Mon, Mar 15, 2021 at 12:24:31PM -0700, Shakeel Butt wrote: > > On Mon, Mar 15, 2021 at 10:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > Sorry for a random question, I didn't get the cover letter. > > > > > > What's the overhead of adding SLAB_ACCOUNT? > > > > The potential overhead is for MEMCG users where we need to > > charge/account each allocation from SLAB_ACCOUNT kmem caches. However > > charging is done in batches, so the cost is amortized. If there is a > > concern about a specific workload then it would be good to see the > > impact of this patch for that workload. > > > > > Please make sure you CC netdev on series which may impact networking. > > In general the overhead is not that big, so I don't think we should argue > too much about every new case where we want to enable the accounting and > rather focus on those few examples (if any?) where it actually hurts > the performance in a meaningful way. Ack, no serious concerns about this particular case. I was expecting you'd have micro benchmark numbers handy so I was curious to learn what they are, but that appears not to be the case.