diff mbox series

[v6,07/10] mm: synchronize access to kmem_cache dying flag using a spinlock

Message ID 20190605024454.1393507-8-guro@fb.com (mailing list archive)
State New, archived
Headers show
Series mm: reparent slab memory on cgroup removal | expand

Commit Message

Roman Gushchin June 5, 2019, 2:44 a.m. UTC
Currently the memcg_params.dying flag and the corresponding
workqueue used for the asynchronous deactivation of kmem_caches
is synchronized using the slab_mutex.

It makes impossible to check this flag from the irq context,
which will be required in order to implement asynchronous release
of kmem_caches.

So let's switch over to the irq-save flavor of the spinlock-based
synchronization.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/slab_common.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Johannes Weiner June 5, 2019, 4:56 p.m. UTC | #1
On Tue, Jun 04, 2019 at 07:44:51PM -0700, Roman Gushchin wrote:
> Currently the memcg_params.dying flag and the corresponding
> workqueue used for the asynchronous deactivation of kmem_caches
> is synchronized using the slab_mutex.
> 
> It makes impossible to check this flag from the irq context,
> which will be required in order to implement asynchronous release
> of kmem_caches.
> 
> So let's switch over to the irq-save flavor of the spinlock-based
> synchronization.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
>  mm/slab_common.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 09b26673b63f..2914a8f0aa85 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -130,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
>  #ifdef CONFIG_MEMCG_KMEM
>  
>  LIST_HEAD(slab_root_caches);
> +static DEFINE_SPINLOCK(memcg_kmem_wq_lock);
>  
>  void slab_init_memcg_params(struct kmem_cache *s)
>  {
> @@ -629,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
>  	struct memcg_cache_array *arr;
>  	struct kmem_cache *s = NULL;
>  	char *cache_name;
> +	bool dying;
>  	int idx;
>  
>  	get_online_cpus();
> @@ -640,7 +642,13 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
>  	 * The memory cgroup could have been offlined while the cache
>  	 * creation work was pending.
>  	 */
> -	if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
> +	if (memcg->kmem_state != KMEM_ONLINE)
> +		goto out_unlock;
> +
> +	spin_lock_irq(&memcg_kmem_wq_lock);
> +	dying = root_cache->memcg_params.dying;
> +	spin_unlock_irq(&memcg_kmem_wq_lock);
> +	if (dying)
>  		goto out_unlock;

What does this lock protect? The dying flag could get set right after
the unlock.
Roman Gushchin June 5, 2019, 10:02 p.m. UTC | #2
On Wed, Jun 05, 2019 at 12:56:16PM -0400, Johannes Weiner wrote:
> On Tue, Jun 04, 2019 at 07:44:51PM -0700, Roman Gushchin wrote:
> > Currently the memcg_params.dying flag and the corresponding
> > workqueue used for the asynchronous deactivation of kmem_caches
> > is synchronized using the slab_mutex.
> > 
> > It makes impossible to check this flag from the irq context,
> > which will be required in order to implement asynchronous release
> > of kmem_caches.
> > 
> > So let's switch over to the irq-save flavor of the spinlock-based
> > synchronization.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > ---
> >  mm/slab_common.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 09b26673b63f..2914a8f0aa85 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -130,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
> >  #ifdef CONFIG_MEMCG_KMEM
> >  
> >  LIST_HEAD(slab_root_caches);
> > +static DEFINE_SPINLOCK(memcg_kmem_wq_lock);
> >  
> >  void slab_init_memcg_params(struct kmem_cache *s)
> >  {
> > @@ -629,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> >  	struct memcg_cache_array *arr;
> >  	struct kmem_cache *s = NULL;
> >  	char *cache_name;
> > +	bool dying;
> >  	int idx;
> >  
> >  	get_online_cpus();
> > @@ -640,7 +642,13 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> >  	 * The memory cgroup could have been offlined while the cache
> >  	 * creation work was pending.
> >  	 */
> > -	if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
> > +	if (memcg->kmem_state != KMEM_ONLINE)
> > +		goto out_unlock;
> > +
> > +	spin_lock_irq(&memcg_kmem_wq_lock);
> > +	dying = root_cache->memcg_params.dying;
> > +	spin_unlock_irq(&memcg_kmem_wq_lock);
> > +	if (dying)
> >  		goto out_unlock;
> 
> What does this lock protect? The dying flag could get set right after
> the unlock.
>

Hi Johannes!

Here is my logic:

1) flush_memcg_workqueue() must guarantee that no new memcg kmem_caches
will be created, and there are no works queued, which will touch
the root kmem_cache, so it can be released
2) so it sets the dying flag, waits for an rcu grace period and flushes
the workqueue (that means for all in-flight works)
3) dying flag in checked in kmemcg_cache_shutdown() and
kmemcg_cache_deactivate(), so that if it set, no new works/rcu tasks
will be queued. corresponding queue_work()/call_rcu() are all under
memcg_kmem_wq_lock lock.
4) memcg_schedule_kmem_cache_create() doesn't check the dying flag
(probably to avoid taking locks on a hot path), but it does
memcg_create_kmem_cache(), which is part of the scheduled work.
And it does it at the very beginning, so even if new kmem_caches
are scheduled to be created, the root kmem_cache won't be touched.

Previously the flag was checked under slab_mutex, but now we set it
under memcg_kmem_wq_lock lock. So I'm not sure we can read it without
taking this lock.

If the flag will be set after unlock, it's fine. It means that the
work has already been scheduled, and flush_workqueue() in
flush_memcg_workqueue() will wait for it. The only problem is if we
don't see the flag after flush_workqueue() is called, but I don't
see how it's possible.

Does it makes sense? I'm sure there are ways to make it more obvious.
Please, let me know if you've any ideas.

Thank you!
Roman Gushchin June 6, 2019, 12:48 a.m. UTC | #3
On Wed, Jun 05, 2019 at 03:02:03PM -0700, Roman Gushchin wrote:
> On Wed, Jun 05, 2019 at 12:56:16PM -0400, Johannes Weiner wrote:
> > On Tue, Jun 04, 2019 at 07:44:51PM -0700, Roman Gushchin wrote:
> > > Currently the memcg_params.dying flag and the corresponding
> > > workqueue used for the asynchronous deactivation of kmem_caches
> > > is synchronized using the slab_mutex.
> > > 
> > > It makes impossible to check this flag from the irq context,
> > > which will be required in order to implement asynchronous release
> > > of kmem_caches.
> > > 
> > > So let's switch over to the irq-save flavor of the spinlock-based
> > > synchronization.
> > > 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > ---
> > >  mm/slab_common.c | 19 +++++++++++++++----
> > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > index 09b26673b63f..2914a8f0aa85 100644
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -130,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
> > >  #ifdef CONFIG_MEMCG_KMEM
> > >  
> > >  LIST_HEAD(slab_root_caches);
> > > +static DEFINE_SPINLOCK(memcg_kmem_wq_lock);
> > >  
> > >  void slab_init_memcg_params(struct kmem_cache *s)
> > >  {
> > > @@ -629,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> > >  	struct memcg_cache_array *arr;
> > >  	struct kmem_cache *s = NULL;
> > >  	char *cache_name;
> > > +	bool dying;
> > >  	int idx;
> > >  
> > >  	get_online_cpus();
> > > @@ -640,7 +642,13 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> > >  	 * The memory cgroup could have been offlined while the cache
> > >  	 * creation work was pending.
> > >  	 */
> > > -	if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
> > > +	if (memcg->kmem_state != KMEM_ONLINE)
> > > +		goto out_unlock;
> > > +
> > > +	spin_lock_irq(&memcg_kmem_wq_lock);
> > > +	dying = root_cache->memcg_params.dying;
> > > +	spin_unlock_irq(&memcg_kmem_wq_lock);
> > > +	if (dying)
> > >  		goto out_unlock;
> > 
> > What does this lock protect? The dying flag could get set right after
> > the unlock.
> >
> 
> Hi Johannes!
> 
> Here is my logic:
> 
> 1) flush_memcg_workqueue() must guarantee that no new memcg kmem_caches
> will be created, and there are no works queued, which will touch
> the root kmem_cache, so it can be released
> 2) so it sets the dying flag, waits for an rcu grace period and flushes
> the workqueue (that means for all in-flight works)
> 3) dying flag in checked in kmemcg_cache_shutdown() and
> kmemcg_cache_deactivate(), so that if it set, no new works/rcu tasks
> will be queued. corresponding queue_work()/call_rcu() are all under
> memcg_kmem_wq_lock lock.
> 4) memcg_schedule_kmem_cache_create() doesn't check the dying flag
> (probably to avoid taking locks on a hot path), but it does
> memcg_create_kmem_cache(), which is part of the scheduled work.
> And it does it at the very beginning, so even if new kmem_caches
> are scheduled to be created, the root kmem_cache won't be touched.
> 
> Previously the flag was checked under slab_mutex, but now we set it
> under memcg_kmem_wq_lock lock. So I'm not sure we can read it without
> taking this lock.
> 
> If the flag will be set after unlock, it's fine. It means that the
> work has already been scheduled, and flush_workqueue() in
> flush_memcg_workqueue() will wait for it. The only problem is if we
> don't see the flag after flush_workqueue() is called, but I don't
> see how it's possible.
> 
> Does it makes sense? I'm sure there are ways to make it more obvious.
> Please, let me know if you've any ideas.

Hm, after some thoughts, I've found that the problem is that we check
the dying flag of the root cache. But it's the same in the existing code.

So currently (without my patches):
1) we do set the dying flag under slab_mutex
2) waiting for the workqueue to flush
3) grabbing the slab_mutex and going to release the root kmem_cache

a concurrent memcg_kmem_cache_create_func() can be scheduled after 2),
grab the slab_mutex after 3) and check the kmem_cache->memcg_params.dying
flag of already released kmem_cache.

The reason why it's not a real problem is that it's expected from a user
that kmem_cache will not be used for new allocations after calling
kmem_cache_destroy(). It means no new memcg kmem_cache creation will be
scheduled, and we can avoid checking the dying flag at all.

Does this makes sense?

Thanks!
Vladimir Davydov June 9, 2019, 2:31 p.m. UTC | #4
On Tue, Jun 04, 2019 at 07:44:51PM -0700, Roman Gushchin wrote:
> Currently the memcg_params.dying flag and the corresponding
> workqueue used for the asynchronous deactivation of kmem_caches
> is synchronized using the slab_mutex.
> 
> It makes impossible to check this flag from the irq context,
> which will be required in order to implement asynchronous release
> of kmem_caches.
> 
> So let's switch over to the irq-save flavor of the spinlock-based
> synchronization.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
>  mm/slab_common.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 09b26673b63f..2914a8f0aa85 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -130,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
>  #ifdef CONFIG_MEMCG_KMEM
>  
>  LIST_HEAD(slab_root_caches);
> +static DEFINE_SPINLOCK(memcg_kmem_wq_lock);
>  
>  void slab_init_memcg_params(struct kmem_cache *s)
>  {
> @@ -629,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
>  	struct memcg_cache_array *arr;
>  	struct kmem_cache *s = NULL;
>  	char *cache_name;
> +	bool dying;
>  	int idx;
>  
>  	get_online_cpus();
> @@ -640,7 +642,13 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
>  	 * The memory cgroup could have been offlined while the cache
>  	 * creation work was pending.
>  	 */
> -	if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
> +	if (memcg->kmem_state != KMEM_ONLINE)
> +		goto out_unlock;
> +
> +	spin_lock_irq(&memcg_kmem_wq_lock);
> +	dying = root_cache->memcg_params.dying;
> +	spin_unlock_irq(&memcg_kmem_wq_lock);
> +	if (dying)
>  		goto out_unlock;

I do understand why we need to sync setting dying flag for a kmem cache
about to be destroyed in flush_memcg_workqueue vs checking the flag in
kmemcg_cache_deactivate: this is needed so that we don't schedule a new
deactivation work after we flush RCU/workqueue. However, I don't think
it's necessary to check the dying flag here, in memcg_create_kmem_cache:
we can't schedule a new cache creation work after kmem_cache_destroy has
started, because one mustn't allocate from a dead kmem cache; since we
flush the queue before getting to actual destruction, no cache creation
work can be pending. Yeah, it might happen that a cache creation work
starts execution while flush_memcg_workqueue is in progress, but I don't
see any point in optimizing this case - after all, cache destruction is
a very cold path. Since checking the flag in memcg_create_kmem_cache
raises question, I suggest to simply drop this check.

Anyway, it would be nice to see some comment in the code explaining why
we check dying flag under a spin lock in kmemcg_cache_deactivate.

>  
>  	idx = memcg_cache_id(memcg);
> @@ -735,14 +743,17 @@ static void kmemcg_cache_deactivate(struct kmem_cache *s)
>  
>  	__kmemcg_cache_deactivate(s);
>  
> +	spin_lock_irq(&memcg_kmem_wq_lock);
>  	if (s->memcg_params.root_cache->memcg_params.dying)
> -		return;
> +		goto unlock;
>  
>  	/* pin memcg so that @s doesn't get destroyed in the middle */
>  	css_get(&s->memcg_params.memcg->css);
>  
>  	s->memcg_params.work_fn = __kmemcg_cache_deactivate_after_rcu;
>  	call_rcu(&s->memcg_params.rcu_head, kmemcg_rcufn);
> +unlock:
> +	spin_unlock_irq(&memcg_kmem_wq_lock);
>  }
>  
>  void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg)
> @@ -852,9 +863,9 @@ static int shutdown_memcg_caches(struct kmem_cache *s)
>  
>  static void flush_memcg_workqueue(struct kmem_cache *s)
>  {
> -	mutex_lock(&slab_mutex);
> +	spin_lock_irq(&memcg_kmem_wq_lock);
>  	s->memcg_params.dying = true;
> -	mutex_unlock(&slab_mutex);
> +	spin_unlock_irq(&memcg_kmem_wq_lock);
>  
>  	/*
>  	 * SLAB and SLUB deactivate the kmem_caches through call_rcu. Make
Roman Gushchin June 10, 2019, 8:46 p.m. UTC | #5
On Sun, Jun 09, 2019 at 05:31:32PM +0300, Vladimir Davydov wrote:
> On Tue, Jun 04, 2019 at 07:44:51PM -0700, Roman Gushchin wrote:
> > Currently the memcg_params.dying flag and the corresponding
> > workqueue used for the asynchronous deactivation of kmem_caches
> > is synchronized using the slab_mutex.
> > 
> > It makes impossible to check this flag from the irq context,
> > which will be required in order to implement asynchronous release
> > of kmem_caches.
> > 
> > So let's switch over to the irq-save flavor of the spinlock-based
> > synchronization.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > ---
> >  mm/slab_common.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 09b26673b63f..2914a8f0aa85 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -130,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
> >  #ifdef CONFIG_MEMCG_KMEM
> >  
> >  LIST_HEAD(slab_root_caches);
> > +static DEFINE_SPINLOCK(memcg_kmem_wq_lock);
> >  
> >  void slab_init_memcg_params(struct kmem_cache *s)
> >  {
> > @@ -629,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> >  	struct memcg_cache_array *arr;
> >  	struct kmem_cache *s = NULL;
> >  	char *cache_name;
> > +	bool dying;
> >  	int idx;
> >  
> >  	get_online_cpus();
> > @@ -640,7 +642,13 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> >  	 * The memory cgroup could have been offlined while the cache
> >  	 * creation work was pending.
> >  	 */
> > -	if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
> > +	if (memcg->kmem_state != KMEM_ONLINE)
> > +		goto out_unlock;
> > +
> > +	spin_lock_irq(&memcg_kmem_wq_lock);
> > +	dying = root_cache->memcg_params.dying;
> > +	spin_unlock_irq(&memcg_kmem_wq_lock);
> > +	if (dying)
> >  		goto out_unlock;
> 
> I do understand why we need to sync setting dying flag for a kmem cache
> about to be destroyed in flush_memcg_workqueue vs checking the flag in
> kmemcg_cache_deactivate: this is needed so that we don't schedule a new
> deactivation work after we flush RCU/workqueue. However, I don't think
> it's necessary to check the dying flag here, in memcg_create_kmem_cache:
> we can't schedule a new cache creation work after kmem_cache_destroy has
> started, because one mustn't allocate from a dead kmem cache; since we
> flush the queue before getting to actual destruction, no cache creation
> work can be pending. Yeah, it might happen that a cache creation work
> starts execution while flush_memcg_workqueue is in progress, but I don't
> see any point in optimizing this case - after all, cache destruction is
> a very cold path. Since checking the flag in memcg_create_kmem_cache
> raises question, I suggest to simply drop this check.

Yeah, I came to the same conclusion (in a thread with Johannes),
that this check is not required. I'll drop it in a separate patch.

> 
> Anyway, it would be nice to see some comment in the code explaining why
> we check dying flag under a spin lock in kmemcg_cache_deactivate.

Sure, will add some.

Btw, thank you very much for reviewing the series!
diff mbox series

Patch

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 09b26673b63f..2914a8f0aa85 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -130,6 +130,7 @@  int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr,
 #ifdef CONFIG_MEMCG_KMEM
 
 LIST_HEAD(slab_root_caches);
+static DEFINE_SPINLOCK(memcg_kmem_wq_lock);
 
 void slab_init_memcg_params(struct kmem_cache *s)
 {
@@ -629,6 +630,7 @@  void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	struct memcg_cache_array *arr;
 	struct kmem_cache *s = NULL;
 	char *cache_name;
+	bool dying;
 	int idx;
 
 	get_online_cpus();
@@ -640,7 +642,13 @@  void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	 * The memory cgroup could have been offlined while the cache
 	 * creation work was pending.
 	 */
-	if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying)
+	if (memcg->kmem_state != KMEM_ONLINE)
+		goto out_unlock;
+
+	spin_lock_irq(&memcg_kmem_wq_lock);
+	dying = root_cache->memcg_params.dying;
+	spin_unlock_irq(&memcg_kmem_wq_lock);
+	if (dying)
 		goto out_unlock;
 
 	idx = memcg_cache_id(memcg);
@@ -735,14 +743,17 @@  static void kmemcg_cache_deactivate(struct kmem_cache *s)
 
 	__kmemcg_cache_deactivate(s);
 
+	spin_lock_irq(&memcg_kmem_wq_lock);
 	if (s->memcg_params.root_cache->memcg_params.dying)
-		return;
+		goto unlock;
 
 	/* pin memcg so that @s doesn't get destroyed in the middle */
 	css_get(&s->memcg_params.memcg->css);
 
 	s->memcg_params.work_fn = __kmemcg_cache_deactivate_after_rcu;
 	call_rcu(&s->memcg_params.rcu_head, kmemcg_rcufn);
+unlock:
+	spin_unlock_irq(&memcg_kmem_wq_lock);
 }
 
 void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg)
@@ -852,9 +863,9 @@  static int shutdown_memcg_caches(struct kmem_cache *s)
 
 static void flush_memcg_workqueue(struct kmem_cache *s)
 {
-	mutex_lock(&slab_mutex);
+	spin_lock_irq(&memcg_kmem_wq_lock);
 	s->memcg_params.dying = true;
-	mutex_unlock(&slab_mutex);
+	spin_unlock_irq(&memcg_kmem_wq_lock);
 
 	/*
 	 * SLAB and SLUB deactivate the kmem_caches through call_rcu. Make