diff mbox series

[v2,1/1] mm/vmalloc: remove preempt_disable/enable when do preloading

Message ID 20191010223318.28115-1-urezki@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/1] mm/vmalloc: remove preempt_disable/enable when do preloading | expand

Commit Message

Uladzislau Rezki Oct. 10, 2019, 10:33 p.m. UTC
Get rid of preempt_disable() and preempt_enable() when the
preload is done for splitting purpose. The reason is that
calling spin_lock() with disabled preemtion is forbidden in
CONFIG_PREEMPT_RT kernel.

Therefore, we do not guarantee that a CPU is preloaded, instead
we minimize the case when it is not with this change.

For example i run the special test case that follows the preload
pattern and path. 20 "unbind" threads run it and each does
1000000 allocations. Only 3.5 times among 1000000 a CPU was
not preloaded. So it can happen but the number is negligible.

V1 -> V2:
  - move __this_cpu_cmpxchg check when spin_lock is taken,
    as proposed by Andrew Morton
  - add more explanation in regard of preloading
  - adjust and move some comments

Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 50 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 17 deletions(-)

Comments

Sebastian Andrzej Siewior Oct. 11, 2019, 6:15 p.m. UTC | #1
On 2019-10-11 00:33:18 [+0200], Uladzislau Rezki (Sony) wrote:
> Get rid of preempt_disable() and preempt_enable() when the
> preload is done for splitting purpose. The reason is that
> calling spin_lock() with disabled preemtion is forbidden in
> CONFIG_PREEMPT_RT kernel.
> 
> Therefore, we do not guarantee that a CPU is preloaded, instead
> we minimize the case when it is not with this change.
> 
> For example i run the special test case that follows the preload
> pattern and path. 20 "unbind" threads run it and each does
> 1000000 allocations. Only 3.5 times among 1000000 a CPU was
> not preloaded. So it can happen but the number is negligible.
> 
> V1 -> V2:
>   - move __this_cpu_cmpxchg check when spin_lock is taken,
>     as proposed by Andrew Morton
>   - add more explanation in regard of preloading
>   - adjust and move some comments
> 
> Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thank you.

Sebastian
Daniel Wagner Oct. 11, 2019, 9:56 p.m. UTC | #2
On 10/11/19 8:15 PM, Sebastian Andrzej Siewior wrote:
> On 2019-10-11 00:33:18 [+0200], Uladzislau Rezki (Sony) wrote:
>> Get rid of preempt_disable() and preempt_enable() when the
>> preload is done for splitting purpose. The reason is that
>> calling spin_lock() with disabled preemtion is forbidden in
>> CONFIG_PREEMPT_RT kernel.
>>
>> Therefore, we do not guarantee that a CPU is preloaded, instead
>> we minimize the case when it is not with this change.
>>
>> For example i run the special test case that follows the preload
>> pattern and path. 20 "unbind" threads run it and each does
>> 1000000 allocations. Only 3.5 times among 1000000 a CPU was
>> not preloaded. So it can happen but the number is negligible.
>>
>> V1 -> V2:
>>    - move __this_cpu_cmpxchg check when spin_lock is taken,
>>      as proposed by Andrew Morton
>>    - add more explanation in regard of preloading
>>    - adjust and move some comments
>>
>> Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
>> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Acked-by: Daniel Wagner <dwagner@suse.de>
Michal Hocko Oct. 14, 2019, 1:13 p.m. UTC | #3
On Fri 11-10-19 00:33:18, Uladzislau Rezki (Sony) wrote:
> Get rid of preempt_disable() and preempt_enable() when the
> preload is done for splitting purpose. The reason is that
> calling spin_lock() with disabled preemtion is forbidden in
> CONFIG_PREEMPT_RT kernel.

I think it would be really helpful to describe why the preemption was
disabled in that path. Some of that is explained in the comment but the
changelog should mention that explicitly.
 
> Therefore, we do not guarantee that a CPU is preloaded, instead
> we minimize the case when it is not with this change.
> 
> For example i run the special test case that follows the preload
> pattern and path. 20 "unbind" threads run it and each does
> 1000000 allocations. Only 3.5 times among 1000000 a CPU was
> not preloaded. So it can happen but the number is negligible.
> 
> V1 -> V2:
>   - move __this_cpu_cmpxchg check when spin_lock is taken,
>     as proposed by Andrew Morton
>   - add more explanation in regard of preloading
>   - adjust and move some comments
> 
> Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 50 +++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e92ff5f7dd8b..f48cd0711478 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -969,6 +969,19 @@ adjust_va_to_fit_type(struct vmap_area *va,
>  			 * There are a few exceptions though, as an example it is
>  			 * a first allocation (early boot up) when we have "one"
>  			 * big free space that has to be split.
> +			 *
> +			 * Also we can hit this path in case of regular "vmap"
> +			 * allocations, if "this" current CPU was not preloaded.
> +			 * See the comment in alloc_vmap_area() why. If so, then
> +			 * GFP_NOWAIT is used instead to get an extra object for
> +			 * split purpose. That is rare and most time does not
> +			 * occur.
> +			 *
> +			 * What happens if an allocation gets failed. Basically,
> +			 * an "overflow" path is triggered to purge lazily freed
> +			 * areas to free some memory, then, the "retry" path is
> +			 * triggered to repeat one more time. See more details
> +			 * in alloc_vmap_area() function.
>  			 */
>  			lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);

This doesn't seem to have anything to do with the patch. Have you
considered to make it a patch on its own? Btw. I find this comment very
useful!

>  			if (!lva)
> @@ -1078,31 +1091,34 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  
>  retry:
>  	/*
> -	 * Preload this CPU with one extra vmap_area object to ensure
> -	 * that we have it available when fit type of free area is
> -	 * NE_FIT_TYPE.
> +	 * Preload this CPU with one extra vmap_area object. It is used
> +	 * when fit type of free area is NE_FIT_TYPE. Please note, it
> +	 * does not guarantee that an allocation occurs on a CPU that
> +	 * is preloaded, instead we minimize the case when it is not.
> +	 * It can happen because of migration, because there is a race
> +	 * until the below spinlock is taken.

s@migration@cpu migration@ because migration without on its own is quite
ambiguous, especially in the MM code where it usually refers to memory.

>  	 *
>  	 * The preload is done in non-atomic context, thus it allows us
>  	 * to use more permissive allocation masks to be more stable under
> -	 * low memory condition and high memory pressure.
> +	 * low memory condition and high memory pressure. In rare case,
> +	 * if not preloaded, GFP_NOWAIT is used.
>  	 *
> -	 * Even if it fails we do not really care about that. Just proceed
> -	 * as it is. "overflow" path will refill the cache we allocate from.
> +	 * Set "pva" to NULL here, because of "retry" path.
>  	 */
> -	preempt_disable();
> -	if (!__this_cpu_read(ne_fit_preload_node)) {
> -		preempt_enable();
> -		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
> -		preempt_disable();
> +	pva = NULL;
>  
> -		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
> -			if (pva)
> -				kmem_cache_free(vmap_area_cachep, pva);
> -		}
> -	}
> +	if (!this_cpu_read(ne_fit_preload_node))
> +		/*
> +		 * Even if it fails we do not really care about that.
> +		 * Just proceed as it is. If needed "overflow" path
> +		 * will refill the cache we allocate from.
> +		 */
> +		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
>  
>  	spin_lock(&vmap_area_lock);
> -	preempt_enable();
> +
> +	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
> +		kmem_cache_free(vmap_area_cachep, pva);
>  
>  	/*
>  	 * If an allocation fails, the "vend" address is
> -- 
> 2.20.1
Uladzislau Rezki Oct. 14, 2019, 2:30 p.m. UTC | #4
On Mon, Oct 14, 2019 at 03:13:08PM +0200, Michal Hocko wrote:
> On Fri 11-10-19 00:33:18, Uladzislau Rezki (Sony) wrote:
> > Get rid of preempt_disable() and preempt_enable() when the
> > preload is done for splitting purpose. The reason is that
> > calling spin_lock() with disabled preemtion is forbidden in
> > CONFIG_PREEMPT_RT kernel.
> 
> I think it would be really helpful to describe why the preemption was
> disabled in that path. Some of that is explained in the comment but the
> changelog should mention that explicitly.
>  
Will do that, makes sense.

> > Therefore, we do not guarantee that a CPU is preloaded, instead
> > we minimize the case when it is not with this change.
> > 
> > For example i run the special test case that follows the preload
> > pattern and path. 20 "unbind" threads run it and each does
> > 1000000 allocations. Only 3.5 times among 1000000 a CPU was
> > not preloaded. So it can happen but the number is negligible.
> > 
> > V1 -> V2:
> >   - move __this_cpu_cmpxchg check when spin_lock is taken,
> >     as proposed by Andrew Morton
> >   - add more explanation in regard of preloading
> >   - adjust and move some comments
> > 
> > Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
> > Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  mm/vmalloc.c | 50 +++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 33 insertions(+), 17 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index e92ff5f7dd8b..f48cd0711478 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -969,6 +969,19 @@ adjust_va_to_fit_type(struct vmap_area *va,
> >  			 * There are a few exceptions though, as an example it is
> >  			 * a first allocation (early boot up) when we have "one"
> >  			 * big free space that has to be split.
> > +			 *
> > +			 * Also we can hit this path in case of regular "vmap"
> > +			 * allocations, if "this" current CPU was not preloaded.
> > +			 * See the comment in alloc_vmap_area() why. If so, then
> > +			 * GFP_NOWAIT is used instead to get an extra object for
> > +			 * split purpose. That is rare and most time does not
> > +			 * occur.
> > +			 *
> > +			 * What happens if an allocation gets failed. Basically,
> > +			 * an "overflow" path is triggered to purge lazily freed
> > +			 * areas to free some memory, then, the "retry" path is
> > +			 * triggered to repeat one more time. See more details
> > +			 * in alloc_vmap_area() function.
> >  			 */
> >  			lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
> 
> This doesn't seem to have anything to do with the patch. Have you
> considered to make it a patch on its own? Btw. I find this comment very
> useful!
> 
Makes sense, will make it as separate patch.

> >  			if (!lva)
> > @@ -1078,31 +1091,34 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> >  
> >  retry:
> >  	/*
> > -	 * Preload this CPU with one extra vmap_area object to ensure
> > -	 * that we have it available when fit type of free area is
> > -	 * NE_FIT_TYPE.
> > +	 * Preload this CPU with one extra vmap_area object. It is used
> > +	 * when fit type of free area is NE_FIT_TYPE. Please note, it
> > +	 * does not guarantee that an allocation occurs on a CPU that
> > +	 * is preloaded, instead we minimize the case when it is not.
> > +	 * It can happen because of migration, because there is a race
> > +	 * until the below spinlock is taken.
> 
> s@migration@cpu migration@ because migration without on its own is quite
> ambiguous, especially in the MM code where it usually refers to memory.
> 
Thanks, will update it.


Thank you for the comments!

--
Vlad Rezki
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e92ff5f7dd8b..f48cd0711478 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -969,6 +969,19 @@  adjust_va_to_fit_type(struct vmap_area *va,
 			 * There are a few exceptions though, as an example it is
 			 * a first allocation (early boot up) when we have "one"
 			 * big free space that has to be split.
+			 *
+			 * Also we can hit this path in case of regular "vmap"
+			 * allocations, if "this" current CPU was not preloaded.
+			 * See the comment in alloc_vmap_area() why. If so, then
+			 * GFP_NOWAIT is used instead to get an extra object for
+			 * split purpose. That is rare and most time does not
+			 * occur.
+			 *
+			 * What happens if an allocation gets failed. Basically,
+			 * an "overflow" path is triggered to purge lazily freed
+			 * areas to free some memory, then, the "retry" path is
+			 * triggered to repeat one more time. See more details
+			 * in alloc_vmap_area() function.
 			 */
 			lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
 			if (!lva)
@@ -1078,31 +1091,34 @@  static struct vmap_area *alloc_vmap_area(unsigned long size,
 
 retry:
 	/*
-	 * Preload this CPU with one extra vmap_area object to ensure
-	 * that we have it available when fit type of free area is
-	 * NE_FIT_TYPE.
+	 * Preload this CPU with one extra vmap_area object. It is used
+	 * when fit type of free area is NE_FIT_TYPE. Please note, it
+	 * does not guarantee that an allocation occurs on a CPU that
+	 * is preloaded, instead we minimize the case when it is not.
+	 * It can happen because of migration, because there is a race
+	 * until the below spinlock is taken.
 	 *
 	 * The preload is done in non-atomic context, thus it allows us
 	 * to use more permissive allocation masks to be more stable under
-	 * low memory condition and high memory pressure.
+	 * low memory condition and high memory pressure. In rare case,
+	 * if not preloaded, GFP_NOWAIT is used.
 	 *
-	 * Even if it fails we do not really care about that. Just proceed
-	 * as it is. "overflow" path will refill the cache we allocate from.
+	 * Set "pva" to NULL here, because of "retry" path.
 	 */
-	preempt_disable();
-	if (!__this_cpu_read(ne_fit_preload_node)) {
-		preempt_enable();
-		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
-		preempt_disable();
+	pva = NULL;
 
-		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
-			if (pva)
-				kmem_cache_free(vmap_area_cachep, pva);
-		}
-	}
+	if (!this_cpu_read(ne_fit_preload_node))
+		/*
+		 * Even if it fails we do not really care about that.
+		 * Just proceed as it is. If needed "overflow" path
+		 * will refill the cache we allocate from.
+		 */
+		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
 
 	spin_lock(&vmap_area_lock);
-	preempt_enable();
+
+	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
+		kmem_cache_free(vmap_area_cachep, pva);
 
 	/*
 	 * If an allocation fails, the "vend" address is