diff mbox series

mm: vmalloc: Use the vmap_area_lock to protect ne_fit_preload_node

Message ID 20191003090906.1261-1-dwagner@suse.de (mailing list archive)
State New, archived
Headers show
Series mm: vmalloc: Use the vmap_area_lock to protect ne_fit_preload_node | expand

Commit Message

Daniel Wagner Oct. 3, 2019, 9:09 a.m. UTC
Replace preempt_enable() and preempt_disable() with the vmap_area_lock
spin_lock instead. Calling spin_lock() with preempt disabled is
illegal for -rt. Furthermore, enabling preemption inside the
spin_lock() doesn't really make sense.

Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for
split purpose")
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 mm/vmalloc.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Uladzislau Rezki Oct. 3, 2019, 11:55 a.m. UTC | #1
On Thu, Oct 03, 2019 at 11:09:06AM +0200, Daniel Wagner wrote:
> Replace preempt_enable() and preempt_disable() with the vmap_area_lock
> spin_lock instead. Calling spin_lock() with preempt disabled is
> illegal for -rt. Furthermore, enabling preemption inside the
> spin_lock() doesn't really make sense.
> 
> Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for
> split purpose")
> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  mm/vmalloc.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 08c134aa7ff3..0d1175673583 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1091,11 +1091,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	 * Even if it fails we do not really care about that. Just proceed
>  	 * as it is. "overflow" path will refill the cache we allocate from.
>  	 */
> -	preempt_disable();
> +	spin_lock(&vmap_area_lock);
>  	if (!__this_cpu_read(ne_fit_preload_node)) {
> -		preempt_enable();
> +		spin_unlock(&vmap_area_lock);
>  		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
> -		preempt_disable();
> +		spin_lock(&vmap_area_lock);
>  
>  		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
>  			if (pva)
> @@ -1103,9 +1103,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  		}
>  	}
>  
> -	spin_lock(&vmap_area_lock);
> -	preempt_enable();
> -
>  	/*
>  	 * If an allocation fails, the "vend" address is
>  	 * returned. Therefore trigger the overflow path.
> -- 
> 2.16.4
> 
Some background. The idea was to avoid of touching several times
vmap_area_lock, therefore there are preempt_disable()/preempt_enable()
instead, in order to stay on the same CPU.

When it comes to PREEMPT_RT it is a problem, so

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Vlad Rezki
Sebastian Andrzej Siewior Oct. 4, 2019, 3:37 p.m. UTC | #2
If you post something that is related to PREEMPT_RT please keep tglx and
me in Cc.

On 2019-10-03 11:09:06 [+0200], Daniel Wagner wrote:
> Replace preempt_enable() and preempt_disable() with the vmap_area_lock
> spin_lock instead. Calling spin_lock() with preempt disabled is
> illegal for -rt. Furthermore, enabling preemption inside the
> spin_lock() doesn't really make sense.

This spin_lock will cause all CPUs to block on it while the
preempt_disable() does not have this limitation.
I added a migrate_disable() in my -next tree. Looking at it again, I
have reasonable doubt that this preempt_disable() is needed.

> Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for
> split purpose")
> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  mm/vmalloc.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 08c134aa7ff3..0d1175673583 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1091,11 +1091,11 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	 * Even if it fails we do not really care about that. Just proceed
>  	 * as it is. "overflow" path will refill the cache we allocate from.
>  	 */
> -	preempt_disable();
> +	spin_lock(&vmap_area_lock);
>  	if (!__this_cpu_read(ne_fit_preload_node)) {
> -		preempt_enable();
> +		spin_unlock(&vmap_area_lock);
>  		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
> -		preempt_disable();
> +		spin_lock(&vmap_area_lock);
>  
>  		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
>  			if (pva)
> @@ -1103,9 +1103,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  		}
>  	}
>  
> -	spin_lock(&vmap_area_lock);
> -	preempt_enable();
> -
>  	/*
>  	 * If an allocation fails, the "vend" address is
>  	 * returned. Therefore trigger the overflow path.
> -- 

Sebastian
Uladzislau Rezki Oct. 4, 2019, 4:20 p.m. UTC | #3
On Fri, Oct 04, 2019 at 05:37:28PM +0200, Sebastian Andrzej Siewior wrote:
> If you post something that is related to PREEMPT_RT please keep tglx and
> me in Cc.
> 
> On 2019-10-03 11:09:06 [+0200], Daniel Wagner wrote:
> > Replace preempt_enable() and preempt_disable() with the vmap_area_lock
> > spin_lock instead. Calling spin_lock() with preempt disabled is
> > illegal for -rt. Furthermore, enabling preemption inside the
> 
> Looking at it again, I have reasonable doubt that this
> preempt_disable() is needed.
> 
The intention was to preload a current CPU with one extra object in
non-atomic context, thus to use GFP_KERNEL permissive parameters. I.e.
that allows us to avoid any allocation(if we stay on the same CPU)
when we are in atomic context do splitting.

If we have migrate_disable/enable, then, i think preempt_enable/disable
should be replaced by it and not the way how it has been proposed
in the patch.

--
Vlad Rezki
Sebastian Andrzej Siewior Oct. 4, 2019, 4:30 p.m. UTC | #4
On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote:
> On Fri, Oct 04, 2019 at 05:37:28PM +0200, Sebastian Andrzej Siewior wrote:
> > If you post something that is related to PREEMPT_RT please keep tglx and
> > me in Cc.
> > 
> > On 2019-10-03 11:09:06 [+0200], Daniel Wagner wrote:
> > > Replace preempt_enable() and preempt_disable() with the vmap_area_lock
> > > spin_lock instead. Calling spin_lock() with preempt disabled is
> > > illegal for -rt. Furthermore, enabling preemption inside the
> > 
> > Looking at it again, I have reasonable doubt that this
> > preempt_disable() is needed.
> > 
> The intention was to preload a current CPU with one extra object in
> non-atomic context, thus to use GFP_KERNEL permissive parameters. I.e.
> that allows us to avoid any allocation(if we stay on the same CPU)
> when we are in atomic context do splitting.

You could have been migrated to another CPU before the first
preempt_disable(). You could have been migrated to another CPU while
memory has been allocated.
I don't really see the point of that preempt_disable() besides keeping
debug code quiet.

> If we have migrate_disable/enable, then, i think preempt_enable/disable
> should be replaced by it and not the way how it has been proposed
> in the patch.

I don't think this patch is appropriate for upstream.

Sebastian
Uladzislau Rezki Oct. 4, 2019, 5:04 p.m. UTC | #5
>
> You could have been migrated to another CPU while
> memory has been allocated.
> 
That is true that we can migrate since we allow preemption
when allocate. But it does not really matter on which CPU an
allocation occurs and whether we migrate or not.

If we land on another CPU or still stay on the same, we will
check anyway one more time if it(another/same CPU) is preloaded
or not:

<snip>
    preempt_disable();
    if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)
<snip>

if another, we can free the object allocated on previous step if
it already has it. If another CPU does not have it, save it in 
ne_fit_preload_node for another current CPU to reuse later. Further
we can not migrate because of:

<snip>
    spin_lock(&vmap_area_lock);
    preempt_enable();
<snip>

--
Vlad Rezki
Sebastian Andrzej Siewior Oct. 4, 2019, 5:45 p.m. UTC | #6
On 2019-10-04 19:04:11 [+0200], Uladzislau Rezki wrote:
> if another, we can free the object allocated on previous step if
> it already has it. If another CPU does not have it, save it in 
> ne_fit_preload_node for another current CPU to reuse later. Further
> we can not migrate because of:
> 
> <snip>
>     spin_lock(&vmap_area_lock);
>     preempt_enable();
> <snip>

ah right. So you keep the lock later on, I somehow missed that part.
That way it actually makes sense.

Sebastian
Daniel Wagner Oct. 7, 2019, 8:27 a.m. UTC | #7
On Fri, Oct 04, 2019 at 05:37:28PM +0200, Sebastian Andrzej Siewior wrote:
> If you post something that is related to PREEMPT_RT please keep tglx and
> me in Cc.

Sure, just forgot to add it this time. My email setup needs a bit more
love. Sorry.
Daniel Wagner Oct. 7, 2019, 8:30 a.m. UTC | #8
Hi,

On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote:
> > If we have migrate_disable/enable, then, i think preempt_enable/disable
> > should be replaced by it and not the way how it has been proposed
> > in the patch.
> 
> I don't think this patch is appropriate for upstream.

Yes, I agree. The discussion made this clear, this is only for -rt
trees. Initially I though this should be in mainline too.

Thanks,
Daniel
Sebastian Andrzej Siewior Oct. 7, 2019, 10:56 a.m. UTC | #9
On 2019-10-07 10:30:37 [+0200], Daniel Wagner wrote:
> Hi,
Hi Daniel,

> On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote:
> > > If we have migrate_disable/enable, then, i think preempt_enable/disable
> > > should be replaced by it and not the way how it has been proposed
> > > in the patch.
> > 
> > I don't think this patch is appropriate for upstream.
> 
> Yes, I agree. The discussion made this clear, this is only for -rt
> trees. Initially I though this should be in mainline too.

Sorry, this was _before_ Uladzislau pointed out that you *just* moved
the lock that was there from the beginning. I missed that while looking
over the patch. Based on that I don't think that this patch is not
appropriate for upstream.

> Thanks,
> Daniel

Sebastian
Uladzislau Rezki Oct. 7, 2019, 4:23 p.m. UTC | #10
Hello, Daniel, Sebastian.

> > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote:
> > > > If we have migrate_disable/enable, then, i think preempt_enable/disable
> > > > should be replaced by it and not the way how it has been proposed
> > > > in the patch.
> > > 
> > > I don't think this patch is appropriate for upstream.
> > 
> > Yes, I agree. The discussion made this clear, this is only for -rt
> > trees. Initially I though this should be in mainline too.
> 
> Sorry, this was _before_ Uladzislau pointed out that you *just* moved
> the lock that was there from the beginning. I missed that while looking
> over the patch. Based on that I don't think that this patch is not
> appropriate for upstream.
> 
Yes that is a bit messy :) Then i do not see what that patch fixes in
mainline? Instead it will just add an extra blocking, i did not want that
therefore used preempt_enable/disable. But, when i saw this patch i got it
as a preparation of PREEMPT_RT merging work.

--
Vlad Rezki
Daniel Wagner Oct. 7, 2019, 4:34 p.m. UTC | #11
On Mon, Oct 07, 2019 at 06:23:30PM +0200, Uladzislau Rezki wrote:
> Hello, Daniel, Sebastian.
> 
> > > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote:
> > > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote:
> > > > > If we have migrate_disable/enable, then, i think preempt_enable/disable
> > > > > should be replaced by it and not the way how it has been proposed
> > > > > in the patch.
> > > > 
> > > > I don't think this patch is appropriate for upstream.
> > > 
> > > Yes, I agree. The discussion made this clear, this is only for -rt
> > > trees. Initially I though this should be in mainline too.
> > 
> > Sorry, this was _before_ Uladzislau pointed out that you *just* moved
> > the lock that was there from the beginning. I missed that while looking
> > over the patch. Based on that I don't think that this patch is not
> > appropriate for upstream.
> > 
> Yes that is a bit messy :) Then i do not see what that patch fixes in
> mainline? Instead it will just add an extra blocking, i did not want that
> therefore used preempt_enable/disable. But, when i saw this patch i got it
> as a preparation of PREEMPT_RT merging work.

Maybe I should add some background info here as well. Currently, I am
creating an -rt tree on v5.3 for which I need this patch (or a
migrate_disable() version of it). So this is slightly independent of
the work Sebiastian is doing. Though the mainline effort of PREEMPT_RT
will hit this problem as well.

I understood Sebiastian wrong above. I thought he suggest to use the
migrate_disable() approach even for mainline. 

I supppose, one thing which would help in this discussion, is what do
you gain by using preempt_disable() instead of moving the lock up?
Do you have performance numbers which could justify the code?
Uladzislau Rezki Oct. 7, 2019, 4:56 p.m. UTC | #12
On Mon, Oct 07, 2019 at 06:34:43PM +0200, Daniel Wagner wrote:
> On Mon, Oct 07, 2019 at 06:23:30PM +0200, Uladzislau Rezki wrote:
> > Hello, Daniel, Sebastian.
> > 
> > > > On Fri, Oct 04, 2019 at 06:30:42PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > On 2019-10-04 18:20:41 [+0200], Uladzislau Rezki wrote:
> > > > > > If we have migrate_disable/enable, then, i think preempt_enable/disable
> > > > > > should be replaced by it and not the way how it has been proposed
> > > > > > in the patch.
> > > > > 
> > > > > I don't think this patch is appropriate for upstream.
> > > > 
> > > > Yes, I agree. The discussion made this clear, this is only for -rt
> > > > trees. Initially I though this should be in mainline too.
> > > 
> > > Sorry, this was _before_ Uladzislau pointed out that you *just* moved
> > > the lock that was there from the beginning. I missed that while looking
> > > over the patch. Based on that I don't think that this patch is not
> > > appropriate for upstream.
> > > 
> > Yes that is a bit messy :) Then i do not see what that patch fixes in
> > mainline? Instead it will just add an extra blocking, i did not want that
> > therefore used preempt_enable/disable. But, when i saw this patch i got it
> > as a preparation of PREEMPT_RT merging work.
> 
> Maybe I should add some background info here as well. Currently, I am
> creating an -rt tree on v5.3 for which I need this patch (or a
> migrate_disable() version of it). So this is slightly independent of
> the work Sebiastian is doing. Though the mainline effort of PREEMPT_RT
> will hit this problem as well.
> 
> I understood Sebiastian wrong above. I thought he suggest to use the
> migrate_disable() approach even for mainline. 
> 
> I supppose, one thing which would help in this discussion, is what do
> you gain by using preempt_disable() instead of moving the lock up?
> Do you have performance numbers which could justify the code?
>
Actually there is a high lock contention on vmap_area_lock, because it
is still global. You can have a look at last slide:

https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf

so this change will make it a bit higher. From the other hand i agree
that for rt it should be fixed, probably it could be done like:

ifdef PREEMPT_RT
    migrate_disable()
#else
    preempt_disable()
...

but i am not sure it is good either.

--
Vlad Rezki
Daniel Wagner Oct. 7, 2019, 5:22 p.m. UTC | #13
On Mon, Oct 07, 2019 at 06:56:11PM +0200, Uladzislau Rezki wrote:
> On Mon, Oct 07, 2019 at 06:34:43PM +0200, Daniel Wagner wrote:
> > I supppose, one thing which would help in this discussion, is what do
> > you gain by using preempt_disable() instead of moving the lock up?
> > Do you have performance numbers which could justify the code?
> >
> Actually there is a high lock contention on vmap_area_lock, because it
> is still global. You can have a look at last slide:
> 
> https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> 
> so this change will make it a bit higher.

Thanks! I suspected something like this :(

On the todo-list page you stating that vmap_area_lock could be
splitted and therefore reduce the contention. If you could avoid those
preempt_disable() tricks and just use plain spin_locks() to protect it
would be really helpful.

> From the other hand i agree
> that for rt it should be fixed, probably it could be done like:
> 
> ifdef PREEMPT_RT
>     migrate_disable()
> #else
>     preempt_disable()
> ...
> 
> but i am not sure it is good either.

I don't think this way to go. I guess Sebastian and Thomas have a
better idea how to address this for PREEMPT_RT.
Sebastian Andrzej Siewior Oct. 7, 2019, 5:36 p.m. UTC | #14
On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote:
> Actually there is a high lock contention on vmap_area_lock, because it
> is still global. You can have a look at last slide:
> 
> https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> 
> so this change will make it a bit higher. From the other hand i agree
> that for rt it should be fixed, probably it could be done like:
> 
> ifdef PREEMPT_RT
>     migrate_disable()
> #else
>     preempt_disable()
> ...
> 
> but i am not sure it is good either.

What is to be expected on average? Is the lock acquired and then
released again because the slot is empty and memory needs to be
allocated or can it be assumed that this hardly happens? 

Sebastian
Uladzislau Rezki Oct. 7, 2019, 9:44 p.m. UTC | #15
On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote:
> > Actually there is a high lock contention on vmap_area_lock, because it
> > is still global. You can have a look at last slide:
> > 
> > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > 
> > so this change will make it a bit higher. From the other hand i agree
> > that for rt it should be fixed, probably it could be done like:
> > 
> > ifdef PREEMPT_RT
> >     migrate_disable()
> > #else
> >     preempt_disable()
> > ...
> > 
> > but i am not sure it is good either.
> 
> What is to be expected on average? Is the lock acquired and then
> released again because the slot is empty and memory needs to be
> allocated or can it be assumed that this hardly happens? 
> 
The lock is not released(we are not allowed), instead we just try
to allocate with GFP_NOWAIT flag. It can happen if preallocation
has been failed with GFP_KERNEL flag earlier:

<snip>
...
 } else if (type == NE_FIT_TYPE) {
  /*
   * Split no edge of fit VA.
   *
   *     |       |
   *   L V  NVA  V R
   * |---|-------|---|
   */
  lva = __this_cpu_xchg(ne_fit_preload_node, NULL);
  if (unlikely(!lva)) {
      ...
      lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
      ...
  }
...
<snip>

How often we need an extra object for split purpose, the answer
is it depends on. For example fork() path falls to that pattern.

I think we can assume that migration can hardly ever happen and
that should be considered as rare case. Thus we can do a prealoading
without worrying much if a it occurs:

<snip>
urezki@pc636:~/data/ssd/coding/linux-stable$ git diff
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e92ff5f7dd8b..bc782edcd1fd 100644 
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
         * Even if it fails we do not really care about that. Just proceed
         * as it is. "overflow" path will refill the cache we allocate from.
         */
-       preempt_disable();
-       if (!__this_cpu_read(ne_fit_preload_node)) {
-               preempt_enable();
+       if (!this_cpu_read(ne_fit_preload_node)) {
                pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
-               preempt_disable();

-               if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
+               if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
                        if (pva)
                                kmem_cache_free(vmap_area_cachep, pva);
                }
        }
 
        spin_lock(&vmap_area_lock);
-       preempt_enable();

        /*
         * If an allocation fails, the "vend" address is
urezki@pc636:~/data/ssd/coding/linux-stable$
<snip>

so, we do not guarantee, instead we minimize number of allocations
with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to
even trigger the case when CPU is not preloaded.

I can test it tomorrow on my 12xCPUs to see its behavior there.

--
Vlad Rezki
Uladzislau Rezki Oct. 8, 2019, 4:04 p.m. UTC | #16
On Mon, Oct 07, 2019 at 11:44:20PM +0200, Uladzislau Rezki wrote:
> On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote:
> > > Actually there is a high lock contention on vmap_area_lock, because it
> > > is still global. You can have a look at last slide:
> > > 
> > > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > > 
> > > so this change will make it a bit higher. From the other hand i agree
> > > that for rt it should be fixed, probably it could be done like:
> > > 
> > > ifdef PREEMPT_RT
> > >     migrate_disable()
> > > #else
> > >     preempt_disable()
> > > ...
> > > 
> > > but i am not sure it is good either.
> > 
> > What is to be expected on average? Is the lock acquired and then
> > released again because the slot is empty and memory needs to be
> > allocated or can it be assumed that this hardly happens? 
> > 
> The lock is not released(we are not allowed), instead we just try
> to allocate with GFP_NOWAIT flag. It can happen if preallocation
> has been failed with GFP_KERNEL flag earlier:
> 
> <snip>
> ...
>  } else if (type == NE_FIT_TYPE) {
>   /*
>    * Split no edge of fit VA.
>    *
>    *     |       |
>    *   L V  NVA  V R
>    * |---|-------|---|
>    */
>   lva = __this_cpu_xchg(ne_fit_preload_node, NULL);
>   if (unlikely(!lva)) {
>       ...
>       lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
>       ...
>   }
> ...
> <snip>
> 
> How often we need an extra object for split purpose, the answer
> is it depends on. For example fork() path falls to that pattern.
> 
> I think we can assume that migration can hardly ever happen and
> that should be considered as rare case. Thus we can do a prealoading
> without worrying much if a it occurs:
> 
> <snip>
> urezki@pc636:~/data/ssd/coding/linux-stable$ git diff
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e92ff5f7dd8b..bc782edcd1fd 100644 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>          * Even if it fails we do not really care about that. Just proceed
>          * as it is. "overflow" path will refill the cache we allocate from.
>          */
> -       preempt_disable();
> -       if (!__this_cpu_read(ne_fit_preload_node)) {
> -               preempt_enable();
> +       if (!this_cpu_read(ne_fit_preload_node)) {
>                 pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
> -               preempt_disable();
> 
> -               if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
> +               if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
>                         if (pva)
>                                 kmem_cache_free(vmap_area_cachep, pva);
>                 }
>         }
>  
>         spin_lock(&vmap_area_lock);
> -       preempt_enable();
> 
>         /*
>          * If an allocation fails, the "vend" address is
> urezki@pc636:~/data/ssd/coding/linux-stable$
> <snip>
> 
> so, we do not guarantee, instead we minimize number of allocations
> with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to
> even trigger the case when CPU is not preloaded.
> 
> I can test it tomorrow on my 12xCPUs to see its behavior there.
> 
Tested it on different systems. For example on my 8xCPUs system that
runs PREEMPT kernel i see only few GFP_NOWAIT allocations, i.e. it
happens when we land to another CPU that was not preloaded.

I run the special test case that follows the preload pattern and path.
So 20 "unbind" threads run it and each does 1000000 allocations. As a
result only 3.5 times among 1000000, during splitting, CPU was not
preloaded thus, GFP_NOWAIT was used to obtain an extra object.

It is obvious that slightly modified approach still minimizes allocations
in atomic context, so it can happen but the number is negligible and can
be ignored, i think.

--
Vlad Rezki
Daniel Wagner Oct. 9, 2019, 6:05 a.m. UTC | #17
On Tue, Oct 08, 2019 at 06:04:59PM +0200, Uladzislau Rezki wrote:
> > so, we do not guarantee, instead we minimize number of allocations
> > with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to
> > even trigger the case when CPU is not preloaded.
> > 
> > I can test it tomorrow on my 12xCPUs to see its behavior there.
> > 
> Tested it on different systems. For example on my 8xCPUs system that
> runs PREEMPT kernel i see only few GFP_NOWAIT allocations, i.e. it
> happens when we land to another CPU that was not preloaded.
> 
> I run the special test case that follows the preload pattern and path.
> So 20 "unbind" threads run it and each does 1000000 allocations. As a
> result only 3.5 times among 1000000, during splitting, CPU was not
> preloaded thus, GFP_NOWAIT was used to obtain an extra object.
> 
> It is obvious that slightly modified approach still minimizes allocations
> in atomic context, so it can happen but the number is negligible and can
> be ignored, i think.

Thanks for doing the tests. In this case I would suggest to get rid of
the preempt_disable() micro optimization, since there is almost no
gain in doing so. Do you send a patch? :)

Thanks,
Daniel
Uladzislau Rezki Oct. 9, 2019, 9:47 a.m. UTC | #18
Hello, Daniel.

On Wed, Oct 09, 2019 at 08:05:39AM +0200, Daniel Wagner wrote:
> On Tue, Oct 08, 2019 at 06:04:59PM +0200, Uladzislau Rezki wrote:
> > > so, we do not guarantee, instead we minimize number of allocations
> > > with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to
> > > even trigger the case when CPU is not preloaded.
> > > 
> > > I can test it tomorrow on my 12xCPUs to see its behavior there.
> > > 
> > Tested it on different systems. For example on my 8xCPUs system that
> > runs PREEMPT kernel i see only few GFP_NOWAIT allocations, i.e. it
> > happens when we land to another CPU that was not preloaded.
> > 
> > I run the special test case that follows the preload pattern and path.
> > So 20 "unbind" threads run it and each does 1000000 allocations. As a
> > result only 3.5 times among 1000000, during splitting, CPU was not
> > preloaded thus, GFP_NOWAIT was used to obtain an extra object.
> > 
> > It is obvious that slightly modified approach still minimizes allocations
> > in atomic context, so it can happen but the number is negligible and can
> > be ignored, i think.
> 
> Thanks for doing the tests. In this case I would suggest to get rid of
> the preempt_disable() micro optimization, since there is almost no
> gain in doing so. Do you send a patch? :)
> 
I can do it, for sure, in case you do not mind, since it was your
initiative. Otherwise you can upload v2.

--
Vlad Rezki
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 08c134aa7ff3..0d1175673583 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1091,11 +1091,11 @@  static struct vmap_area *alloc_vmap_area(unsigned long size,
 	 * Even if it fails we do not really care about that. Just proceed
 	 * as it is. "overflow" path will refill the cache we allocate from.
 	 */
-	preempt_disable();
+	spin_lock(&vmap_area_lock);
 	if (!__this_cpu_read(ne_fit_preload_node)) {
-		preempt_enable();
+		spin_unlock(&vmap_area_lock);
 		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
-		preempt_disable();
+		spin_lock(&vmap_area_lock);
 
 		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
 			if (pva)
@@ -1103,9 +1103,6 @@  static struct vmap_area *alloc_vmap_area(unsigned long size,
 		}
 	}
 
-	spin_lock(&vmap_area_lock);
-	preempt_enable();
-
 	/*
 	 * If an allocation fails, the "vend" address is
 	 * returned. Therefore trigger the overflow path.