diff mbox series

mm/vmalloc: Fix unlock order in s_stop()

Message ID 20201213180843.16938-1-longman@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/vmalloc: Fix unlock order in s_stop() | expand

Commit Message

Waiman Long Dec. 13, 2020, 6:08 p.m. UTC
When multiple locks are acquired, they should be released in reverse
order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
case.

  s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
  s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);

This unlock sequence, though allowed, is not optimal. If a waiter is
present, mutex_unlock() will need to go through the slowpath of waking
up the waiter with preemption disabled. Fix that by releasing the
spinlock first before the mutex.

Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/vmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Uladzislau Rezki Dec. 13, 2020, 6:39 p.m. UTC | #1
On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote:
> When multiple locks are acquired, they should be released in reverse
> order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
> case.
> 
>   s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
>   s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);
> 
> This unlock sequence, though allowed, is not optimal. If a waiter is
> present, mutex_unlock() will need to go through the slowpath of waking
> up the waiter with preemption disabled. Fix that by releasing the
> spinlock first before the mutex.
> 
> Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/vmalloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a8b210..75913f685c71 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
>  }
>  
>  static void s_stop(struct seq_file *m, void *p)
> -	__releases(&vmap_purge_lock)
>  	__releases(&vmap_area_lock)
> +	__releases(&vmap_purge_lock)
>  {
> -	mutex_unlock(&vmap_purge_lock);
>  	spin_unlock(&vmap_area_lock);
> +	mutex_unlock(&vmap_purge_lock);
>  }
>  
>  static void show_numa_info(struct seq_file *m, struct vm_struct *v)
BTW, if navigation over both list is an issue, for example when there
are multiple heavy readers of /proc/vmallocinfo, i think, it make sense
to implement RCU safe lists iteration and get rid of both locks.

As for the patch: Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Thanks!

--
Vlad Rezki
Waiman Long Dec. 13, 2020, 7:42 p.m. UTC | #2
On 12/13/20 1:39 PM, Uladzislau Rezki wrote:
> On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote:
>> When multiple locks are acquired, they should be released in reverse
>> order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
>> case.
>>
>>    s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
>>    s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);
>>
>> This unlock sequence, though allowed, is not optimal. If a waiter is
>> present, mutex_unlock() will need to go through the slowpath of waking
>> up the waiter with preemption disabled. Fix that by releasing the
>> spinlock first before the mutex.
>>
>> Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   mm/vmalloc.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 6ae491a8b210..75913f685c71 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
>>   }
>>   
>>   static void s_stop(struct seq_file *m, void *p)
>> -	__releases(&vmap_purge_lock)
>>   	__releases(&vmap_area_lock)
>> +	__releases(&vmap_purge_lock)
>>   {
>> -	mutex_unlock(&vmap_purge_lock);
>>   	spin_unlock(&vmap_area_lock);
>> +	mutex_unlock(&vmap_purge_lock);
>>   }
>>   
>>   static void show_numa_info(struct seq_file *m, struct vm_struct *v)
> BTW, if navigation over both list is an issue, for example when there
> are multiple heavy readers of /proc/vmallocinfo, i think, it make sense
> to implement RCU safe lists iteration and get rid of both locks.

Making it lockless is certainly better, but doing lockless the right way 
is tricky. I will probably keep it as it unless there is a significant 
advantage of doing so.

Cheers,
Longman

>
> As for the patch: Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>
> Thanks!
>
> --
> Vlad Rezki
>
Matthew Wilcox Dec. 13, 2020, 9:51 p.m. UTC | #3
On Sun, Dec 13, 2020 at 07:39:36PM +0100, Uladzislau Rezki wrote:
> On Sun, Dec 13, 2020 at 01:08:43PM -0500, Waiman Long wrote:
> > When multiple locks are acquired, they should be released in reverse
> > order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
> > case.
> > 
> >   s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
> >   s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);
> > 
> > This unlock sequence, though allowed, is not optimal. If a waiter is
> > present, mutex_unlock() will need to go through the slowpath of waking
> > up the waiter with preemption disabled. Fix that by releasing the
> > spinlock first before the mutex.
> > 
> > Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")
> > Signed-off-by: Waiman Long <longman@redhat.com>
> > ---
> >  mm/vmalloc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 6ae491a8b210..75913f685c71 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
> >  }
> >  
> >  static void s_stop(struct seq_file *m, void *p)
> > -	__releases(&vmap_purge_lock)
> >  	__releases(&vmap_area_lock)
> > +	__releases(&vmap_purge_lock)
> >  {
> > -	mutex_unlock(&vmap_purge_lock);
> >  	spin_unlock(&vmap_area_lock);
> > +	mutex_unlock(&vmap_purge_lock);
> >  }
> >  
> >  static void show_numa_info(struct seq_file *m, struct vm_struct *v)
> BTW, if navigation over both list is an issue, for example when there
> are multiple heavy readers of /proc/vmallocinfo, i think, it make sense
> to implement RCU safe lists iteration and get rid of both locks.

If we need to iterate the list efficiently, i'd suggest getting rid of
the list and using an xarray instead.  maybe a maple tree, once that code
is better exercised.
David Hildenbrand Dec. 14, 2020, 9:39 a.m. UTC | #4
On 13.12.20 19:08, Waiman Long wrote:
> When multiple locks are acquired, they should be released in reverse
> order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
> case.
> 
>   s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
>   s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);
> 
> This unlock sequence, though allowed, is not optimal. If a waiter is
> present, mutex_unlock() will need to go through the slowpath of waking
> up the waiter with preemption disabled. Fix that by releasing the
> spinlock first before the mutex.
> 
> Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")

I'm not sure if this classifies as "Fixes". As you correctly state "is
not optimal". But yeah, releasing a spinlock after releasing a mutex
looks weird already.

> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/vmalloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a8b210..75913f685c71 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3448,11 +3448,11 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
>  }
>  
>  static void s_stop(struct seq_file *m, void *p)
> -	__releases(&vmap_purge_lock)
>  	__releases(&vmap_area_lock)
> +	__releases(&vmap_purge_lock)
>  {
> -	mutex_unlock(&vmap_purge_lock);
>  	spin_unlock(&vmap_area_lock);
> +	mutex_unlock(&vmap_purge_lock);
>  }
>  
>  static void show_numa_info(struct seq_file *m, struct vm_struct *v)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>
Waiman Long Dec. 14, 2020, 3:05 p.m. UTC | #5
On 12/14/20 4:39 AM, David Hildenbrand wrote:
> On 13.12.20 19:08, Waiman Long wrote:
>> When multiple locks are acquired, they should be released in reverse
>> order. For s_start() and s_stop() in mm/vmalloc.c, that is not the
>> case.
>>
>>    s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock);
>>    s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock);
>>
>> This unlock sequence, though allowed, is not optimal. If a waiter is
>> present, mutex_unlock() will need to go through the slowpath of waking
>> up the waiter with preemption disabled. Fix that by releasing the
>> spinlock first before the mutex.
>>
>> Fixes: e36176be1c39 ("mm/vmalloc: rework vmap_area_lock")
> I'm not sure if this classifies as "Fixes". As you correctly state "is
> not optimal". But yeah, releasing a spinlock after releasing a mutex
> looks weird already.
>
Yes, it may not be technically a real bug fix. However, the order just 
doesn't look right. That is why I sent out a patch to address that.

Cheers,
Longman
Matthew Wilcox Dec. 14, 2020, 3:37 p.m. UTC | #6
On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote:
> On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote:
> > If we need to iterate the list efficiently, i'd suggest getting rid of
> > the list and using an xarray instead.  maybe a maple tree, once that code
> > is better exercised.
>
> Not really efficiently. We need just a full scan of it propagating the
> information about mapped and un-purged areas to user space applications.
> 
> For example RCU-safe list is what we need, IMHO. From the other hand i
> am not sure if xarray is RCU safe in a context of concurrent removing/adding
> an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX().

It's as RCU safe as an RCU-safe list.  Specifically, it guarantees:

 - If an element is present at all times between the start and the
   end of the iteration, it will appear in the iteration.
 - No element will appear more than once.
 - No element will appear in the iteration that was never present.
 - The iteration will terminate.

If an element is added or removed between the start and end of the
iteration, it may or may not appear.  Causality is not guaranteed (eg
if modification A is made before modification B, modification B may
be reflected in the iteration while modification A is not).
Uladzislau Rezki Dec. 14, 2020, 5:56 p.m. UTC | #7
On Mon, Dec 14, 2020 at 03:37:46PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 14, 2020 at 04:11:28PM +0100, Uladzislau Rezki wrote:
> > On Sun, Dec 13, 2020 at 09:51:34PM +0000, Matthew Wilcox wrote:
> > > If we need to iterate the list efficiently, i'd suggest getting rid of
> > > the list and using an xarray instead.  maybe a maple tree, once that code
> > > is better exercised.
> >
> > Not really efficiently. We need just a full scan of it propagating the
> > information about mapped and un-purged areas to user space applications.
> > 
> > For example RCU-safe list is what we need, IMHO. From the other hand i
> > am not sure if xarray is RCU safe in a context of concurrent removing/adding
> > an element(xa_remove()/xa_insert()) and scanning like xa_for_each_XXX().
> 
> It's as RCU safe as an RCU-safe list.  Specifically, it guarantees:
> 
>  - If an element is present at all times between the start and the
>    end of the iteration, it will appear in the iteration.
>  - No element will appear more than once.
>  - No element will appear in the iteration that was never present.
>  - The iteration will terminate.
> 
> If an element is added or removed between the start and end of the
> iteration, it may or may not appear.  Causality is not guaranteed (eg
> if modification A is made before modification B, modification B may
> be reflected in the iteration while modification A is not).
>
Thank you for information! To make use of xarray it would require a migration
from our current vmap_area_root RB-tree to xaarray. It probably makes sense   
if there are performance benefits of such migration work. Apparently running
the vmalloc benchmark shows a quite big degrade:

# X-array
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1
Run the test with following parameters: run_test_mask=31 single_cpu_test=1
Done.
Check the kernel ring buffer to see the summary.

real    0m18.928s
user    0m0.017s
sys     0m0.004s
urezki@pc638:~$
[   90.103768] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1275773 usec
[   90.103771] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1439371 usec
[   90.103772] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 9138051 usec
[   90.103773] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 4821400 usec
[   90.103774] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 2181207 usec
[   90.103775] All test took CPU0=69774784667 cycles

# RB-tree
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=31 single_cpu_test=1
Run the test with following parameters: run_test_mask=31 single_cpu_test=1
Done.
Check the kernel ring buffer to see the summary.

real    0m13.975s
user    0m0.013s
sys     0m0.010s
urezki@pc638:~$ 
[   26.633372] Summary: fix_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 429836 usec
[   26.633375] Summary: full_fit_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 566042 usec
[   26.633377] Summary: long_busy_list_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 7663974 usec
[   26.633378] Summary: random_size_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 3853388 usec
[   26.633379] Summary: fix_align_alloc_test passed: 1 failed: 0 repeat: 1 loops: 1000000 avg: 1370097 usec
[   26.633380] All test took CPU0=51370095742 cycles

I suspect xa_load() does provide O(log(n)) search time?

--
Vlad Rezki
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a8b210..75913f685c71 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3448,11 +3448,11 @@  static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 }
 
 static void s_stop(struct seq_file *m, void *p)
-	__releases(&vmap_purge_lock)
 	__releases(&vmap_area_lock)
+	__releases(&vmap_purge_lock)
 {
-	mutex_unlock(&vmap_purge_lock);
 	spin_unlock(&vmap_area_lock);
+	mutex_unlock(&vmap_purge_lock);
 }
 
 static void show_numa_info(struct seq_file *m, struct vm_struct *v)