diff mbox series

[RFC,3/3] exit: Check for MMF_OOM_SKIP in exit_mmap

Message ID 20220421190533.1601879-4-npache@redhat.com (mailing list archive)
State New
Headers show
Series Slight improvements for OOM/Futex | expand

Commit Message

Nico Pache April 21, 2022, 7:05 p.m. UTC
The MMF_OOM_SKIP bit is used to indicate weather a mm_struct can not be
invalided or has already been invalided. exit_mmap currently calls
__oom_reap_task_mm unconditionally despite the fact that the oom reaper
may have already called this.

Add a check for the MMF_OOM_SKIP bit being set in exit_mmap to avoid
unnessary calls to the invalidate code.

A slight race can occur on the MMF_OOM_SKIP bit that will still allow
this to run twice. My testing has shown an ~66% decrease in double calls
to _oom_reap_task_mm.

Fixes: 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/mmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Michal Hocko April 22, 2022, 3:38 p.m. UTC | #1
On Thu 21-04-22 15:05:33, Nico Pache wrote:
> The MMF_OOM_SKIP bit is used to indicate weather a mm_struct can not be
> invalided or has already been invalided. exit_mmap currently calls
> __oom_reap_task_mm unconditionally despite the fact that the oom reaper
> may have already called this.
> 
> Add a check for the MMF_OOM_SKIP bit being set in exit_mmap to avoid
> unnessary calls to the invalidate code.

Why do we care about this?
 
> A slight race can occur on the MMF_OOM_SKIP bit that will still allow
> this to run twice. My testing has shown an ~66% decrease in double calls
> to _oom_reap_task_mm.
> 
> Fixes: 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")

I do not see this would be fixing anything.

> Cc: David Rientjes <rientjes@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/mmap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index a2968669fd4e..b867f408dacd 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3113,7 +3113,8 @@ void exit_mmap(struct mm_struct *mm)
>  	/* mm's last user has gone, and its about to be pulled down */
>  	mmu_notifier_release(mm);
>  
> -	if (unlikely(mm_is_oom_victim(mm))) {
> +	if (unlikely(mm_is_oom_victim(mm)) &&
> +			!test_bit(MMF_OOM_SKIP, &mm->flags)) {
>  		/*
>  		 * Manually reap the mm to free as much memory as possible.
>  		 * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
> -- 
> 2.35.1
Nico Pache April 25, 2022, 7 p.m. UTC | #2
On 4/22/22 11:38, Michal Hocko wrote:
> On Thu 21-04-22 15:05:33, Nico Pache wrote:
>> The MMF_OOM_SKIP bit is used to indicate weather a mm_struct can not be
>> invalided or has already been invalided. exit_mmap currently calls
>> __oom_reap_task_mm unconditionally despite the fact that the oom reaper
>> may have already called this.
>>
>> Add a check for the MMF_OOM_SKIP bit being set in exit_mmap to avoid
>> unnessary calls to the invalidate code.
> 
> Why do we care about this?
Is there no cost to the MMU/TLB invalidation? The MMU notifier contains a lock
too so perhaps we can also avoids some unnecessary MMU notifier lock contention.
>  
>> A slight race can occur on the MMF_OOM_SKIP bit that will still allow
>> this to run twice. My testing has shown an ~66% decrease in double calls
>> to _oom_reap_task_mm.
>>
>> Fixes: 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
> 
> I do not see this would be fixing anything.
Ok im just trying to make sure we are keeping an eye on what introduced this
double call. Davids commit above is what introduced the oom_reap_task_mm in the
exit_mmap code. It goes along with some other changes that I dont fully
understand without more studying, so that's why I was hoping he could provide
some input around that CVE (the main thing im concerned about re-introducing).
> 
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: Nico Pache <npache@redhat.com>
>> ---
>>  mm/mmap.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index a2968669fd4e..b867f408dacd 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -3113,7 +3113,8 @@ void exit_mmap(struct mm_struct *mm)
>>  	/* mm's last user has gone, and its about to be pulled down */
>>  	mmu_notifier_release(mm);
>>  
>> -	if (unlikely(mm_is_oom_victim(mm))) {
>> +	if (unlikely(mm_is_oom_victim(mm)) &&
>> +			!test_bit(MMF_OOM_SKIP, &mm->flags)) {
>>  		/*
>>  		 * Manually reap the mm to free as much memory as possible.
>>  		 * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
>> -- 
>> 2.35.1
>
Michal Hocko April 26, 2022, 6:59 a.m. UTC | #3
On Mon 25-04-22 15:00:24, Nico Pache wrote:
> 
> 
> On 4/22/22 11:38, Michal Hocko wrote:
> > On Thu 21-04-22 15:05:33, Nico Pache wrote:
> >> The MMF_OOM_SKIP bit is used to indicate weather a mm_struct can not be
> >> invalided or has already been invalided. exit_mmap currently calls
> >> __oom_reap_task_mm unconditionally despite the fact that the oom reaper
> >> may have already called this.
> >>
> >> Add a check for the MMF_OOM_SKIP bit being set in exit_mmap to avoid
> >> unnessary calls to the invalidate code.
> > 
> > Why do we care about this?
> Is there no cost to the MMU/TLB invalidation? The MMU notifier contains a lock
> too so perhaps we can also avoids some unnecessary MMU notifier lock contention.

I am pretty sure that this area can be micro optimized but I do not
really see a strong reason for that. OOM victims are/should be really
rare so I do not think that any performance optimization would be really
visible.

If you want to improve the code then I think a much better plan would be
to get rid of the whole oom special case altogether. This might be much
closer than ever after Hugh's recent m{un}lock changes. I didn't have
time to think that through though. I believe Suren Baghdasaryan has been
looking into that as well.
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index a2968669fd4e..b867f408dacd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3113,7 +3113,8 @@  void exit_mmap(struct mm_struct *mm)
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
 
-	if (unlikely(mm_is_oom_victim(mm))) {
+	if (unlikely(mm_is_oom_victim(mm)) &&
+			!test_bit(MMF_OOM_SKIP, &mm->flags)) {
 		/*
 		 * Manually reap the mm to free as much memory as possible.
 		 * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard