diff mbox

[04/12] mm: oom_kill: remove unnecessary locking in exit_oom_victim()

Message ID 1427264236-17249-5-git-send-email-hannes@cmpxchg.org (mailing list archive)
State New, archived
Headers show

Commit Message

Johannes Weiner March 25, 2015, 6:17 a.m. UTC
Disabling the OOM killer needs to exclude allocators from entering,
not existing victims from exiting.

Right now the only waiter is suspend code, which achieves quiescence
by disabling the OOM killer.  But later on we want to add waits that
hold the lock instead to stop new victims from showing up.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/oom_kill.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Michal Hocko March 26, 2015, 12:53 p.m. UTC | #1
On Wed 25-03-15 02:17:08, Johannes Weiner wrote:
> Disabling the OOM killer needs to exclude allocators from entering,
> not existing victims from exiting.

The idea was that exit_oom_victim doesn't miss a waiter.

exit_oom_victim is doing
	atomic_dec_return(&oom_victims) && oom_killer_disabled)

so there is a full (implicit) memory barrier befor oom_killer_disabled
check. The other part is trickier. oom_killer_disable does:
	oom_killer_disabled = true;
        up_write(&oom_sem);

        wait_event(oom_victims_wait, !atomic_read(&oom_victims));

up_write doesn't guarantee a full memory barrier AFAICS in
Documentation/memory-barriers.txt (although the generic and x86
implementations seem to implement it as a full barrier) but wait_event
implies the full memory barrier (prepare_to_wait_event does spin
lock&unlock) before checking the condition in the slow path. This should
be sufficient and docummented...

	/*
	 * We do not need to hold oom_sem here because oom_killer_disable
	 * guarantees that oom_killer_disabled chage is visible before
	 * the waiter is put into sleep (prepare_to_wait_event) so
	 * we cannot miss a wake up.
	 */

in unmark_oom_victim()

> Right now the only waiter is suspend code, which achieves quiescence
> by disabling the OOM killer.  But later on we want to add waits that
> hold the lock instead to stop new victims from showing up.

It is not entirely clear what you mean by this from the current context.
exit_oom_victim is not called from any context which would be locked by
any OOM internals so it should be safe to use the locking.

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

I have nothing against the change as it seems correct but it would be
good to get a better clarification and also document the implicit memory
barriers.

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/oom_kill.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4b9547be9170..88aa9ba40fa5 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -437,14 +437,12 @@ void exit_oom_victim(void)
>  {
>  	clear_thread_flag(TIF_MEMDIE);
>  
> -	down_read(&oom_sem);
>  	/*
>  	 * There is no need to signal the lasst oom_victim if there
>  	 * is nobody who cares.
>  	 */
>  	if (!atomic_dec_return(&oom_victims) && oom_killer_disabled)
>  		wake_up_all(&oom_victims_wait);
> -	up_read(&oom_sem);
>  }
>  
>  /**
> -- 
> 2.3.3
>
Michal Hocko March 26, 2015, 1:01 p.m. UTC | #2
On Thu 26-03-15 13:53:48, Michal Hocko wrote:
> On Wed 25-03-15 02:17:08, Johannes Weiner wrote:
> > Disabling the OOM killer needs to exclude allocators from entering,
> > not existing victims from exiting.
> 
> The idea was that exit_oom_victim doesn't miss a waiter.
> 
> exit_oom_victim is doing
> 	atomic_dec_return(&oom_victims) && oom_killer_disabled)
> 
> so there is a full (implicit) memory barrier befor oom_killer_disabled
> check. The other part is trickier. oom_killer_disable does:
> 	oom_killer_disabled = true;
>         up_write(&oom_sem);
> 
>         wait_event(oom_victims_wait, !atomic_read(&oom_victims));
> 
> up_write doesn't guarantee a full memory barrier AFAICS in
> Documentation/memory-barriers.txt (although the generic and x86
> implementations seem to implement it as a full barrier) but wait_event
> implies the full memory barrier (prepare_to_wait_event does spin
> lock&unlock) before checking the condition in the slow path. This should
> be sufficient and docummented...
> 
> 	/*
> 	 * We do not need to hold oom_sem here because oom_killer_disable
> 	 * guarantees that oom_killer_disabled chage is visible before
> 	 * the waiter is put into sleep (prepare_to_wait_event) so
> 	 * we cannot miss a wake up.
> 	 */
> 
> in unmark_oom_victim()

OK, I can see that the next patch removes oom_killer_disabled
completely. The dependency won't be there and so the concerns about the
memory barriers.

Is there any reason why the ordering is done this way? It would sound
more logical to me.
Johannes Weiner March 26, 2015, 3:04 p.m. UTC | #3
On Thu, Mar 26, 2015 at 01:53:48PM +0100, Michal Hocko wrote:
> On Wed 25-03-15 02:17:08, Johannes Weiner wrote:
> > Right now the only waiter is suspend code, which achieves quiescence
> > by disabling the OOM killer.  But later on we want to add waits that
> > hold the lock instead to stop new victims from showing up.
> 
> It is not entirely clear what you mean by this from the current context.
> exit_oom_victim is not called from any context which would be locked by
> any OOM internals so it should be safe to use the locking.

A later patch will add another wait_event() to wait for oom victims to
drop to zero.  But that new consumer won't be disabling the OOM killer
to prevent new victims from showing up, it will just hold the lock to
exclude OOM kills.  So the exiting victims shouldn't get stuck on that
lock which the guy that is waiting for them is holding.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Weiner March 26, 2015, 3:10 p.m. UTC | #4
On Thu, Mar 26, 2015 at 02:01:06PM +0100, Michal Hocko wrote:
> On Thu 26-03-15 13:53:48, Michal Hocko wrote:
> > On Wed 25-03-15 02:17:08, Johannes Weiner wrote:
> > > Disabling the OOM killer needs to exclude allocators from entering,
> > > not existing victims from exiting.
> > 
> > The idea was that exit_oom_victim doesn't miss a waiter.
> > 
> > exit_oom_victim is doing
> > 	atomic_dec_return(&oom_victims) && oom_killer_disabled)
> > 
> > so there is a full (implicit) memory barrier befor oom_killer_disabled
> > check. The other part is trickier. oom_killer_disable does:
> > 	oom_killer_disabled = true;
> >         up_write(&oom_sem);
> > 
> >         wait_event(oom_victims_wait, !atomic_read(&oom_victims));
> > 
> > up_write doesn't guarantee a full memory barrier AFAICS in
> > Documentation/memory-barriers.txt (although the generic and x86
> > implementations seem to implement it as a full barrier) but wait_event
> > implies the full memory barrier (prepare_to_wait_event does spin
> > lock&unlock) before checking the condition in the slow path. This should
> > be sufficient and docummented...
> > 
> > 	/*
> > 	 * We do not need to hold oom_sem here because oom_killer_disable
> > 	 * guarantees that oom_killer_disabled chage is visible before
> > 	 * the waiter is put into sleep (prepare_to_wait_event) so
> > 	 * we cannot miss a wake up.
> > 	 */
> > 
> > in unmark_oom_victim()
> 
> OK, I can see that the next patch removes oom_killer_disabled
> completely. The dependency won't be there and so the concerns about the
> memory barriers.
> 
> Is there any reason why the ordering is done this way? It would sound
> more logical to me.

I honestly didn't even think about the dependency between the lock and
this check.  They both looked unnecessary to me and I stopped putting
any more thought into it once I had convinced myself that they are.

The order was chosen because the waitqueue generalization seemed like
a bigger deal.  One is just an unnecessary lock, but this extra check
cost me quite some time debugging and seems like a much more harmful
piece of code to fix.  It's no problem to reorder the patches, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4b9547be9170..88aa9ba40fa5 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -437,14 +437,12 @@  void exit_oom_victim(void)
 {
 	clear_thread_flag(TIF_MEMDIE);
 
-	down_read(&oom_sem);
 	/*
 	 * There is no need to signal the lasst oom_victim if there
 	 * is nobody who cares.
 	 */
 	if (!atomic_dec_return(&oom_victims) && oom_killer_disabled)
 		wake_up_all(&oom_victims_wait);
-	up_read(&oom_sem);
 }
 
 /**