diff mbox series

mm: Mark the OOM reaper thread as freezable

Message ID 20210918233920.9174-1-sultan@kerneltoast.com (mailing list archive)
State New
Headers show
Series mm: Mark the OOM reaper thread as freezable | expand

Commit Message

Sultan Alsawaf (unemployed) Sept. 18, 2021, 11:39 p.m. UTC
From: Sultan Alsawaf <sultan@kerneltoast.com>

The OOM reaper thread uses wait_event_freezable() without actually being
marked as freezable. Fix it by adding a set_freezable() call.

Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
---
 mm/oom_kill.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Andrew Morton Sept. 19, 2021, 10:40 p.m. UTC | #1
On Sat, 18 Sep 2021 16:39:20 -0700 Sultan Alsawaf <sultan@kerneltoast.com> wrote:

> The OOM reaper thread uses wait_event_freezable() without actually being
> marked as freezable. Fix it by adding a set_freezable() call.

Oh.

What are the runtime effects of this defect?
Michal Hocko Sept. 20, 2021, 12:40 p.m. UTC | #2
On Sat 18-09-21 16:39:20, Sultan Alsawaf wrote:
> From: Sultan Alsawaf <sultan@kerneltoast.com>
> 
> The OOM reaper thread uses wait_event_freezable() without actually being
> marked as freezable. Fix it by adding a set_freezable() call.

What is the actual problem you are trying to solve here. Freezer details
are hairy and I have to re-learn them each time again and again but from
what I remember wait_event_freezable doesn't really depend on tyask
being freezable. It tells the freezer that the task is OK to exclude
while it is sleeping and that should be just the case for the oom
reaper. Or am I missing something?
 
> Fixes: aac453635549 ("mm, oom: introduce oom reaper")
> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
> ---
>  mm/oom_kill.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 831340e7ad8b..46a742b57735 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -641,6 +641,8 @@ static void oom_reap_task(struct task_struct *tsk)
>  
>  static int oom_reaper(void *unused)
>  {
> +	set_freezable();
> +
>  	while (true) {
>  		struct task_struct *tsk = NULL;
>  
> -- 
> 2.33.0
Sultan Alsawaf (unemployed) Sept. 20, 2021, 3:55 p.m. UTC | #3
On Mon, Sep 20, 2021 at 02:40:37PM +0200, Michal Hocko wrote:
> What is the actual problem you are trying to solve here.

There isn't any specific problem I'm trying to solve here; simply that, it
appeared as though you intended for the reaper thread to be freezable when it
actually isn't. The OOM killer is disabled after processes are frozen though so
I guess it could be considered a matter of consistency to freeze the reaper
thread too.

Do you remember why you used wait_event_freezable()?

> Freezer details are hairy and I have to re-learn them each time again and
> again but from what I remember wait_event_freezable doesn't really depend on
> tyask being freezable. It tells the freezer that the task is OK to exclude
> while it is sleeping and that should be just the case for the oom reaper. Or
> am I missing something?

The task indeed doesn't need to be freezable, but the rest of what you remember
isn't quite true. It tells the freezer to exclude the task only because the task
will handle entering the freezer on its own. When a task sleeps on
wait_event_freezable(), it will be woken up when system-wide freezing starts,
and then it will try to freeze itself (see freezable_schedule() and
freezer_count()).

If the freezer bits here are undesired then I think wait_event_interruptible()
should be used instead.

Sultan
Michal Hocko Sept. 20, 2021, 5:34 p.m. UTC | #4
On Mon 20-09-21 08:55:15, Sultan Alsawaf wrote:
> On Mon, Sep 20, 2021 at 02:40:37PM +0200, Michal Hocko wrote:
> > What is the actual problem you are trying to solve here.
> 
> There isn't any specific problem I'm trying to solve here; simply that, it
> appeared as though you intended for the reaper thread to be freezable when it
> actually isn't. The OOM killer is disabled after processes are frozen though so
> I guess it could be considered a matter of consistency to freeze the reaper
> thread too.

The intention and the scope of the patch should be in the changelog.
Your Fixes tag suggests there is a problem to fixed.
 
> Do you remember why you used wait_event_freezable()?

My memory has faded but I suspect it was to make sure that the oom
reaper is not blocking the system wide freezing. The operation mode of
the thread is to wait for oom victims and then do the unmapping without
any blocking. While it can be frozen during the operation I do not
remember that causing any problems and the waiting is exactly the point
when that is obviously safe - hence wait_event_freezable which I believe
is the proper API to use.

> > Freezer details are hairy and I have to re-learn them each time again and
> > again but from what I remember wait_event_freezable doesn't really depend on
> > tyask being freezable. It tells the freezer that the task is OK to exclude
> > while it is sleeping and that should be just the case for the oom reaper. Or
> > am I missing something?
> 
> The task indeed doesn't need to be freezable, but the rest of what you remember
> isn't quite true. It tells the freezer to exclude the task only because the task
> will handle entering the freezer on its own. When a task sleeps on
> wait_event_freezable(), it will be woken up when system-wide freezing starts,
> and then it will try to freeze itself (see freezable_schedule() and
> freezer_count()).

Thanks for the clarification.

> If the freezer bits here are undesired then I think wait_event_interruptible()
> should be used instead.

I am not saying it is undesired. The crux is to be clear in the
reasoning.
Sultan Alsawaf (unemployed) Sept. 20, 2021, 7:16 p.m. UTC | #5
On Mon, Sep 20, 2021 at 07:34:26PM +0200, Michal Hocko wrote:
> The intention and the scope of the patch should be in the changelog.
> Your Fixes tag suggests there is a problem to fixed.

I guess References would be more appropriate here? I'm not familiar with every
subsystem's way of doing things, so I just rolled with Fixes to leave a
breadcrumb trail to the original commit implicated in my change. What would you
suggest in a case like this for mm patches?

> My memory has faded but I suspect it was to make sure that the oom
> reaper is not blocking the system wide freezing. The operation mode of
> the thread is to wait for oom victims and then do the unmapping without
> any blocking. While it can be frozen during the operation I do not
> remember that causing any problems and the waiting is exactly the point
> when that is obviously safe - hence wait_event_freezable which I believe
> is the proper API to use.

This isn't clear to me. Kthreads come with PF_NOFREEZE set by default, so the
system-wide freezing will already ignore the reaper thread as-is, although it
will make that determination from inside freeze_task() and thus
freezing_slow_path(), which involves acquiring a lock. You could set
PF_FREEZER_SKIP to skip the slowpath evaluation entirely.

Furthermore, the use of wait_event_freezable() will make the reaper thread enter
try_to_freeze() every time it's woken up. This seems wasteful considering that
the reaper thread will never actually freeze.

Sultan
Michal Hocko Sept. 20, 2021, 8:30 p.m. UTC | #6
On Mon 20-09-21 12:16:39, Sultan Alsawaf wrote:
> On Mon, Sep 20, 2021 at 07:34:26PM +0200, Michal Hocko wrote:
> > The intention and the scope of the patch should be in the changelog.
> > Your Fixes tag suggests there is a problem to fixed.
> 
> I guess References would be more appropriate here? I'm not familiar with every
> subsystem's way of doing things, so I just rolled with Fixes to leave a
> breadcrumb trail to the original commit implicated in my change. What would you
> suggest in a case like this for mm patches?

We usually tend to provide Fixes where there has been something fixed.
It seems just confusing if it is used for non functional changes,
cleanups etc. Thera are gray zones of course.

> > My memory has faded but I suspect it was to make sure that the oom
> > reaper is not blocking the system wide freezing. The operation mode of
> > the thread is to wait for oom victims and then do the unmapping without
> > any blocking. While it can be frozen during the operation I do not
> > remember that causing any problems and the waiting is exactly the point
> > when that is obviously safe - hence wait_event_freezable which I believe
> > is the proper API to use.
> 
> This isn't clear to me. Kthreads come with PF_NOFREEZE set by default, so the
> system-wide freezing will already ignore the reaper thread as-is, although it
> will make that determination from inside freeze_task() and thus
> freezing_slow_path(), which involves acquiring a lock. You could set
> PF_FREEZER_SKIP to skip the slowpath evaluation entirely.

I am not sure I follow. My understanding is that we need to make sure
oom_reaper is not running after the quiescent state as it is changing
user space address space. For that I believe we need to freeze the
kthread at a proper moment. That is currently the entry point and maybe
we can extend that even to the reaping loop but I haven't really
explored that. PF_FREEZER_SKIP would skip over the reaper and that could
result in it racing with the snapshotting no?

> Furthermore, the use of wait_event_freezable() will make the reaper thread enter
> try_to_freeze() every time it's woken up. This seems wasteful considering that
> the reaper thread will never actually freeze.

Is this something to really worry about?
Sultan Alsawaf (unemployed) Sept. 20, 2021, 10:29 p.m. UTC | #7
On Mon, Sep 20, 2021 at 10:30:12PM +0200, Michal Hocko wrote:
> We usually tend to provide Fixes where there has been something fixed.
> It seems just confusing if it is used for non functional changes,
> cleanups etc. Thera are gray zones of course.

Got it, thanks. So no tag would be used in such a case?

> I am not sure I follow. My understanding is that we need to make sure
> oom_reaper is not running after the quiescent state as it is changing
> user space address space. For that I believe we need to freeze the
> kthread at a proper moment. That is currently the entry point and maybe
> we can extend that even to the reaping loop but I haven't really
> explored that. PF_FREEZER_SKIP would skip over the reaper and that could
> result in it racing with the snapshotting no?

Kthreads cannot be implicitly frozen; it's not like PREEMPT. From what I've read
in the freezer code, two things must occur for a kthread to freeze: the kthread
must have PF_NOFREEZE unset (via set_freezable(), as is done in the patch I've
submitted here), and the kthread must have an explicit call into the freezer,
such as via wait_event_freezable().

Right now, oom_reaper is totally ignored by the freezer because PF_NOFREEZE is
set by default in all kthreads. As such, oom_reaper can keep running while
system-wide freezing occurs. If you think this can mangle snapshots, then
perhaps there is a real bug here after all.

It sounds like you don't want oom_reaper to slow down system-wide freezing, but
at the same time, you want oom_reaper to participate in system-wide freezing?
I'm not sure how you could achieve that, aside from maybe inserting a call into
the freezer while iterating through each vma, akin to adding a cond_resched().

My PF_FREEZER_SKIP suggestion was just to emphasize that oom_reaper is currently
skipping the freezer anyway due to PF_NOFREEZE, and that you could set
PF_FREEZER_SKIP to make it skip the freezer a little faster if you wanted.

> Is this something to really worry about?

I'm trying to emphasize that the current usage of wait_event_freezable() in
oom_repear behaves *exactly* like wait_event_interruptible() but with some extra
overhead.

Sultan
Michal Hocko Sept. 21, 2021, 7:50 a.m. UTC | #8
On Mon 20-09-21 15:29:46, Sultan Alsawaf wrote:
> On Mon, Sep 20, 2021 at 10:30:12PM +0200, Michal Hocko wrote:
> > We usually tend to provide Fixes where there has been something fixed.
> > It seems just confusing if it is used for non functional changes,
> > cleanups etc. Thera are gray zones of course.
> 
> Got it, thanks. So no tag would be used in such a case?
> 
> > I am not sure I follow. My understanding is that we need to make sure
> > oom_reaper is not running after the quiescent state as it is changing
> > user space address space. For that I believe we need to freeze the
> > kthread at a proper moment. That is currently the entry point and maybe
> > we can extend that even to the reaping loop but I haven't really
> > explored that. PF_FREEZER_SKIP would skip over the reaper and that could
> > result in it racing with the snapshotting no?
> 
> Kthreads cannot be implicitly frozen; it's not like PREEMPT. From what I've read
> in the freezer code, two things must occur for a kthread to freeze: the kthread
> must have PF_NOFREEZE unset (via set_freezable(), as is done in the patch I've
> submitted here), and the kthread must have an explicit call into the freezer,
> such as via wait_event_freezable().
> 
> Right now, oom_reaper is totally ignored by the freezer because PF_NOFREEZE is
> set by default in all kthreads. As such, oom_reaper can keep running while
> system-wide freezing occurs. If you think this can mangle snapshots, then
> perhaps there is a real bug here after all.

OK, now I am getting your point. Sorry for being dense here. Process
freezing has always been kinda muddy to me as I've said earlier. I have
completely misunderstood your earlier PF_NOFREEZE remark.
Michal Hocko Sept. 21, 2021, 10:48 a.m. UTC | #9
On Sat 18-09-21 16:39:20, Sultan Alsawaf wrote:
> From: Sultan Alsawaf <sultan@kerneltoast.com>
> 
> The OOM reaper thread uses wait_event_freezable() without actually being
> marked as freezable. Fix it by adding a set_freezable() call.

After the follow up discussion it is clear what the patch does and why
it is needed. The changelog really begs for some clarification. I would
propose something like

"
The OOM reaper kthread uses wait_event_freezable while it is waiting for
any work. It is safe to freeze it while waiting. We, however, need to
prevent any activity after global freezer quiescent state because the
oom reaping is altering address space and this might alter the snapshot
theoretically (please note that this is mostly a theoretical concern not
being observed in practice so far) so the freezer has to wait for an
explicit freezing.

The current implementation doesn't work that way though because all
kernel threads are created with PF_NOFREEZE flag so they are
automatically excluded from freezing operation. This means that
oom_reaper can race with the system snapshoting if it was processing
while the system is being frozen. Fix that by set_freezable which will
make the oom_reaper visible to the freezer.
"

> 
> Fixes: aac453635549 ("mm, oom: introduce oom reaper")
> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>

With the above or otherwise improved changelog feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/oom_kill.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 831340e7ad8b..46a742b57735 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -641,6 +641,8 @@ static void oom_reap_task(struct task_struct *tsk)
>  
>  static int oom_reaper(void *unused)
>  {
> +	set_freezable();
> +
>  	while (true) {
>  		struct task_struct *tsk = NULL;
>  
> -- 
> 2.33.0
diff mbox series

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 831340e7ad8b..46a742b57735 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -641,6 +641,8 @@  static void oom_reap_task(struct task_struct *tsk)
 
 static int oom_reaper(void *unused)
 {
+	set_freezable();
+
 	while (true) {
 		struct task_struct *tsk = NULL;