Message ID | 20210918233920.9174-1-sultan@kerneltoast.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Mark the OOM reaper thread as freezable | expand |
On Sat, 18 Sep 2021 16:39:20 -0700 Sultan Alsawaf <sultan@kerneltoast.com> wrote: > The OOM reaper thread uses wait_event_freezable() without actually being > marked as freezable. Fix it by adding a set_freezable() call. Oh. What are the runtime effects of this defect?
On Sat 18-09-21 16:39:20, Sultan Alsawaf wrote: > From: Sultan Alsawaf <sultan@kerneltoast.com> > > The OOM reaper thread uses wait_event_freezable() without actually being > marked as freezable. Fix it by adding a set_freezable() call. What is the actual problem you are trying to solve here. Freezer details are hairy and I have to re-learn them each time again and again but from what I remember wait_event_freezable doesn't really depend on tyask being freezable. It tells the freezer that the task is OK to exclude while it is sleeping and that should be just the case for the oom reaper. Or am I missing something? > Fixes: aac453635549 ("mm, oom: introduce oom reaper") > Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> > --- > mm/oom_kill.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 831340e7ad8b..46a742b57735 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -641,6 +641,8 @@ static void oom_reap_task(struct task_struct *tsk) > > static int oom_reaper(void *unused) > { > + set_freezable(); > + > while (true) { > struct task_struct *tsk = NULL; > > -- > 2.33.0
On Mon, Sep 20, 2021 at 02:40:37PM +0200, Michal Hocko wrote: > What is the actual problem you are trying to solve here. There isn't any specific problem I'm trying to solve here; simply that, it appeared as though you intended for the reaper thread to be freezable when it actually isn't. The OOM killer is disabled after processes are frozen though so I guess it could be considered a matter of consistency to freeze the reaper thread too. Do you remember why you used wait_event_freezable()? > Freezer details are hairy and I have to re-learn them each time again and > again but from what I remember wait_event_freezable doesn't really depend on > tyask being freezable. It tells the freezer that the task is OK to exclude > while it is sleeping and that should be just the case for the oom reaper. Or > am I missing something? The task indeed doesn't need to be freezable, but the rest of what you remember isn't quite true. It tells the freezer to exclude the task only because the task will handle entering the freezer on its own. When a task sleeps on wait_event_freezable(), it will be woken up when system-wide freezing starts, and then it will try to freeze itself (see freezable_schedule() and freezer_count()). If the freezer bits here are undesired then I think wait_event_interruptible() should be used instead. Sultan
On Mon 20-09-21 08:55:15, Sultan Alsawaf wrote: > On Mon, Sep 20, 2021 at 02:40:37PM +0200, Michal Hocko wrote: > > What is the actual problem you are trying to solve here. > > There isn't any specific problem I'm trying to solve here; simply that, it > appeared as though you intended for the reaper thread to be freezable when it > actually isn't. The OOM killer is disabled after processes are frozen though so > I guess it could be considered a matter of consistency to freeze the reaper > thread too. The intention and the scope of the patch should be in the changelog. Your Fixes tag suggests there is a problem to fixed. > Do you remember why you used wait_event_freezable()? My memory has faded but I suspect it was to make sure that the oom reaper is not blocking the system wide freezing. The operation mode of the thread is to wait for oom victims and then do the unmapping without any blocking. While it can be frozen during the operation I do not remember that causing any problems and the waiting is exactly the point when that is obviously safe - hence wait_event_freezable which I believe is the proper API to use. > > Freezer details are hairy and I have to re-learn them each time again and > > again but from what I remember wait_event_freezable doesn't really depend on > > tyask being freezable. It tells the freezer that the task is OK to exclude > > while it is sleeping and that should be just the case for the oom reaper. Or > > am I missing something? > > The task indeed doesn't need to be freezable, but the rest of what you remember > isn't quite true. It tells the freezer to exclude the task only because the task > will handle entering the freezer on its own. When a task sleeps on > wait_event_freezable(), it will be woken up when system-wide freezing starts, > and then it will try to freeze itself (see freezable_schedule() and > freezer_count()). Thanks for the clarification. > If the freezer bits here are undesired then I think wait_event_interruptible() > should be used instead. I am not saying it is undesired. The crux is to be clear in the reasoning.
On Mon, Sep 20, 2021 at 07:34:26PM +0200, Michal Hocko wrote: > The intention and the scope of the patch should be in the changelog. > Your Fixes tag suggests there is a problem to fixed. I guess References would be more appropriate here? I'm not familiar with every subsystem's way of doing things, so I just rolled with Fixes to leave a breadcrumb trail to the original commit implicated in my change. What would you suggest in a case like this for mm patches? > My memory has faded but I suspect it was to make sure that the oom > reaper is not blocking the system wide freezing. The operation mode of > the thread is to wait for oom victims and then do the unmapping without > any blocking. While it can be frozen during the operation I do not > remember that causing any problems and the waiting is exactly the point > when that is obviously safe - hence wait_event_freezable which I believe > is the proper API to use. This isn't clear to me. Kthreads come with PF_NOFREEZE set by default, so the system-wide freezing will already ignore the reaper thread as-is, although it will make that determination from inside freeze_task() and thus freezing_slow_path(), which involves acquiring a lock. You could set PF_FREEZER_SKIP to skip the slowpath evaluation entirely. Furthermore, the use of wait_event_freezable() will make the reaper thread enter try_to_freeze() every time it's woken up. This seems wasteful considering that the reaper thread will never actually freeze. Sultan
On Mon 20-09-21 12:16:39, Sultan Alsawaf wrote: > On Mon, Sep 20, 2021 at 07:34:26PM +0200, Michal Hocko wrote: > > The intention and the scope of the patch should be in the changelog. > > Your Fixes tag suggests there is a problem to fixed. > > I guess References would be more appropriate here? I'm not familiar with every > subsystem's way of doing things, so I just rolled with Fixes to leave a > breadcrumb trail to the original commit implicated in my change. What would you > suggest in a case like this for mm patches? We usually tend to provide Fixes where there has been something fixed. It seems just confusing if it is used for non functional changes, cleanups etc. Thera are gray zones of course. > > My memory has faded but I suspect it was to make sure that the oom > > reaper is not blocking the system wide freezing. The operation mode of > > the thread is to wait for oom victims and then do the unmapping without > > any blocking. While it can be frozen during the operation I do not > > remember that causing any problems and the waiting is exactly the point > > when that is obviously safe - hence wait_event_freezable which I believe > > is the proper API to use. > > This isn't clear to me. Kthreads come with PF_NOFREEZE set by default, so the > system-wide freezing will already ignore the reaper thread as-is, although it > will make that determination from inside freeze_task() and thus > freezing_slow_path(), which involves acquiring a lock. You could set > PF_FREEZER_SKIP to skip the slowpath evaluation entirely. I am not sure I follow. My understanding is that we need to make sure oom_reaper is not running after the quiescent state as it is changing user space address space. For that I believe we need to freeze the kthread at a proper moment. That is currently the entry point and maybe we can extend that even to the reaping loop but I haven't really explored that. PF_FREEZER_SKIP would skip over the reaper and that could result in it racing with the snapshotting no? > Furthermore, the use of wait_event_freezable() will make the reaper thread enter > try_to_freeze() every time it's woken up. This seems wasteful considering that > the reaper thread will never actually freeze. Is this something to really worry about?
On Mon, Sep 20, 2021 at 10:30:12PM +0200, Michal Hocko wrote: > We usually tend to provide Fixes where there has been something fixed. > It seems just confusing if it is used for non functional changes, > cleanups etc. Thera are gray zones of course. Got it, thanks. So no tag would be used in such a case? > I am not sure I follow. My understanding is that we need to make sure > oom_reaper is not running after the quiescent state as it is changing > user space address space. For that I believe we need to freeze the > kthread at a proper moment. That is currently the entry point and maybe > we can extend that even to the reaping loop but I haven't really > explored that. PF_FREEZER_SKIP would skip over the reaper and that could > result in it racing with the snapshotting no? Kthreads cannot be implicitly frozen; it's not like PREEMPT. From what I've read in the freezer code, two things must occur for a kthread to freeze: the kthread must have PF_NOFREEZE unset (via set_freezable(), as is done in the patch I've submitted here), and the kthread must have an explicit call into the freezer, such as via wait_event_freezable(). Right now, oom_reaper is totally ignored by the freezer because PF_NOFREEZE is set by default in all kthreads. As such, oom_reaper can keep running while system-wide freezing occurs. If you think this can mangle snapshots, then perhaps there is a real bug here after all. It sounds like you don't want oom_reaper to slow down system-wide freezing, but at the same time, you want oom_reaper to participate in system-wide freezing? I'm not sure how you could achieve that, aside from maybe inserting a call into the freezer while iterating through each vma, akin to adding a cond_resched(). My PF_FREEZER_SKIP suggestion was just to emphasize that oom_reaper is currently skipping the freezer anyway due to PF_NOFREEZE, and that you could set PF_FREEZER_SKIP to make it skip the freezer a little faster if you wanted. > Is this something to really worry about? I'm trying to emphasize that the current usage of wait_event_freezable() in oom_repear behaves *exactly* like wait_event_interruptible() but with some extra overhead. Sultan
On Mon 20-09-21 15:29:46, Sultan Alsawaf wrote: > On Mon, Sep 20, 2021 at 10:30:12PM +0200, Michal Hocko wrote: > > We usually tend to provide Fixes where there has been something fixed. > > It seems just confusing if it is used for non functional changes, > > cleanups etc. Thera are gray zones of course. > > Got it, thanks. So no tag would be used in such a case? > > > I am not sure I follow. My understanding is that we need to make sure > > oom_reaper is not running after the quiescent state as it is changing > > user space address space. For that I believe we need to freeze the > > kthread at a proper moment. That is currently the entry point and maybe > > we can extend that even to the reaping loop but I haven't really > > explored that. PF_FREEZER_SKIP would skip over the reaper and that could > > result in it racing with the snapshotting no? > > Kthreads cannot be implicitly frozen; it's not like PREEMPT. From what I've read > in the freezer code, two things must occur for a kthread to freeze: the kthread > must have PF_NOFREEZE unset (via set_freezable(), as is done in the patch I've > submitted here), and the kthread must have an explicit call into the freezer, > such as via wait_event_freezable(). > > Right now, oom_reaper is totally ignored by the freezer because PF_NOFREEZE is > set by default in all kthreads. As such, oom_reaper can keep running while > system-wide freezing occurs. If you think this can mangle snapshots, then > perhaps there is a real bug here after all. OK, now I am getting your point. Sorry for being dense here. Process freezing has always been kinda muddy to me as I've said earlier. I have completely misunderstood your earlier PF_NOFREEZE remark.
On Sat 18-09-21 16:39:20, Sultan Alsawaf wrote: > From: Sultan Alsawaf <sultan@kerneltoast.com> > > The OOM reaper thread uses wait_event_freezable() without actually being > marked as freezable. Fix it by adding a set_freezable() call. After the follow up discussion it is clear what the patch does and why it is needed. The changelog really begs for some clarification. I would propose something like " The OOM reaper kthread uses wait_event_freezable while it is waiting for any work. It is safe to freeze it while waiting. We, however, need to prevent any activity after global freezer quiescent state because the oom reaping is altering address space and this might alter the snapshot theoretically (please note that this is mostly a theoretical concern not being observed in practice so far) so the freezer has to wait for an explicit freezing. The current implementation doesn't work that way though because all kernel threads are created with PF_NOFREEZE flag so they are automatically excluded from freezing operation. This means that oom_reaper can race with the system snapshoting if it was processing while the system is being frozen. Fix that by set_freezable which will make the oom_reaper visible to the freezer. " > > Fixes: aac453635549 ("mm, oom: introduce oom reaper") > Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> With the above or otherwise improved changelog feel free to add Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/oom_kill.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 831340e7ad8b..46a742b57735 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -641,6 +641,8 @@ static void oom_reap_task(struct task_struct *tsk) > > static int oom_reaper(void *unused) > { > + set_freezable(); > + > while (true) { > struct task_struct *tsk = NULL; > > -- > 2.33.0
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 831340e7ad8b..46a742b57735 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -641,6 +641,8 @@ static void oom_reap_task(struct task_struct *tsk) static int oom_reaper(void *unused) { + set_freezable(); + while (true) { struct task_struct *tsk = NULL;