Message ID | 20211208212211.2860249-3-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4,1/3] mm: protect free_pgtables with mmap_lock write lock in exit_mmap | expand |
On Wed 08-12-21 13:22:11, Suren Baghdasaryan wrote: > With exit_mmap holding mmap_write_lock during free_pgtables call, > process_mrelease does not need to elevate mm->mm_users in order to > prevent exit_mmap from destrying pagetables while __oom_reap_task_mm > is walking the VMA tree. The change prevents process_mrelease from > calling the last mmput, which can lead to waiting for IO completion > in exit_aio. > > Fixes: 337546e83fc7 ("mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap") I am not sure I have brought this up already but I do not think Fixes tag is a good fit. 337546e83fc7 is a correct way to handle the race. It is just slightly less optimal than this fix. > Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/oom_kill.c | 27 +++++++++++++++------------ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 1ddabefcfb5a..67780386f478 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -1169,15 +1169,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) > goto put_task; > } > > - if (mmget_not_zero(p->mm)) { > - mm = p->mm; > - if (task_will_free_mem(p)) > - reap = true; > - else { > - /* Error only if the work has not been done already */ > - if (!test_bit(MMF_OOM_SKIP, &mm->flags)) > - ret = -EINVAL; > - } > + mm = p->mm; > + mmgrab(mm); > + > + if (task_will_free_mem(p)) > + reap = true; > + else { > + /* Error only if the work has not been done already */ > + if (!test_bit(MMF_OOM_SKIP, &mm->flags)) > + ret = -EINVAL; > } > task_unlock(p); > > @@ -1188,13 +1188,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) > ret = -EINTR; > goto drop_mm; > } > - if (!__oom_reap_task_mm(mm)) > + /* > + * Check MMF_OOM_SKIP again under mmap_read_lock protection to ensure > + * possible change in exit_mmap is seen > + */ > + if (!test_bit(MMF_OOM_SKIP, &mm->flags) && !__oom_reap_task_mm(mm)) > ret = -EAGAIN; > mmap_read_unlock(mm); > > drop_mm: > - if (mm) > - mmput(mm); > + mmdrop(mm); > put_task: > put_task_struct(task); > return ret; > -- > 2.34.1.400.ga245620fadb-goog
On Thu, Dec 9, 2021 at 12:59 AM Michal Hocko <mhocko@suse.com> wrote: > > On Wed 08-12-21 13:22:11, Suren Baghdasaryan wrote: > > With exit_mmap holding mmap_write_lock during free_pgtables call, > > process_mrelease does not need to elevate mm->mm_users in order to > > prevent exit_mmap from destrying pagetables while __oom_reap_task_mm > > is walking the VMA tree. The change prevents process_mrelease from > > calling the last mmput, which can lead to waiting for IO completion > > in exit_aio. > > > > Fixes: 337546e83fc7 ("mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap") > > I am not sure I have brought this up already but I do not think Fixes > tag is a good fit. 337546e83fc7 is a correct way to handle the race. It > is just slightly less optimal than this fix. Will post v5 without it. Thanks! > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > Acked-by: Michal Hocko <mhocko@suse.com> > > Thanks! > > --- > > mm/oom_kill.c | 27 +++++++++++++++------------ > > 1 file changed, 15 insertions(+), 12 deletions(-) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 1ddabefcfb5a..67780386f478 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -1169,15 +1169,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) > > goto put_task; > > } > > > > - if (mmget_not_zero(p->mm)) { > > - mm = p->mm; > > - if (task_will_free_mem(p)) > > - reap = true; > > - else { > > - /* Error only if the work has not been done already */ > > - if (!test_bit(MMF_OOM_SKIP, &mm->flags)) > > - ret = -EINVAL; > > - } > > + mm = p->mm; > > + mmgrab(mm); > > + > > + if (task_will_free_mem(p)) > > + reap = true; > > + else { > > + /* Error only if the work has not been done already */ > > + if (!test_bit(MMF_OOM_SKIP, &mm->flags)) > > + ret = -EINVAL; > > } > > task_unlock(p); > > > > @@ -1188,13 +1188,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) > > ret = -EINTR; > > goto drop_mm; > > } > > - if (!__oom_reap_task_mm(mm)) > > + /* > > + * Check MMF_OOM_SKIP again under mmap_read_lock protection to ensure > > + * possible change in exit_mmap is seen > > + */ > > + if (!test_bit(MMF_OOM_SKIP, &mm->flags) && !__oom_reap_task_mm(mm)) > > ret = -EAGAIN; > > mmap_read_unlock(mm); > > > > drop_mm: > > - if (mm) > > - mmput(mm); > > + mmdrop(mm); > > put_task: > > put_task_struct(task); > > return ret; > > -- > > 2.34.1.400.ga245620fadb-goog > > -- > Michal Hocko > SUSE Labs
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1ddabefcfb5a..67780386f478 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1169,15 +1169,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) goto put_task; } - if (mmget_not_zero(p->mm)) { - mm = p->mm; - if (task_will_free_mem(p)) - reap = true; - else { - /* Error only if the work has not been done already */ - if (!test_bit(MMF_OOM_SKIP, &mm->flags)) - ret = -EINVAL; - } + mm = p->mm; + mmgrab(mm); + + if (task_will_free_mem(p)) + reap = true; + else { + /* Error only if the work has not been done already */ + if (!test_bit(MMF_OOM_SKIP, &mm->flags)) + ret = -EINVAL; } task_unlock(p); @@ -1188,13 +1188,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) ret = -EINTR; goto drop_mm; } - if (!__oom_reap_task_mm(mm)) + /* + * Check MMF_OOM_SKIP again under mmap_read_lock protection to ensure + * possible change in exit_mmap is seen + */ + if (!test_bit(MMF_OOM_SKIP, &mm->flags) && !__oom_reap_task_mm(mm)) ret = -EAGAIN; mmap_read_unlock(mm); drop_mm: - if (mm) - mmput(mm); + mmdrop(mm); put_task: put_task_struct(task); return ret;
With exit_mmap holding mmap_write_lock during free_pgtables call, process_mrelease does not need to elevate mm->mm_users in order to prevent exit_mmap from destrying pagetables while __oom_reap_task_mm is walking the VMA tree. The change prevents process_mrelease from calling the last mmput, which can lead to waiting for IO completion in exit_aio. Fixes: 337546e83fc7 ("mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap") Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- mm/oom_kill.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-)