From patchwork Fri Nov 13 17:34:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 11904351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 05E641391 for ; Fri, 13 Nov 2020 17:34:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E10522226 for ; Fri, 13 Nov 2020 17:34:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="X54a4os3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E10522226 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CC546B0092; Fri, 13 Nov 2020 12:34:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 87C496B0093; Fri, 13 Nov 2020 12:34:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 745756B0095; Fri, 13 Nov 2020 12:34:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id 4763D6B0092 for ; Fri, 13 Nov 2020 12:34:54 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E49F03634 for ; Fri, 13 Nov 2020 17:34:53 +0000 (UTC) X-FDA: 77480095266.06.low41_190425427310 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id C289110059C10 for ; Fri, 13 Nov 2020 17:34:53 +0000 (UTC) X-Spam-Summary: 1,0,0,6e68c5f94cb8f2ad,d41d8cd98f00b204,3vmouxwykcamvxuhqejrrjoh.frpolqx0-ppnydfn.ruj@flex--surenb.bounces.google.com,,RULES_HIT:2:41:152:355:379:541:800:960:966:967:973:982:988:989:1260:1277:1313:1314:1345:1434:1437:1516:1518:1535:1593:1594:1605:1606:1730:1747:1777:1792:2196:2199:2393:2525:2559:2565:2570:2682:2685:2693:2703:2859:2903:2933:2937:2939:2942:2945:2947:2951:2954:3022:3152:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4119:4250:4321:4385:4605:5007:6261:7875:7901:7903:8957:9025:9969:10004:11658:13153:13228,0,RBL:209.85.219.201:@flex--surenb.bounces.google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yfn85obp94s8mynbpgcddjm4tyyycjoej3cqr8nhfmbguswowimao6bze4zh8.i5im636n8x8zmj9furme34ihbstwf9wm4bju168rxgmnhhh3e1jmhnwbbt5rgzf.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:70,LUA_SUMMAR Y:none X-HE-Tag: low41_190425427310 X-Filterd-Recvd-Size: 8248 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Fri, 13 Nov 2020 17:34:53 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id k196so11387395ybf.9 for ; Fri, 13 Nov 2020 09:34:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=ZsQdeBNMv7hOETc9fSaUWVVeMIm9t9MNudq+U9YvmqU=; b=X54a4os3itmD5YQrBTHoM5/2Axj5Oh+y4MRUkEe8kEiXGcKlKTdMpz6ijG482PgJAJ E6+9pSELboav2zygBsxNl771+cdUDqS+Pg4C9wj+aXbA1G+ec/xWjOB/07eCM0ER8UEm nrzurd205gkbSWKTpzV9PTTnsZPh61zNMUC1tZdsK4VykBDeo9Ngf/yFypgO0PDVNzHn HPNF0Hlqvt6Ly1oIVqe+FhX3hCaqVOrPf+fCMT6m3Jl8bEwh2RziliVNqBtSLlk4T47T 7vzVc4/o5AVAtbJlSxwo9fAFD6RydC3bzlAn/eTO6MOC4O0zMS2Ly3viF7IClIvcrWk4 b/CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=ZsQdeBNMv7hOETc9fSaUWVVeMIm9t9MNudq+U9YvmqU=; b=Xo9s8Fhzsq74D/UPI2VyAsyw9qHqgNX+BehdPT28ZcfCRN/xKS6VK3l1aRomd4G82K XHFhEnaCzNNheznNLeQ5fF6k7d+DqIo/SdqyDLfmoFsmp88kpcDH+dFrNURInYkcRrWR vGYIXWgAm2Q90QWFoxtrLtC2lsYdU6gGgbL1Eaf1okMeT6GApwq9kRLuDUvb8kT6Lw05 I3dwW98Jgtpobp56UDUFt2hGA0UuGFmtvhLyxb7AKR+nyLd4yp3Dqeg5Zt87xV4OQkWd L8zQZAjVZi6ulOdwnNDaDMVbBxY7dfNrQuSwmez5d7oZ3IbSyUV5ppxU3y90X4k/3pjJ w0SQ== X-Gm-Message-State: AOAM531qBTH52+BZ7QzVw2ljgDQSuCbGFify8q3iDNsdzcBj3g4A/dHp MwGO4LO/vy/XvWBY4qR/ClKP+NUN3FE= X-Google-Smtp-Source: ABdhPJz0e6m8NgCQdlJ9lY/OcnFz+0tvqR2+7WCYCq1mGUncy7h2VBfmZqJioqhT/TIB0FJtbNToj2B5CdE= X-Received: from surenb1.mtv.corp.google.com ([100.98.240.136]) (user=surenb job=sendgmr) by 2002:a5b:c08:: with SMTP id f8mr3997350ybq.398.1605288892382; Fri, 13 Nov 2020 09:34:52 -0800 (PST) Date: Fri, 13 Nov 2020 09:34:48 -0800 Message-Id: <20201113173448.1863419-1-surenb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.29.2.299.gdc1121823c-goog Subject: [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process From: Suren Baghdasaryan To: surenb@google.com Cc: akpm@linux-foundation.org, mhocko@kernel.org, rientjes@google.com, willy@infradead.org, hannes@cmpxchg.org, guro@fb.com, riel@surriel.com, minchan@kernel.org, christian@brauner.io, oleg@redhat.com, timmurray@google.com, linux-api@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a process is being killed it might be in an uninterruptible sleep which leads to an unpredictable delay in its memory reclaim. In low memory situations, when it's important to free up memory quickly, such delay is problematic. Kernel solves this problem with oom-reaper thread which performs memory reclaim even when the victim process is not runnable. Userspace currently lacks such mechanisms and the need and potential solutions were discussed before (see links below). This patch provides a mechanism to perform memory reclaim in the context of the process that sends SIGKILL signal. New SYNC_REAP_MM flag for pidfd_send_signal syscall can be used only when sending SIGKILL signal and will lead to the caller synchronously reclaiming the memory that belongs to the victim and can be easily reclaimed. 1. https://patchwork.kernel.org/cover/10894999 2. https://lwn.net/Articles/787217 3. https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com Signed-off-by: Suren Baghdasaryan --- include/linux/oom.h | 2 ++ include/linux/signal.h | 7 ++++ kernel/signal.c | 73 ++++++++++++++++++++++++++++++++++++++++-- mm/oom_kill.c | 2 +- 4 files changed, 81 insertions(+), 3 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 2db9a1432511..9a8dcabdfdf1 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -111,6 +111,8 @@ bool __oom_reap_task_mm(struct mm_struct *mm); long oom_badness(struct task_struct *p, unsigned long totalpages); +extern bool task_will_free_mem(struct task_struct *task); + extern bool out_of_memory(struct oom_control *oc); extern void exit_oom_victim(void); diff --git a/include/linux/signal.h b/include/linux/signal.h index b256f9c65661..5deafc99035d 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -449,6 +449,13 @@ extern bool unhandled_signal(struct task_struct *tsk, int sig); (!siginmask(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \ (t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL) +/* + * Flag values used in pidfd_send_signal: + * + * SYNC_REAP_MM indicates request to reclaim mm after SIGKILL. + */ +#define SYNC_REAP_MM 0x1 + void signals_init(void); int restore_altstack(const stack_t __user *); diff --git a/kernel/signal.c b/kernel/signal.c index ef8f2a28d37c..15d4be5600a3 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -46,6 +46,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -3711,6 +3712,63 @@ static struct pid *pidfd_to_pid(const struct file *file) return tgid_pidfd_to_pid(file); } +static int reap_mm(struct pid *pid) +{ + struct task_struct *task; + struct mm_struct *mm; + int ret = 0; + + /* Get the task_struct */ + task = get_pid_task(pid, PIDTYPE_PID); + if (!task) { + ret = -ESRCH; + goto out; + } + + task_lock(task); + + /* Check if memory can be easily reclaimed */ + if (!task_will_free_mem(task)) { + task_unlock(task); + ret = -EBUSY; + goto release_task; + } + + /* Get mm to prevent exit_mmap */ + mm = task->mm; + mmget(mm); + + /* Ensure no competition with OOM-killer to prevent contention */ + if (unlikely(mm_is_oom_victim(mm)) || + unlikely(test_bit(MMF_OOM_SKIP, &mm->flags))) { + /* Already being reclaimed */ + task_unlock(task); + goto drop_mm; + } + /* + * Prevent OOM-killer or other pidfd_send_signal from considering + * this task + */ + set_bit(MMF_OOM_SKIP, &mm->flags); + + task_unlock(task); + + mmap_read_lock(mm); + if (!__oom_reap_task_mm(mm)) { + /* Failed to reap part of the address space. User can retry */ + ret = -EAGAIN; + clear_bit(MMF_OOM_SKIP, &mm->flags); + } + mmap_read_unlock(mm); + +drop_mm: + mmput(mm); +release_task: + put_task_struct(task); +out: + return ret; +} + /** * sys_pidfd_send_signal - Signal a process through a pidfd * @pidfd: file descriptor of the process @@ -3737,10 +3795,16 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, struct pid *pid; kernel_siginfo_t kinfo; - /* Enforce flags be set to 0 until we add an extension. */ - if (flags) + /* Enforce only valid flags. */ + if (flags) { + /* Allow SYNC_REAP_MM only with SIGKILL. */ + if (flags == SYNC_REAP_MM && sig == SIGKILL) + goto valid; + return -EINVAL; + } +valid: f = fdget(pidfd); if (!f.file) return -EBADF; @@ -3775,6 +3839,11 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, } ret = kill_pid_info(sig, &kinfo, pid); + if (unlikely(ret)) + goto err; + + if (flags & SYNC_REAP_MM) + ret = reap_mm(pid); err: fdput(f); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8b84661a6410..66c90bca25bc 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -808,7 +808,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) * Caller has to make sure that task->mm is stable (hold task_lock or * it operates on the current). */ -static bool task_will_free_mem(struct task_struct *task) +bool task_will_free_mem(struct task_struct *task) { struct mm_struct *mm = task->mm; struct task_struct *p;