From patchwork Wed Feb 8 09:49:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuq X-Patchwork-Id: 13132825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5948DC636CC for ; Wed, 8 Feb 2023 09:49:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A46146B0071; Wed, 8 Feb 2023 04:49:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F6086B0072; Wed, 8 Feb 2023 04:49:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BCC16B0073; Wed, 8 Feb 2023 04:49:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7DA8E6B0071 for ; Wed, 8 Feb 2023 04:49:27 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4AF8FA6BE2 for ; Wed, 8 Feb 2023 09:49:27 +0000 (UTC) X-FDA: 80443651974.27.00CAFC6 Received: from chinatelecom.cn (prt-mail.chinatelecom.cn [42.123.76.219]) by imf10.hostedemail.com (Postfix) with ESMTP id EA2FCC0020 for ; Wed, 8 Feb 2023 09:49:23 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of liuq131@chinatelecom.cn designates 42.123.76.219 as permitted sender) smtp.mailfrom=liuq131@chinatelecom.cn; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675849765; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=17y0HAun2GZlV1+17AF6UbAwT73vPeQ+22Ihn6FiIMs=; b=ubeXZqrQNdQUm0Kkf79itXszrX8zmz+49o8lCD+JUhDz6SLHg9UovD0qzskPy1DASSfVL5 PSt327RNMNZcVNBq5stxTUYdkOkN2cJI88+zQn1Gm94I82t3hKbMD0WLHVKw/YT+9h8Ahd //gd4UEBjnT1sqUAJvB5BIft5H3iwPQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of liuq131@chinatelecom.cn designates 42.123.76.219 as permitted sender) smtp.mailfrom=liuq131@chinatelecom.cn; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675849765; a=rsa-sha256; cv=none; b=U5Dw9emziDhLbzXph/d5J3hqxwMCBrXe6S9Dy/RkRiPb4W3/R+cCtqj8GTkSrmWTH7oLx7 fASQ4Pbpbesg58gg+fCUwDlXp3WZ7yCMwLbPrlXKtoynRAjBU9EV4GDSl6dKdzRKXZKzT/ YFXGUBH+3j2QM6tE7hyaOpbojq741I4= HMM_SOURCE_IP: 172.18.0.188:58042.1886889107 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-36.111.64.85 (unknown [172.18.0.188]) by chinatelecom.cn (HERMES) with SMTP id D71532800AF; Wed, 8 Feb 2023 17:49:12 +0800 (CST) X-189-SAVE-TO-SEND: +liuq131@chinatelecom.cn Received: from ([36.111.64.85]) by app0023 with ESMTP id 445ad0df776c47cc8e5c8aff04e70ff5 for akpm@linux-foundation.org; Wed, 08 Feb 2023 17:49:19 CST X-Transaction-ID: 445ad0df776c47cc8e5c8aff04e70ff5 X-Real-From: liuq131@chinatelecom.cn X-Receive-IP: 36.111.64.85 X-MEDUSA-Status: 0 From: liuq To: akpm@linux-foundation.org Cc: agruenba@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, huyd12@chinatelecom.cn, liuq Subject: [PATCH] pid: add handling of too many zombie processes Date: Wed, 8 Feb 2023 17:49:05 +0800 Message-Id: <20230208094905.373-1-liuq131@chinatelecom.cn> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 1tm4zj453jqtu64y9unh8aye19aspqgk X-Rspamd-Queue-Id: EA2FCC0020 X-HE-Tag: 1675849763-646230 X-HE-Meta: U2FsdGVkX19OeE3bZb58k/yhUTDSaq4Hf1SECnvCM9yjU5eE366Ock9pPLSeYIgsFzPJYBH/0raQayovdFDpWVksWMaOhYlBoFuvv/V8G5bqJhG7hcTGr+MGkc/ivNtcdq4jTkQgMMUNrz4C3rz+9hi3X8rZvu8Frn9OFuPXd3sOvda3sWdzrRyaJAY9YdYHlrBVSwEaueOS5bWkXrakc/nZSp/2MC52exn19PtJfM7s/s3bvqXQkcQk94llMFG44SpsW799qGkjIPtrOXO5VsvJ+mV04W/ITdddT7Kz+9BZihzsJg5QYF58oGLbRHaBzXbv2ApLlWNslQUwCiafrmQ6xX6JfCWucWsAPcZgCqk7U4u7fO7m1q3ET5EEl4Bm2kwEXoWcXTjxBjAEdIbZoo14wJqoKc9tiEdKxPQB0AspH97s9a/V3FGwPMcAS0gR1gLtPHv4eOwbNOCtf5vbGaqCbVbF6npjHu8b1eiUNa/Yz7+enHI5Qoa8gzRk41XtJP7345jRfpcvGNKg5TsT40T6Kro918uRTIBksnhOkCN4VLuELXJY1vNB9/ggvy8M5s+M1gIcnRbk9cfLRXqHyK/ejsWQipEYW6c3vX+oDIh80zbG/v/WRhXOytayo2gap2SjyVeJbvAjB1qzkTea2TszFLzOIOL3rSrWcVB6fvOmbxtlcKbHQKs4Hq5KsPqHVQ8mDU6StM0pWvaP/t93zFny2ASQCC5aWpPeAqApQ6x7jdWgNTpjEWUdl5Z8h/YkBWdWHrF4+mt0yDL0A/XGuhb5eEv3KD2qi1sCN/8FXx+t0fuo4Puva0/DFfDxEcpZWEkcMl7kzmV2wy/oWLw8/ZiT1+T7LwSmUeEhKWWQDY4bFrn2/HqKXHIGm/CCzTmrESTyayBcBKIo1sSx06cISeFNV6HtkXp1nlbW6rUZ9Jp3zkDe8AoZtkcgF063omzJF0v3ECrBixX4dyv8XY2 PMavfhXL Tx1SGhORBMHwuenbfCtn3kdaoJR9WyH1YyNnzSt5yDTaELhGDU54VbzEr0udUB/CRU8hJ45vPdjro8eiETFG0GquOFiLwF00GZRFarqJUyCx+erD632xxUMSAwg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000190, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is a common situation that a parent process forks many child processes to execute tasks, but the parent process does not execute wait/waitpid when the child process exits, resulting in a large number of child processes becoming zombie processes. At this time, if the number of processes in the system out of kernel.pid_max, the new fork syscall will fail, and the system will not be able to execute any command at this time (unless an old process exits) eg: [root@lq-workstation ~]# ls -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: Resource temporarily unavailable [root@lq-workstation ~]# reboot -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: Resource temporarily unavailable I dealt with this situation in the alloc_pid function, and found a process with the most zombie subprocesses, and more than 10(or other reasonable values?) zombie subprocesses, so I tried to kill this process to release the pid resources. Signed-off-by: liuq --- include/linux/mm.h | 2 ++ kernel/pid.c | 6 +++- mm/oom_kill.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..afcff08a3878 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1940,6 +1940,8 @@ static inline void clear_page_pfmemalloc(struct page *page) * Can be called by the pagefault handler when it gets a VM_FAULT_OOM. */ extern void pagefault_out_of_memory(void); +extern void pid_max_oom_check(struct pid_namespace *ns); + #define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK) #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1)) diff --git a/kernel/pid.c b/kernel/pid.c index 3fbc5e46b721..1a9a60e19ab6 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -237,7 +237,11 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, idr_preload_end(); if (nr < 0) { - retval = (nr == -ENOSPC) ? -EAGAIN : nr; + retval = nr; + if (nr == -ENOSPC) { + retval = -EAGAIN; + pid_max_oom_check(tmp); + } goto out_free; } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1276e49b31b0..18d05d706f48 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1260,3 +1260,73 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +static void oom_pid_evaluate_task(struct task_struct *p, + struct task_struct **max_zombie_task, int *max_zombie_num) +{ + struct task_struct *child; + int zombie_num = 0; + + list_for_each_entry(child, &p->children, sibling) { + if (child->exit_state == EXIT_ZOMBIE) + zombie_num++; + } + if (zombie_num > *max_zombie_num) { + *max_zombie_num = zombie_num; + *max_zombie_task = p; + } +} +#define MAX_ZOMBIE_NUM 10 +struct task_struct *pid_max_bad_process(struct pid_namespace *ns) +{ + int max_zombie_num = 0; + struct task_struct *max_zombie_task = &init_task; + struct task_struct *p; + + rcu_read_lock(); + for_each_process(p) + oom_pid_evaluate_task(p, &max_zombie_task, &max_zombie_num); + rcu_read_unlock(); + + if (max_zombie_num > MAX_ZOMBIE_NUM) { + pr_info("process %d has %d zombie child\n", + task_pid_nr_ns(max_zombie_task, ns), max_zombie_num); + return max_zombie_task; + } + + return NULL; +} + +void pid_max_oom_kill_process(struct task_struct *task) +{ + struct oom_control oc = { + .zonelist = NULL, + .nodemask = NULL, + .memcg = NULL, + .gfp_mask = 0, + .order = 0, + }; + + get_task_struct(task); + oc.chosen = task; + + if (mem_cgroup_oom_synchronize(true)) + return; + + if (!mutex_trylock(&oom_lock)) + return; + + oom_kill_process(&oc, "Out of pid max(oom_kill_allocating_task)"); + mutex_unlock(&oom_lock); +} + +void pid_max_oom_check(struct pid_namespace *ns) +{ + struct task_struct *p; + + p = pid_max_bad_process(ns); + if (p) { + pr_info("oom_kill process %d\n", task_pid_nr_ns(p, ns)); + pid_max_oom_kill_process(p); + } +}