From patchwork Thu Aug 10 08:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 13348954 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFF191ADE8 for ; Thu, 10 Aug 2023 08:13:34 +0000 (UTC) Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0BB4211E for ; Thu, 10 Aug 2023 01:13:32 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1bc83a96067so5015775ad.0 for ; Thu, 10 Aug 2023 01:13:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655212; x=1692260012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tl5oO1QRXOb7Xlr/P1TocHGd4tt8X+hUV0N7kEiSrUM=; b=aDltu1PKROoqGLCvRCHWF58dz9bbnpl3orhWXwYhHL4mOHV2ipbxiKpukTOenyL6CP qsNy+obeB+6hqeZA4OJZrBAEy5I5NGbHxgyWFUwujdVGnzUJPnTe5NggTksBrAirVvJP s1EYtHrvz6WbPAli7p8Q8y6ax6VSW2rlRMDKpBKnDWboQse/8rLDqIBFeQVxhHeZztwp hNgKt7V0jmP5xO5pheYLt3rgmcVCkGCETmfp0Yi1GP0kbth3HerEwWBLJr9aP63ixuse /4sAIb4CE6FcbBvaG+uCMzgW0hze7LEwB0OChYDzKAYsFMSSo2MoGv0b3ayXt0THKVke snpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655212; x=1692260012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tl5oO1QRXOb7Xlr/P1TocHGd4tt8X+hUV0N7kEiSrUM=; b=ftSVQFGCUFxjzoYhV3B/uuWMNyJ8O04dkRnqeJmGnVDKNH7XOgZ6KvTNXs2ni7RnkL +ROqHZxVkmXd3kriUZ1JJzfL3cxcR0eFX9V44N+rnYXr8vmaxBOYxy9qIvoHaeHLqSQc MMZ3BrS3ElM+I5xuSX8R7gD2fYhymuw78GYjrAhCo9uCFaEMYopPqGHz4p02ZKqxkQGL EF5EYBd7CZyt/0HdcRUjaVSON5LTOjVfhJA0bUcVcsb4701dfSLKrZ7imON+QbjIp3Ph eTaOArnAxiOItdirGkfwvAYWPEf1uKRmDJEtPqNydsawf1wPS5v7a7sH/e1q9VMhcAJf FycA== X-Gm-Message-State: AOJu0YzFS7mNRGNxD7gX+b+Bfa4rXufaE8LexbA8YvMQ3pOzq8oQyO6D N8xeFll2vmYbXeD2osI/Gl/RVg== X-Google-Smtp-Source: AGHT+IFFybK5XwmJPTjMNNThFKg57uX9Mgd++y0VnkKYwGUQcpL7/Y8InoUpFsJs/oQykpOry4nY0Q== X-Received: by 2002:a17:903:41cf:b0:1bc:9794:22ef with SMTP id u15-20020a17090341cf00b001bc979422efmr1545395ple.1.1691655212147; Thu, 10 Aug 2023 01:13:32 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:31 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou , Michal Hocko Subject: [RFC PATCH v2 1/5] mm, oom: Introduce bpf_oom_evaluate_task Date: Thu, 10 Aug 2023 16:13:15 +0800 Message-Id: <20230810081319.65668-2-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-State: RFC This patch adds a new hook bpf_oom_evaluate_task in oom_evaluate_task. It takes oc and current iterating task as parameters and returns a result indicating which one should be selected. We can use it to bypass the current logic of oom_evaluate_task and implement customized OOM policies in the attached BPF progams. Suggested-by: Michal Hocko Signed-off-by: Chuyi Zhou --- mm/oom_kill.c | 59 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 50 insertions(+), 9 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 612b5597d3af..255c9ef1d808 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -18,6 +18,7 @@ * kernel subsystems and hints as to where to find out what things do. */ +#include #include #include #include @@ -305,6 +306,27 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) return CONSTRAINT_NONE; } +enum { + NO_BPF_POLICY, + BPF_EVAL_ABORT, + BPF_EVAL_NEXT, + BPF_EVAL_SELECT, +}; + +__weak noinline int bpf_oom_evaluate_task(struct task_struct *task, struct oom_control *oc) +{ + return NO_BPF_POLICY; +} + +BTF_SET8_START(oom_bpf_fmodret_ids) +BTF_ID_FLAGS(func, bpf_oom_evaluate_task) +BTF_SET8_END(oom_bpf_fmodret_ids) + +static const struct btf_kfunc_id_set oom_bpf_fmodret_set = { + .owner = THIS_MODULE, + .set = &oom_bpf_fmodret_ids, +}; + static int oom_evaluate_task(struct task_struct *task, void *arg) { struct oom_control *oc = arg; @@ -317,6 +339,26 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) if (!is_memcg_oom(oc) && !oom_cpuset_eligible(task, oc)) goto next; + /* + * If task is allocating a lot of memory and has been marked to be + * killed first if it triggers an oom, then select it. + */ + if (oom_task_origin(task)) { + points = LONG_MAX; + goto select; + } + + switch (bpf_oom_evaluate_task(task, oc)) { + case BPF_EVAL_ABORT: + goto abort; /* abort search process */ + case BPF_EVAL_NEXT: + goto next; /* ignore the task */ + case BPF_EVAL_SELECT: + goto select; /* select the task */ + default: + break; /* No BPF policy */ + } + /* * This task already has access to memory reserves and is being killed. * Don't allow any other task to have access to the reserves unless @@ -329,15 +371,6 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) goto abort; } - /* - * If task is allocating a lot of memory and has been marked to be - * killed first if it triggers an oom, then select it. - */ - if (oom_task_origin(task)) { - points = LONG_MAX; - goto select; - } - points = oom_badness(task, oc->totalpages); if (points == LONG_MIN || points < oc->chosen_points) goto next; @@ -732,10 +765,18 @@ static struct ctl_table vm_oom_kill_table[] = { static int __init oom_init(void) { + int err; oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper"); #ifdef CONFIG_SYSCTL register_sysctl_init("vm", vm_oom_kill_table); #endif + +#ifdef CONFIG_BPF_SYSCALL + err = register_btf_fmodret_id_set(&oom_bpf_fmodret_set); + if (err) + pr_warn("error while registering oom fmodret entrypoints: %d", err); +#endif + return 0; } subsys_initcall(oom_init) From patchwork Thu Aug 10 08:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 13348955 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B88241ADE8 for ; Thu, 10 Aug 2023 08:13:38 +0000 (UTC) Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7EE42680 for ; Thu, 10 Aug 2023 01:13:36 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-686f1240a22so594428b3a.0 for ; Thu, 10 Aug 2023 01:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655216; x=1692260016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=V+gOX8E32N7J0qn1e/cgbPAAdyKUl0JYooXv5sNbHuI=; b=OG7IuzB6CPOFfsV9wFrgnazb3f1VQrgIV5f+OOZdkCCvMJDzb3aV1VZFFlJjpUsXA3 +BP87n9CA/2IoiPk0nP0DzRB7WQ7FQy3RpzkGS3xJcTT6KV93ClOZg7CbgeULa7wj9ly Fr3KCTw83jY9eghkbiZB2yqiscmS6/rgT4aHFfjPPmngWuHbHqylqAWkCsuflY/r6XA/ u+1EJS1Q7o+TgHUDSY/Y9RXZe993prhItKrseX0iBXnPuQufH4ZiVy6hcumq/SQIiGp4 weiY7Tex+vf38O2Fi3m5cUUfZFvNYFueWFQFMKwYqa/fe7Wq4zxqt5ZSwFX0OfTK0l6P IEEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655216; x=1692260016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V+gOX8E32N7J0qn1e/cgbPAAdyKUl0JYooXv5sNbHuI=; b=EnP8yypYqAHkCpq3wJlTNteWniKuKcS4jdLaURJg1q9V/HAh3RZZ1vst/BvDAo/yDe 0QqL1o5tqoqHyisJzPWpDJmNMs6lH9li7t0k1ngvpj8nnUwHCNFQZD9f9v8qI9LlPfqw xB7q8juRlC4NXcnYZ8vX4KV7lMymi9Bd+U+ZF51EWYqP+mABAxraqG92QFVk4u60c8Oh ILsRVAvb6COyN+EOgfTdmPdEj0rKv4D7/EOHriCKjdZdqRs4cubvF3SPYtAVbOFfpBzy DJDr35xaT0JMOgmdN7xG9Kaq7SucGM7UCtnqvpAhQRkSFd7i0HBqBUU3ivVfn0hOZ1Be THUw== X-Gm-Message-State: AOJu0YyisDLl6CN8U3jllXiI/P6Mmob3DqA5oYFpdKqrHh7dTs0Fso6B n7jNYkGlcZQmTllk/gDgzwkPiw== X-Google-Smtp-Source: AGHT+IGtscS04wQJzOPzeNPixXB/Nvlgj5vG5uoIWZkaaoCuS2LHPKsrUQS+DTx+/kSMNoPf/s1uqQ== X-Received: by 2002:a17:903:246:b0:1b8:76ce:9d91 with SMTP id j6-20020a170903024600b001b876ce9d91mr1911030plh.1.1691655216226; Thu, 10 Aug 2023 01:13:36 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:35 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 2/5] mm: Add policy_name to identify OOM policies Date: Thu, 10 Aug 2023 16:13:16 +0800 Message-Id: <20230810081319.65668-3-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-State: RFC This patch adds a new metadata policy_name in oom_control and report it in dump_header(), so we can know what has been the selection policy. In BPF program, we can call kfunc set_oom_policy_name to set the current user-defined policy name. The in-kernel policy_name is "default". Signed-off-by: Chuyi Zhou --- include/linux/oom.h | 7 +++++++ mm/oom_kill.c | 42 +++++++++++++++++++++++++++++++++++++++--- 2 files changed, 46 insertions(+), 3 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 7d0c9c48a0c5..69d0f2ec6ea6 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -22,6 +22,10 @@ enum oom_constraint { CONSTRAINT_MEMCG, }; +enum { + POLICY_NAME_LEN = 16, +}; + /* * Details of the page allocation that triggered the oom killer that are used to * determine what should be killed. @@ -52,6 +56,9 @@ struct oom_control { /* Used to print the constraint info. */ enum oom_constraint constraint; + + /* Used to report the policy info. */ + char policy_name[POLICY_NAME_LEN]; }; extern struct mutex oom_lock; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 255c9ef1d808..3239dcdba4d7 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -443,6 +443,35 @@ static int dump_task(struct task_struct *p, void *arg) return 0; } +__bpf_kfunc void set_oom_policy_name(struct oom_control *oc, const char *src, size_t sz) +{ + memset(oc->policy_name, 0, sizeof(oc->policy_name)); + + if (sz > POLICY_NAME_LEN) + sz = POLICY_NAME_LEN; + + memcpy(oc->policy_name, src, sz); +} + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "kfuncs which will be used in BPF programs"); + +__weak noinline void bpf_set_policy_name(struct oom_control *oc) +{ +} + +__diag_pop(); + +BTF_SET8_START(bpf_oom_policy_kfunc_ids) +BTF_ID_FLAGS(func, set_oom_policy_name) +BTF_SET8_END(bpf_oom_policy_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_oom_policy_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_oom_policy_kfunc_ids, +}; + /** * dump_tasks - dump current memory state of all system tasks * @oc: pointer to struct oom_control @@ -484,8 +513,8 @@ static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim) static void dump_header(struct oom_control *oc, struct task_struct *p) { - pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n", - current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, + pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, policy_name=%s, oom_score_adj=%hd\n", + current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, oc->policy_name, current->signal->oom_score_adj); if (!IS_ENABLED(CONFIG_COMPACTION) && oc->order) pr_warn("COMPACTION is disabled!!!\n"); @@ -775,8 +804,11 @@ static int __init oom_init(void) err = register_btf_fmodret_id_set(&oom_bpf_fmodret_set); if (err) pr_warn("error while registering oom fmodret entrypoints: %d", err); + err = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, + &bpf_oom_policy_kfunc_set); + if (err) + pr_warn("error while registering oom kfunc entrypoints: %d", err); #endif - return 0; } subsys_initcall(oom_init) @@ -1196,6 +1228,10 @@ bool out_of_memory(struct oom_control *oc) return true; } + set_oom_policy_name(oc, "default", sizeof("default")); + + bpf_set_policy_name(oc); + select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { From patchwork Thu Aug 10 08:13:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 13348956 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5292C1ADE8 for ; Thu, 10 Aug 2023 08:13:42 +0000 (UTC) Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DE3D211E for ; Thu, 10 Aug 2023 01:13:41 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-686f19b6dd2so449280b3a.2 for ; Thu, 10 Aug 2023 01:13:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655221; x=1692260021; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bR96duwMScqqE4Do/jU3bN4CcpYhg3nXfWXt6kcMass=; b=TXdsabfFDEPwwJuaGZzBTo8pU4vepokuZPrymTlMQv1CvvfMKgCU29jA0QwH6nWuD0 gznXsPzuwLjEwlDK0vi+3+2uRxC1XMbW+lp+jWxIiLx3wBIpH31hYECAny9GNhrdQoBp fQwpX0cY+EOzJNEIbOPRRsH7MYAIzMIZor3623Qqkm1xdI1C87niDDFOIeGXXw9YKZKm brvKjzr+H6AxlWG+iqKS93izG5cWAN1ArPQzOhLyXud37OGWuLg4IpUu1Ebn7x3jQUuM TpKXPVzkIRw4vQlWHA/ApQtbsInEvHS24H7J3wHllzsOpLmActmt8abO0eoPurB7QaOk svQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655221; x=1692260021; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bR96duwMScqqE4Do/jU3bN4CcpYhg3nXfWXt6kcMass=; b=O6H1lwqLLNqoI5iLt1EOXU5T/uxnt8S1VLPOA5v2aK1wyd2en+KS4Utci4NqkmD5AY sjHneRzIdijNFFbN1IG1ryg9JeR/XbbdSJDGJ4rPfkgskRVQzxuOcaPdzcrKg80085Ur 7IKrcXLxiNkqydc6iqrbA5PzfryNUj6f9Bx7MzagDlR3qdJ5Txy3tkE/fBdjahwbFEiZ b8CTVGQJoYDmV0FqHP2UEN0r+NN3zXDG6sVWtKhei9pOeOg8PsQJiet8ASVYJP4Dwid7 rpCUBOXSYFzF31mx0NysMUF3Xm1WsmuUn5CnkAJ4i9o9S103bXrzjP0v40Lw0RJqRX+l DFow== X-Gm-Message-State: AOJu0YyNKQCayL+WCA7IcosbFFqaTJedCCR1AWYwe1GnoNzDhSpqUxYP Di5XbySyUjPfvlU9baIl/wxfmg== X-Google-Smtp-Source: AGHT+IGLxHS6Efr09r8Fx5K71xSslYVl/sG2ni7ae6H65/+PX1fMA8cicT8FgwPTgOcMbX17lAwmtQ== X-Received: by 2002:a05:6a20:1387:b0:13d:af0e:4ee5 with SMTP id hn7-20020a056a20138700b0013daf0e4ee5mr1482972pzc.18.1691655220823; Thu, 10 Aug 2023 01:13:40 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:40 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou , Alan Maguire Subject: [RFC PATCH v2 3/5] mm: Add a tracepoint when OOM victim selection is failed Date: Thu, 10 Aug 2023 16:13:17 +0800 Message-Id: <20230810081319.65668-4-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-State: RFC This patch add a tracepoint to mark the scenario where nothing was chosen for OOM killer. This would allow BPF programs to catch the fact that the BPF OOM policy didn't work well. Suggested-by: Alan Maguire Signed-off-by: Chuyi Zhou --- include/trace/events/oom.h | 18 ++++++++++++++++++ mm/oom_kill.c | 1 + 2 files changed, 19 insertions(+) diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h index 26a11e4a2c36..b6ae1134229c 100644 --- a/include/trace/events/oom.h +++ b/include/trace/events/oom.h @@ -6,6 +6,7 @@ #define _TRACE_OOM_H #include #include +#include TRACE_EVENT(oom_score_adj_update, @@ -151,6 +152,23 @@ TRACE_EVENT(skip_task_reaping, TP_printk("pid=%d", __entry->pid) ); +TRACE_EVENT(select_bad_process_end, + + TP_PROTO(struct oom_control *oc), + + TP_ARGS(oc), + + TP_STRUCT__entry( + __array(char, policy_name, POLICY_NAME_LEN) + ), + + TP_fast_assign( + memcpy(__entry->policy_name, oc->policy_name, POLICY_NAME_LEN); + ), + + TP_printk("policy_name=%s", __entry->policy_name) +); + #ifdef CONFIG_COMPACTION TRACE_EVENT(compact_retry, diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 3239dcdba4d7..af40a1b750fa 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1235,6 +1235,7 @@ bool out_of_memory(struct oom_control *oc) select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { + trace_select_bad_process_end(oc); dump_header(oc, NULL); pr_warn("Out of memory and no killable processes...\n"); /* From patchwork Thu Aug 10 08:13:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 13348957 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A25B71ADE8 for ; Thu, 10 Aug 2023 08:13:47 +0000 (UTC) Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C422DE7E for ; Thu, 10 Aug 2023 01:13:45 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1bd9b4f8e0eso3451635ad.1 for ; Thu, 10 Aug 2023 01:13:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655225; x=1692260025; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LEsujYzeHDJksabxHXYH7r+I+WQ8F1nGq024d8cm1KU=; b=VuTFtoYEBUXlmZ/xn9ftnduPgMunFsfNJN8FNbD7NPo+XLvQDZC82CdMZileZgf7Ju J6N3h2CD+mAIE6hKRtUvuxSpadHUagPDYdUrTa4ay0ncTKS2MyjfNBe6eVPT14fr1HD5 kq6marMYBOiAeqDY6CTWVjuC3XM5YtbqNlCNH/aIP8g8SlkuknQocQyNqJtsiB6YzuhB uIJNNN8uyjXlvs3eP+8s4hLCB1yjUKioQZcsW+p7dsPom0psuNhy8Tf73mLptX1KoVFy 9AQav14P/kk92N8V/4fpdXgdt5oHI3z4UP1dFui92Em4UReElSPkLCYpVCWza+wM+d8m Fu1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655225; x=1692260025; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LEsujYzeHDJksabxHXYH7r+I+WQ8F1nGq024d8cm1KU=; b=hjJR9IMUqJOM+BZor4VqR21E/FhFtxOIEc7sh21bZc7oV1ieWxk3NXGILTRmlym60/ ypuUk8X1UD9cE+FtBrtQdMheOYoW8MdPTByz/GoTnbsPz5GzpZNjZBAXaEWa9TSmi5Z0 QL2VtiMBZYWiqaNyfIYaILN21Oj53w1N5NP+SKbkw2/4qxuRrjMGayc2aCRTs/HcyqkB MB8DV/P4Khvg3LbdGUPE4FyYzO6cMTsWjNq3P7pSwO6loIh3ZyYbYnzZ4ZlweEOCsn33 JZM3luCunAh8Jze4kWFQb3X6PQSr8xEThBB7QjXfYqrGKQ/OtRYoafw5kwiAqVDYWO84 SqSA== X-Gm-Message-State: AOJu0YwLdMYNAc0DV/qR7yg7FG2fKu/zCT8C8cLtYuM+vC0XFHNN4kyI jHGsoZbi2MlLoZ2j3HPn1I2HUgsPc/MsyrQ6BnY= X-Google-Smtp-Source: AGHT+IGBzcA4i2z/gDQv/UTkeNvJUtjNcqDbRAxVuixYlcfF4sIbOkxh+w7RKS0ZxncRcZaqWF8dTg== X-Received: by 2002:a17:903:11c8:b0:1b6:4bbd:c3a7 with SMTP id q8-20020a17090311c800b001b64bbdc3a7mr1431227plh.66.1691655225352; Thu, 10 Aug 2023 01:13:45 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:45 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 4/5] bpf: Add a OOM policy test Date: Thu, 10 Aug 2023 16:13:18 +0800 Message-Id: <20230810081319.65668-5-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch adds a test which implements a priority-based policy through bpf_oom_evaluate_task. The BPF program, oom_policy.c, compares the cgroup priority of two tasks and select the lower one. The userspace program test_oom_policy.c maintains a priority map by using cgroup id as the keys and priority as the values. We could protect certain cgroups from oom-killer by setting higher priority. Signed-off-by: Chuyi Zhou --- .../bpf/prog_tests/test_oom_policy.c | 140 ++++++++++++++++++ .../testing/selftests/bpf/progs/oom_policy.c | 104 +++++++++++++ 2 files changed, 244 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_oom_policy.c create mode 100644 tools/testing/selftests/bpf/progs/oom_policy.c diff --git a/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c b/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c new file mode 100644 index 000000000000..bea61ff22603 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgroup_helpers.h" +#include "oom_policy.skel.h" + +static int map_fd; +static int cg_nr; +struct { + const char *path; + int fd; + unsigned long long id; +} cgs[] = { + { "/cg1" }, + { "/cg2" }, +}; + + +static struct oom_policy *open_load_oom_policy_skel(void) +{ + struct oom_policy *skel; + int err; + + skel = oom_policy__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return NULL; + + err = oom_policy__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + return skel; + +cleanup: + oom_policy__destroy(skel); + return NULL; +} + +static void run_memory_consume(unsigned long long consume_size, int idx) +{ + char *buf; + + join_parent_cgroup(cgs[idx].path); + buf = malloc(consume_size); + memset(buf, 0, consume_size); + sleep(2); + exit(0); +} + +static int set_cgroup_prio(unsigned long long cg_id, int prio) +{ + int err; + + err = bpf_map_update_elem(map_fd, &cg_id, &prio, BPF_ANY); + ASSERT_EQ(err, 0, "update_map"); + return err; +} + +static int prepare_cgroup_environment(void) +{ + int err; + + err = setup_cgroup_environment(); + if (err) + goto clean_cg_env; + for (int i = 0; i < cg_nr; i++) { + err = cgs[i].fd = create_and_get_cgroup(cgs[i].path); + if (!ASSERT_GE(cgs[i].fd, 0, "cg_create")) + goto clean_cg_env; + cgs[i].id = get_cgroup_id(cgs[i].path); + } + return 0; +clean_cg_env: + cleanup_cgroup_environment(); + return err; +} + +void test_oom_policy(void) +{ + struct oom_policy *skel; + struct bpf_link *link; + int err; + int victim_pid; + unsigned long long victim_cg_id; + + link = NULL; + cg_nr = ARRAY_SIZE(cgs); + + skel = open_load_oom_policy_skel(); + err = oom_policy__attach(skel); + if (!ASSERT_OK(err, "oom_policy__attach")) + goto cleanup; + + map_fd = bpf_object__find_map_fd_by_name(skel->obj, "cg_map"); + if (!ASSERT_GE(map_fd, 0, "find map")) + goto cleanup; + + err = prepare_cgroup_environment(); + if (!ASSERT_EQ(err, 0, "prepare cgroup env")) + goto cleanup; + + write_cgroup_file("/", "memory.max", "10M"); + + /* + * Set higher priority to cg2 and lower to cg1, so we would select + * task under cg1 as victim.(see oom_policy.c) + */ + set_cgroup_prio(cgs[0].id, 10); + set_cgroup_prio(cgs[1].id, 50); + + victim_cg_id = cgs[0].id; + victim_pid = fork(); + + if (victim_pid == 0) + run_memory_consume(1024 * 1024 * 4, 0); + + if (fork() == 0) + run_memory_consume(1024 * 1024 * 8, 1); + + while (wait(NULL) > 0) + ; + + ASSERT_EQ(skel->bss->victim_pid, victim_pid, "victim_pid"); + ASSERT_EQ(skel->bss->victim_cg_id, victim_cg_id, "victim_cgid"); + ASSERT_EQ(skel->bss->failed_cnt, 1, "failed_cnt"); +cleanup: + bpf_link__destroy(link); + oom_policy__destroy(skel); + cleanup_cgroup_environment(); +} diff --git a/tools/testing/selftests/bpf/progs/oom_policy.c b/tools/testing/selftests/bpf/progs/oom_policy.c new file mode 100644 index 000000000000..fc9efc93914e --- /dev/null +++ b/tools/testing/selftests/bpf/progs/oom_policy.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, int); + __type(value, int); + __uint(max_entries, 24); +} cg_map SEC(".maps"); + +unsigned int victim_pid; +u64 victim_cg_id; +int failed_cnt; + +#define EOPNOTSUPP 95 + +enum { + NO_BPF_POLICY, + BPF_EVAL_ABORT, + BPF_EVAL_NEXT, + BPF_EVAL_SELECT, +}; + +extern void set_oom_policy_name(struct oom_control *oc, const char *buf, size_t sz) __ksym; + +static __always_inline u64 task_cgroup_id(struct task_struct *task) +{ + struct kernfs_node *node; + struct task_group *tg; + + if (!task) + return 0; + + tg = task->sched_task_group; + node = tg->css.cgroup->kn; + + return node->id; +} + +SEC("fentry/oom_kill_process") +int BPF_PROG(oom_kill_process_k, struct oom_control *oc, const char *message) +{ + struct task_struct *victim = oc->chosen; + + if (victim) { + victim_cg_id = task_cgroup_id(victim); + victim_pid = victim->pid; + } + + return 0; +} + +SEC("fentry/bpf_set_policy_name") +int BPF_PROG(set_police_name_k, struct oom_control *oc) +{ + char name[] = "cg_prio"; + set_oom_policy_name(oc, name, sizeof(name)); + return 0; +} + +SEC("tp_btf/select_bad_process_end") +int BPF_PROG(record_failed, struct oom_control *oc) +{ + failed_cnt += 1; + return 0; +} + +SEC("fmod_ret/bpf_oom_evaluate_task") +int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc) +{ + int chosen_cg_prio, task_cg_prio; + u64 chosen_cg_id, task_cg_id; + struct task_struct *chosen; + int *val; + + if (!failed_cnt) + return BPF_EVAL_NEXT; + + chosen = oc->chosen; + if (!chosen) + return BPF_EVAL_SELECT; + + chosen_cg_id = task_cgroup_id(chosen); + task_cg_id = task_cgroup_id(task); + chosen_cg_prio = task_cg_prio = 0; + val = bpf_map_lookup_elem(&cg_map, &chosen_cg_id); + if (val) + chosen_cg_prio = *val; + val = bpf_map_lookup_elem(&cg_map, &task_cg_id); + if (val) + task_cg_prio = *val; + + if (chosen_cg_prio > task_cg_prio) + return BPF_EVAL_SELECT; + if (chosen_cg_prio < task_cg_prio) + return BPF_EVAL_NEXT; + + return NO_BPF_POLICY; +} + From patchwork Thu Aug 10 08:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 13348958 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 689EB1ADE8 for ; Thu, 10 Aug 2023 08:13:52 +0000 (UTC) Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EA28E56 for ; Thu, 10 Aug 2023 01:13:51 -0700 (PDT) Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-55b0e7efb1cso442840a12.1 for ; Thu, 10 Aug 2023 01:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655231; x=1692260031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; b=KNJbzwTxxK3KzgAfGpr2FeHEj85XtDGujL07/8Dd3z3V3jSF8Rh+LYfOCIp5iXK0w7 YcNv7ogCQvXKM3pEP24qCmrjpg/JDP9h1bPZJUTufYhvRvW+AfvjaCJjSAIiDVPy4YlB FvMnZgpH//K7t4jlj1KtZkT8Tjr1dVTBmA31ymCxgtTRo28F/xNhPrvKRxrHZK39ygQZ TPSiOnZgnkDk1zDo7yeScGo2f+KIT1WWFlda15vKkcDADn6i8rF0kiPkSV8N9pjikKK2 NsnxwPD8zv745JOrycqHn98sjc4x3oYJA0Stmu2pcb6Xa6lTRk6/qVJaIsczS2Rfi1mM 78hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655231; x=1692260031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; b=h+YwW7kWbilbrQ6P6L+D0nXgVcuc1PDIiQfGWnbp+K9t0fjWCvbw3LeMEc+uEqRl77 wdjALWw/w4SFrKDvvLhZGnNRN171lPaTwpZEwoM4UQyee+l8xEB2vMzt39fzhJ/0D4IP Ni+IJ21pLXkowZ1fZerfeZ4CvxW6gHK9dSTx8brgG6MH7w0PqBnYBWupAfj/VB5y4P9i ET3tEPNXR0XdrWJaOMnAlUSm1Gh09LpdI1x/HeVb2ouQD32wAQVbujtEp+it0X5obkbv Q+TVskNYNJBzYXCnFf7w0DYO8dJDQAtUKMQjNh1AHo8cwclXVTRs2x7SkUyQMj+hxggk wlKg== X-Gm-Message-State: AOJu0YzCEopx0HRbTt5jwgwAghkvriSOODWVFxJLUzRejHQoQaVT8PpB Rp1ygWEcVAl33gedDlGx/Dy3XA== X-Google-Smtp-Source: AGHT+IHuFmD3Cyae7WIdtzmgxlWYjNuOj281DztPhX4TApif8F0U2Be9/d7VjjGRzVC64hKOa/1OAA== X-Received: by 2002:a17:902:e548:b0:1ac:63ac:10a7 with SMTP id n8-20020a170902e54800b001ac63ac10a7mr1519133plf.68.1691655230885; Thu, 10 Aug 2023 01:13:50 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:50 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 5/5] bpf: Add a BPF OOM policy Doc Date: Thu, 10 Aug 2023 16:13:19 +0800 Message-Id: <20230810081319.65668-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch adds a new doc Documentation/bpf/oom.rst to describe how BPF OOM policy is supposed to work. Signed-off-by: Chuyi Zhou --- Documentation/bpf/oom.rst | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 Documentation/bpf/oom.rst diff --git a/Documentation/bpf/oom.rst b/Documentation/bpf/oom.rst new file mode 100644 index 000000000000..9bad1fd30d4a --- /dev/null +++ b/Documentation/bpf/oom.rst @@ -0,0 +1,70 @@ +============= +BPF OOM Policy +============= + +The Out Of Memory Killer (aka OOM Killer) is invoked when the system is +critically low on memory. The in-kernel implementation is to iterate over +all tasks in the specific oom domain (all tasks for global and all members +of memcg tree for hard limit oom) and select a victim based some heuristic +policy to kill. + +Specifically: + +1. Begin to iterate tasks using ``oom_evaluate_task()`` and find a valid (killable) + victim in iteration N, select it. + +2. In iteration N + 1, N + 2..., we compare the current iteration task with the + previous selected task, if current is more suitable then select it. + +3. finally we get a victim to kill. + +However, this does not meet the needs of users in some special scenarios. Using +the eBPF capabilities, We can implement customized OOM policies to meet needs. + +Developer API: +================== + +bpf_oom_evaluate_task +---------------------- + +``bpf_oom_evaluate_task`` is a new interface hooking into ``oom_evaluate_task()`` +which is used to bypass the in-kernel selection logic. Users can customize their +victim selection policy through BPF programs attached to it. +:: + + int bpf_oom_evaluate_task(struct task_struct *task, + struct oom_control *oc); + +return value:: + + NO_BPF_POLICY no bpf policy and would fallback to the in-kernel selection + BPF_EVAL_ABORT abort the selection (exit from current selection loop) + BPF_EVAL_NEXT ignore the task + BPF_EAVL_SELECT select the current task + +Suppose we want to select a victim based on the specified pid when OOM is +invoked, we can use the following BPF program:: + + SEC("fmod_ret/bpf_oom_evaluate_task") + int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc) + { + if (task->pid == target_pid) + return BPF_EAVL_SELECT; + return BPF_EVAL_NEXT; + } + +bpf_set_policy_name +--------------------- + +``bpf_set_policy_name`` is a interface hooking before the start of victim selection. We can +set policy's name in the attached program, so dump_header() can identify different policies +when reporting messages. We can set policy's name through kfunc ``set_oom_policy_name`` +:: + + SEC("fentry/bpf_set_policy_name") + int BPF_PROG(set_police_name_k, struct oom_control *oc) + { + char name[] = "my_policy"; + set_oom_policy_name(oc, name, sizeof(name)); + return 0; + } \ No newline at end of file