[RFC,v2,5/5] bpf: Add a BPF OOM policy Doc

Message ID	20230810081319.65668-6-zhouchuyi@bytedance.com (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 689EB1ADE8 for <bpf@vger.kernel.org>; Thu, 10 Aug 2023 08:13:52 +0000 (UTC) From: Chuyi Zhou <zhouchuyi@bytedance.com> To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou <zhouchuyi@bytedance.com> Subject: [RFC PATCH v2 5/5] bpf: Add a BPF OOM policy Doc Date: Thu, 10 Aug 2023 16:13:19 +0800 Message-Id: <20230810081319.65668-6-zhouchuyi@bytedance.com> In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	mm: Select victim using bpf_oom_evaluate_task \| expand [RFC,v2,0/5] mm: Select victim using bpf_oom_evaluate_task [RFC,v2,1/5] mm, oom: Introduce bpf_oom_evaluate_task [RFC,v2,2/5] mm: Add policy_name to identify OOM policies [RFC,v2,3/5] mm: Add a tracepoint when OOM victim selection is failed [RFC,v2,4/5] bpf: Add a OOM policy test [RFC,v2,5/5] bpf: Add a BPF OOM policy Doc

Message ID

20230810081319.65668-6-zhouchuyi@bytedance.com (mailing list archive)

State

RFC

Delegated to:

BPF

Headers

From: Chuyi Zhou <zhouchuyi@bytedance.com>
To: hannes@cmpxchg.org,
	mhocko@kernel.org,
	roman.gushchin@linux.dev,
	ast@kernel.org,
	daniel@iogearbox.net,
	andrii@kernel.org,
	muchun.song@linux.dev
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	wuyun.abel@bytedance.com,
	robin.lu@bytedance.com,
	Chuyi Zhou <zhouchuyi@bytedance.com>
Subject: [RFC PATCH v2 5/5] bpf: Add a BPF OOM policy Doc
Date: Thu, 10 Aug 2023 16:13:19 +0800
Message-Id: <20230810081319.65668-6-zhouchuyi@bytedance.com>
In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com>
References: <20230810081319.65668-1-zhouchuyi@bytedance.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

mm: Select victim using bpf_oom_evaluate_task | expand

Checks

Context	Check	Description
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3	fail	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	fail	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5	fail	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	fail	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for veristat
netdev/tree_selection	success	Not a local patch, async

Context

Check

Description

bpf/vmtest-bpf-next-PR

success

PR summary

bpf/vmtest-bpf-next-VM_Test-1

success

Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}

bpf/vmtest-bpf-next-VM_Test-2

success

Logs for ShellCheck

bpf/vmtest-bpf-next-VM_Test-3

fail

Logs for build for aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-4

fail

Logs for build for s390x with gcc

bpf/vmtest-bpf-next-VM_Test-5

fail

Logs for build for x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-6

fail

Logs for build for x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-7

success

Logs for set-matrix

bpf/vmtest-bpf-next-VM_Test-8

success

Logs for veristat

netdev/tree_selection

success

Not a local patch, async

Commit Message

Chuyi Zhou Aug. 10, 2023, 8:13 a.m. UTC

This patch adds a new doc Documentation/bpf/oom.rst to describe how
BPF OOM policy is supposed to work.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 Documentation/bpf/oom.rst | 70 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100644 Documentation/bpf/oom.rst

diff --git a/Documentation/bpf/oom.rst b/Documentation/bpf/oom.rst
new file mode 100644
index 000000000000..9bad1fd30d4a
--- /dev/null
+++ b/Documentation/bpf/oom.rst
@@ -0,0 +1,70 @@ 
+=============
+BPF OOM Policy
+=============
+
+The Out Of Memory Killer (aka OOM Killer) is invoked when the system is
+critically low on memory. The in-kernel implementation is to iterate over
+all tasks in the specific oom domain (all tasks for global and all members
+of memcg tree for hard limit oom) and select a victim based some heuristic
+policy to kill.
+
+Specifically:
+
+1. Begin to iterate tasks using ``oom_evaluate_task()`` and find a valid (killable)
+   victim in iteration N, select it.
+
+2. In iteration N + 1, N + 2..., we compare the current iteration task with the
+   previous selected task, if current is more suitable then select it.
+
+3. finally we get a victim to kill.
+
+However, this does not meet the needs of users in some special scenarios. Using
+the eBPF capabilities, We can implement customized OOM policies to meet needs.
+
+Developer API:
+==================
+
+bpf_oom_evaluate_task
+----------------------
+
+``bpf_oom_evaluate_task`` is a new interface hooking into ``oom_evaluate_task()``
+which is used to bypass the in-kernel selection logic. Users can customize their
+victim selection policy through BPF programs attached to it.
+::
+
+    int bpf_oom_evaluate_task(struct task_struct *task,
+                                struct oom_control *oc);
+
+return value::
+
+    NO_BPF_POLICY     no bpf policy and would fallback to the in-kernel selection
+    BPF_EVAL_ABORT    abort the selection (exit from current selection loop)
+    BPF_EVAL_NEXT     ignore the task
+    BPF_EAVL_SELECT   select the current task
+
+Suppose we want to select a victim based on the specified pid when OOM is
+invoked, we can use the following BPF program::
+
+    SEC("fmod_ret/bpf_oom_evaluate_task")
+    int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc)
+    {
+        if (task->pid == target_pid)
+            return BPF_EAVL_SELECT;
+        return BPF_EVAL_NEXT;
+    }
+
+bpf_set_policy_name
+---------------------
+
+``bpf_set_policy_name`` is a interface hooking before the start of victim selection. We can
+set policy's name in the attached program, so dump_header() can identify different policies
+when reporting messages. We can set policy's name through kfunc ``set_oom_policy_name``
+::
+
+    SEC("fentry/bpf_set_policy_name")
+    int BPF_PROG(set_police_name_k, struct oom_control *oc)
+    {
+	    char name[] = "my_policy";
+	    set_oom_policy_name(oc, name, sizeof(name));
+	    return 0;
+    }
\ No newline at end of file