mbox series

[RFC,v2,0/5] mm: Select victim using bpf_oom_evaluate_task

Message ID 20230810081319.65668-1-zhouchuyi@bytedance.com (mailing list archive)
Headers show
Series mm: Select victim using bpf_oom_evaluate_task | expand

Message

Chuyi Zhou Aug. 10, 2023, 8:13 a.m. UTC
Changes
-------

This is v2 of the BPF OOM policy patchset.
v1 : https://lore.kernel.org/lkml/20230804093804.47039-1-zhouchuyi@bytedance.com/
v1 -> v2 changes:

- rename bpf_select_task to bpf_oom_evaluate_task and bypass the
tsk_is_oom_victim (and MMF_OOM_SKIP) logic. (Michal)

- add a new hook to set policy's name, so dump_header() can know
what has been the selection policy when reporting messages. (Michal)

- add a tracepoint when select_bad_process() find nothing. (Alan)

- add a doc to to describe how it is all supposed to work. (Alan)

================

This patchset adds a new interface and use it to select victim when OOM
is invoked. The mainly motivation is the need to customizable OOM victim
selection functionality.

The new interface is a bpf hook plugged in oom_evaluate_task. It takes oc
and current task as parameters and return a result indicating which one is
selected by the attached bpf program.

There are several conserns when designing this interface suggested by
Michal:

1. Hooking into oom_evaluate_task can keep the consistency of global and
memcg OOM interface. Besides, it seems the least disruptive to the existing
oom killer implementation.

2. Userspace can handle a lot on its own and provide the input to the BPF
program to make a decision. Since the oom scope iteration will be
implemented already in the kernel so all the BPF program has to do is to
rank processes or memcgs.

3. The new interface should better bypass the current heuristic rules
(e.g., tsk_is_oom_victim, and MMF_OOM_SKIP) to meet an arbitrary oom
policy's need.

Chuyi Zhou (5):
  mm, oom: Introduce bpf_oom_evaluate_task
  mm: Add policy_name to identify OOM policies
  mm: Add a tracepoint when OOM victim selection is failed
  bpf: Add a OOM policy test
  bpf: Add a BPF OOM policy Doc

 Documentation/bpf/oom.rst                     |  70 +++++++++
 include/linux/oom.h                           |   7 +
 include/trace/events/oom.h                    |  18 +++
 mm/oom_kill.c                                 | 100 +++++++++++--
 .../bpf/prog_tests/test_oom_policy.c          | 140 ++++++++++++++++++
 .../testing/selftests/bpf/progs/oom_policy.c  | 104 +++++++++++++
 6 files changed, 428 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/bpf/oom.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_oom_policy.c
 create mode 100644 tools/testing/selftests/bpf/progs/oom_policy.c

Comments

Yosry Ahmed Aug. 16, 2023, 3:49 p.m. UTC | #1
> Changes
> -------
>
> This is v2 of the BPF OOM policy patchset.
> v1 : https://lore.kernel.org/lkml/20230804093804.47039-1-zhouchuyi@bytedance.com/
> v1 -> v2 changes:
>
> - rename bpf_select_task to bpf_oom_evaluate_task and bypass the
> tsk_is_oom_victim (and MMF_OOM_SKIP) logic. (Michal)
>
> - add a new hook to set policy's name, so dump_header() can know
> what has been the selection policy when reporting messages. (Michal)
>
> - add a tracepoint when select_bad_process() find nothing. (Alan)
>
> - add a doc to to describe how it is all supposed to work. (Alan)
>
> ================
>
> This patchset adds a new interface and use it to select victim when OOM
> is invoked. The mainly motivation is the need to customizable OOM victim
> selection functionality.
>
> The new interface is a bpf hook plugged in oom_evaluate_task. It takes oc
> and current task as parameters and return a result indicating which one is
> selected by the attached bpf program.
>
> There are several conserns when designing this interface suggested by
> Michal:
>
> 1. Hooking into oom_evaluate_task can keep the consistency of global and
> memcg OOM interface. Besides, it seems the least disruptive to the existing
> oom killer implementation.
>
> 2. Userspace can handle a lot on its own and provide the input to the BPF
> program to make a decision. Since the oom scope iteration will be
> implemented already in the kernel so all the BPF program has to do is to
> rank processes or memcgs.
>
> 3. The new interface should better bypass the current heuristic rules
> (e.g., tsk_is_oom_victim, and MMF_OOM_SKIP) to meet an arbitrary oom
> policy's need.

Can we linux-mm on such changes? I almost missed this series :)