[bpf-next,v2,0/2] Misc optimizations for bpf mem allocator

Message ID	20221209010947.3130477-1-houtao@huaweicloud.com (mailing list archive)
Headers	show Return-Path: <rcu-owner@kernel.org> From: Hou Tao <houtao@huaweicloud.com> To: bpf@vger.kernel.org Cc: Martin KaFai Lau <martin.lau@linux.dev>, Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>, John Fastabend <john.fastabend@gmail.com>, "Paul E . McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org, houtao1@huawei.com Subject: [PATCH bpf-next v2 0/2] Misc optimizations for bpf mem allocator Date: Fri, 9 Dec 2022 09:09:45 +0800 Message-Id: <20221209010947.3130477-1-houtao@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Misc optimizations for bpf mem allocator \| expand [bpf-next,v2,0/2] Misc optimizations for bpf mem allocator [bpf-next,v2,1/2] bpf: Reuse freed element in free_by_rcu during allocation [bpf-next,v2,2/2] bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true

Message ID

20221209010947.3130477-1-houtao@huaweicloud.com (mailing list archive)

Headers

From: Hou Tao <houtao@huaweicloud.com>
To: bpf@vger.kernel.org
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
        Andrii Nakryiko <andrii@kernel.org>,
        Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>,
        Yonghong Song <yhs@fb.com>,
        Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        KP Singh <kpsingh@kernel.org>,
        Stanislav Fomichev <sdf@google.com>,
        Jiri Olsa <jolsa@kernel.org>,
        John Fastabend <john.fastabend@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org,
        houtao1@huawei.com
Subject: [PATCH bpf-next v2 0/2] Misc optimizations for bpf mem allocator
Date: Fri,  9 Dec 2022 09:09:45 +0800
Message-Id: <20221209010947.3130477-1-houtao@huaweicloud.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Misc optimizations for bpf mem allocator | expand

Message

Hou Tao Dec. 9, 2022, 1:09 a.m. UTC

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset is just misc optimizations for bpf mem allocator. Patch 1
fixes the OOM problem found during running hash-table update benchmark
from qp-trie patchset [0]. The benchmark will add htab elements in
batch and then delete elements in batch, so freed objects will stack on
free_by_rcu and wait for the expiration of RCU grace period. There can
be tens of thousands of freed objects and these objects are not
available for new allocation, so adding htab element will continue to do
new allocation.

For the benchmark commmand: "./bench -w3 -d10 -a htab-update -p 16",
even the maximum entries of htab is 16384, key_size is 255 and
value_size is 4, the peak memory usage will reach 14GB or more.
Increasing rcupdate.rcu_task_enqueue_lim will decrease the peak memory to
860MB, but it is still too many. Although the above case is contrived,
it is better to fix it and the fixing is simple: just reusing the freed
objects in free_by_rcu during allocation. After the fix, the peak memory
usage will decrease to 26MB. Beside above case, the memory blow-up
problem is also possible when allocation and freeing are done on total
different CPUs. I'm trying to fix the blow-up problem by using a global
per-cpu work to free these objects in free_by_rcu timely, but it doesn't
work very well and I am still digging into it.

Patch 2 is a left-over patch from rcu_trace_implies_rcu_gp() patchset
[1]. After disscussing with Paul [2], I think it is also safe to skip
rcu_barrier() when rcu_trace_implies_rcu_gp() returns true.

Comments are always welcome.

Change Log:
v2:
  * Patch 1: repharse the commit message (Suggested by Yonghong & Alexei)
  * Add Acked-by for both patch 1 and 2

v1: https://lore.kernel.org/bpf/20221206042946.686847-1-houtao@huaweicloud.com

[0]: https://lore.kernel.org/bpf/20220924133620.4147153-13-houtao@huaweicloud.com/
[1]: https://lore.kernel.org/bpf/20221014113946.965131-1-houtao@huaweicloud.com/
[2]: https://lore.kernel.org/bpf/20221021185002.GP5600@paulmck-ThinkPad-P17-Gen-1/

Hou Tao (2):
  bpf: Reuse freed element in free_by_rcu during allocation
  bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true

 kernel/bpf/memalloc.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Dec. 9, 2022, 2 a.m. UTC | #1

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri,  9 Dec 2022 09:09:45 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The patchset is just misc optimizations for bpf mem allocator. Patch 1
> fixes the OOM problem found during running hash-table update benchmark
> from qp-trie patchset [0]. The benchmark will add htab elements in
> batch and then delete elements in batch, so freed objects will stack on
> free_by_rcu and wait for the expiration of RCU grace period. There can
> be tens of thousands of freed objects and these objects are not
> available for new allocation, so adding htab element will continue to do
> new allocation.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2,1/2] bpf: Reuse freed element in free_by_rcu during allocation
    https://git.kernel.org/bpf/bpf-next/c/0893d6007db5
  - [bpf-next,v2,2/2] bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true
    https://git.kernel.org/bpf/bpf-next/c/822ed78fab13

You are awesome, thank you!