From patchwork Fri Aug 26 02:44:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955492 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16F6BECAAA3 for ; Fri, 26 Aug 2022 02:44:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09166940008; Thu, 25 Aug 2022 22:44:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 040F3940007; Thu, 25 Aug 2022 22:44:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFCF7940008; Thu, 25 Aug 2022 22:44:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CA62B940007 for ; Thu, 25 Aug 2022 22:44:35 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 99E30120692 for ; Fri, 26 Aug 2022 02:44:35 +0000 (UTC) X-FDA: 79840200510.23.DE8BCC5 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf15.hostedemail.com (Postfix) with ESMTP id 496B4A001A for ; Fri, 26 Aug 2022 02:44:35 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id t5so399646pjs.0 for ; Thu, 25 Aug 2022 19:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc; bh=0Vfugx9ZeZmqLlCmMUr+dSSSNQt1I/QhIF695C1G7bI=; b=FenTtXfPDRhXEeuZZwJmCW13rJvJXvccWWh1j1rcutnnW7/cVcaBG/lXrh7suUwkuN m+/zfREGsLP96WlDb1fCeMSzqNJ9XYL/mHk6Seuo2NiJiqfjxW2Pi3UxRIVuRkH1f7/6 ulzE/eXuQkjVGwpJ+3PZUa3SpoB9tbpq11rbpRiBZ/XYfFnmn3TV8nN+y5lbSP6qaFb6 c0DM+BqHyJh5QcpOcVxGNmUWyN5Tw4ielYn/2I0hQoPG0d4HrOT2blDBEm8a2aUtuhJk zbQL3nWUaDXhigos9PWRw/z0bnZKfYoKl6gxjvR3A1Vvag0pekaOvN/PxJFxrheO6Yu4 xJ9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc; bh=0Vfugx9ZeZmqLlCmMUr+dSSSNQt1I/QhIF695C1G7bI=; b=C2t4wuTffgvtHs/8+GAbX/o6ZcLxuWomV9HGpPmcZbYm4dEZiz6qAUC+/smQk92MmR bIWVt3WUPbjALD9J+ieaB3oRtHPGtlgvrRP4kgBHm70Xkk951FlPQ0lE22mezy2uJtEr SH2Ihl8PdM66BOFVr+JOqoi2vUm77ecklOm+jAPpX/ZfkP0+ExiBrFMUg64Z/Dnc3AyD 0YmipCQpWAkBdwyDVeLRIGINbXGvTBgy7/zpriI/LGhzANSNJkbBB+bEOLx+Am+UDt1f iVKkVZCu36VcH/yEyER5xZkQMqAwMi3F+Ql2phufN9kkFbBkvmjWhvUdUweVSLUB9xRw BA5g== X-Gm-Message-State: ACgBeo2TSMaWUukl2kPApopdcqp20umhBBCX7rZ8wbtK8DAg2WO9BIDI hdCs0XE/Vta9sn8DJnb0DtA= X-Google-Smtp-Source: AA6agR6XzgxYJQDzzr6z3kzzaW91eX94ZbpcSzlDfSBFxNUzEvAov+JDO6buNdUS5GkhvJMF2QkjVw== X-Received: by 2002:a17:903:2450:b0:173:9fe:70e5 with SMTP id l16-20020a170903245000b0017309fe70e5mr1699920pls.148.1661481874186; Thu, 25 Aug 2022 19:44:34 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id jo16-20020a170903055000b00172b8e60019sm260720plb.249.2022.08.25.19.44.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:33 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 00/15] bpf: BPF specific memory allocator. Date: Thu, 25 Aug 2022 19:44:15 -0700 Message-Id: <20220826024430.84565-1-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481875; a=rsa-sha256; cv=none; b=lLV0VmGlacE3YrlAoBvKVSEZRBiKzS6R9JtbTvU1Ria9q/g4xCYjiOyMiaoRtx0HRNPmOP aVIItfZpxY5pYYMC2jVGIYeiLzRNP2BQs3Ne+sRP3hGafUSyMKGNCBtSN+Nd4f+82gidDK FjzqScFpQVUsrIRUqoZ0ikQz4iB706Q= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FenTtXfP; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481875; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=0Vfugx9ZeZmqLlCmMUr+dSSSNQt1I/QhIF695C1G7bI=; b=fvL5QzGDKVfJg2iN+ym9fmw99YwVmrEWMqmLEKUWEvOMZvrpRg6/pwrcJSclO2YyKH0nJL 0C3KlZJhcL5TF9wxrd5TmCmssQwhI8WDWxVeAl92zjakyBkxn5K7Eeli8IuLnz73uYg3Bs V/BPdHTv016846sLgeZQ0ijQzyPVbsc= X-Stat-Signature: xbp63agh4z8ob71wpkfiqhgb5hznp5hu X-Rspamd-Queue-Id: 496B4A001A X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FenTtXfP; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1661481875-652099 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Introduce any context BPF specific memory allocator. Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with per-cpu cache of free elements. Refill this cache asynchronously from irq_work. Major achievements enabled by bpf_mem_alloc: - Dynamically allocated hash maps used to be 10 times slower than fully preallocated. With bpf_mem_alloc and subsequent optimizations the speed of dynamic maps is equal to full prealloc. - Tracing bpf programs can use dynamically allocated hash maps. Potentially saving lots of memory. Typical hash map is sparsely populated. - Sleepable bpf programs can used dynamically allocated hash maps. v3->v4: - fix build issue due to missing local.h on 32-bit arch - add Kumar's ack - proposal for next steps from Delyan: https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com/ v2->v3: - Rewrote the free_list algorithm based on discussions with Kumar. Patch 1. - Allowed sleepable bpf progs use dynamically allocated maps. Patches 13 and 14. - Added sysctl to force bpf_mem_alloc in hash map even if pre-alloc is requested to reduce memory consumption. Patch 15. - Fix: zero-fill percpu allocation - Single rcu_barrier at the end instead of each cpu during bpf_mem_alloc destruction v2 thread: https://lore.kernel.org/bpf/20220817210419.95560-1-alexei.starovoitov@gmail.com/ v1->v2: - Moved unsafe direct call_rcu() from hash map into safe place inside bpf_mem_alloc. Patches 7 and 9. - Optimized atomic_inc/dec in hash map with percpu_counter. Patch 6. - Tuned watermarks per allocation size. Patch 8 - Adopted this approach to per-cpu allocation. Patch 10. - Fully converted hash map to bpf_mem_alloc. Patch 11. - Removed tracing prog restriction on map types. Combination of all patches and final patch 12. v1 thread: https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/ LWN article: https://lwn.net/Articles/899274/ Future work: - expose bpf_mem_alloc as uapi FD to be used in dynptr_alloc, kptr_alloc - convert lru map to bpf_mem_alloc Alexei Starovoitov (15): bpf: Introduce any context BPF specific memory allocator. bpf: Convert hash map to bpf_mem_alloc. selftests/bpf: Improve test coverage of test_maps samples/bpf: Reduce syscall overhead in map_perf_test. bpf: Relax the requirement to use preallocated hash maps in tracing progs. bpf: Optimize element count in non-preallocated hash map. bpf: Optimize call_rcu in non-preallocated hash map. bpf: Adjust low/high watermarks in bpf_mem_cache bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. bpf: Add percpu allocation support to bpf_mem_alloc. bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. bpf: Remove tracing program restriction on map types bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. bpf: Remove prealloc-only restriction for sleepable bpf programs. bpf: Introduce sysctl kernel.bpf_force_dyn_alloc. include/linux/bpf_mem_alloc.h | 26 + include/linux/filter.h | 2 + kernel/bpf/Makefile | 2 +- kernel/bpf/core.c | 2 + kernel/bpf/hashtab.c | 132 +++-- kernel/bpf/memalloc.c | 602 ++++++++++++++++++++++ kernel/bpf/syscall.c | 14 +- kernel/bpf/verifier.c | 52 -- samples/bpf/map_perf_test_kern.c | 44 +- samples/bpf/map_perf_test_user.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 - tools/testing/selftests/bpf/test_maps.c | 38 +- 12 files changed, 796 insertions(+), 131 deletions(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c Acked-by: Andrii Nakryiko