From patchwork Fri Sep 2 21:10:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DA2AECAAA1 for ; Fri, 2 Sep 2022 21:11:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C85CB8012D; Fri, 2 Sep 2022 17:11:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE6C680120; Fri, 2 Sep 2022 17:11:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A38C68012D; Fri, 2 Sep 2022 17:11:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8D7C480120 for ; Fri, 2 Sep 2022 17:11:03 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 61AA2160668 for ; Fri, 2 Sep 2022 21:11:03 +0000 (UTC) X-FDA: 79868390406.26.C4F1B1A Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 17CB180068 for ; Fri, 2 Sep 2022 21:11:02 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id jm11so2970629plb.13 for ; Fri, 02 Sep 2022 14:11:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date; bh=rUqV8Zl9dny7abGWvPgr5U9Ev8KPWsQsQOJ66moJHg0=; b=CC6hquSSdYuFFXbFxEcr0dqArILdI4qqplPTo2M5TWNeXFFN3daOyOjYxDjN5OsJNO oD1R5jSs1Jka4V/Vwjsn8ksF9lFj/AHqrLWIpYl3xUm0ustYR8hnTL5dDdiuTPczXHDm fUZifBW+hUMLBeN4itj0P4OawZegxwVjvjx3TYRqTfXUsfogEKdlKIXVq6AowyXH5HSE 9raSHO7Pa30g3PxYsVd61/lJAb091VdHjHG6M/OPeoPHlR/IhG4zKowPoqQLXmU0PmWV l0wcy99MmMQTV8FTGX0xrt2zrzlCw0tEESQaxkr6Z2bxeKzFlqX03H98CgRS3Jb3y9dM wCfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=rUqV8Zl9dny7abGWvPgr5U9Ev8KPWsQsQOJ66moJHg0=; b=KD4JqBt+Ydvp3feLhWI4JG2RNviJjcOykODehesxCEqGQQN3gLjT6u63NN0q1bDBUA YOSrdIS3YvWRjs83xtQliYVEdB2p1AxUCrsCGT2yT2PrErM8eG55FDXPnhu5qpRgddaL MWyWnqlsEkLYN5zdXmH7dfIf6Mo/sR90QeWE3bnDdHTlfvfbbErVy6enMGGr4AY3q6Oh wemxQy4Dsw+Vi1T9ENjQi3ChOXPDzOjXIIQzxI9WOt6WnH77za2cL1dowTxxd2Rvf9Os CPp0Y7c4a3tNDYisKUEhynvxwJgu13YghC8e/xAjsQsnwMK1kdo5WdbEjs0+xv9qKnGy 7mtg== X-Gm-Message-State: ACgBeo374Ng0nQQZxc3KK+wQRhyOREBcix/twH45phBl82gss74FIWc7 1pBWWZSJY1ZHo+fyLDX193s= X-Google-Smtp-Source: AA6agR48vy0phFbUYE/1jzl9Em+wciTotWfB9zReb6xfN9LWm6/7U7b1dVpV9AkdQZ0CD89PFA+J8A== X-Received: by 2002:a17:90b:1c12:b0:1fd:b28d:a98f with SMTP id oc18-20020a17090b1c1200b001fdb28da98fmr6688613pjb.24.1662153061918; Fri, 02 Sep 2022 14:11:01 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id f5-20020a170902684500b001708b189c4asm2047958pln.137.2022.09.02.14.11.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:01 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 00/16] bpf: BPF specific memory allocator. Date: Fri, 2 Sep 2022 14:10:42 -0700 Message-Id: <20220902211058.60789-1-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153063; a=rsa-sha256; cv=none; b=tmlyqfDa4i4sULeZYCYD/0teCTBEYCnOqD0Pk4XlhwuE1ZRno7IT0cNmFd6tmaEGFHsW8H +Vd5qcZPNkov4fRPa/4UK/P7/tfjGvA5ifuMNVpczIEJvlWtXAhH1IW90qLmZAu/uSPFdL eCSv036E8BoU2SOZakLolxkteojQMQw= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CC6hquSS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153063; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=rUqV8Zl9dny7abGWvPgr5U9Ev8KPWsQsQOJ66moJHg0=; b=gH6Pyj/9yvp0bkV3K8vMmMf3r6Vcpl+sg4Cep2MDO3NRtG/gGdgtmMmWEzkd6kobGjWrHQ pjtUJ4MFrgLJgbBVIP7GZuvqFZZoO/c2iv73i7A8OXerACbD4yPxE2UnKO7W5lz6FXNbyw vx5tE0wvJTL6OMecg8T7h4kt4+QLSss= X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 17CB180068 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=CC6hquSS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: ejdsdtir8mxy53gatosrtirx34rf7k94 X-HE-Tag: 1662153062-944821 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Introduce any context BPF specific memory allocator. Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with per-cpu cache of free elements. Refill this cache asynchronously from irq_work. Major achievements enabled by bpf_mem_alloc: - Dynamically allocated hash maps used to be 10 times slower than fully preallocated. With bpf_mem_alloc and subsequent optimizations the speed of dynamic maps is equal to full prealloc. - Tracing bpf programs can use dynamically allocated hash maps. Potentially saving lots of memory. Typical hash map is sparsely populated. - Sleepable bpf programs can used dynamically allocated hash maps. v5->v6: - Debugged the reason for selftests/bpf/test_maps ooming in a small VM that BPF CI is using. Added patch 16 that optimizes the usage of rcu_barrier-s between bpf_mem_alloc and hash map. It drastically improved the speed of htab destruction. v4->v5: - Fixed missing migrate_disable in hash tab free path (Daniel) - Replaced impossible "memory leak" with WARN_ON_ONCE (Martin) - Dropped sysctl kernel.bpf_force_dyn_alloc patch (Daniel) - Added Andrii's ack - Added new patch 15 that removes kmem_cache usage from bpf_mem_alloc. It saves memory, speeds up map create/destroy operations while maintains hash map update/delete performance. v3->v4: - fix build issue due to missing local.h on 32-bit arch - add Kumar's ack - proposal for next steps from Delyan: https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com/ v2->v3: - Rewrote the free_list algorithm based on discussions with Kumar. Patch 1. - Allowed sleepable bpf progs use dynamically allocated maps. Patches 13 and 14. - Added sysctl to force bpf_mem_alloc in hash map even if pre-alloc is requested to reduce memory consumption. Patch 15. - Fix: zero-fill percpu allocation - Single rcu_barrier at the end instead of each cpu during bpf_mem_alloc destruction v2 thread: https://lore.kernel.org/bpf/20220817210419.95560-1-alexei.starovoitov@gmail.com/ v1->v2: - Moved unsafe direct call_rcu() from hash map into safe place inside bpf_mem_alloc. Patches 7 and 9. - Optimized atomic_inc/dec in hash map with percpu_counter. Patch 6. - Tuned watermarks per allocation size. Patch 8 - Adopted this approach to per-cpu allocation. Patch 10. - Fully converted hash map to bpf_mem_alloc. Patch 11. - Removed tracing prog restriction on map types. Combination of all patches and final patch 12. v1 thread: https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/ LWN article: https://lwn.net/Articles/899274/ Future work: - expose bpf_mem_alloc as uapi FD to be used in dynptr_alloc, kptr_alloc - convert lru map to bpf_mem_alloc - further cleanup htab code. Example: htab_use_raw_lock can be removed. Alexei Starovoitov (16): bpf: Introduce any context BPF specific memory allocator. bpf: Convert hash map to bpf_mem_alloc. selftests/bpf: Improve test coverage of test_maps samples/bpf: Reduce syscall overhead in map_perf_test. bpf: Relax the requirement to use preallocated hash maps in tracing progs. bpf: Optimize element count in non-preallocated hash map. bpf: Optimize call_rcu in non-preallocated hash map. bpf: Adjust low/high watermarks in bpf_mem_cache bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. bpf: Add percpu allocation support to bpf_mem_alloc. bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. bpf: Remove tracing program restriction on map types bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. bpf: Remove prealloc-only restriction for sleepable bpf programs. bpf: Remove usage of kmem_cache from bpf_mem_cache. bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. include/linux/bpf_mem_alloc.h | 28 + kernel/bpf/Makefile | 2 +- kernel/bpf/hashtab.c | 138 +++-- kernel/bpf/memalloc.c | 634 ++++++++++++++++++++++ kernel/bpf/syscall.c | 5 +- kernel/bpf/verifier.c | 52 -- samples/bpf/map_perf_test_kern.c | 44 +- samples/bpf/map_perf_test_user.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 - tools/testing/selftests/bpf/test_maps.c | 38 +- 10 files changed, 820 insertions(+), 134 deletions(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c