From patchwork Fri Aug 26 02:44:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABAF6ECAAA2 for ; Fri, 26 Aug 2022 02:44:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39EC4940009; Thu, 25 Aug 2022 22:44:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3297E940007; Thu, 25 Aug 2022 22:44:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12C0D940009; Thu, 25 Aug 2022 22:44:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 00616940007 for ; Thu, 25 Aug 2022 22:44:39 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CFD9CA06FF for ; Fri, 26 Aug 2022 02:44:39 +0000 (UTC) X-FDA: 79840200678.23.D425E59 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf15.hostedemail.com (Postfix) with ESMTP id 880E3A0017 for ; Fri, 26 Aug 2022 02:44:39 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id t129so313463pfb.6 for ; Thu, 25 Aug 2022 19:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=jcFgPr+mMsQ6acfd4eMsN1X03M9ufA6xEtvBMBs1h04=; b=qpwzN9IbugKk7G5aAemjUQloI7Mv1oUQ7Vwohb4iP6bBm7FyMZ//A5oDEJK+sBWL4i dmUNC6e5DABWHTvgpAyaRYyVU+TW3O14lEXVcvPcElBL5WWmnVAxDNHLzL6bGIWWjNcy fPw+y8r1Fkbtth9F/U4B/CR5qN84PaDEZBXIZc50LXubPNOKsqu31k3ZK6HPcjO0M0o2 k/CxkiR+oaE+2Odp5HL1vBhlB4oDCkb2fB/e7xxxKEOqyEF300ALUzd4/+mM/hcfNV1S 7QChnvAeJUhiiPRAFmxTl4VJ2LgSOiNafBZPFLQeO8PChW0aTwJK6HBhMM20ZLNGpBAH bU3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=jcFgPr+mMsQ6acfd4eMsN1X03M9ufA6xEtvBMBs1h04=; b=RS0+i0x0p2/rn1HUWCmG5tbT9IRjXP9yQ2+YLKjozA+9vuoVuH3sMN4zjtrjRyA6uk aDSe707EwQFs8P1YoS28iVNaF3IhiDQmn5Oogplmy7DBgIZxTe9nI5tnSbXUTw9y/d8v fXRK5ukk+rpyC3z0nIlBC5VxXLpjvNUsGi7OfEVKcXV+h5jaFPin3223TzRgTAGOiuoj cRoO9Ji7BDOYsK0cbb3kpKfBXI47boUyWV+Ye53aou2nts/kxZ2Til3SJ2A4QXTZ9OD5 wb+6ljDqDtf5u/VHB8vF6AjiHRttxhGZxvgoAfEl18DssHQVUOl6H41BTxZKVSHaxrXq BYtA== X-Gm-Message-State: ACgBeo3jv8GL1fo80eIuzg09EF7Rtm3sS6NMkr3u6T1/U4zIf0SoXaG7 ziLGs79Q/mnSowCnJlqD2UI= X-Google-Smtp-Source: AA6agR7pluzdOxkEC9WVaejVaOW0SWNzu0pZiZQBWadC6rYCe08SHpErDTNEPgpM/+PrwcbJWg33GA== X-Received: by 2002:a65:4644:0:b0:42a:dfb6:4e80 with SMTP id k4-20020a654644000000b0042adfb64e80mr1588492pgr.262.1661481878254; Thu, 25 Aug 2022 19:44:38 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id n6-20020a170902e54600b0016dbb878f8asm282536plf.82.2022.08.25.19.44.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:37 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 01/15] bpf: Introduce any context BPF specific memory allocator. Date: Thu, 25 Aug 2022 19:44:16 -0700 Message-Id: <20220826024430.84565-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jcFgPr+mMsQ6acfd4eMsN1X03M9ufA6xEtvBMBs1h04=; b=RUh7gssx69g2TyEvJ5UCsAwY1Pz4cAkThC748A6UmUCNGkSysipgahQTKUO/IpyuKsCtbZ 7JFqh2wn1+DPNdDx1qcpVRVUXO4QMlqrKM12ur1yZc6jS795TL2gGlwgUuIhJOchGbZnkG QJJjj4upK8olL09PKwluBL5N62ySJyA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qpwzN9Ib; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481879; a=rsa-sha256; cv=none; b=ajqcrEBPoAhxYnZMiQSnKLXnuUBH1z+TalllywS23+k4fbWdJwVMAYSPi/+GtRvhAN6IoR PBHeUwd+okJ9FsJpGMbDNif3esItuVw98pFVImjo6SGvwpdg8m6dYQ2ZY97WHiI9JGGi4v a6O79+6iyx6HWy+sbTesg4yIFgB3OKQ= Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qpwzN9Ib; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 880E3A0017 X-Stat-Signature: 7t3o4pxtb7kujfdbwq6g4acod5k6ppji X-Rspam-User: X-HE-Tag: 1661481879-549820 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with minimal per-cpu cache of free elements. Refill this cache asynchronously from irq_work. BPF programs always run with migration disabled. It's safe to allocate from cache of the current cpu with irqs disabled. Free-ing is always done into bucket of the current cpu as well. irq_work trims extra free elements from buckets with kfree and refills them with kmalloc, so global kmalloc logic takes care of freeing objects allocated by one cpu and freed on another. struct bpf_mem_alloc supports two modes: - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. This is typical bpf hash map use case when all elements have equal size. - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on kmalloc/kfree. Max allocation size is 4096 in this case. This is bpf_dynptr and bpf_kptr use case. bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 26 ++ kernel/bpf/Makefile | 2 +- kernel/bpf/memalloc.c | 476 ++++++++++++++++++++++++++++++++++ 3 files changed, 503 insertions(+), 1 deletion(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h new file mode 100644 index 000000000000..804733070f8d --- /dev/null +++ b/include/linux/bpf_mem_alloc.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#ifndef _BPF_MEM_ALLOC_H +#define _BPF_MEM_ALLOC_H +#include + +struct bpf_mem_cache; +struct bpf_mem_caches; + +struct bpf_mem_alloc { + struct bpf_mem_caches __percpu *caches; + struct bpf_mem_cache __percpu *cache; +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); + +/* kmalloc/kfree equivalent: */ +void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); +void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); + +/* kmem_cache_alloc/free equivalent: */ +void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); +void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); + +#endif /* _BPF_MEM_ALLOC_H */ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 00e05b69a4df..341c94f208f4 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c new file mode 100644 index 000000000000..29f340016e9e --- /dev/null +++ b/kernel/bpf/memalloc.c @@ -0,0 +1,476 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include +#include +#include + +/* Any context (including NMI) BPF specific memory allocator. + * + * Tracing BPF programs can attach to kprobe and fentry. Hence they + * run in unknown context where calling plain kmalloc() might not be safe. + * + * Front-end kmalloc() with per-cpu per-bucket cache of free elements. + * Refill this cache asynchronously from irq_work. + * + * CPU_0 buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * ... + * CPU_N buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * + * The buckets are prefilled at the start. + * BPF programs always run with migration disabled. + * It's safe to allocate from cache of the current cpu with irqs disabled. + * Free-ing is always done into bucket of the current cpu as well. + * irq_work trims extra free elements from buckets with kfree + * and refills them with kmalloc, so global kmalloc logic takes care + * of freeing objects allocated by one cpu and freed on another. + * + * Every allocated objected is padded with extra 8 bytes that contains + * struct llist_node. + */ +#define LLIST_NODE_SZ sizeof(struct llist_node) + +/* similar to kmalloc, but sizeof == 8 bucket is gone */ +static u8 size_index[24] __ro_after_init = { + 3, /* 8 */ + 3, /* 16 */ + 4, /* 24 */ + 4, /* 32 */ + 5, /* 40 */ + 5, /* 48 */ + 5, /* 56 */ + 5, /* 64 */ + 1, /* 72 */ + 1, /* 80 */ + 1, /* 88 */ + 1, /* 96 */ + 6, /* 104 */ + 6, /* 112 */ + 6, /* 120 */ + 6, /* 128 */ + 2, /* 136 */ + 2, /* 144 */ + 2, /* 152 */ + 2, /* 160 */ + 2, /* 168 */ + 2, /* 176 */ + 2, /* 184 */ + 2 /* 192 */ +}; + +static int bpf_mem_cache_idx(size_t size) +{ + if (!size || size > 4096) + return -1; + + if (size <= 192) + return size_index[(size - 1) / 8] - 1; + + return fls(size - 1) - 1; +} + +#define NUM_CACHES 11 + +struct bpf_mem_cache { + /* per-cpu list of free objects of size 'unit_size'. + * All accesses are done with interrupts disabled and 'active' counter + * protection with __llist_add() and __llist_del_first(). + */ + struct llist_head free_llist; + local_t active; + + /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill + * are sequenced by per-cpu 'active' counter. But unit_free() cannot + * fail. When 'active' is busy the unit_free() will add an object to + * free_llist_extra. + */ + struct llist_head free_llist_extra; + + /* kmem_cache != NULL when bpf_mem_alloc was created for specific + * element size. + */ + struct kmem_cache *kmem_cache; + struct irq_work refill_work; + struct obj_cgroup *objcg; + int unit_size; + /* count of objects in free_llist */ + int free_cnt; +}; + +struct bpf_mem_caches { + struct bpf_mem_cache cache[NUM_CACHES]; +}; + +static struct llist_node notrace *__llist_del_first(struct llist_head *head) +{ + struct llist_node *entry, *next; + + entry = head->first; + if (!entry) + return NULL; + next = entry->next; + head->first = next; + return entry; +} + +#define BATCH 48 +#define LOW_WATERMARK 32 +#define HIGH_WATERMARK 96 +/* Assuming the average number of elements per bucket is 64, when all buckets + * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + + * 64*4096*32 ~ 20Mbyte + */ + +static void *__alloc(struct bpf_mem_cache *c, int node) +{ + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + + if (c->kmem_cache) + return kmem_cache_alloc_node(c->kmem_cache, flags, node); + + return kmalloc_node(c->unit_size, flags, node); +} + +static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) +{ +#ifdef CONFIG_MEMCG_KMEM + if (c->objcg) + return get_mem_cgroup_from_objcg(c->objcg); +#endif + +#ifdef CONFIG_MEMCG + return root_mem_cgroup; +#else + return NULL; +#endif +} + +/* Mostly runs from irq_work except __init phase. */ +static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) +{ + struct mem_cgroup *memcg = NULL, *old_memcg; + unsigned long flags; + void *obj; + int i; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (i = 0; i < cnt; i++) { + obj = __alloc(c, node); + if (!obj) + break; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + /* In RT irq_work runs in per-cpu kthread, so disable + * interrupts to avoid preemption and interrupts and + * reduce the chance of bpf prog executing on this cpu + * when active counter is busy. + */ + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + __llist_add(obj, &c->free_llist); + c->free_cnt++; + } + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + } + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); +} + +static void free_one(struct bpf_mem_cache *c, void *obj) +{ + if (c->kmem_cache) + kmem_cache_free(c->kmem_cache, obj); + else + kfree(obj); +} + +static void free_bulk(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + unsigned long flags; + int cnt; + + do { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + else + cnt = 0; + } + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + free_one(c, llnode); + } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + + /* and drain free_llist_extra */ + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +static void bpf_mem_refill(struct irq_work *work) +{ + struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); + int cnt; + + /* Racy access to free_cnt. It doesn't need to be 100% accurate */ + cnt = c->free_cnt; + if (cnt < LOW_WATERMARK) + /* irq_work runs on this cpu and kmalloc will allocate + * from the current numa node which is what we want here. + */ + alloc_bulk(c, BATCH, NUMA_NO_NODE); + else if (cnt > HIGH_WATERMARK) + free_bulk(c); +} + +static void notrace irq_work_raise(struct bpf_mem_cache *c) +{ + irq_work_queue(&c->refill_work); +} + +static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) +{ + init_irq_work(&c->refill_work, bpf_mem_refill); + /* To avoid consuming memory assume that 1st run of bpf + * prog won't be doing more than 4 map_update_elem from + * irq disabled region + */ + alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); +} + +/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. + * This is typical bpf hash map use case when all elements have equal size. + * + * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on + * kmalloc/kfree. Max allocation size is 4096 in this case. + * This is bpf_dynptr and bpf_kptr use case. + */ +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +{ + static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; + struct bpf_mem_caches *cc, __percpu *pcc; + struct bpf_mem_cache *c, __percpu *pc; + struct kmem_cache *kmem_cache; + struct obj_cgroup *objcg = NULL; + char buf[32]; + int cpu, i; + + if (size) { + pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); + if (!pc) + return -ENOMEM; + size += LLIST_NODE_SZ; /* room for llist_node */ + snprintf(buf, sizeof(buf), "bpf-%u", size); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + if (!kmem_cache) { + free_percpu(pc); + return -ENOMEM; + } +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(pc, cpu); + c->kmem_cache = kmem_cache; + c->unit_size = size; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + ma->cache = pc; + return 0; + } + + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); + if (!pcc) + return -ENOMEM; +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(pcc, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + c->unit_size = sizes[i]; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + } + ma->caches = pcc; + return 0; +} + +static void drain_mem_cache(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) +{ + struct bpf_mem_caches *cc; + struct bpf_mem_cache *c; + int cpu, i; + + if (ma->cache) { + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(ma->cache, cpu); + drain_mem_cache(c); + } + /* kmem_cache and memcg are the same across cpus */ + kmem_cache_destroy(c->kmem_cache); + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->cache); + ma->cache = NULL; + } + if (ma->caches) { + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(ma->caches, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + drain_mem_cache(c); + } + } + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->caches); + ma->caches = NULL; + } +} + +/* notrace is necessary here and in other functions to make sure + * bpf programs cannot attach to them and cause llist corruptions. + */ +static void notrace *unit_alloc(struct bpf_mem_cache *c) +{ + struct llist_node *llnode = NULL; + unsigned long flags; + int cnt = 0; + + /* Disable irqs to prevent the following race for majority of prog types: + * prog_A + * bpf_mem_alloc + * preemption or irq -> prog_B + * bpf_mem_alloc + * + * but prog_B could be a perf_event NMI prog. + * Use per-cpu 'active' counter to order free_list access between + * unit_alloc/unit_free/bpf_mem_refill. + */ + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + } + local_dec(&c->active); + local_irq_restore(flags); + + WARN_ON(cnt < 0); + + if (cnt < LOW_WATERMARK) + irq_work_raise(c); + return llnode; +} + +/* Though 'ptr' object could have been allocated on a different cpu + * add it to the free_llist of the current cpu. + * Let kfree() logic deal with it when it's later called from irq_work. + */ +static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + unsigned long flags; + int cnt = 0; + + BUILD_BUG_ON(LLIST_NODE_SZ > 8); + + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + __llist_add(llnode, &c->free_llist); + cnt = ++c->free_cnt; + } else { + /* unit_free() cannot fail. Therefore add an object to atomic + * llist. free_bulk() will drain it. Though free_llist_extra is + * a per-cpu list we have to use atomic llist_add here, since + * it also can be interrupted by bpf nmi prog that does another + * unit_free() into the same free_llist_extra. + */ + llist_add(llnode, &c->free_llist_extra); + } + local_dec(&c->active); + local_irq_restore(flags); + + if (cnt > HIGH_WATERMARK) + /* free few objects from current cpu into global kmalloc pool */ + irq_work_raise(c); +} + +/* Called from BPF program or from sys_bpf syscall. + * In both cases migration is disabled. + */ +void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size) +{ + int idx; + void *ret; + + if (!size) + return ZERO_SIZE_PTR; + + idx = bpf_mem_cache_idx(size + LLIST_NODE_SZ); + if (idx < 0) + return NULL; + + ret = unit_alloc(this_cpu_ptr(ma->caches)->cache + idx); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) +{ + int idx; + + if (!ptr) + return; + + idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ)); + if (idx < 0) + return; + + unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); +} + +void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) +{ + void *ret; + + ret = unit_alloc(this_cpu_ptr(ma->cache)); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) +{ + if (!ptr) + return; + + unit_free(this_cpu_ptr(ma->cache), ptr); +} From patchwork Fri Aug 26 02:44:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955494 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61263ECAAA2 for ; Fri, 26 Aug 2022 02:44:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9F2494000A; Thu, 25 Aug 2022 22:44:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4EA8940007; Thu, 25 Aug 2022 22:44:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA22094000A; Thu, 25 Aug 2022 22:44:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9A56C940007 for ; Thu, 25 Aug 2022 22:44:43 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 78665A070B for ; Fri, 26 Aug 2022 02:44:43 +0000 (UTC) X-FDA: 79840200846.22.BB30A5D Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf31.hostedemail.com (Postfix) with ESMTP id 0E2DD20015 for ; Fri, 26 Aug 2022 02:44:42 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id y141so310842pfb.7 for ; Thu, 25 Aug 2022 19:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=+2AV05bdm8xkQpoMCJUPIxLKxzKU4IZaOb+/PeeE9PY=; b=mApOc90L16vuczykhMzLUoCjA8f2iPH3BzfoU0N5IE+LdM22Dc0Gf3KYBlCU1rdV8A LyFIojC2pyWfiab+IlwL6FkedJFe2QC8LQ5EelGv2vuIUXI6Gc7WGrFn6mmIyl3WEoMr wE50BS5y1Y/5o57WY+RKrjyorT6tYqi42m4+02csRgbkfPa85LYhla4h2b4vFdnogwRY qFcuUhYAgrK79nXrrjnb1FtkxC6YMwvwuN0XIK2vVWGATqBFVzrmjLuM5xECbB3Tu1e0 iFQK0JSMJaJoAgdGO/FdcCI/GKXlXQZTv6uP8JNQ0k3q+JBYW4oXHsxSnZnygMBzAzBZ XOVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=+2AV05bdm8xkQpoMCJUPIxLKxzKU4IZaOb+/PeeE9PY=; b=fiKwNAGMBkceCbmjyWqgYTRXDNftQZEl42Itd8VvHI/VyfiZK1Ugu/mObCzuQKjtVB /QuwXgetD0y5niz7RGAQlRst0f7Gcr6aJ8zOS6bXtHhC5XHeyB6MX47xCwJ/g/LIBjSL 5AZzF3zpDw2l7qbT+Sf10OFsfV1Iby+vdzJfQMfD6eR23zAIuo5kJYVWjhRYsy/VZAQ3 FUX7yCZN/79zu1ut/RXTQz1pnJtndIYOsrsGjniXGe6MJml5mDPPTNADbFF7wN1Eqq1p AhuwWKeeyJcSfKlfW0luIgScyQr+RCVqOao1xBP9hi9kGmQgvkUf8g2yCBNJFQTZNF76 Ay6g== X-Gm-Message-State: ACgBeo3GljyNsmxpFQwl9qDJZq0hUABdgSpPVV7QSoQ7q+ugNoQ5l2Ah /Q9gO7sE6fkgns9U+UyCjY0= X-Google-Smtp-Source: AA6agR50gs0JWpOYm9NYHwfXNq0iozZ2+2uX/n9WINT9+ttn6xBahC1wKowr+Ulul7NEIVpMqwyFcA== X-Received: by 2002:a63:491f:0:b0:41d:89d5:b3e7 with SMTP id w31-20020a63491f000000b0041d89d5b3e7mr1588144pga.18.1661481882140; Thu, 25 Aug 2022 19:44:42 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id 201-20020a6217d2000000b0052d50e14f1dsm365800pfx.78.2022.08.25.19.44.40 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:41 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc. Date: Thu, 25 Aug 2022 19:44:17 -0700 Message-Id: <20220826024430.84565-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481883; a=rsa-sha256; cv=none; b=H+B2oSuYMNvKaA2VuM5vbqvdgDAzQm9A9JpAsWs0wA8wEOyb5sX2i2gsMbCw30w8GMUDZH ArOGCI06l6hMVUDCM4bQNhk6oTq25B2uDJFjd+aIdY+pdEBAfmuG6fgargx+AOnWExYRni V9jH4bJZ/wCSj97hZjEBIcS6R3zwz24= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mApOc90L; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481883; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+2AV05bdm8xkQpoMCJUPIxLKxzKU4IZaOb+/PeeE9PY=; b=ri3rIBvSev0AtMX0A65OmWQo/Z0gZ8q4kYAAoGQ5l3pBbgREi4zNydkQ/cnpbyk6/cDmes 3omO4j4GBVq80Mt5kKYJJG3eZwG5jaUfq+OXHqnyjd5bXnfKLBBSHGRNFIjyCd4Z/sUH8e tId3GfILqYa0zu82dgUNFE7dXZmAFjY= X-Stat-Signature: 86rio4jwfietau1a8etfxo9rk7e78rdm X-Rspamd-Queue-Id: 0E2DD20015 X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mApOc90L; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-HE-Tag: 1661481882-431143 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert bpf hash map to use bpf memory allocator. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index b301a63afa2f..bd23c8830d49 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,6 +14,7 @@ #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -92,6 +93,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; + struct bpf_mem_alloc ma; struct bucket *buckets; void *elems; union { @@ -563,6 +565,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (err) goto free_prealloc; } + } else { + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + if (err) + goto free_map_locked; } return &htab->map; @@ -573,6 +579,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); @@ -849,7 +856,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); check_and_free_fields(htab, l); - kfree(l); + bpf_mem_cache_free(&htab->ma, l); } static void htab_elem_free_rcu(struct rcu_head *head) @@ -973,9 +980,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = ERR_PTR(-E2BIG); goto dec_count; } - l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, - GFP_NOWAIT | __GFP_NOWARN, - htab->map.numa_node); + l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); goto dec_count; @@ -994,7 +999,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, pptr = bpf_map_alloc_percpu(&htab->map, size, 8, GFP_NOWAIT | __GFP_NOWARN); if (!pptr) { - kfree(l_new); + bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } @@ -1489,6 +1494,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Fri Aug 26 02:44:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 535F4ECAAA2 for ; Fri, 26 Aug 2022 02:44:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BA8394000B; Thu, 25 Aug 2022 22:44:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96969940007; Thu, 25 Aug 2022 22:44:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BD3894000B; Thu, 25 Aug 2022 22:44:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6DE0E940007 for ; Thu, 25 Aug 2022 22:44:47 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2ED7FC061F for ; Fri, 26 Aug 2022 02:44:47 +0000 (UTC) X-FDA: 79840201014.04.0DAE4C2 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf21.hostedemail.com (Postfix) with ESMTP id D6FA01C000C for ; Fri, 26 Aug 2022 02:44:46 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id v23so432278plo.9 for ; Thu, 25 Aug 2022 19:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=hvVDteQ6MmazqwXbIHyqt6el95iRuYgeBC1SCE7DUYk=; b=M8K7ILJxVqeQg1MNWt9822zwi7fMIWuDdpVgOocafAiiLoLbcfstRLY0MLeFUhQRIT L4p+HUSDVhaEu3gqQ8OWhW9R1dlQaj+8k7kfi7wmYoByiBdiiPvOAwBa47f+/2uFiMB1 len7kLW/SS1nsWlvRC7EPKReBAe4UhWO28hFV96Ll8EnCfGzfWW7zeoC0eWKqeNRLbWf Sj2A8W1VBupzMDIXFuW1XLsqnSdIE9t3bcKF/JecI9dEiOE1wkfDo+YnTSJHfmXn/MR5 piSeRlV5/uWMAzeR9acxkNJlo7VjNYvkWR93krewmRZ+/8d/v0Qi/biTlWvl4zDxiew9 VZng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=hvVDteQ6MmazqwXbIHyqt6el95iRuYgeBC1SCE7DUYk=; b=U5csVwHr6jLbh7AmcTrHeu4gRgGc0Vo2D1lkU5yvngWEmKemaqXqJIjMvF3Z6E/qIX fEHIZfjLQlt5PQE6oecmIal0KF6J3XGf1t8bWltdhBZZCEgRowM/5o2pdEvjaeHhvO8e QbdwEipFeOGtwrQQEaQSW/cE1Kt5tOrAZnKBud2fIQwqmmXKnjObHTX4rZI2o4pzwkIz G6SFhc/vV5rdJ9QofcjW/lSmglWDAUzlFDq/uNqfChkXnqR5mzf+YAKgEVmy0w9Ob0Db /cRaa5UUxi4T07Q+Aqsexa8VT/ifDh5lzS4YhEjKG3ZBtsTXFdx6diGGrvG3750tOlnm s1BA== X-Gm-Message-State: ACgBeo3su9yKhpxnl+OFcoDtvXvV9yM599YcwdM6YCmSUf46c8v4Bqtl zMSzJ8nHGs7erd+VLNi8FCk= X-Google-Smtp-Source: AA6agR6z2bSWVECy8ZaxZXZnjCl/5iGEY2CA3BNQSzUPMZae9sItw74m+5WO67D7+KCzP5ecC+N2sQ== X-Received: by 2002:a17:902:7b87:b0:172:8ae9:2015 with SMTP id w7-20020a1709027b8700b001728ae92015mr1885670pll.112.1661481885939; Thu, 25 Aug 2022 19:44:45 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id s3-20020a170902ea0300b0016f02fceff4sm290150plg.57.2022.08.25.19.44.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:45 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Date: Thu, 25 Aug 2022 19:44:18 -0700 Message-Id: <20220826024430.84565-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481886; a=rsa-sha256; cv=none; b=g1dh/hBuJ/wBe+lnCKaJ2wKxrbNUmP1262MYdVzTuRd6da2MlCmJUH1oAp69TGFFVZfvH+ Xb8MjnbyOINkTqBlRiQV7NFY4VkAdrB/ns/nE8UZuIRkVela2Uh7IGOulNeVh2IEhuBkCC 20qJR5qlg8eLOsfp7WLK2xWXei1wcvw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=M8K7ILJx; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481886; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hvVDteQ6MmazqwXbIHyqt6el95iRuYgeBC1SCE7DUYk=; b=CnoAQasqTRAU3zc/Xq0Puze2D9q+3L+Rt4LuFZ4JNbplheDFeP3AgMCr3iyMoL2WtYwivJ fy9+h0t5ODMaQJmmge+7cwStWwEYdup6pKB5bYW+OMQDrx8XiLdjcOaei7czYmK3hfzhP8 X0Xkk0h9U8VvbTCmdjvJllOxn90BTxI= X-Stat-Signature: 6grpj36xckcoiakokbh47o1th8zc3oyo X-Rspamd-Queue-Id: D6FA01C000C X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=M8K7ILJx; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-HE-Tag: 1661481886-971630 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_maps.c | 38 ++++++++++++++++--------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index cbebfaa7c1e8..d1ffc76814d9 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -264,10 +264,11 @@ static void test_hashmap_percpu(unsigned int task, void *data) close(fd); } +#define VALUE_SIZE 3 static int helper_fill_hashmap(int max_entries) { int i, fd, ret; - long long key, value; + long long key, value[VALUE_SIZE] = {}; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), max_entries, &map_opts); @@ -276,8 +277,8 @@ static int helper_fill_hashmap(int max_entries) "err: %s, flags: 0x%x\n", strerror(errno), map_opts.map_flags); for (i = 0; i < max_entries; i++) { - key = i; value = key; - ret = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + key = i; value[0] = key; + ret = bpf_map_update_elem(fd, &key, value, BPF_NOEXIST); CHECK(ret != 0, "can't update hashmap", "err: %s\n", strerror(ret)); @@ -288,8 +289,8 @@ static int helper_fill_hashmap(int max_entries) static void test_hashmap_walk(unsigned int task, void *data) { - int fd, i, max_entries = 1000; - long long key, value, next_key; + int fd, i, max_entries = 10000; + long long key, value[VALUE_SIZE], next_key; bool next_key_valid = true; fd = helper_fill_hashmap(max_entries); @@ -297,7 +298,7 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); } assert(i == max_entries); @@ -305,9 +306,9 @@ static void test_hashmap_walk(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, NULL, &key) == 0); for (i = 0; next_key_valid; i++) { next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - value++; - assert(bpf_map_update_elem(fd, &key, &value, BPF_EXIST) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + value[0]++; + assert(bpf_map_update_elem(fd, &key, value, BPF_EXIST) == 0); key = next_key; } @@ -316,8 +317,8 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - assert(value - 1 == key); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + assert(value[0] - 1 == key); } assert(i == max_entries); @@ -1371,16 +1372,16 @@ static void __run_parallel(unsigned int tasks, static void test_map_stress(void) { + run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_hashmap, NULL); run_parallel(100, test_hashmap_percpu, NULL); run_parallel(100, test_hashmap_sizes, NULL); - run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_arraymap, NULL); run_parallel(100, test_arraymap_percpu, NULL); } -#define TASKS 1024 +#define TASKS 100 #define DO_UPDATE 1 #define DO_DELETE 0 @@ -1432,6 +1433,8 @@ static void test_update_delete(unsigned int fn, void *data) int fd = ((int *)data)[0]; int i, key, value, err; + if (fn & 1) + test_hashmap_walk(fn, NULL); for (i = fn; i < MAP_SIZE; i += TASKS) { key = value = i; @@ -1455,7 +1458,7 @@ static void test_update_delete(unsigned int fn, void *data) static void test_map_parallel(void) { - int i, fd, key = 0, value = 0; + int i, fd, key = 0, value = 0, j = 0; int data[2]; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), @@ -1466,6 +1469,7 @@ static void test_map_parallel(void) exit(1); } +again: /* Use the same fd in children to add elements to this map: * child_0 adds key=0, key=1024, key=2048, ... * child_1 adds key=1, key=1025, key=2049, ... @@ -1502,6 +1506,12 @@ static void test_map_parallel(void) key = -1; assert(bpf_map_get_next_key(fd, NULL, &key) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &key) < 0 && errno == ENOENT); + + key = 0; + bpf_map_delete_elem(fd, &key); + if (j++ < 5) + goto again; + close(fd); } static void test_map_rdonly(void) From patchwork Fri Aug 26 02:44:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955496 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDC3BECAAD1 for ; Fri, 26 Aug 2022 02:44:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54C36940007; Thu, 25 Aug 2022 22:44:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FBFD80007; Thu, 25 Aug 2022 22:44:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39DA894000C; Thu, 25 Aug 2022 22:44:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2BDEF940007 for ; Thu, 25 Aug 2022 22:44:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F33664049E for ; Fri, 26 Aug 2022 02:44:50 +0000 (UTC) X-FDA: 79840201140.20.E06A09B Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf15.hostedemail.com (Postfix) with ESMTP id 9F032A0017 for ; Fri, 26 Aug 2022 02:44:50 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id 72so306838pfx.9 for ; Thu, 25 Aug 2022 19:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=OsoTVXPQ6+PDC/ic9azQGTBktlzZtn9DYA063dTVrZc=; b=luqFTwYi12weGzQ+UoeVPQSf1nhScJ0NkKyeKThJurowbjiXXy0lv7aSHQe1cfmt1P xsFeT3FWfP1wL7fse38yym5SfqWIyh4hvNqR87ivgRfx/grhUyjD0ldYpBZlAlW7WhtZ r2+NMugZ/qMUoim6eLe2te13jAU5IfUIQS4Y+2NtU9S6Wz8NunrN4afxGSOrBeKppwmH JNU9jyU/KITi1nu7v6ub/xagl13MF8SkYPfqBpdy5+8lJ0COEAHPg2RZMVZXYkWhV/EE zyJl+J0sg+2B70hPvRh9od7DUku2fa69Kr3RJlHmJ0q/Q+yc8K5AGeqMNgQc6alRUODH 47Vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=OsoTVXPQ6+PDC/ic9azQGTBktlzZtn9DYA063dTVrZc=; b=dp952mUndZjCC27B5Gd2suaF7XL7dnnx4VBfBDoRUdN8uGGHU4LRILAXDiYgP4a395 grnpN+/dnTqA1XkG4WdgscldOAe8BUUQF6cfhwACuenmSGiyjPXOuMETX3xzf7vDaUCN U/5s/7XgFcghH5XaNb2POgK7nR5V5UGH/oDt2KpPDGSW99XtxLnd/fxnBKpjeA3wlaJ+ L2QzRn6KS2XyIEtXz6SO9X2xT/4WXGMjk9cJspLniFMV0Th/6Fn+Dr5WJto80XwvYJ8a 2tNFKxjwKhlwcLrSLLorSGIlkJoA1/OahVMESJ/LHbmSf1+ACkSXeo78RPKMkZmL0/4Y PbdQ== X-Gm-Message-State: ACgBeo2x5/EkdzQV9Cdm7kJ5GRAjYMXW9/h3njB5l0VC8StPCLJNNing Ammrcbb11A52onzIlFF3f+0= X-Google-Smtp-Source: AA6agR6kavHPljuwulMUHsrGVJHdEQX4+1VJhqZ3+3NTWH7ASxAm2WjdfuseM2tmAtrAW00wL8lWGg== X-Received: by 2002:a63:ff09:0:b0:42a:59ee:1775 with SMTP id k9-20020a63ff09000000b0042a59ee1775mr1563462pgi.85.1661481889744; Thu, 25 Aug 2022 19:44:49 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id b5-20020a656685000000b00429c5270710sm312596pgw.1.2022.08.25.19.44.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:49 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test. Date: Thu, 25 Aug 2022 19:44:19 -0700 Message-Id: <20220826024430.84565-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481890; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OsoTVXPQ6+PDC/ic9azQGTBktlzZtn9DYA063dTVrZc=; b=w1J+BVa3wCCsKAmd3dd56MFMs/bGqvBGpaQlU9K68K3JQsu868TekseWzp+fgrR1XvGbiE c+ByVCqNWtIkLNIF9afZWFc/ylvDr5YsGjZtCN1Lk4A3uuTNqSkuxeJTQNR49gHx+DUscL ukRfVYON+JzOtqNUGb5MNvOYsPKh1A4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=luqFTwYi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481890; a=rsa-sha256; cv=none; b=GF6aLzh0Zn9nLFeTLmtBW2QLepHAS5aiwtY0G7Ix9WqWJJNuBnh1E0wtnr5mZLdodmw/WD FrTG05KVUqWmRERVHUdHuxq3MO/bl9fnD4fW5I0TZir4f/ZixLhRDl4KrVbYzgCrJEuC9h iN6uDoyh4b2Erasz/XuxgJ7LhE03X18= Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=luqFTwYi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9F032A0017 X-Stat-Signature: omttn8fgic3hnybqt8ppnhhjjmpnnmcq X-Rspam-User: X-HE-Tag: 1661481890-680764 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make map_perf_test for preallocated and non-preallocated hash map spend more time inside bpf program to focus performance analysis on the speed of update/lookup/delete operations performed by bpf program. It makes 'perf report' of bpf_mem_alloc look like: 11.76% map_perf_test [k] _raw_spin_lock_irqsave 11.26% map_perf_test [k] htab_map_update_elem 9.70% map_perf_test [k] _raw_spin_lock 9.47% map_perf_test [k] htab_map_delete_elem 8.57% map_perf_test [k] memcpy_erms 5.58% map_perf_test [k] alloc_htab_elem 4.09% map_perf_test [k] __htab_map_lookup_elem 3.44% map_perf_test [k] syscall_exit_to_user_mode 3.13% map_perf_test [k] lookup_nulls_elem_raw 3.05% map_perf_test [k] migrate_enable 3.04% map_perf_test [k] memcmp 2.67% map_perf_test [k] unit_free 2.39% map_perf_test [k] lookup_elem_raw Reduce default iteration count as well to make 'map_perf_test' quick enough even on debug kernels. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- samples/bpf/map_perf_test_kern.c | 44 ++++++++++++++++++++------------ samples/bpf/map_perf_test_user.c | 2 +- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 8773f22b6a98..7342c5b2f278 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -108,11 +108,14 @@ int stress_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map, &key); - if (value) - bpf_map_delete_elem(&hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map, &key); + if (value) + bpf_map_delete_elem(&hash_map, &key); + } return 0; } @@ -123,11 +126,14 @@ int stress_percpu_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map, &key); + } return 0; } @@ -137,11 +143,14 @@ int stress_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&hash_map_alloc, &key); + } return 0; } @@ -151,11 +160,14 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + } return 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index b6fc174ab1f2..1bb53f4b29e1 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -72,7 +72,7 @@ static int test_flags = ~0; static uint32_t num_map_entries; static uint32_t inner_lru_hash_size; static int lru_hash_lookup_test_entries = 32; -static uint32_t max_cnt = 1000000; +static uint32_t max_cnt = 10000; static int check_test_flags(enum test_type t) { From patchwork Fri Aug 26 02:44:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF2F9ECAAA3 for ; Fri, 26 Aug 2022 02:44:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FAA180008; Thu, 25 Aug 2022 22:44:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A98780007; Thu, 25 Aug 2022 22:44:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FCE980008; Thu, 25 Aug 2022 22:44:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3D52980007 for ; Thu, 25 Aug 2022 22:44:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 19472A06C8 for ; Fri, 26 Aug 2022 02:44:55 +0000 (UTC) X-FDA: 79840201350.28.E1B8E86 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf06.hostedemail.com (Postfix) with ESMTP id BE83418000D for ; Fri, 26 Aug 2022 02:44:54 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id t11-20020a17090a510b00b001fac77e9d1fso6782066pjh.5 for ; Thu, 25 Aug 2022 19:44:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=3mqXakE7twO5rpPKR4kaQK7GjmKBp7UOh97wf3rlLLs=; b=Ts1ypdy+1bsDA841Ckzl1jFQs5Ddjh9LsLbOtpSMZAikE1vFQWCk6y8dDrk+WRnEJs ulUmiCSq/5l8XtlF24ltS4sju6xZs/vPhDxsc0yJS5UmlqJ/lMdHXjoExMcqj3LZbb8g 01MiFGQD/UEGxc22NjIWUnGH3HLTyiJOKl97ggTWdENDYGd+PRXQbNuHMlWbHvWxHLFW sz9eq4usJdVrS/bslLT4UeZ1IcUmFFYxVvhsEMwqr5ZKnqaY08m+06DnAKod2sah0h4v p8QlC7yg244hKgk8rUFgNDubkZCk8i4rWVMVHGD4tkJtWagfbZUa1zicg5C7m2MAJLhU iNBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=3mqXakE7twO5rpPKR4kaQK7GjmKBp7UOh97wf3rlLLs=; b=n2nQFXNSv5JuJL13SoC7xHETT1R5DFjQmpsPEQqELf8ug2clW0koMA05XVodYYWQWY tc5sv2ygKa6Gxz+8TZG5EhLxUpwqS9NdZgR0ddLPulBDr2tpc78c04qBOsb6MUZBD+Po 5g62r6YZnGyUMNEuywnUoKP9VsLIbONG4X0txyKqHpgiTyLNOe8GIe6y/f/2sMeSAfv1 eYS+eAWq+RBd/5iUuyczg1NY995U8PFoWM3hCpwIA2bTnPC8oWR4BvjN6AxcqaWXEUBO LK8TJuimlgL5YRkPdyk0hxQsVQGYGoSTtH+EIwP6YkOH5Yh//IrPVhTxqAUNGM/pobNR Pg3w== X-Gm-Message-State: ACgBeo0mX9/bWJTPjPATP2aFlenQS/aTaTqtnnRYWbWN/rqYT2rO5DvS X0PBlz/HJ+uyLC9eI9n9eFE= X-Google-Smtp-Source: AA6agR4vvT5Bvx0/kBrb3s1GNaOacjAGPqdnwmEadcnCadwXfyraXlXYt6utigpRN4EkUigFdsAhyQ== X-Received: by 2002:a17:902:a411:b0:172:766e:7f3d with SMTP id p17-20020a170902a41100b00172766e7f3dmr1714520plq.24.1661481893768; Thu, 25 Aug 2022 19:44:53 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id i14-20020a17090a2a0e00b001f3e643ebbfsm463922pjd.0.2022.08.25.19.44.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:53 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs. Date: Thu, 25 Aug 2022 19:44:20 -0700 Message-Id: <20220826024430.84565-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481894; a=rsa-sha256; cv=none; b=ilqrSNw9fGxG1noRRZNiQcbJ80REkEInjDMNMFSQSu5euK9N5UrdxxvANoveIuXItBizn8 nrBKbtjI+5H5KtHw0V7ZG4pSNPU4PMNLI0ns6Ov8M7xEhAmcC8DXDk5r62iMoOhT30EQiP fJbrvrnk/lQ+6oXYFuPmdozT7WEYz6U= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Ts1ypdy+; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3mqXakE7twO5rpPKR4kaQK7GjmKBp7UOh97wf3rlLLs=; b=E5m2XOPZpIisI425Jrq5Gf8PZfq+j6gxlznn+j3Ajnh08whfK7bDIt6jj41k2H2I0FHLyo uwJWXi74ddiAdgYw7v+60OHDsBf2y5NvgKZrRuJFIpvQ4kWovIRZGm9etEWUng+eRNe2tp jxFtwX2/8XsXKuRcIW8VkhpW2dUMxuU= Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Ts1ypdy+; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: bwaxw66ubcdei9tfd1zdqeasapjdwkpj X-Rspamd-Queue-Id: BE83418000D X-HE-Tag: 1661481894-583005 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since bpf hash map was converted to use bpf_mem_alloc it is safe to use from tracing programs and in RT kernels. But per-cpu hash map is still using dynamic allocation for per-cpu map values, hence keep the warning for this map type. In the future alloc_percpu_gfp can be front-end-ed with bpf_mem_cache and this restriction will be completely lifted. perf_event (NMI) bpf programs have to use preallocated hash maps, because free_htab_elem() is using call_rcu which might crash if re-entered. Sleepable bpf programs have to use preallocated hash maps, because life time of the map elements is not protected by rcu_read_lock/unlock. This restriction can be lifted in the future as well. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0194a36d0b36..3dce3166855f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12629,10 +12629,12 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, * For programs attached to PERF events this is mandatory as the * perf NMI can hit any arbitrary code sequence. * - * All other trace types using preallocated hash maps are unsafe as - * well because tracepoint or kprobes can be inside locked regions - * of the memory allocator or at a place where a recursion into the - * memory allocator would see inconsistent state. + * All other trace types using non-preallocated per-cpu hash maps are + * unsafe as well because tracepoint or kprobes can be inside locked + * regions of the per-cpu memory allocator or at a place where a + * recursion into the per-cpu memory allocator would see inconsistent + * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is + * safe to use from kprobe/fentry and in RT. * * On RT enabled kernels run-time allocation of all trace type * programs is strictly prohibited due to lock type constraints. On @@ -12642,15 +12644,26 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, */ if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { + /* perf_event bpf progs have to use preallocated hash maps + * because non-prealloc is still relying on call_rcu to free + * elements. + */ verbose(env, "perf_event programs can only use preallocated hash map\n"); return -EINVAL; } - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, "trace type programs can only use preallocated hash map\n"); - return -EINVAL; + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || + (map->inner_map_meta && + map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + verbose(env, + "trace type programs can only use preallocated per-cpu hash map\n"); + return -EINVAL; + } + WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); + verbose(env, + "trace type programs with run-time allocated per-cpu hash maps are unsafe." + " Switch to preallocated hash maps.\n"); } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n"); } if (map_value_has_spin_lock(map)) { From patchwork Fri Aug 26 02:44:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955498 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9015ECAAA3 for ; Fri, 26 Aug 2022 02:44:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FF0B80009; Thu, 25 Aug 2022 22:44:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AD9580007; Thu, 25 Aug 2022 22:44:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3501680009; Thu, 25 Aug 2022 22:44:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 20F7980007 for ; Thu, 25 Aug 2022 22:44:59 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F1617AC3AC for ; Fri, 26 Aug 2022 02:44:58 +0000 (UTC) X-FDA: 79840201476.09.3E870C0 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf07.hostedemail.com (Postfix) with ESMTP id A8E4C4000D for ; Fri, 26 Aug 2022 02:44:58 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id i8-20020a17090a65c800b001fd602afda2so405630pjs.4 for ; Thu, 25 Aug 2022 19:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=NXvXoD33j5kMIQZFtLrXok6Pap7LT2bzLuOd3NYTYe0=; b=PJWFMBiFT6pQvqyIB+tWhQApaReahJZJ4ywc6po9Bu4kohuSIv1m2BfDiz4j+WoA9J LqCiMNbbuApimsx0f808ZjUgpP8reoyvYHwspcV7TzHo2DrZZGk6JnSFSBSN3oi2U9Y8 tvexFdO0VPPX/SrIt6gU7b1Lv6UlBy2/kAIJxMKCFJ6HgEwNoM97P9FtZyot1A341q/6 Y1i/eUU/RTovPFAdT8b7a0YHU2SUtdoegCo6c2orAK7WQSVTBPu1h/pwd303TY3Pl8m0 PC3GNFlkU3kQKbmnH5GuflvoBQ1PDcFDW6JCh9DLHqSr07NE2GarvH6v21dHtqDJjnEP gbRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=NXvXoD33j5kMIQZFtLrXok6Pap7LT2bzLuOd3NYTYe0=; b=AM0hhe2BlI0J3hFjLurF1896cmsJYgk6OXQDk/fasJ/B9Ho1BRiSKhvbnYJv7iocqE ONIYlG+JA9Q/BQ3iL1TMty9wD9Qt5X0iX7NGaeDkixUJZNf1KNUisTO+TBTs8Tfatzkv Rg/iyLWinPkLiS0ne7IRnUBYijWm4X81gecV7gXiG9ckTDld7ly4lmFziwOmvRGvUjyU dXJA9TZniccMke68mlGDjloq7raRD18MOZb11CCooR5u9CjCtIEsOwAWPWVWXtVklEXz cooZuPcgwm2FbWZg6B0XgLitkeBSM4SwpeU2jesh7UdvbAU4Tqwai82shSIP4ZVQJs5G Makg== X-Gm-Message-State: ACgBeo0T+p+UFdZKCKn/ySEL2rSrt4S+CCOdmMC3894NJtX2AThoWX34 Xo/aj7osYk17Og7qoenAOtE= X-Google-Smtp-Source: AA6agR4YUygGQAOZEKUf/gWaAL6wUDgorQIEpyBBHbO3JR/ETWFBvc39p9MAG/g0/OdE4hf5pecHfw== X-Received: by 2002:a17:90b:1bce:b0:1fa:ecc3:9068 with SMTP id oa14-20020a17090b1bce00b001faecc39068mr2076479pjb.116.1661481897690; Thu, 25 Aug 2022 19:44:57 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id i17-20020a170902cf1100b0016d785ef6d2sm261394plg.223.2022.08.25.19.44.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:44:57 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map. Date: Thu, 25 Aug 2022 19:44:21 -0700 Message-Id: <20220826024430.84565-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481898; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NXvXoD33j5kMIQZFtLrXok6Pap7LT2bzLuOd3NYTYe0=; b=jEID4EdaWz5vL8eBJo0P92PVH1SiHY5BvUfyzZCSYIrWHfJcYk+sB/iZLaJxq2+JaMv7QH yZm7KSnt/i3VBOLP14WQmunZHTnjO1CpQGFuofTA3MESa2SqqU+j7GXrhDidconeYhxraJ acVBjP9C/98O4NO8V8ftc1ZeKRqL2W8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PJWFMBiF; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481898; a=rsa-sha256; cv=none; b=74oWVTd3r+I1ZtD7MUQweU7ifMZ7T038be+c1enXe021AZ8lmx0BTSbZpwDK1bLZ9KIX8/ BaqR9CaYbIYQdb5lllyyK4+c/GKxDPAmdL8tZdK6bIUUC5XFbrN12pUf0zBaJEWf3FSrTI s+EhVf2pheEMewrYlRYCicCbHMZiikc= Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PJWFMBiF; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: n837m3e1x4cost6dwaefz6o45p4494wt X-Rspamd-Queue-Id: A8E4C4000D X-Rspamd-Server: rspam03 X-HE-Tag: 1661481898-430542 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus access the same bpf map. Based on specified max_entries for the hash map calculate when percpu_counter becomes faster than atomic_t and use it for such maps. For example samples/bpf/map_perf_test is using hash map with max_entries 1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per second using atomic_t. On a system with 15 cpus it shows 100k events per second using percpu. map_perf_test is an extreme case where all cpus colliding on atomic_t which causes extreme cache bouncing. Note that the slow path of percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is necessary. See comment in the code why the heuristic is based on num_online_cpus(). Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 70 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index bd23c8830d49..8f68c6e13339 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -101,7 +101,12 @@ struct bpf_htab { struct bpf_lru lru; }; struct htab_elem *__percpu *extra_elems; - atomic_t count; /* number of elements in this hashtable */ + /* number of elements in non-preallocated hashtable are kept + * in either pcount or count + */ + struct percpu_counter pcount; + atomic_t count; + bool use_percpu_counter; u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; @@ -552,6 +557,29 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) htab_init_buckets(htab); +/* compute_batch_value() computes batch value as num_online_cpus() * 2 + * and __percpu_counter_compare() needs + * htab->max_entries - cur_number_of_elems to be more than batch * num_online_cpus() + * for percpu_counter to be faster than atomic_t. In practice the average bpf + * hash map size is 10k, which means that a system with 64 cpus will fill + * hashmap to 20% of 10k before percpu_counter becomes ineffective. Therefore + * define our own batch count as 32 then 10k hash map can be filled up to 80%: + * 10k - 8k > 32 _batch_ * 64 _cpus_ + * and __percpu_counter_compare() will still be fast. At that point hash map + * collisions will dominate its performance anyway. Assume that hash map filled + * to 50+% isn't going to be O(1) and use the following formula to choose + * between percpu_counter and atomic_t. + */ +#define PERCPU_COUNTER_BATCH 32 + if (attr->max_entries / 2 > num_online_cpus() * PERCPU_COUNTER_BATCH) + htab->use_percpu_counter = true; + + if (htab->use_percpu_counter) { + err = percpu_counter_init(&htab->pcount, 0, GFP_KERNEL); + if (err) + goto free_map_locked; + } + if (prealloc) { err = prealloc_init(htab); if (err) @@ -878,6 +906,31 @@ static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) } } +static bool is_map_full(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + return __percpu_counter_compare(&htab->pcount, htab->map.max_entries, + PERCPU_COUNTER_BATCH) >= 0; + return atomic_read(&htab->count) >= htab->map.max_entries; +} + +static void inc_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH); + else + atomic_inc(&htab->count); +} + +static void dec_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH); + else + atomic_dec(&htab->count); +} + + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { htab_put_fd_value(htab, l); @@ -886,7 +939,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) check_and_free_fields(htab, l); __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { - atomic_dec(&htab->count); + dec_elem_count(htab); l->htab = htab; call_rcu(&l->rcu, htab_elem_free_rcu); } @@ -970,16 +1023,15 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = container_of(l, struct htab_elem, fnode); } } else { - if (atomic_inc_return(&htab->count) > htab->map.max_entries) - if (!old_elem) { + if (is_map_full(htab)) + if (!old_elem) /* when map is full and update() is replacing * old element, it's ok to allocate, since * old element will be freed immediately. * Otherwise return an error */ - l_new = ERR_PTR(-E2BIG); - goto dec_count; - } + return ERR_PTR(-E2BIG); + inc_elem_count(htab); l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); @@ -1021,7 +1073,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new->hash = hash; return l_new; dec_count: - atomic_dec(&htab->count); + dec_elem_count(htab); return l_new; } @@ -1495,6 +1547,8 @@ static void htab_map_free(struct bpf_map *map) free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->ma); + if (htab->use_percpu_counter) + percpu_counter_destroy(&htab->pcount); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Fri Aug 26 02:44:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8832EECAAD1 for ; Fri, 26 Aug 2022 02:45:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F0FD8000A; Thu, 25 Aug 2022 22:45:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17E0B80007; Thu, 25 Aug 2022 22:45:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 041DD8000A; Thu, 25 Aug 2022 22:45:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E393F80007 for ; Thu, 25 Aug 2022 22:45:02 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C9B1C1C6442 for ; Fri, 26 Aug 2022 02:45:02 +0000 (UTC) X-FDA: 79840201644.27.7D3545F Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf16.hostedemail.com (Postfix) with ESMTP id 8592A18000E for ; Fri, 26 Aug 2022 02:45:02 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id e19so381441pju.1 for ; Thu, 25 Aug 2022 19:45:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=PaPuYu1jW6l30jyUgEzf6fCeC16QOg08yHB4Ew2HdVM=; b=GRWpSaM70EZmvKMlFNvW71O+vyfhIii8KVD3IfMCQDb8TiWvtRl//AZ0NMAoAnkp52 lCPxgH8uRmBE7iuNdblxsUTzr7XnprwwQjuPsDEZL2N9jm1C53QVEzWXezafTMsByITm 0MHH6PmJRDm1vOWc76ZZ/IprpTh2GBWLqeDp3HNMbXLUh7shyvKYYYWnqKuKU95tGXWY gufIxRKVaOfKTpoqxebuo+OYy+AZ8LDWxoZ6QPFjF73r5Ljl9f8+nn/N1HNXc31xx52/ dECetY7+9Fmelb70kDDTVQh9MGPbJ8LK4ubKuMnUws8UqnwmbIyhAxbTElaqiIMWFrN3 MMwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=PaPuYu1jW6l30jyUgEzf6fCeC16QOg08yHB4Ew2HdVM=; b=QxejVjkRjonl8cYieuwbObe6edHT4PGjI6uGXdCkfUE8ybMEy8MVlNJteE3I0KYqBW AZcr/+Gr0rvvbNSp074LL8a2uEwVXVUWHMyhkWNkybrK4vRMqjPLZ7Qu3B9XXqx/PW4N ncJLfkHta0rAiNFEly4Wbx785R87sn+/R9Aapz20gQE2IT1pcS1eHakueYTI+RVq6EsH tL9Vtp2ru7fv6FNlZDbBo1S+ou2LO+5Hd5hTwCgNuhHlz+worqAQMW3/eNlYRwnMzsXj FNgJFvW3zg0EocQ2JolcsFLfDkE1p72YhdRJYfWshxcd0RT1ZqpslIaPQeuzZaa9anW7 36Iw== X-Gm-Message-State: ACgBeo2TsECiyw18E8b63m4DtmXY9KTJLmEN4HJFGT0M/cIakY9LhUxX 4vkV1RxInP6rMmjLW0S/IZCN3zAEnic= X-Google-Smtp-Source: AA6agR5R7JS+PwvjO3raqPgX79bYewUm8RfaJcq/kT1isr5WF/WqZYncmh5vmqWXfYf/7p1jNenqbA== X-Received: by 2002:a17:90a:de96:b0:1fa:e427:e18e with SMTP id n22-20020a17090ade9600b001fae427e18emr2152653pjv.116.1661481901540; Thu, 25 Aug 2022 19:45:01 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id k6-20020a170902ce0600b0015e8d4eb1dbsm292164plg.37.2022.08.25.19.45.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:01 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 07/15] bpf: Optimize call_rcu in non-preallocated hash map. Date: Thu, 25 Aug 2022 19:44:22 -0700 Message-Id: <20220826024430.84565-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GRWpSaM7; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481902; a=rsa-sha256; cv=none; b=8eTYXimXvT+95T2aE2MiLmuLvfTuAgcsoHzsn13wfs82p2Kf+E4X9YaR9EOF9On/XozDF1 Z/PhwHxM4OR2afmh0fgFxzoMj5c3Dy6IRxnF5IlJVqOj8GdOGBxmk52A/6LyrIktL+Yvmx nS289Gcy3IC+zS5GQLZgK21CeyiECr8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PaPuYu1jW6l30jyUgEzf6fCeC16QOg08yHB4Ew2HdVM=; b=TcFXSt8HeG6WwpkWX/YjxFtKWypSj6I4/Azi/xyMJtysOm52wAy/S4rx4812/T9m4tBaPv 5souVQcWmikrzD+UioHCR4gQlHWu0y3KNaQXxAQ7KlDLI0qwcU4zR1MGYyh4GJAQ5MwVER RU2PMmRpgOuDZqVcgX8is/rUgMMO/4I= X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8592A18000E X-Rspam-User: X-Stat-Signature: sr7ptgzs97ma6dann73iiiyuinrg89nw Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GRWpSaM7; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1661481902-585717 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 8 ++++++-- kernel/bpf/memalloc.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 ----------- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 8f68c6e13339..299ab98f9811 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -940,8 +940,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); + if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { + l->htab = htab; + call_rcu(&l->rcu, htab_elem_free_rcu); + } else { + htab_elem_free(htab, l); + } } } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 29f340016e9e..c1817f14c25a 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -277,7 +277,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index 5f5309791649..0053c5402173 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -208,17 +208,6 @@ static int timer_cb2(void *map, int *key, struct hmap_elem *val) */ bpf_map_delete_elem(map, key); - /* in non-preallocated hashmap both 'key' and 'val' are RCU - * protected and still valid though this element was deleted - * from the map. Arm this timer for ~35 seconds. When callback - * finishes the call_rcu will invoke: - * htab_elem_free_rcu - * check_and_free_timer - * bpf_timer_cancel_and_free - * to cancel this 35 second sleep and delete the timer for real. - */ - if (bpf_timer_start(&val->timer, 1ull << 35, 0) != 0) - err |= 256; ok |= 4; } return 0; From patchwork Fri Aug 26 02:44:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955500 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6A9DECAAA3 for ; Fri, 26 Aug 2022 02:45:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3ED2E8000B; Thu, 25 Aug 2022 22:45:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39C5880007; Thu, 25 Aug 2022 22:45:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23D368000B; Thu, 25 Aug 2022 22:45:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1244880007 for ; Thu, 25 Aug 2022 22:45:07 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DA72816010D for ; Fri, 26 Aug 2022 02:45:06 +0000 (UTC) X-FDA: 79840201812.04.A8E72A9 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf31.hostedemail.com (Postfix) with ESMTP id 7FBB220013 for ; Fri, 26 Aug 2022 02:45:06 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id h11-20020a17090a470b00b001fbc5ba5224so267103pjg.2 for ; Thu, 25 Aug 2022 19:45:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=2T6kqT/PBUwvMDQLZRxHusAqO091+bGHkILYJZTplhw=; b=DUYXaz7496TnWGpAvKl9hy0sER9sYMgoaJ36QXmlvNPpivD2z17fLd/JKKUcNy0qmD Ps326UW/a/iDhTNaeG/Re/e0+8kcfd0mqISUCuY0HOuxdY8SQ6mj9DUAaCAM1TO2XE4V nK7JDFduix6BHOPIRRAgTazCVWK5S3qQ9Tc0EfaQ19hVJCVjSVdJe3ZoaOBy8+UoM3Z8 /NG6Sk51M1ZyGdWoVUn0zybcEeqWsC9Uz+J+CvqCtDl5oVNJ2yHeqc2S8lOv58kA59x/ fik7BvTFnUjP2OFWs1Pgl6mtM6eUCCkU/zTi+ak32ZO9KwrXFnG+VThy7mmM/5+OJ1bW i2BA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=2T6kqT/PBUwvMDQLZRxHusAqO091+bGHkILYJZTplhw=; b=HC34/xTE2+dy474io3cnXU2llt8WX0b/l5mVcGUF4tJAoi6Av6dojQsTjLsBWV7KlZ 0jenzkaB8e6BJ/Pd3F2RrqlXse+Sx2o9yjp5bl93kGc083XTtWkXwRgXkqdgnDf4OGh8 qaHU33oNdsJut/pcRMuCl/Fh9n1DsceuilkDAVwJ3VkKfHTEghZ+bp9ce68IwC7X814I xSGaDG8qZqWS+kfB0DHS++QG/pDiiYYaOPQmSokjdOzT2weoEr9Igi01yrISCxurH+cQ zoxC/AMkqk5mw2NifMk/8uWdjCpyPjNrQN2qv6YBOGJ30HFUUsSNNY9Ap1UshjG1wi5u si6A== X-Gm-Message-State: ACgBeo3f/xJWeaJAtqhp0o5uWFYeacAEjxC89TFGPzwit0jt9vp0DpVT Rh1svRk3oBJ6ehJiGYUGAzE= X-Google-Smtp-Source: AA6agR7c7N+NtA1xf1XKhKVANdoaBkj2X29ztifg8T7kAAuAOrZJHVQOBfblEuR90mdmwtqS6pztTg== X-Received: by 2002:a17:902:6542:b0:172:95d8:a777 with SMTP id d2-20020a170902654200b0017295d8a777mr1725023pln.61.1661481905538; Thu, 25 Aug 2022 19:45:05 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id i10-20020a17090332ca00b0017292073839sm271265plr.178.2022.08.25.19.45.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:05 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Date: Thu, 25 Aug 2022 19:44:23 -0700 Message-Id: <20220826024430.84565-9-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481906; a=rsa-sha256; cv=none; b=YEh/23hoEPwpsiWsLEKvnlgWtGlP+hRrRXqFdJLX/Q+P6+4nsjtgafDj9+cA+GgnuXCvxI pO+Uas9HUWnSVGzbuFTEl63YuKFcFLj7++qjLb2wAGHywTNNXZZPT25Hw7VgfFNwJuDi3p JKA+IXuryle13iAxpTsuVXteub5mEC8= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=DUYXaz74; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481906; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2T6kqT/PBUwvMDQLZRxHusAqO091+bGHkILYJZTplhw=; b=j1MZCBapvoYz7+J8bWfzS57DpLKC2ctDDM6Bp6MoRnv6lZtuRspd4udWyHs22talG3tTvP N4kb8D0NWCONXoB3J+WWekml/zkZXqIl2qoxXMxYwHPX9rjZytmTXi/qVaRiP8cGf91xLW erdQJ66qZyVuU6UKlgJkWcdTVkSynnU= X-Stat-Signature: 5uramm6kt6b34fuiye9gteqw6dhmf9kg X-Rspamd-Queue-Id: 7FBB220013 X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=DUYXaz74; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1661481906-994787 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The same low/high watermarks for every bucket in bpf_mem_cache consume significant amount of memory. Preallocating 64 elements of 4096 bytes each in the free list is not efficient. Make low/high watermarks and batching value dependent on element size. This change brings significant memory savings. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 50 +++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index c1817f14c25a..775c38132c4d 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -100,6 +100,7 @@ struct bpf_mem_cache { int unit_size; /* count of objects in free_llist */ int free_cnt; + int low_watermark, high_watermark, batch; }; struct bpf_mem_caches { @@ -118,14 +119,6 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) return entry; } -#define BATCH 48 -#define LOW_WATERMARK 32 -#define HIGH_WATERMARK 96 -/* Assuming the average number of elements per bucket is 64, when all buckets - * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + - * 64*4096*32 ~ 20Mbyte - */ - static void *__alloc(struct bpf_mem_cache *c, int node) { /* Allocate, but don't deplete atomic reserves that typical @@ -216,7 +209,7 @@ static void free_bulk(struct bpf_mem_cache *c) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); free_one(c, llnode); - } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -230,12 +223,12 @@ static void bpf_mem_refill(struct irq_work *work) /* Racy access to free_cnt. It doesn't need to be 100% accurate */ cnt = c->free_cnt; - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) /* irq_work runs on this cpu and kmalloc will allocate * from the current numa node which is what we want here. */ - alloc_bulk(c, BATCH, NUMA_NO_NODE); - else if (cnt > HIGH_WATERMARK) + alloc_bulk(c, c->batch, NUMA_NO_NODE); + else if (cnt > c->high_watermark) free_bulk(c); } @@ -244,9 +237,38 @@ static void notrace irq_work_raise(struct bpf_mem_cache *c) irq_work_queue(&c->refill_work); } +/* For typical bpf map case that uses bpf_mem_cache_alloc and single bucket + * the freelist cache will be elem_size * 64 (or less) on each cpu. + * + * For bpf programs that don't have statically known allocation sizes and + * assuming (low_mark + high_mark) / 2 as an average number of elements per + * bucket and all buckets are used the total amount of memory in freelists + * on each cpu will be: + * 64*16 + 64*32 + 64*64 + 64*96 + 64*128 + 64*196 + 64*256 + 32*512 + 16*1024 + 8*2048 + 4*4096 + * == ~ 116 Kbyte using below heuristic. + * Initialized, but unused bpf allocator (not bpf map specific one) will + * consume ~ 11 Kbyte per cpu. + * Typical case will be between 11K and 116K closer to 11K. + * bpf progs can and should share bpf_mem_cache when possible. + */ + static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) { init_irq_work(&c->refill_work, bpf_mem_refill); + if (c->unit_size <= 256) { + c->low_watermark = 32; + c->high_watermark = 96; + } else { + /* When page_size == 4k, order-0 cache will have low_mark == 2 + * and high_mark == 6 with batch alloc of 3 individual pages at + * a time. + * 8k allocs and above low == 1, high == 3, batch == 1. + */ + c->low_watermark = max(32 * 256 / c->unit_size, 1); + c->high_watermark = max(96 * 256 / c->unit_size, 3); + } + c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1); + /* To avoid consuming memory assume that 1st run of bpf * prog won't be doing more than 4 map_update_elem from * irq disabled region @@ -388,7 +410,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) WARN_ON(cnt < 0); - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) irq_work_raise(c); return llnode; } @@ -421,7 +443,7 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) local_dec(&c->active); local_irq_restore(flags); - if (cnt > HIGH_WATERMARK) + if (cnt > c->high_watermark) /* free few objects from current cpu into global kmalloc pool */ irq_work_raise(c); } From patchwork Fri Aug 26 02:44:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A175ECAAA3 for ; Fri, 26 Aug 2022 02:45:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39E678000C; Thu, 25 Aug 2022 22:45:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 34DCF80007; Thu, 25 Aug 2022 22:45:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C8C58000C; Thu, 25 Aug 2022 22:45:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 06E3280007 for ; Thu, 25 Aug 2022 22:45:11 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D7FCE804D1 for ; Fri, 26 Aug 2022 02:45:10 +0000 (UTC) X-FDA: 79840201980.20.58E13C1 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf11.hostedemail.com (Postfix) with ESMTP id 8970A40009 for ; Fri, 26 Aug 2022 02:45:10 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id 199so269182pfz.2 for ; Thu, 25 Aug 2022 19:45:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=DLuVXLyugXNmDkaT6x9Z+sGpdk504mjgrnnpI5+GSfs=; b=SaMjqA/EJ4sqTT4znEEVoD5upYYaeTntE/mLt70FVwVDSh7+tZje0VpRemiyFBBP5l y3CS7cNqz5TLZYcYMmCEmnCqg/oA3QPRiHFYU+YKQxZA8LFsNyXkHtPN+7dtIWvKhauw GwMpMJfvuOajU4uOnflKeksgGeG68H1zjsbaVTJePxIsykTWu5DOngeyYkMewVfe+DSW 04JjzB161d8iEXzNg5tp2cZx676C3hXr8VfXq6Gg16CKLyMOMyI4kjjbYeJKeU6aFHzM 3Xzu7PRyi8hl+fnM2mOq9zhM/hAQ6/VHpgFfc67Wn5sdix9TPH9mbWo2u5lR2gYETRug cugA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=DLuVXLyugXNmDkaT6x9Z+sGpdk504mjgrnnpI5+GSfs=; b=BJylFVZ2qzY3o+r8Kn0nIYlPbTNkbabyz6kVzqJMpl62rV40DIBFMPulF15T/vjNxL jVRBXacJivvOnlOQm2jpEjOUYySIrreTzwbLhyl++sPwsyn4qIe+pNlX88FARO2cjR2W c2lDURr/5aCl3/6Zjj3zbOOUCTixxqjPMA9wNQ71lH/PNrfdGrYztRE7p2ifAXKORqaI nIINl+oxkMcaYd73VCLHlpv6l2QL4/hC3dp8AeY1FvMDqf1VZEmFf5+V4TjJc+BVn3Eb O1g00xVIk7oe3oHzfS5Vjj+tGS2jpIgKoJzcSp/Tmw5Gr1tgH4gulN+sYS2/syo+rycc Mh2w== X-Gm-Message-State: ACgBeo29/WehzNZNA7n8CB2yx8rDuAQ+A7F/eQaOnLDGZ38viqdg2Ggh tvV/j2OTytwZgn1f4HgCejo= X-Google-Smtp-Source: AA6agR6hEqTs/xcMDgH3Ie5la1+xlWrCI/6kyg12tYh5U5u47HDLCZ+6/V1SYLdABwEaJrN8BGlVKw== X-Received: by 2002:a65:60d4:0:b0:419:9871:9cf with SMTP id r20-20020a6560d4000000b00419987109cfmr1609101pgv.214.1661481909479; Thu, 25 Aug 2022 19:45:09 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id k10-20020a63d10a000000b003fdc16f5de2sm293401pgg.15.2022.08.25.19.45.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:09 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. Date: Thu, 25 Aug 2022 19:44:24 -0700 Message-Id: <20220826024430.84565-10-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481910; a=rsa-sha256; cv=none; b=RGtcKwpFV5OSDLgfurAZpyFuT7pzFVrHf6ESid66aMjtW2EIIkTMhoKy+IQJujGynp0Hi4 WivAe+E30AgFiJQGdtcu0JJ5n8KrCepKUQ1yVNjQJ7pPGTbB21e3uxLVQRLpsQY9HKypXz 0EYQBnyM6pToP3JAbrcCMFz4nyAVc8s= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="SaMjqA/E"; spf=pass (imf11.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481910; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DLuVXLyugXNmDkaT6x9Z+sGpdk504mjgrnnpI5+GSfs=; b=ztUYo8k05ejbdxwZ13GDx+eHVGup6U/qeVVcAXHNU+9rFZULIu3d97fORQDqdQh+49uMYC 62XML5ZfpxDIGyW4vU3qhM23C6abRo64g9MRNcZOl5H8/bPvQUjDJI79dWlNYRC/gIJbez bkqgggpUtOyBU9uAQjOK8KhTypg1Ous= Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="SaMjqA/E"; spf=pass (imf11.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 4en611kf6zehqr11mkm8ahy9xwzurye9 X-Rspamd-Queue-Id: 8970A40009 X-HE-Tag: 1661481910-352004 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 64 +++++++++++++++++++++++++++++++++++++++++-- kernel/bpf/syscall.c | 5 +++- 2 files changed, 65 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 775c38132c4d..6a252d495f6c 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,11 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + + struct rcu_head rcu; + struct llist_head free_by_rcu; + struct llist_head waiting_for_gp; + atomic_t call_rcu_in_progress; }; struct bpf_mem_caches { @@ -189,6 +194,45 @@ static void free_one(struct bpf_mem_cache *c, void *obj) kfree(obj); } +static void __free_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); + struct llist_node *pos, *t; + + llist_for_each_safe(pos, t, llnode) + free_one(c, pos); + atomic_set(&c->call_rcu_in_progress, 0); +} + +static void enque_to_free(struct bpf_mem_cache *c, void *obj) +{ + struct llist_node *llnode = obj; + + /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. + * Nothing races to add to free_by_rcu list. + */ + __llist_add(llnode, &c->free_by_rcu); +} + +static void do_call_rcu(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + if (atomic_xchg(&c->call_rcu_in_progress, 1)) + return; + + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + /* There is no concurrent __llist_add(waiting_for_gp) access. + * It doesn't race with llist_del_all either. + * But there could be two concurrent llist_del_all(waiting_for_gp): + * from __free_rcu() and from drain_mem_cache(). + */ + __llist_add(llnode, &c->waiting_for_gp); + call_rcu(&c->rcu, __free_rcu); +} + static void free_bulk(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; @@ -208,12 +252,13 @@ static void free_bulk(struct bpf_mem_cache *c) local_dec(&c->active); if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); - free_one(c, llnode); + enque_to_free(c, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) - free_one(c, llnode); + enque_to_free(c, llnode); + do_call_rcu(c); } static void bpf_mem_refill(struct irq_work *work) @@ -299,7 +344,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; @@ -341,6 +386,15 @@ static void drain_mem_cache(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; + /* The caller has done rcu_barrier() and no progs are using this + * bpf_mem_cache, but htab_map_free() called bpf_mem_cache_free() for + * all remaining elements and they can be in free_by_rcu or in + * waiting_for_gp lists, so drain those lists now. + */ + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp)) + free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -362,6 +416,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) kmem_cache_destroy(c->kmem_cache); if (c->objcg) obj_cgroup_put(c->objcg); + /* c->waiting_for_gp list was drained, but __free_rcu might + * still execute. Wait for it now before we free 'c'. + */ + rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4e9d4622aef7..074c901fbb4e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -638,7 +638,10 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) bpf_map_free_id(map, do_idr_lock); btf_put(map->btf); INIT_WORK(&map->work, bpf_map_free_deferred); - schedule_work(&map->work); + /* Avoid spawning kworkers, since they all might contend + * for the same mutex like slab_mutex. + */ + queue_work(system_unbound_wq, &map->work); } } From patchwork Fri Aug 26 02:44:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955502 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C6AFECAAA2 for ; Fri, 26 Aug 2022 02:45:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFFB68000D; Thu, 25 Aug 2022 22:45:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAEAA80007; Thu, 25 Aug 2022 22:45:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C286D8000D; Thu, 25 Aug 2022 22:45:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id ADEEA80007 for ; Thu, 25 Aug 2022 22:45:14 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 84B451A06B5 for ; Fri, 26 Aug 2022 02:45:14 +0000 (UTC) X-FDA: 79840202148.07.346C5C6 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf07.hostedemail.com (Postfix) with ESMTP id 38C8640009 for ; Fri, 26 Aug 2022 02:45:14 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id 76so326535pfy.3 for ; Thu, 25 Aug 2022 19:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=YZHKbeH8DnXbWPWBMsdiPtaW1ZKeFO8TVBseUASZqVM=; b=GpjL7fUxJMG0/zA4mo4xdTv/r1CC2MmWh8AC7pNBP0L30+xpxMGnp4gJg5hUbUSh8M PcxqY9cJV/NHV5jRouSevPYwJYvQ4zOPwihjyQfO2HzyBPXNThjHHoQR8c9xD0MtGvLU wKmGX9d5HDh2hDH83P3Msz5LMSj+aCBlspPN/m3oW/sW8bA7KNFeqX5KlaH1WgBXvuG0 xnPYL3d65t6kcG+D32p2p9/17oXtt+70XGAc1o07dDv96/CXktZRxDH3EzrllrlLk22P LeXXSPWSITd5r0wv9l9CvxvEwPFKZAtdqnO6Q6+Eqn4DFXAd9l3+sMQ/ZVDzZKXuNTO8 UDfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=YZHKbeH8DnXbWPWBMsdiPtaW1ZKeFO8TVBseUASZqVM=; b=u0E3FikeBgrjRp0kK0nOCnJEv4J3RLS1ezjgr02sozCu4IeBBAPF7/S5v0SvbQYhoG ISUwHGlOdGG4jcdnFK8IfpBnA34sQ8POuF+hJYS85Og0FNp6ocr8fsFY0XAll68Jzyvj V6ARXYA6GiryKZ5dooT2vqT8pFx7D+YBiVdn/pY2tq6FTgGe2Messhxbtc1TqAqC+893 Z7syd8Ovhdmiv3WE+/AnaJM2ojCzxcwGQGQyanW2IIPrHptmpTtHukEo8eY0GZjLMB5q 3fXQgs71OlpOU9h8eUSUMjt7lON+kwrcKvMslxVQg4rgTrv5VmF2/UE0105Y0KW63GPx kg2w== X-Gm-Message-State: ACgBeo0TUjabK2hY1MxuLsGBdfEsMGlPCVCviu2/MTlWLPqO782TdUxO v+4jqWTbCITOIIQVfchR3Sg= X-Google-Smtp-Source: AA6agR5TGr1wZaD3/psfDBLsZ0Te84G7Qc7MK5kymK0o3fuFIppuFSKhzWWIbyXSWXW5VL+Banlqog== X-Received: by 2002:a63:e412:0:b0:41d:9c6a:7e with SMTP id a18-20020a63e412000000b0041d9c6a007emr1577624pgi.575.1661481913392; Thu, 25 Aug 2022 19:45:13 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id y1-20020a17090a154100b001fac90ead43sm430265pja.29.2022.08.25.19.45.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:12 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc. Date: Thu, 25 Aug 2022 19:44:25 -0700 Message-Id: <20220826024430.84565-11-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481914; a=rsa-sha256; cv=none; b=u5S665oyn9mYAbUeS/X+syUCtLEk+kfN83eSzYMCrdZwNj80JAE1HvPymBFV6eDT09gGtn zpiw9JrV4VVS0CG91+yIFm0Rya/vF4HLd8TjIJICKZnIwJmsLIwNk7ejR5XWtxK47WGjHO zKC+XJzdLf6kRWe4JSzP8mqSOsBCCv4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GpjL7fUx; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YZHKbeH8DnXbWPWBMsdiPtaW1ZKeFO8TVBseUASZqVM=; b=c0l2TtSdgisQDrkIDQ4KLQjLMG0yWlqqK9C2coTVMfwFOycARel/CZRsuHHeyhcFab5hdF 4fzXuRbZ28aO/hTAVsnd+KFktpiyvReOQ0jnDtwVJCzrxvK6/9Da1HgX+B4faQAb67wEB7 mSM4b7lS6F7bWPUQ/ULW4EnONQzws/Y= X-Stat-Signature: 6w1bqx1np6bdtp83n7a353hye6ch7mpx X-Rspamd-Queue-Id: 38C8640009 X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GpjL7fUx; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1661481914-533779 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 +- kernel/bpf/hashtab.c | 2 +- kernel/bpf/memalloc.c | 44 +++++++++++++++++++++++++++++++---- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 804733070f8d..653ed1584a03 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,7 +12,7 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 299ab98f9811..8daa1132d43c 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -594,7 +594,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 6a252d495f6c..54455a64699b 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,7 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + bool percpu; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -133,6 +134,19 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + if (c->percpu) { + void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); + + if (!obj || !pptr) { + free_percpu(pptr); + kfree(obj); + return NULL; + } + obj[1] = pptr; + return obj; + } + if (c->kmem_cache) return kmem_cache_alloc_node(c->kmem_cache, flags, node); @@ -188,6 +202,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { + if (c->percpu) { + free_percpu(((void **)obj)[1]); + kmem_cache_free(c->kmem_cache, obj); + return; + } + if (c->kmem_cache) kmem_cache_free(c->kmem_cache, obj); else @@ -328,21 +348,30 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache; + struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; char buf[32]; - int cpu, i; + int cpu, i, unit_size; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - size += LLIST_NODE_SZ; /* room for llist_node */ + + if (percpu) { + unit_size = size; + /* room for llist_node and per-cpu pointer */ + size = LLIST_NODE_SZ + sizeof(void *); + } else { + size += LLIST_NODE_SZ; /* room for llist_node */ + unit_size = size; + } + snprintf(buf, sizeof(buf), "bpf-%u", size); kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { @@ -355,14 +384,19 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); c->kmem_cache = kmem_cache; - c->unit_size = size; + c->unit_size = unit_size; c->objcg = objcg; + c->percpu = percpu; prefill_mem_cache(c, cpu); } ma->cache = pc; return 0; } + /* size == 0 && percpu is an invalid combination */ + if (WARN_ON_ONCE(percpu)) + return -EINVAL; + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; From patchwork Fri Aug 26 02:44:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5545FECAAA2 for ; Fri, 26 Aug 2022 02:45:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4E3A8000E; Thu, 25 Aug 2022 22:45:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFD2080007; Thu, 25 Aug 2022 22:45:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C76818000E; Thu, 25 Aug 2022 22:45:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B482580007 for ; Thu, 25 Aug 2022 22:45:19 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 914FB120692 for ; Fri, 26 Aug 2022 02:45:19 +0000 (UTC) X-FDA: 79840202358.20.4BB10FE Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 4160914000E for ; Fri, 26 Aug 2022 02:45:18 +0000 (UTC) Received: by mail-pl1-f178.google.com with SMTP id w2so479731pld.0 for ; Thu, 25 Aug 2022 19:45:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=JOT55Eo9yXBHTgQbv8Pjb2VU2oR8kPidi7ho2CeP3XA=; b=gCoUJ948UmoShsYcMidHcEiLyEh04DDK1ibniRf2utpNAQWBzjiNiHWHIf0O+NLVja WGL9KKHm4DaGUjNpJIK2yIROOmgr8NDTki660lVaOqr/rFzYNN478jck2xc13leAlDBM 4VdMWJzBRJrUABCPJov0A4VRInU1l6PQBIP2h9P7aP8eIuYBIlElPkpdHRW2Lec2QslO o8BCJPHHevTAADQogbTHbDuspa+F0RY/unBqhRlCvylBZrcUJBmdPjMqB1bY9BBuR+wh S20NHVzQNf10QHiuQb8WpsWDwnnN/Y1hZV+58xpe4L7y9mShbxIZBKjfq3xIakXExNPD rgig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=JOT55Eo9yXBHTgQbv8Pjb2VU2oR8kPidi7ho2CeP3XA=; b=AgUi3XIK3eBLdpVxja7RQ6+Zxm7/xgYcnq3+dJNgggAs6rVvmc+a+u4ZGnWm4F7i1D taHbj088OzOh/jMBE3VPPOuAYyzMJhZToWaXn3orpRt+Y5zXpMoDj2v67Pll/vU9nrWn ZJQEgAhujqcBJsDaUHqYZ+EC5HMSbTld8DCwWRaWZ4/Og6kz2rnw+NITIBzSqgTfxZbT w+kHAvGd8Or/92kP5OCt1yGPuuv9S3yd+CsU4IwzAndOv8PhM8Z60DtvLyWcgxCf/l17 wIVhX5R9DSDKX7iGccaaDcr4x13+RTTHJVhTl6/p8Xe9EpZiLlZ8wEk5Dgzrd6f2yRbN +nlg== X-Gm-Message-State: ACgBeo3gGeohvUGnt5Q4KRdbOgiEkbfHdwxQN4Y1nx0WVXNThde+wZ3p N4zO1t3qJnSSzBY0bgfLqRY= X-Google-Smtp-Source: AA6agR5X/Z9WhtKb2aN19AJeaXeQalaigXUciOc7YFPfyQPVvDOGcHpo+aw+1pzWIVBDai/B9yIB9A== X-Received: by 2002:a17:90b:3c0c:b0:1fb:42eb:746b with SMTP id pb12-20020a17090b3c0c00b001fb42eb746bmr2030419pjb.229.1661481917249; Thu, 25 Aug 2022 19:45:17 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id e7-20020a631e07000000b0041c49af8156sm307528pge.6.2022.08.25.19.45.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:16 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. Date: Thu, 25 Aug 2022 19:44:26 -0700 Message-Id: <20220826024430.84565-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gCoUJ948; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481918; a=rsa-sha256; cv=none; b=xxkQMq0Qz5wF/R80Jlkn7vubNqQgua+1bLUvLAg6zzI5pEOwoDJ/84OY1iFJD6BzOW2kyw ZeIvitl0TGNy0J207BQaRJVXZFHeN1Hznsml4V27a1omTxb8zK/f8e4Eo0xFZkRW2hqNDt WVicy40856Twt2sSNdrlBRBl4ruEL+M= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JOT55Eo9yXBHTgQbv8Pjb2VU2oR8kPidi7ho2CeP3XA=; b=qRj01MDnUzSotN9m7ONGecrZJhAu3cGY7Ncglcg7q40jUcc7NYfOZhdvYmxubfL2d4JPkf UG7rdf5mvUKnd6M5E4/WYj8jzwGHKGoKcMHDB+ARL0yEZWQaPCXLlg8PTZ2lqqD/yE4Lgs db1wvkWeBVNCNLNnRC9iIoxx1f0uyyk= Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gCoUJ948; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: ny5z3h9j9jn5a45uazim9o1wnwg6zk8c X-Rspamd-Queue-Id: 4160914000E X-Rspamd-Server: rspam04 X-HE-Tag: 1661481918-417119 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to zero-fill per-cpu allocations, since dynamically allocated map elements are now similar to full prealloc, since alloc_percpu() is not called inline and the elements are reused in the freelist. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 45 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 8daa1132d43c..89f26cbddef5 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -94,6 +94,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; + struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { @@ -121,14 +122,14 @@ struct htab_elem { struct { void *padding; union { - struct bpf_htab *htab; struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { - struct rcu_head rcu; + /* pointer to per-cpu pointer */ + void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; @@ -435,8 +436,6 @@ static int htab_map_alloc_check(union bpf_attr *attr) bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); - BUILD_BUG_ON(offsetof(struct htab_elem, htab) != - offsetof(struct htab_elem, hash_node.pprev)); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); @@ -597,6 +596,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; + if (percpu) { + err = bpf_mem_alloc_init(&htab->pcpu_ma, + round_up(htab->map.value_size, 8), true); + if (err) + goto free_map_locked; + } } return &htab->map; @@ -607,6 +612,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); @@ -882,19 +888,11 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) - free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); + bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); check_and_free_fields(htab, l); bpf_mem_cache_free(&htab->ma, l); } -static void htab_elem_free_rcu(struct rcu_head *head) -{ - struct htab_elem *l = container_of(head, struct htab_elem, rcu); - struct bpf_htab *htab = l->htab; - - htab_elem_free(htab, l); -} - static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; @@ -940,12 +938,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); - } else { - htab_elem_free(htab, l); - } + htab_elem_free(htab, l); } } @@ -970,13 +963,12 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus) { - /* When using prealloc and not setting the initial value on all cpus, - * zero-fill element values for other cpus (just as what happens when - * not using prealloc). Otherwise, bpf program has no way to ensure + /* When not setting the initial value on all cpus, zero-fill element + * values for other cpus. Otherwise, bpf program has no way to ensure * known initial values for cpus other than current one * (onallcpus=false always when coming from bpf prog). */ - if (htab_is_prealloc(htab) && !onallcpus) { + if (!onallcpus) { u32 size = round_up(htab->map.value_size, 8); int current_cpu = raw_smp_processor_id(); int cpu; @@ -1047,18 +1039,18 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, memcpy(l_new->key, key, key_size); if (percpu) { - size = round_up(size, 8); if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ - pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_NOWAIT | __GFP_NOWARN); + pptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!pptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } + l_new->ptr_to_pptr = pptr; + pptr = *(void **)pptr; } pcpu_init_value(htab, pptr, value, onallcpus); @@ -1550,6 +1542,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); From patchwork Fri Aug 26 02:44:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955504 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 359AAECAAA2 for ; Fri, 26 Aug 2022 02:45:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C032D8000F; Thu, 25 Aug 2022 22:45:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB2D480007; Thu, 25 Aug 2022 22:45:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A53C28000F; Thu, 25 Aug 2022 22:45:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 95DC880007 for ; Thu, 25 Aug 2022 22:45:22 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5C05080547 for ; Fri, 26 Aug 2022 02:45:22 +0000 (UTC) X-FDA: 79840202484.02.8280EE2 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf09.hostedemail.com (Postfix) with ESMTP id 221C214000C for ; Fri, 26 Aug 2022 02:45:21 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id 72so307841pfx.9 for ; Thu, 25 Aug 2022 19:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=npiETgRwDYegJeQPkgRCmw8CYHFlORx1B2gGnaepqcQ=; b=e+bSbZkl7envlVYZAFYCewLjpcre0KiJcKaFhi7xoVHKhCWc0njTaLRrXpeVFZSH+R vsaeoPXH0MEWU5tzVg3/VfJTcqZ1MGqmuMRfSlihRYhiz2lvscCrABVy3CVUVyfwUs74 Uezxt8PpEomaEhfEyZD0h7ZhXuS3FzQwPWOGzcmhfHHdeisZH55JPNPcQeMChpC5wfZJ z9OWknA1te3hX8a6YuCheJ3sYjkIkzCwn6QyNzb2JEWhBB2RxieeBZEa7mUbrSH6TlXo wapvrbiOPg5Juf9JqnN7d+2n7Ax2yEl3enbdNQfCSBSBFUmOrHD0sbcywSKOaqffYx0e z/QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=npiETgRwDYegJeQPkgRCmw8CYHFlORx1B2gGnaepqcQ=; b=e0hF5ez36rcoCH9h41gry5ByBYn0Z9/HeFZ2TinareyvFEyTAiVLv+rsG3/6wDyICZ lb0BkYxKtnrsSXzjXLRe1Qgo37wtGBCTTWmjTglsBRCYtQ+m7OFBgZLTWPyB0F0ay8fR Eo3mYn2Kl+6I6Z76ldYehFY6vYr0wPFTHQCO01cQG2n+Ep6VC9dKH4Jfph3hHntXw2yP XZm8uzIcx8huVAKULr7s4tbbDPVUDcKYIRRJrIFFQBzMWYOEtZUnrYYH8NPzXF+79ybz 9MZqbPPS3QH3Xg5W6dc/mUiXJ7c3z11ap00AqjDDtmzFAFOuoVUWLeIALBx9dN8dcmfx 9b9g== X-Gm-Message-State: ACgBeo1Auv1nhlzYi1WaJn9CmqH5tk8B0XURRp7r0tlfmMFAMqtUp7Iy KuJzrZkatmRUzFIQtlTPGWuZZqigFoI= X-Google-Smtp-Source: AA6agR5mEmWR70rsohsYXBBkwh25jznbtxJMxn5NHJGfQXVbDlt9394v2MZchWx42S0yKImnG+fpug== X-Received: by 2002:a65:6c07:0:b0:41d:9ddd:4266 with SMTP id y7-20020a656c07000000b0041d9ddd4266mr1595015pgu.326.1661481921144; Thu, 25 Aug 2022 19:45:21 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id w188-20020a6262c5000000b0052c456eafe1sm365074pfb.176.2022.08.25.19.45.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:20 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 12/15] bpf: Remove tracing program restriction on map types Date: Thu, 25 Aug 2022 19:44:27 -0700 Message-Id: <20220826024430.84565-13-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481922; a=rsa-sha256; cv=none; b=uOi6hTvPuPf8Kog9l3GVtvn+bJ1eF6q/lRi4ydj8NCCskVZjw2KNK0/EmwISSQnBnoxwke q+ST+IoZHA+d+Q4QrcnNveA9jK0C4iIPhrG1yoUMSSGlSjZT84T6EAv6biPwppV46JQZfe hBckJWOhg3O24CHaNU//mDX6dS9lsD0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=e+bSbZkl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=npiETgRwDYegJeQPkgRCmw8CYHFlORx1B2gGnaepqcQ=; b=DWOV7f+k08MNVxHDPcZv7/O9/WtLW9+gVzEr81K8t5WFJP1OXztYJ4XrXB1tRfXV5v+WIe T3NoDT3cmcxXL1CJrbOcQmhWudmtPGXtUZh5w8VzzF/87nB4HhmqwQD1s1XqeDdqIopO0Y jbqoVI6oKhrMCKgL7bRecb/cnyRjFmA= Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=e+bSbZkl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Rspamd-Queue-Id: 221C214000C X-Rspamd-Server: rspam08 X-Stat-Signature: 583ixy4jbqhh6p6gzu9mk9yjrohb7sxn X-HE-Tag: 1661481921-902751 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The hash map is now fully converted to bpf_mem_alloc. Its implementation is not allocating synchronously and not calling call_rcu() directly. It's now safe to use non-preallocated hash maps in all types of tracing programs including BPF_PROG_TYPE_PERF_EVENT that runs out of NMI context. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3dce3166855f..57ec06b1d09d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12623,48 +12623,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, { enum bpf_prog_type prog_type = resolve_prog_type(prog); - /* - * Validate that trace type programs use preallocated hash maps. - * - * For programs attached to PERF events this is mandatory as the - * perf NMI can hit any arbitrary code sequence. - * - * All other trace types using non-preallocated per-cpu hash maps are - * unsafe as well because tracepoint or kprobes can be inside locked - * regions of the per-cpu memory allocator or at a place where a - * recursion into the per-cpu memory allocator would see inconsistent - * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is - * safe to use from kprobe/fentry and in RT. - * - * On RT enabled kernels run-time allocation of all trace type - * programs is strictly prohibited due to lock type constraints. On - * !RT kernels it is allowed for backwards compatibility reasons for - * now, but warnings are emitted so developers are made aware of - * the unsafety and can fix their programs before this is enforced. - */ - if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { - if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { - /* perf_event bpf progs have to use preallocated hash maps - * because non-prealloc is still relying on call_rcu to free - * elements. - */ - verbose(env, "perf_event programs can only use preallocated hash map\n"); - return -EINVAL; - } - if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || - (map->inner_map_meta && - map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, - "trace type programs can only use preallocated per-cpu hash map\n"); - return -EINVAL; - } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, - "trace type programs with run-time allocated per-cpu hash maps are unsafe." - " Switch to preallocated hash maps.\n"); - } - } if (map_value_has_spin_lock(map)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { From patchwork Fri Aug 26 02:44:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21244ECAAA3 for ; Fri, 26 Aug 2022 02:45:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF1EA80010; Thu, 25 Aug 2022 22:45:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA15080007; Thu, 25 Aug 2022 22:45:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 942B080010; Thu, 25 Aug 2022 22:45:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 80EED80007 for ; Thu, 25 Aug 2022 22:45:26 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 641D9804D1 for ; Fri, 26 Aug 2022 02:45:26 +0000 (UTC) X-FDA: 79840202652.24.A5C1672 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf17.hostedemail.com (Postfix) with ESMTP id 0E8EB4000C for ; Fri, 26 Aug 2022 02:45:25 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id h11-20020a17090a470b00b001fbc5ba5224so267692pjg.2 for ; Thu, 25 Aug 2022 19:45:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=v7ijpUUJZb/iDPXwwv+nfCSTFfI8M4d6OICZ7qDTZBE=; b=cn76eGXganlgoSYlgV2oR77vIYIassdgrhDQYBNoAST9ehRnVjOr2Y1A6JjG/Tsy5I Ppz8xlSx8zrByejMn2V9lfaHXu/5H6GWFlN90v05qIDJ9PKnV21lywriUSObeiwqEBdr YMT3umMkrY/zkFJS+KFqUA7Aomj/mZVs2FyFwatqOsy0dNbG2Z58T90p4Me2mvTNJGSw uo2qaUJELJZ9ZmI92gG80VbIqCPbpCes0ti9wNMqQFSYaEuRSkgFGVkENkvc5HEvLPKP pqjBConTXnhwHmVFJ5cN+k+UvEbjAmcPq7MSnBR2wjL632w0jNq/EG/UghBGxwzwUz8p W3zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=v7ijpUUJZb/iDPXwwv+nfCSTFfI8M4d6OICZ7qDTZBE=; b=y8tAz+SLqQsKoAaCa7WJf88SzIZN54DRbBT6e5+ILPPaNypKpKX6mcN/yNxB1zyV09 w6qqWN/8fXEXRBhnj5J1eQ+d0R4NFK6xHN2lbMomJTJhSIXDQLYRgRhR7EJBRdwHPmWW m8PvLx4TzXrDefMRM5wme6ly6exQ+y2BADlv63tCtcCUZMxvYn424NkogsI8IYBZIHtC IW5aUMEU0wlK63WT/i53N5QuJRm71CcLgFKRY1LKneSwJpdcEwdTFnRXEwbDULmCgbp/ ed+zNeqI2BSst3R59ykNp4jMDvshSpsXyPK0GO2SV/KQ0+kk1GvBGm8wyEMzeOgxKLAb 6NkQ== X-Gm-Message-State: ACgBeo2XbrDf9dG9SuRW2uwDVWNrKjH4wubot0j7dJPR2oYaL44j3tEb rrw+0vzhz8Pg3EWpGrufP6Y= X-Google-Smtp-Source: AA6agR4yPXRgq0g4xyroom/XnM5eNSLLTGaNkOmL/K+bgD8y2UnupRj5bBLPVT83fXmgj1fvSWheRQ== X-Received: by 2002:a17:90b:483:b0:1fb:137e:4bb9 with SMTP id bh3-20020a17090b048300b001fb137e4bb9mr2064418pjb.188.1661481925121; Thu, 25 Aug 2022 19:45:25 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id s6-20020a170902a50600b0016dbe37cebdsm259186plq.246.2022.08.25.19.45.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:24 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. Date: Thu, 25 Aug 2022 19:44:28 -0700 Message-Id: <20220826024430.84565-14-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481926; a=rsa-sha256; cv=none; b=uEVS41TyyGnkhYnv5c/O8cFahnh9G6KMew+/3AijNUkJnEeJyI+yUZ6Yvsarr9cas6xToq Azt0AioAhifwGNFAGgkD6S5m0hq0UppLYtHgwa0aeZk18fBFRryufMw5bA2VEQabW/J7K3 muTS+8SEEm3aSbg5f7S/jETJ2Yk+6eM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=cn76eGXg; spf=pass (imf17.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481926; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v7ijpUUJZb/iDPXwwv+nfCSTFfI8M4d6OICZ7qDTZBE=; b=7+/MDMoPbuv3sr+LtmWg7bnUAe9dDcfjZmsAW5LCA3rJwMxefKxAXp9KbSBRUf2ziNu5Kf Et8+BNLUWUFePhz1Jr3LsL/LcvbX8Y7zwoo0p8wvxnXfSYjXJbikJwKmk8y+8xDYN6Q+Yl YRqt4EwK6/BcDOucLYGAM6GVo3XTSTY= X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=cn76eGXg; spf=pass (imf17.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam12 X-Stat-Signature: bd7sy1kc7jj18968mf4wzbcg3djfrzjk X-Rspamd-Queue-Id: 0E8EB4000C X-HE-Tag: 1661481925-230749 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000182, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Use call_rcu_tasks_trace() to wait for sleepable progs to finish. Then use call_rcu() to wait for normal progs to finish and finally do free_one() on each element when freeing objects into global memory pool. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 54455a64699b..9caeeaaf9bcb 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -225,6 +225,13 @@ static void __free_rcu(struct rcu_head *head) atomic_set(&c->call_rcu_in_progress, 0); } +static void __free_rcu_tasks_trace(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + + call_rcu(&c->rcu, __free_rcu); +} + static void enque_to_free(struct bpf_mem_cache *c, void *obj) { struct llist_node *llnode = obj; @@ -250,7 +257,11 @@ static void do_call_rcu(struct bpf_mem_cache *c) * from __free_rcu() and from drain_mem_cache(). */ __llist_add(llnode, &c->waiting_for_gp); - call_rcu(&c->rcu, __free_rcu); + /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. + * Then use call_rcu() to wait for normal progs to finish + * and finally do free_one() on each element. + */ + call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) @@ -453,6 +464,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) /* c->waiting_for_gp list was drained, but __free_rcu might * still execute. Wait for it now before we free 'c'. */ + rcu_barrier_tasks_trace(); rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; From patchwork Fri Aug 26 02:44:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955506 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53EDDECAAA3 for ; Fri, 26 Aug 2022 02:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD7F080011; Thu, 25 Aug 2022 22:45:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D87AA80007; Thu, 25 Aug 2022 22:45:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C28D580011; Thu, 25 Aug 2022 22:45:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B058080007 for ; Thu, 25 Aug 2022 22:45:30 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9387E12069F for ; Fri, 26 Aug 2022 02:45:30 +0000 (UTC) X-FDA: 79840202820.13.0518B80 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf31.hostedemail.com (Postfix) with ESMTP id 54C1C20012 for ; Fri, 26 Aug 2022 02:45:30 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id ds12-20020a17090b08cc00b001fae6343d9fso6905166pjb.0 for ; Thu, 25 Aug 2022 19:45:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=uV2iJKt1r8T4/kLPE+0S1a5qg1z9UcPmTn8EI+FbVTo=; b=gtEwsy4x+ddZbQwM7VlCCFDOxzrhgFhLglupjaRgCD3AJZlTMpit1vdMX5v8Bni/HA vDa71MrVWUlvn5LpFBMWOWVw6vMcrOCd/fUEeM1UMXpCTra44xMBlSb+iMsNNFy3CH7m 4JXWN4MlEX3rJMhhnlxU7n+Od+4H1MZVHWvG1TMSziWe03JZKAfH5asgRhyi32or0QX0 tqkGfCdKrOVoFtOXf/ySkKi1evQRMdtWo0NRmE31kSGsSO0hc3jVAukkBFPjftA40wXx onIV8UusP66eO+hVTj1a5Htqwo3bXAk8c727wkzss96LrUc2Gdsf0oPeoaQWnjVAOM9W SIHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=uV2iJKt1r8T4/kLPE+0S1a5qg1z9UcPmTn8EI+FbVTo=; b=iHoUXEBM0Z6YFiHQ7Xli07CL4qFf8pHtOP4z+Caqhv0tHUNQ161e5xCQrkZrhyCl+9 ReN8tAVbH2G0Ao5gDHNhnFpZ8ADbK+EVWDsAxbL+cYxQ1YCAGZdVuAnOeBDppr530Xmj bcsZZNFhYmBlfKQA/SEBS7jZEe9XeWxbYswc/C39rVLgj5v+xgB/K7/IUuS1D7/YUQeo mVut3eYAREfdb2d0PwubMWioJJQj8huz0Fa8h4sP+eut2L7Gd7rmxD0h5Io5VcnOn5+r QiuTwSqW5ehfgJeOlZxPtCjlLWxN9scFDk+qtQ/LjVLkISK3X4+jFWOIFDMXIZSQG4e+ 8Y+w== X-Gm-Message-State: ACgBeo3Srtfq6NgUaQo9DgPPPjy3Rf29TvUS3+9Me7Kb4SW/12JZ0JO1 NnLcMEMEhu6Yqmuwv1fiW4o= X-Google-Smtp-Source: AA6agR6cxdfOqJFcJA44Y7BdIgBjd3RcuOdJYy4wgdgAkiQKPwJQN8WC3FxYM1NwvpDrniv6z1pgVQ== X-Received: by 2002:a17:90a:1d0:b0:1fa:c551:5e83 with SMTP id 16-20020a17090a01d000b001fac5515e83mr2116468pjd.106.1661481929024; Thu, 25 Aug 2022 19:45:29 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id p9-20020a1709027ec900b00172c7d6badcsm259849plb.251.2022.08.25.19.45.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:28 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 14/15] bpf: Remove prealloc-only restriction for sleepable bpf programs. Date: Thu, 25 Aug 2022 19:44:29 -0700 Message-Id: <20220826024430.84565-15-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gtEwsy4x; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481930; a=rsa-sha256; cv=none; b=oLDNXGNP5wrY/RL7opFZZD1r+HyETAK9IQPpDxquhwkZayk2mZjZyDEX8p/rIdla2pseo8 lRyRaO33RucmlPjXVic+lDcH89tWDpCN7PQ3Uyl+9gezc4r/iE+wrNwonezFaLgyObiFzY rOQ4HIssGNX1tIvBG3kyBb3aqTr17dE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uV2iJKt1r8T4/kLPE+0S1a5qg1z9UcPmTn8EI+FbVTo=; b=vxKyCostc3anWLLf1AiV7V0caqab+1s86KyMH6geIgSwqZHpzx5l+hmnseJQ4/I+Z7KWqt DeyfXHtT8dWMN8vUImwHgZOfaLwd5n3lCTd8Wn7phm4syM7KRUouNRa1eB35yquk/yqt7G 7gkurcF9BDe7DKBYmxFIkXUzpyT1QaI= X-Stat-Signature: faxsfjxpdscktzsaztzwwn63mjbw7smg Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gtEwsy4x; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 54C1C20012 X-Rspam-User: X-HE-Tag: 1661481930-310084 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since hash map is now converted to bpf_mem_alloc and it's waiting for rcu and rcu_tasks_trace GPs before freeing elements into global memory slabs it's safe to use dynamically allocated hash maps in sleepable bpf programs. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 57ec06b1d09d..068b20ed34d2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12586,14 +12586,6 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env, return err; } -static int check_map_prealloc(struct bpf_map *map) -{ - return (map->map_type != BPF_MAP_TYPE_HASH && - map->map_type != BPF_MAP_TYPE_PERCPU_HASH && - map->map_type != BPF_MAP_TYPE_HASH_OF_MAPS) || - !(map->map_flags & BPF_F_NO_PREALLOC); -} - static bool is_tracing_prog_type(enum bpf_prog_type type) { switch (type) { @@ -12608,15 +12600,6 @@ static bool is_tracing_prog_type(enum bpf_prog_type type) } } -static bool is_preallocated_map(struct bpf_map *map) -{ - if (!check_map_prealloc(map)) - return false; - if (map->inner_map_meta && !check_map_prealloc(map->inner_map_meta)) - return false; - return true; -} - static int check_map_prog_compatibility(struct bpf_verifier_env *env, struct bpf_map *map, struct bpf_prog *prog) @@ -12669,12 +12652,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_LRU_PERCPU_HASH: case BPF_MAP_TYPE_ARRAY_OF_MAPS: case BPF_MAP_TYPE_HASH_OF_MAPS: - if (!is_preallocated_map(map)) { - verbose(env, - "Sleepable programs can only use preallocated maps\n"); - return -EINVAL; - } - break; case BPF_MAP_TYPE_RINGBUF: case BPF_MAP_TYPE_INODE_STORAGE: case BPF_MAP_TYPE_SK_STORAGE: From patchwork Fri Aug 26 02:44:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12955507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4754DECAAA2 for ; Fri, 26 Aug 2022 02:45:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D62CD80012; Thu, 25 Aug 2022 22:45:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D12C180007; Thu, 25 Aug 2022 22:45:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDC8780012; Thu, 25 Aug 2022 22:45:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AC74280007 for ; Thu, 25 Aug 2022 22:45:34 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9090CA06C8 for ; Fri, 26 Aug 2022 02:45:34 +0000 (UTC) X-FDA: 79840202988.06.FAD0C6F Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf11.hostedemail.com (Postfix) with ESMTP id 4C42D40006 for ; Fri, 26 Aug 2022 02:45:34 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id s206so220065pgs.3 for ; Thu, 25 Aug 2022 19:45:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=5QfgEJRNKJb9Ft7vW5lcHKllT7kM9P++zdYvapMUmos=; b=Dt5UAkUDbBjefvOUaA3JwJDmn9c263KT6p0l55eczhC/30hED/rRBGmjSwM0KunqG4 XWSyRDDWW0oi+0b3rG1gG2EULZXUh9E9j/3ppRq5edceNB475Yx3ArA4hEoNXzrVRYmO g38h7PcJo+xdIeR7QCHxdd8km1QFh9SF+1GuLBuVV8HAPYRPnzRpSTFhrWsqFda6wwkt AER/Ggv2GXk0zRUmjXl8rc+/x0PvViOMJMeUcSzRs2c86Y0A3keROwxMXKO2XPA9UwK1 B4W4jazxD64nmx4EIXvj9EH03fG9QfX0eHdzaYuiLrEVZL4vCiprb721PjLG/Vr7MMky NjOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=5QfgEJRNKJb9Ft7vW5lcHKllT7kM9P++zdYvapMUmos=; b=OYjVIO6tp0+Wk2QFAPF9tuzuMJ+g4bC/AM5FGIVHLGuKNpxk6rFyTAqO0qBw1JRO1h yovkHV7SXo0mGmWnGbBQOE+y6r2ijqrwbBAVxsF+WndP1R9YwhTvDBh4Oa5zhllkrY0w nnMovk2p+U8t1lun/Cu9GzJqEq7oBpINvU6zRaaejmK0GoYPswFQndUvInKhhpFMpj6K h6F7rh+Mmxoto15aYCUFHzSLDpQHRyi+tWuT8SftIbMc9LTAVOxi6R8XuikoF5iUtL/m Su/16ZJH/a01zWacOPTp9pOYrR476GVC2Bp6zr+fp5nj/oEJsdJeGnHXi0B015n9iL+M 4Zsg== X-Gm-Message-State: ACgBeo0Swd+lEWBPrmcxtfULFmVeuZvXCOF0RkOoqeDxW12jc6DfsSwv c1k4kIyZdvTPTriMgRF+GAY= X-Google-Smtp-Source: AA6agR6UaEvgCLR+vyotD4u6O6LzVm1i7qrfbDu+HjQg3lGZNJbmdv8TJVzSmCnmZmdlYcZ4lg8wLw== X-Received: by 2002:aa7:940b:0:b0:537:97d1:7897 with SMTP id x11-20020aa7940b000000b0053797d17897mr1941202pfo.26.1661481933321; Thu, 25 Aug 2022 19:45:33 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:15dc]) by smtp.gmail.com with ESMTPSA id q1-20020a6557c1000000b0042b5b036da4sm284087pgr.68.2022.08.25.19.45.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 25 Aug 2022 19:45:32 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc. Date: Thu, 25 Aug 2022 19:44:30 -0700 Message-Id: <20220826024430.84565-16-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Dt5UAkUD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661481934; a=rsa-sha256; cv=none; b=oThKOifSQlk+h/d1khBXm6oPr0QoDaL0FZsDJNJF/aHnhPL4FzVBWiXF4ZLIptA/xZFVrN F4E5yx0YKXlkQnmBFGdQ2f3k0GyRKmNpsqyZ7PamhJP1yJqOe7Sw/QDbnto+D5BiFbjo7F AK6uMUDwKjHsXPneCJVChvZtya6pybw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661481934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5QfgEJRNKJb9Ft7vW5lcHKllT7kM9P++zdYvapMUmos=; b=rFYAGBg/PRqHyPc/fdtNKnwzf3AQLunyfgf6X3FNJCQ2uVbS7cE7HELR14bP8RFxXuQ1dq 7cSbU/aHKzeMlXDcq6cxP5lWtS9VQsfo/GeCHuBho8cCiehwpkCqOvmUSsWb/zkhVD2K+7 LwBqIpaDfFP9PBo07hEi8xuTzB9GeS8= X-Stat-Signature: wnjzo9djygg7us4coaqojn79kff3cw4d Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Dt5UAkUD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4C42D40006 X-Rspam-User: X-HE-Tag: 1661481934-700531 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Introduce sysctl kernel.bpf_force_dyn_alloc to force dynamic allocation in bpf hash map. All selftests/bpf should pass with bpf_force_dyn_alloc 0 or 1 and all bpf programs (both sleepable and not) should not see any functional difference. The sysctl's observable behavior should only be improved memory usage. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 2 ++ kernel/bpf/core.c | 2 ++ kernel/bpf/hashtab.c | 5 +++++ kernel/bpf/syscall.c | 9 +++++++++ 4 files changed, 18 insertions(+) diff --git a/include/linux/filter.h b/include/linux/filter.h index a5f21dc3c432..eb4d4a0c0bde 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1009,6 +1009,8 @@ bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, } #endif +extern int bpf_force_dyn_alloc; + #ifdef CONFIG_BPF_JIT extern int bpf_jit_enable; extern int bpf_jit_harden; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 639437f36928..a13e78ea4b90 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -533,6 +533,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) bpf_prog_kallsyms_del(fp); } +int bpf_force_dyn_alloc __read_mostly; + #ifdef CONFIG_BPF_JIT /* All BPF JIT sysctl knobs here. */ int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON); diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 89f26cbddef5..f68a3400939e 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -505,6 +505,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) bpf_map_init_from_attr(&htab->map, attr); + if (!lru && bpf_force_dyn_alloc) { + prealloc = false; + htab->map.map_flags |= BPF_F_NO_PREALLOC; + } + if (percpu_lru) { /* ensure each CPU's lru list has >=1 elements. * since we are at it, make each lru list has the same diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 074c901fbb4e..5c631244b63b 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5299,6 +5299,15 @@ static struct ctl_table bpf_syscall_table[] = { .mode = 0644, .proc_handler = bpf_stats_handler, }, + { + .procname = "bpf_force_dyn_alloc", + .data = &bpf_force_dyn_alloc, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { } };