From patchwork Fri Sep 2 21:10:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E5D3ECAAA1 for ; Fri, 2 Sep 2022 21:11:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EDF88012E; Fri, 2 Sep 2022 17:11:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 975DD80120; Fri, 2 Sep 2022 17:11:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C86E8012E; Fri, 2 Sep 2022 17:11:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6050580120 for ; Fri, 2 Sep 2022 17:11:07 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2E9A514085F for ; Fri, 2 Sep 2022 21:11:07 +0000 (UTC) X-FDA: 79868390574.28.88EFD55 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf16.hostedemail.com (Postfix) with ESMTP id B4745180055 for ; Fri, 2 Sep 2022 21:11:06 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id x23so3021264pll.7 for ; Fri, 02 Sep 2022 14:11:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=40G1ymhuiC/g2HUm+i3kSkyauN9uFFb8HbB3KTVRMFY=; b=g9l2SGqJ8lVeF1Za2wcoFjFDgLmi0NNDNds/5fZWVdaNyf2z8Su+sQLzFJEXi2wjwo fQRI6THPu1PKqxgn/dAIrIC/XdoQtb1PZTnh7NqX8ZRkIwZmOLcL+GwAooDJ6EEY2CfX CGJWAefNjcjlm+xtEDnCLQoROomnIBc5ORdGMQBycuUQzHtv73rUwXjP7YcAZLWVpopK +nlNtato/IN1r1FtFxppxU3OwX/m7J5a8JgFdiANGncZwfHUMH/XoIBBycdIFbPIbXHY Eekl0fsJgRZcD1q62zImqsypjNiyIefvCClV07N36P8y7DcD1jF6b+F7JThvez8r3dxQ SQpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=40G1ymhuiC/g2HUm+i3kSkyauN9uFFb8HbB3KTVRMFY=; b=vu36F2ebFuuFrsrDoxFtLJMImCeqGsLr1a2p4DfV2ftqttDLKQYHnDAYRo/om7nr02 OUQ+oRfVcn2oc7beF8FMHQD7sBMKz5/9e540r8nGhZn7cPeQKcaiKx9w6snF1ivAywjw +HtlQma7sNla9dSYF3W2yhJDLqt2oIE0s2S/PMHdql19WUDXK9s1b0zbaRYVDPBsqueT nQJiju3IcfBKmgIK1Ieovj8Xe4yNwvD2rP1123lYnbHJJBLQ2mHst43P/Ki0/clSJu3h yJdq/S+dOUzemkTqbPUgEJkLsL1XX3OftagAWcqbm5OgVPDDpeSsxtj5ULP4IxrmuV7e dNIQ== X-Gm-Message-State: ACgBeo2nSw/Jy1ecoYKurgy76y1Z6Xl5zCVRvrl9PoigNGlLET9VtnN5 71DIrm3FYRvysBF14RHMZo4= X-Google-Smtp-Source: AA6agR7nJ7ev+xJ33bPLJ6DwSL4fN0lavvmUa45RbFQkP0c3uLDtbVOjbHtmoFto8vDWKIyofw5PyQ== X-Received: by 2002:a17:902:8643:b0:172:e067:d7ac with SMTP id y3-20020a170902864300b00172e067d7acmr37494621plt.164.1662153065623; Fri, 02 Sep 2022 14:11:05 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id l15-20020a170903120f00b0017508d2665dsm2143384plh.34.2022.09.02.14.11.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:05 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 01/16] bpf: Introduce any context BPF specific memory allocator. Date: Fri, 2 Sep 2022 14:10:43 -0700 Message-Id: <20220902211058.60789-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153066; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=40G1ymhuiC/g2HUm+i3kSkyauN9uFFb8HbB3KTVRMFY=; b=XegJCsn6unwLyvZhBwzGruqRD2iuFJZFFsoBwTsVeeuGDZPJb9e2jOHXEidNgV26AWqk0N OlgFHydYP23VAIBHRGJR6c1pB6GGiWR0WmiHvGibfXq/xN0cmWmCMbN9ZpMFNMzsmYJoM+ Zwd8e+SbrvzfPFgaGuGXm/0oZftmzTk= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g9l2SGqJ; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153066; a=rsa-sha256; cv=none; b=lHtoGLSSEHwDc2H271aBkqGbGN4SJE04xfk6joQq2/hZiQKyEx9rapDjrN/wzeVXyXVfk0 FqAjSwso1O5RD4d3CkLFeos9xCSxt8saUslpGyhSfwBqfSwz28iPGNqKkfLAZFRbOEZyP/ Q8VRi/t1QqbmANk0zxMvzH6AnCMMFKE= Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g9l2SGqJ; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam08 X-Stat-Signature: xxbyrtho59smk9wj1h3kra5u4u4oj9m4 X-Rspamd-Queue-Id: B4745180055 X-Rspam-User: X-HE-Tag: 1662153066-428669 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with minimal per-cpu cache of free elements. Refill this cache asynchronously from irq_work. BPF programs always run with migration disabled. It's safe to allocate from cache of the current cpu with irqs disabled. Free-ing is always done into bucket of the current cpu as well. irq_work trims extra free elements from buckets with kfree and refills them with kmalloc, so global kmalloc logic takes care of freeing objects allocated by one cpu and freed on another. struct bpf_mem_alloc supports two modes: - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. This is typical bpf hash map use case when all elements have equal size. - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on kmalloc/kfree. Max allocation size is 4096 in this case. This is bpf_dynptr and bpf_kptr use case. bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 26 ++ kernel/bpf/Makefile | 2 +- kernel/bpf/memalloc.c | 480 ++++++++++++++++++++++++++++++++++ 3 files changed, 507 insertions(+), 1 deletion(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h new file mode 100644 index 000000000000..804733070f8d --- /dev/null +++ b/include/linux/bpf_mem_alloc.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#ifndef _BPF_MEM_ALLOC_H +#define _BPF_MEM_ALLOC_H +#include + +struct bpf_mem_cache; +struct bpf_mem_caches; + +struct bpf_mem_alloc { + struct bpf_mem_caches __percpu *caches; + struct bpf_mem_cache __percpu *cache; +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); + +/* kmalloc/kfree equivalent: */ +void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); +void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); + +/* kmem_cache_alloc/free equivalent: */ +void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); +void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); + +#endif /* _BPF_MEM_ALLOC_H */ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 00e05b69a4df..341c94f208f4 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c new file mode 100644 index 000000000000..1c46763d855e --- /dev/null +++ b/kernel/bpf/memalloc.c @@ -0,0 +1,480 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include +#include +#include + +/* Any context (including NMI) BPF specific memory allocator. + * + * Tracing BPF programs can attach to kprobe and fentry. Hence they + * run in unknown context where calling plain kmalloc() might not be safe. + * + * Front-end kmalloc() with per-cpu per-bucket cache of free elements. + * Refill this cache asynchronously from irq_work. + * + * CPU_0 buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * ... + * CPU_N buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * + * The buckets are prefilled at the start. + * BPF programs always run with migration disabled. + * It's safe to allocate from cache of the current cpu with irqs disabled. + * Free-ing is always done into bucket of the current cpu as well. + * irq_work trims extra free elements from buckets with kfree + * and refills them with kmalloc, so global kmalloc logic takes care + * of freeing objects allocated by one cpu and freed on another. + * + * Every allocated objected is padded with extra 8 bytes that contains + * struct llist_node. + */ +#define LLIST_NODE_SZ sizeof(struct llist_node) + +/* similar to kmalloc, but sizeof == 8 bucket is gone */ +static u8 size_index[24] __ro_after_init = { + 3, /* 8 */ + 3, /* 16 */ + 4, /* 24 */ + 4, /* 32 */ + 5, /* 40 */ + 5, /* 48 */ + 5, /* 56 */ + 5, /* 64 */ + 1, /* 72 */ + 1, /* 80 */ + 1, /* 88 */ + 1, /* 96 */ + 6, /* 104 */ + 6, /* 112 */ + 6, /* 120 */ + 6, /* 128 */ + 2, /* 136 */ + 2, /* 144 */ + 2, /* 152 */ + 2, /* 160 */ + 2, /* 168 */ + 2, /* 176 */ + 2, /* 184 */ + 2 /* 192 */ +}; + +static int bpf_mem_cache_idx(size_t size) +{ + if (!size || size > 4096) + return -1; + + if (size <= 192) + return size_index[(size - 1) / 8] - 1; + + return fls(size - 1) - 1; +} + +#define NUM_CACHES 11 + +struct bpf_mem_cache { + /* per-cpu list of free objects of size 'unit_size'. + * All accesses are done with interrupts disabled and 'active' counter + * protection with __llist_add() and __llist_del_first(). + */ + struct llist_head free_llist; + local_t active; + + /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill + * are sequenced by per-cpu 'active' counter. But unit_free() cannot + * fail. When 'active' is busy the unit_free() will add an object to + * free_llist_extra. + */ + struct llist_head free_llist_extra; + + /* kmem_cache != NULL when bpf_mem_alloc was created for specific + * element size. + */ + struct kmem_cache *kmem_cache; + struct irq_work refill_work; + struct obj_cgroup *objcg; + int unit_size; + /* count of objects in free_llist */ + int free_cnt; +}; + +struct bpf_mem_caches { + struct bpf_mem_cache cache[NUM_CACHES]; +}; + +static struct llist_node notrace *__llist_del_first(struct llist_head *head) +{ + struct llist_node *entry, *next; + + entry = head->first; + if (!entry) + return NULL; + next = entry->next; + head->first = next; + return entry; +} + +#define BATCH 48 +#define LOW_WATERMARK 32 +#define HIGH_WATERMARK 96 +/* Assuming the average number of elements per bucket is 64, when all buckets + * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + + * 64*4096*32 ~ 20Mbyte + */ + +static void *__alloc(struct bpf_mem_cache *c, int node) +{ + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + + if (c->kmem_cache) + return kmem_cache_alloc_node(c->kmem_cache, flags, node); + + return kmalloc_node(c->unit_size, flags, node); +} + +static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) +{ +#ifdef CONFIG_MEMCG_KMEM + if (c->objcg) + return get_mem_cgroup_from_objcg(c->objcg); +#endif + +#ifdef CONFIG_MEMCG + return root_mem_cgroup; +#else + return NULL; +#endif +} + +/* Mostly runs from irq_work except __init phase. */ +static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) +{ + struct mem_cgroup *memcg = NULL, *old_memcg; + unsigned long flags; + void *obj; + int i; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (i = 0; i < cnt; i++) { + obj = __alloc(c, node); + if (!obj) + break; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + /* In RT irq_work runs in per-cpu kthread, so disable + * interrupts to avoid preemption and interrupts and + * reduce the chance of bpf prog executing on this cpu + * when active counter is busy. + */ + local_irq_save(flags); + /* alloc_bulk runs from irq_work which will not preempt a bpf + * program that does unit_alloc/unit_free since IRQs are + * disabled there. There is no race to increment 'active' + * counter. It protects free_llist from corruption in case NMI + * bpf prog preempted this loop. + */ + WARN_ON_ONCE(local_inc_return(&c->active) != 1); + __llist_add(obj, &c->free_llist); + c->free_cnt++; + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + } + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); +} + +static void free_one(struct bpf_mem_cache *c, void *obj) +{ + if (c->kmem_cache) + kmem_cache_free(c->kmem_cache, obj); + else + kfree(obj); +} + +static void free_bulk(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + unsigned long flags; + int cnt; + + do { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(flags); + WARN_ON_ONCE(local_inc_return(&c->active) != 1); + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + else + cnt = 0; + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + free_one(c, llnode); + } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + + /* and drain free_llist_extra */ + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +static void bpf_mem_refill(struct irq_work *work) +{ + struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); + int cnt; + + /* Racy access to free_cnt. It doesn't need to be 100% accurate */ + cnt = c->free_cnt; + if (cnt < LOW_WATERMARK) + /* irq_work runs on this cpu and kmalloc will allocate + * from the current numa node which is what we want here. + */ + alloc_bulk(c, BATCH, NUMA_NO_NODE); + else if (cnt > HIGH_WATERMARK) + free_bulk(c); +} + +static void notrace irq_work_raise(struct bpf_mem_cache *c) +{ + irq_work_queue(&c->refill_work); +} + +static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) +{ + init_irq_work(&c->refill_work, bpf_mem_refill); + /* To avoid consuming memory assume that 1st run of bpf + * prog won't be doing more than 4 map_update_elem from + * irq disabled region + */ + alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); +} + +/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. + * This is typical bpf hash map use case when all elements have equal size. + * + * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on + * kmalloc/kfree. Max allocation size is 4096 in this case. + * This is bpf_dynptr and bpf_kptr use case. + */ +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +{ + static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; + struct bpf_mem_caches *cc, __percpu *pcc; + struct bpf_mem_cache *c, __percpu *pc; + struct kmem_cache *kmem_cache; + struct obj_cgroup *objcg = NULL; + char buf[32]; + int cpu, i; + + if (size) { + pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); + if (!pc) + return -ENOMEM; + size += LLIST_NODE_SZ; /* room for llist_node */ + snprintf(buf, sizeof(buf), "bpf-%u", size); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + if (!kmem_cache) { + free_percpu(pc); + return -ENOMEM; + } +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(pc, cpu); + c->kmem_cache = kmem_cache; + c->unit_size = size; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + ma->cache = pc; + return 0; + } + + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); + if (!pcc) + return -ENOMEM; +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(pcc, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + c->unit_size = sizes[i]; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + } + ma->caches = pcc; + return 0; +} + +static void drain_mem_cache(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) +{ + struct bpf_mem_caches *cc; + struct bpf_mem_cache *c; + int cpu, i; + + if (ma->cache) { + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(ma->cache, cpu); + drain_mem_cache(c); + } + /* kmem_cache and memcg are the same across cpus */ + kmem_cache_destroy(c->kmem_cache); + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->cache); + ma->cache = NULL; + } + if (ma->caches) { + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(ma->caches, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + drain_mem_cache(c); + } + } + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->caches); + ma->caches = NULL; + } +} + +/* notrace is necessary here and in other functions to make sure + * bpf programs cannot attach to them and cause llist corruptions. + */ +static void notrace *unit_alloc(struct bpf_mem_cache *c) +{ + struct llist_node *llnode = NULL; + unsigned long flags; + int cnt = 0; + + /* Disable irqs to prevent the following race for majority of prog types: + * prog_A + * bpf_mem_alloc + * preemption or irq -> prog_B + * bpf_mem_alloc + * + * but prog_B could be a perf_event NMI prog. + * Use per-cpu 'active' counter to order free_list access between + * unit_alloc/unit_free/bpf_mem_refill. + */ + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + } + local_dec(&c->active); + local_irq_restore(flags); + + WARN_ON(cnt < 0); + + if (cnt < LOW_WATERMARK) + irq_work_raise(c); + return llnode; +} + +/* Though 'ptr' object could have been allocated on a different cpu + * add it to the free_llist of the current cpu. + * Let kfree() logic deal with it when it's later called from irq_work. + */ +static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + unsigned long flags; + int cnt = 0; + + BUILD_BUG_ON(LLIST_NODE_SZ > 8); + + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + __llist_add(llnode, &c->free_llist); + cnt = ++c->free_cnt; + } else { + /* unit_free() cannot fail. Therefore add an object to atomic + * llist. free_bulk() will drain it. Though free_llist_extra is + * a per-cpu list we have to use atomic llist_add here, since + * it also can be interrupted by bpf nmi prog that does another + * unit_free() into the same free_llist_extra. + */ + llist_add(llnode, &c->free_llist_extra); + } + local_dec(&c->active); + local_irq_restore(flags); + + if (cnt > HIGH_WATERMARK) + /* free few objects from current cpu into global kmalloc pool */ + irq_work_raise(c); +} + +/* Called from BPF program or from sys_bpf syscall. + * In both cases migration is disabled. + */ +void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size) +{ + int idx; + void *ret; + + if (!size) + return ZERO_SIZE_PTR; + + idx = bpf_mem_cache_idx(size + LLIST_NODE_SZ); + if (idx < 0) + return NULL; + + ret = unit_alloc(this_cpu_ptr(ma->caches)->cache + idx); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) +{ + int idx; + + if (!ptr) + return; + + idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ)); + if (idx < 0) + return; + + unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); +} + +void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) +{ + void *ret; + + ret = unit_alloc(this_cpu_ptr(ma->cache)); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) +{ + if (!ptr) + return; + + unit_free(this_cpu_ptr(ma->cache), ptr); +} From patchwork Fri Sep 2 21:10:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964680 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27C00ECAAA1 for ; Fri, 2 Sep 2022 21:11:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B73EC8012F; Fri, 2 Sep 2022 17:11:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFBFD80120; Fri, 2 Sep 2022 17:11:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 928FB8012F; Fri, 2 Sep 2022 17:11:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7CCC380120 for ; Fri, 2 Sep 2022 17:11:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5778E1605EF for ; Fri, 2 Sep 2022 21:11:10 +0000 (UTC) X-FDA: 79868390700.02.7E80601 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf02.hostedemail.com (Postfix) with ESMTP id E9C3E80068 for ; Fri, 2 Sep 2022 21:11:09 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id jm11so2970884plb.13 for ; Fri, 02 Sep 2022 14:11:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=Ga0Nwt5Zi5Wm2Kkt3SbRcQrpwa6IAGB0M+wlcmjeY9w=; b=LT2FmVcwiQsltTeEp0hTE44aqRgd79vDL/FHer/61JBuCVzyYkfJwZf0D7Yc/BnbGN AxxHbrL4rhdcmr0YrHsDZDGgeYLpN92GHZa5Z8Hz2VPhg4yri592fHoEIWW0HvsuX6cc ayFKfJZDDSnB9CHQAxXQf8Wt55+nJgowZEIMu0aaym1TyZ9rBPLiN8fAZ6kqCorFtAiH QRngBbH5C+RlnH+wmUmESGzMFkkXQTxx5cY6s3ssUcemkDerW/OoCoaRVmCEUCPqeUOO SmcuJ1gtVens6ZxWYvP2JOJcnHxa3OMGyi3OukHYiMBs+E14gJTcJsI9g8GIWTRjC5fd cBlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Ga0Nwt5Zi5Wm2Kkt3SbRcQrpwa6IAGB0M+wlcmjeY9w=; b=PcGElxlLCQhtcbKH3FvjMSHrkHL4zkn6YwvtOFKkeRiIrTEw7+YOfjLL/136xdJw19 cF9Kfi+mrpA3bAktFSjmmZqyH7Bq/+hkQE/xLy0q7cJJ3A1RGmucid3mGokRulXrOM8Z QHwjAURZ7k/K9UId2b+9/YkzZjYxe3hE9k5k0IYyjDWnXJ/u1MxxtLBsNF4MLTNcrPD+ u3HoFxtP42BhUP3C5LRvxv4tvJ3EpLWzcPUgsQ8AA9IvYwDtjvSQ7TlvX37GjET7oErQ YyPtoiG+0Tzpr8SCR16FcKRRBFmAvyrOrk8xJPA2EpuAZe6mdkjNP8qXjIhwc5oZtCm7 sWZQ== X-Gm-Message-State: ACgBeo23nEyYJYLfgfLnrq2neDClIRFZ21PcerRVHk1DWYbfgcaGIvo0 vL3BXgAQvnmhZwGSuhs2k4Q= X-Google-Smtp-Source: AA6agR4leFQrfvhSUXDaX2SlYFzlzN6r/IBW2pDg7LziTKf/ZUbDHT0ZBiUqih6w333ieveTbAth4g== X-Received: by 2002:a17:902:e5c3:b0:175:534:1735 with SMTP id u3-20020a170902e5c300b0017505341735mr21332665plf.87.1662153069480; Fri, 02 Sep 2022 14:11:09 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id v8-20020a17090a520800b001fda0505eaasm1945582pjh.1.2022.09.02.14.11.08 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:09 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 02/16] bpf: Convert hash map to bpf_mem_alloc. Date: Fri, 2 Sep 2022 14:10:44 -0700 Message-Id: <20220902211058.60789-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153070; a=rsa-sha256; cv=none; b=qMeNq1QvwrhGPoEJ0gYtKbLZDypwmcfSob6Cm7E40a+4kW4L6VvVo9YZ/2Yew0DmW5JHGG Hzi9DZV4iUxYWHhpW2LenA8xQhxVeFW0+uGJEBiNItXKYU/pX7jWZY0iq2h3MfJpcwbtkS r2VFCyh4o9AWHYf+myZX2RNVQ91zc9M= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LT2FmVcw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ga0Nwt5Zi5Wm2Kkt3SbRcQrpwa6IAGB0M+wlcmjeY9w=; b=AN83Ig5LxBy04gnveuFF4LdLQ6YEmCSZlHiKKpap1exgEVi3Bnl7OEm26nEpzTqgrVZegk D3A/jT8CSV8AoGnuhL8+Sfelkt8WZD1dxiqO+tgY0fFcGbe7cbp5HLqPd8Hh2NomMf1EYn dQ+coo6xTXFtRQs7AIF3OxUMqh8UImQ= X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E9C3E80068 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LT2FmVcw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: m94ca1rxdbtc8atxfdxshm7acngm1rwg X-HE-Tag: 1662153069-990727 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert bpf hash map to use bpf memory allocator. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index eb1263f03e9b..508e64351f87 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,6 +14,7 @@ #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -92,6 +93,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; + struct bpf_mem_alloc ma; struct bucket *buckets; void *elems; union { @@ -576,6 +578,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (err) goto free_prealloc; } + } else { + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + if (err) + goto free_map_locked; } return &htab->map; @@ -586,6 +592,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); @@ -862,7 +869,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); check_and_free_fields(htab, l); - kfree(l); + bpf_mem_cache_free(&htab->ma, l); } static void htab_elem_free_rcu(struct rcu_head *head) @@ -986,9 +993,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = ERR_PTR(-E2BIG); goto dec_count; } - l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, - GFP_NOWAIT | __GFP_NOWARN, - htab->map.numa_node); + l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); goto dec_count; @@ -1007,7 +1012,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, pptr = bpf_map_alloc_percpu(&htab->map, size, 8, GFP_NOWAIT | __GFP_NOWARN); if (!pptr) { - kfree(l_new); + bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } @@ -1429,6 +1434,10 @@ static void delete_all_elements(struct bpf_htab *htab) { int i; + /* It's called from a worker thread, so disable migration here, + * since bpf_mem_cache_free() relies on that. + */ + migrate_disable(); for (i = 0; i < htab->n_buckets; i++) { struct hlist_nulls_head *head = select_bucket(htab, i); struct hlist_nulls_node *n; @@ -1439,6 +1448,7 @@ static void delete_all_elements(struct bpf_htab *htab) htab_elem_free(htab, l); } } + migrate_enable(); } static void htab_free_malloced_timers(struct bpf_htab *htab) @@ -1502,6 +1512,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Fri Sep 2 21:10:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F30DECAAA1 for ; Fri, 2 Sep 2022 21:11:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E807280130; Fri, 2 Sep 2022 17:11:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0A0180120; Fri, 2 Sep 2022 17:11:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5C4880130; Fri, 2 Sep 2022 17:11:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B026880120 for ; Fri, 2 Sep 2022 17:11:14 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8C0F1C02CA for ; Fri, 2 Sep 2022 21:11:14 +0000 (UTC) X-FDA: 79868390868.20.B452DFE Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf04.hostedemail.com (Postfix) with ESMTP id 35DB84004E for ; Fri, 2 Sep 2022 21:11:14 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id r69so2980677pgr.2 for ; Fri, 02 Sep 2022 14:11:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=fN5UTsD9/UNy4JwRwsLxsSRdDj8AsSF1mCpGXo1LAXw=; b=MzhJb7DeGqrdXTr68P9EuoEFICUOC+F3zkwB7lo4l3TqEH95m7Ve+aXypNk19AOh65 yZM0CMp01BZtiR4vXFlq4EEevvOJ3oQ9RBArId6AYe6xSMGcYqVJW6tU+OGn2222jYR/ 7tfsLlZVy8VP0QZkj1BTy3Sews6BaMbsOzdTLZ1FUyIqthRuRTw4GdnmISsz6aKRflqZ 8oRAON32QbXfkaVS4wbQw4MQhUGUB542Anpw2ESQv7RCQ4sOKlNmPWQN+eXBTMhO31dl HSWYQ4OlnSIJYxljp4c7UNYtBJZFh2nBrhKBmuRUNDiKMpYw3z5ddbBdbazRAew0lxL/ BVhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=fN5UTsD9/UNy4JwRwsLxsSRdDj8AsSF1mCpGXo1LAXw=; b=OwcLR8eSIuBenF/KfAohYwxvE3sGJFV8/0e7MQP9lzwcwEZFnwA1Y1S0m5ZPLDWUbh h+9YvO8Te/ZBNPsNo1lBsBtCslOkAweULKwXLhTimbHy9pfqanMXi0g/qT3qFgfZNzg0 m2f6vhqG8QP1LLkYqG+segWEolIl7Guc0vMAo/GjOP8vXL2SSkzXH0ZNXUt9hgY/hYOr MUMhSQ65tiTHntfnrBp2mQZWMsI8yhkMY4fLWtkb0iC7AETNMQf9qEbyQDvZ6aUJxhCM +fTdcjSYqgz1pnp4A+PnpRhTA/6cAiv6OG8+ZamrF8mZ9uqO9jtWhGSywHyuKHGUxtAD hmlw== X-Gm-Message-State: ACgBeo1cGaf8gW7Vj0bf23G7bzaPi2JbX0WnFI3SU06yV8UCawxP7qjD ZPg78rD1b2FD2aPi4yl3BDA= X-Google-Smtp-Source: AA6agR4Aia0MYvf8H9EHnMwCgvO736xTHM4ggch+Tusz/FffOHXj94vbODIdwLjAsXKUMbxG2UXe9Q== X-Received: by 2002:a63:564f:0:b0:425:f2cd:d0ce with SMTP id g15-20020a63564f000000b00425f2cdd0cemr31670185pgm.143.1662153073177; Fri, 02 Sep 2022 14:11:13 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id z8-20020aa79f88000000b0052e57ed8cdasm2321374pfr.55.2022.09.02.14.11.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:12 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 03/16] selftests/bpf: Improve test coverage of test_maps Date: Fri, 2 Sep 2022 14:10:45 -0700 Message-Id: <20220902211058.60789-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153074; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fN5UTsD9/UNy4JwRwsLxsSRdDj8AsSF1mCpGXo1LAXw=; b=YxsEY+TJXJhyFYaytwhxmonrWL+BpK+AbWBzSe99q3oUGezNci0BZZ27B8zUqof41yru3M a+Uthm9OuA7OuA0903XZHZQKRRpxPauhcKH06NqBCMRqiXhHJaegZ7SHOw8ZKPaG/GKOGa zqj1A1mKFTAofy8fm2mIIXm/m6Vg+Gs= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MzhJb7De; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153074; a=rsa-sha256; cv=none; b=YLZhYu6r8iF4xEBg/at5g02DCgYMxTGYTcPA/SI6OEgw38pLeVXoY/9F2TlaBOGgOoJNri 11a9MFuZA65j5rhwKg4ZK3Ey6W8xcgy3LfggQVkxCCmPPWI9LWj4EcB4Lzh2rGDjuApcl+ EsFGoJBKAE5zxFrF2B6pcEnXwl7tAcI= X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 35DB84004E Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MzhJb7De; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: ufkamdd78q4518xshcb8yoahk8z7scpy X-HE-Tag: 1662153074-27152 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_maps.c | 38 ++++++++++++++++--------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index c49f2056e14f..00b9cc305e58 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -264,10 +264,11 @@ static void test_hashmap_percpu(unsigned int task, void *data) close(fd); } +#define VALUE_SIZE 3 static int helper_fill_hashmap(int max_entries) { int i, fd, ret; - long long key, value; + long long key, value[VALUE_SIZE] = {}; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), max_entries, &map_opts); @@ -276,8 +277,8 @@ static int helper_fill_hashmap(int max_entries) "err: %s, flags: 0x%x\n", strerror(errno), map_opts.map_flags); for (i = 0; i < max_entries; i++) { - key = i; value = key; - ret = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + key = i; value[0] = key; + ret = bpf_map_update_elem(fd, &key, value, BPF_NOEXIST); CHECK(ret != 0, "can't update hashmap", "err: %s\n", strerror(ret)); @@ -288,8 +289,8 @@ static int helper_fill_hashmap(int max_entries) static void test_hashmap_walk(unsigned int task, void *data) { - int fd, i, max_entries = 1000; - long long key, value, next_key; + int fd, i, max_entries = 10000; + long long key, value[VALUE_SIZE], next_key; bool next_key_valid = true; fd = helper_fill_hashmap(max_entries); @@ -297,7 +298,7 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); } assert(i == max_entries); @@ -305,9 +306,9 @@ static void test_hashmap_walk(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, NULL, &key) == 0); for (i = 0; next_key_valid; i++) { next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - value++; - assert(bpf_map_update_elem(fd, &key, &value, BPF_EXIST) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + value[0]++; + assert(bpf_map_update_elem(fd, &key, value, BPF_EXIST) == 0); key = next_key; } @@ -316,8 +317,8 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - assert(value - 1 == key); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + assert(value[0] - 1 == key); } assert(i == max_entries); @@ -1371,16 +1372,16 @@ static void __run_parallel(unsigned int tasks, static void test_map_stress(void) { + run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_hashmap, NULL); run_parallel(100, test_hashmap_percpu, NULL); run_parallel(100, test_hashmap_sizes, NULL); - run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_arraymap, NULL); run_parallel(100, test_arraymap_percpu, NULL); } -#define TASKS 1024 +#define TASKS 100 #define DO_UPDATE 1 #define DO_DELETE 0 @@ -1432,6 +1433,8 @@ static void test_update_delete(unsigned int fn, void *data) int fd = ((int *)data)[0]; int i, key, value, err; + if (fn & 1) + test_hashmap_walk(fn, NULL); for (i = fn; i < MAP_SIZE; i += TASKS) { key = value = i; @@ -1455,7 +1458,7 @@ static void test_update_delete(unsigned int fn, void *data) static void test_map_parallel(void) { - int i, fd, key = 0, value = 0; + int i, fd, key = 0, value = 0, j = 0; int data[2]; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), @@ -1466,6 +1469,7 @@ static void test_map_parallel(void) exit(1); } +again: /* Use the same fd in children to add elements to this map: * child_0 adds key=0, key=1024, key=2048, ... * child_1 adds key=1, key=1025, key=2049, ... @@ -1502,6 +1506,12 @@ static void test_map_parallel(void) key = -1; assert(bpf_map_get_next_key(fd, NULL, &key) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &key) < 0 && errno == ENOENT); + + key = 0; + bpf_map_delete_elem(fd, &key); + if (j++ < 5) + goto again; + close(fd); } static void test_map_rdonly(void) From patchwork Fri Sep 2 21:10:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964682 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2194DC6FA83 for ; Fri, 2 Sep 2022 21:11:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B750580131; Fri, 2 Sep 2022 17:11:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFC2B80120; Fri, 2 Sep 2022 17:11:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94EC680131; Fri, 2 Sep 2022 17:11:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7C58F80120 for ; Fri, 2 Sep 2022 17:11:18 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 54B8516042E for ; Fri, 2 Sep 2022 21:11:18 +0000 (UTC) X-FDA: 79868391036.06.BC06368 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf15.hostedemail.com (Postfix) with ESMTP id EB5CAA0057 for ; Fri, 2 Sep 2022 21:11:17 +0000 (UTC) Received: by mail-pl1-f172.google.com with SMTP id d12so3028604plr.6 for ; Fri, 02 Sep 2022 14:11:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=4+uhdcDVuSjjem7hAR91KRY20zQndt0vkR0L244Rk1E=; b=E721puASI0KcWoJFMcQRS4f6jZ6+ghjyIowMzWFrGqRS6pgXqFl58dxkBEafFaLkKV CJixDExhy3mCCFK2DFTMUfBQjks9eFlG8cXISjtTYqjS0uBjtDfp/JvM3Q2/8fCO/Qg9 jN72OKZLXtlsq4xJXOC+xXn2aMP5BqeSMhEZVSDazojdqfNJ/lySABQXrUbfcTjNKDWy +qC0qXQ9/Il3e6etrNaDeiXLpFYw/TF62fkiFIJTo4r8tKqya+s67YZ/ZMFGU24r3tkP J3AMy7qFsq/tblD8OojYdv6DHwFUDqyR+M0pXhgr8oM0yPbaHbLmvLN1Ke6LP41lghpW jBqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=4+uhdcDVuSjjem7hAR91KRY20zQndt0vkR0L244Rk1E=; b=AvS9kPsLyCoMei31poox6yqZ6QN/925Ub3RIwJvelEPNt2qDr8P8XnEn0wpL7VX7oi hn+FPSqElfIHIzd9ioHSLsuHYxcfx5/nVbDiku+EZ2Qu05OiRDC7pGchOc2Pqr3AcSHR 6+tKj3+u/fwqQqmnrq8hjwm57FP35schYeRSGmH8HB3focNHVNlnNePkNrzHVxOz9oc5 zH2wduHZFy0n3xrAGED/LiFO0Ryf7xeVCAjX+jCgsQFBGgycsSjwdNguBTuED6SLpA0e dz/q843AP+U7aycmv+khoHmIvFZ8SJbWiZnd1ZloXkonhin1eUwYdba72XrjcTTe7QQ+ TvGg== X-Gm-Message-State: ACgBeo0ghhrnF2uu42ekWN9OqoEOf7lguh3jW1LAEFM2AivbxrI9IY+B 1/iGCxj2pG0UJin/Fj0nh3g= X-Google-Smtp-Source: AA6agR6dWy255w61IoDpzzOIdj3C3jFRxBDgJPN2l719BRrtodChCNCvX7n9Ei0RsrYf0afyK9nbqg== X-Received: by 2002:a17:90b:4f81:b0:1fe:1716:fe20 with SMTP id qe1-20020a17090b4f8100b001fe1716fe20mr6735459pjb.63.1662153076920; Fri, 02 Sep 2022 14:11:16 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id e7-20020a17090ab38700b001f1acb6c3ebsm1903023pjr.34.2022.09.02.14.11.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:16 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 04/16] samples/bpf: Reduce syscall overhead in map_perf_test. Date: Fri, 2 Sep 2022 14:10:46 -0700 Message-Id: <20220902211058.60789-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153077; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4+uhdcDVuSjjem7hAR91KRY20zQndt0vkR0L244Rk1E=; b=rQoVR3/jEZC2uWhLJaCyHp5xNHXPrcPHfg4OWz/hhQgpOu6qns5KDGsHuYDDxvPDodgz8r MzL0w8pnbjOZubNWJiN1CGVL87ax9DmxoLGv8xR75J0DyBwptVUAXsmPmicqtRw/3wdgkG PX8y8DWL0NczQcGlcsY7K6c9DjNAgHg= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=E721puAS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153077; a=rsa-sha256; cv=none; b=2V2oKNxPEG22q9B3dtR9jvhl/iTU/x8kTMETFqGxRapbsJOQGEkmA4QP0G1qIVgNQBCJgP 1Ophk6qlObr83ZYZfmZe62ybLTLob9Yxda8ux9DzkgLQOG9YL3jgFWF53NKr7nCNUdlw3s vqNoWkCJVYwbUi+tNSuCJunq59R+F/8= X-Rspamd-Queue-Id: EB5CAA0057 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=E721puAS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: 5o91mjh5r3x9nyfhg568ek8g7zyqa7qd X-HE-Tag: 1662153077-216035 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make map_perf_test for preallocated and non-preallocated hash map spend more time inside bpf program to focus performance analysis on the speed of update/lookup/delete operations performed by bpf program. It makes 'perf report' of bpf_mem_alloc look like: 11.76% map_perf_test [k] _raw_spin_lock_irqsave 11.26% map_perf_test [k] htab_map_update_elem 9.70% map_perf_test [k] _raw_spin_lock 9.47% map_perf_test [k] htab_map_delete_elem 8.57% map_perf_test [k] memcpy_erms 5.58% map_perf_test [k] alloc_htab_elem 4.09% map_perf_test [k] __htab_map_lookup_elem 3.44% map_perf_test [k] syscall_exit_to_user_mode 3.13% map_perf_test [k] lookup_nulls_elem_raw 3.05% map_perf_test [k] migrate_enable 3.04% map_perf_test [k] memcmp 2.67% map_perf_test [k] unit_free 2.39% map_perf_test [k] lookup_elem_raw Reduce default iteration count as well to make 'map_perf_test' quick enough even on debug kernels. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- samples/bpf/map_perf_test_kern.c | 44 ++++++++++++++++++++------------ samples/bpf/map_perf_test_user.c | 2 +- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 8773f22b6a98..7342c5b2f278 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -108,11 +108,14 @@ int stress_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map, &key); - if (value) - bpf_map_delete_elem(&hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map, &key); + if (value) + bpf_map_delete_elem(&hash_map, &key); + } return 0; } @@ -123,11 +126,14 @@ int stress_percpu_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map, &key); + } return 0; } @@ -137,11 +143,14 @@ int stress_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&hash_map_alloc, &key); + } return 0; } @@ -151,11 +160,14 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + } return 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index b6fc174ab1f2..1bb53f4b29e1 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -72,7 +72,7 @@ static int test_flags = ~0; static uint32_t num_map_entries; static uint32_t inner_lru_hash_size; static int lru_hash_lookup_test_entries = 32; -static uint32_t max_cnt = 1000000; +static uint32_t max_cnt = 10000; static int check_test_flags(enum test_type t) { From patchwork Fri Sep 2 21:10:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D17EC6FA83 for ; Fri, 2 Sep 2022 21:11:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D63A80132; Fri, 2 Sep 2022 17:11:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 15F6680120; Fri, 2 Sep 2022 17:11:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0006980132; Fri, 2 Sep 2022 17:11:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DF09680120 for ; Fri, 2 Sep 2022 17:11:21 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BEF13C04F7 for ; Fri, 2 Sep 2022 21:11:21 +0000 (UTC) X-FDA: 79868391162.16.6AD3372 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf23.hostedemail.com (Postfix) with ESMTP id 6B2FA14005A for ; Fri, 2 Sep 2022 21:11:21 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id z3-20020a17090abd8300b001fd803e34f1so6644564pjr.1 for ; Fri, 02 Sep 2022 14:11:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=J/UPgVAZJ6tCW63aRjPvtUFbVFMqtQ38ez6bXN9pU3M=; b=Z+zmpFJC3ftrCUSPfzCV0Qc+2ob3WsvAPqeuvhBKoKbFt/pQFqV+BNYqLRgGlwMwcy qgablL6I2lFEiyqIJPeMGNIfFp1Y4/CFOQbq+7qELRgwOhBw4b+BBOG0cdUFBL/bVKA6 l1407TD0ndNw/7Bdljc+ajdeoW28Qt1LrRS2SmJHx2mYoun3SJvjfsj2GyKP2bVpVB7i PkBUxOsdRi3GMkWd3moYkiUoP+O9pxQQ67cPYC0J1q0Y73cOhy/KxQNtAq3e3kWaFIoU IVSlUHtFLztYXZQdXnmTI+ecqcWsvDl2sqtgY4YnafwdeW7UpsJ7wModLPgvD7hvkbVq 7CfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=J/UPgVAZJ6tCW63aRjPvtUFbVFMqtQ38ez6bXN9pU3M=; b=ltJ/sm55q/D1hs2t1KPg49Mbhli1UKjHFK4zwrHXybznJqgqtzWoQpXM3tk0rDvVYp tOTyK1up3WVI0l9k81FnQqCIOAYN70POOxuCc17hBUIHMXvohgcubscO5is7b0e5ul0z n++KplEO3vry69ajOcfFVrkNXoHZp2jCHxmbhu4cCrIEgYuuSLb4uaWtiFCIN3kc2M6Q sJF5rP4vfHnwia++5OkPpe+RYAmcycln6BeNbvRCwDAWuaj+DNGL5WJVpvTSiD4QZC+j ZB9SqG+8oXzchzbUTLMbe7DXR1RYC0thdGL9ePOm8bMgSPPOt+mgt9EiQ3mFdImQ64Ru fNzA== X-Gm-Message-State: ACgBeo2/cs75NHxrFpvYSuPKCixMV016ngeP6S9iC7kWXR/iIxpzesXE Hq+HnvZLHgxLjqlUE/OASds= X-Google-Smtp-Source: AA6agR5fBaKqR/HFJSACOnkb0zESuppM0ekv4TN56bEZPQPXqUAgu74uIojKJ1YoGpyBynonlgp1Sw== X-Received: by 2002:a17:90a:c402:b0:1f8:c335:d4d7 with SMTP id i2-20020a17090ac40200b001f8c335d4d7mr6953680pjt.242.1662153080473; Fri, 02 Sep 2022 14:11:20 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id h16-20020a170902f55000b0017532e01e3fsm2108917plf.276.2022.09.02.14.11.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:20 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 05/16] bpf: Relax the requirement to use preallocated hash maps in tracing progs. Date: Fri, 2 Sep 2022 14:10:47 -0700 Message-Id: <20220902211058.60789-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153081; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J/UPgVAZJ6tCW63aRjPvtUFbVFMqtQ38ez6bXN9pU3M=; b=PHARK1wx0woMFGCGaC+hiDcf8+j5WDVyeTMoxX7TYU+BxqwQz6b9OwPzU1Pq3IEAEmPzvi wVBUdAlVKTn36V86KdQZOthrdyI/5SmUmZjfQMSOFp+4fJx5J5ewj2vxIztTtLO+hm0l6D RMTBrhN6xTrAIolEJ2FGVBjFIxLeua8= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Z+zmpFJC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153081; a=rsa-sha256; cv=none; b=che7CL74Ztck6G80V2oUTM1E/WD8bV/oc41a5zBXiB65FbqI+qAAjNfrqgNJFJ4s+S/3bh Fe6HpWZpI9Vc9VV9lHa9lbIRzhbKfbiJ1c8BW/4kesNpINxaeSeJG2o9r7vUPJE6Mpixk1 SAA4HDVUyKA72anHAVlFBw6jsjZeO4g= X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6B2FA14005A Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Z+zmpFJC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: jqpohzwrih8a7s6xcd5pfdjjs5nqwmpj X-HE-Tag: 1662153081-384575 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since bpf hash map was converted to use bpf_mem_alloc it is safe to use from tracing programs and in RT kernels. But per-cpu hash map is still using dynamic allocation for per-cpu map values, hence keep the warning for this map type. In the future alloc_percpu_gfp can be front-end-ed with bpf_mem_cache and this restriction will be completely lifted. perf_event (NMI) bpf programs have to use preallocated hash maps, because free_htab_elem() is using call_rcu which might crash if re-entered. Sleepable bpf programs have to use preallocated hash maps, because life time of the map elements is not protected by rcu_read_lock/unlock. This restriction can be lifted in the future as well. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0194a36d0b36..3dce3166855f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12629,10 +12629,12 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, * For programs attached to PERF events this is mandatory as the * perf NMI can hit any arbitrary code sequence. * - * All other trace types using preallocated hash maps are unsafe as - * well because tracepoint or kprobes can be inside locked regions - * of the memory allocator or at a place where a recursion into the - * memory allocator would see inconsistent state. + * All other trace types using non-preallocated per-cpu hash maps are + * unsafe as well because tracepoint or kprobes can be inside locked + * regions of the per-cpu memory allocator or at a place where a + * recursion into the per-cpu memory allocator would see inconsistent + * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is + * safe to use from kprobe/fentry and in RT. * * On RT enabled kernels run-time allocation of all trace type * programs is strictly prohibited due to lock type constraints. On @@ -12642,15 +12644,26 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, */ if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { + /* perf_event bpf progs have to use preallocated hash maps + * because non-prealloc is still relying on call_rcu to free + * elements. + */ verbose(env, "perf_event programs can only use preallocated hash map\n"); return -EINVAL; } - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, "trace type programs can only use preallocated hash map\n"); - return -EINVAL; + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || + (map->inner_map_meta && + map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + verbose(env, + "trace type programs can only use preallocated per-cpu hash map\n"); + return -EINVAL; + } + WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); + verbose(env, + "trace type programs with run-time allocated per-cpu hash maps are unsafe." + " Switch to preallocated hash maps.\n"); } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n"); } if (map_value_has_spin_lock(map)) { From patchwork Fri Sep 2 21:10:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95DA7ECAAA1 for ; Fri, 2 Sep 2022 21:11:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 299D080133; Fri, 2 Sep 2022 17:11:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 222F880120; Fri, 2 Sep 2022 17:11:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0767280133; Fri, 2 Sep 2022 17:11:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E4CC680120 for ; Fri, 2 Sep 2022 17:11:25 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BCDE8C02CA for ; Fri, 2 Sep 2022 21:11:25 +0000 (UTC) X-FDA: 79868391330.11.95B8AD3 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf09.hostedemail.com (Postfix) with ESMTP id 6EA7E14006A for ; Fri, 2 Sep 2022 21:11:25 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id 73so2987999pga.1 for ; Fri, 02 Sep 2022 14:11:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=YkiR1pR4pf3gc+NcW9hCzitUfw6uraWiWMrmU6Iy0/0=; b=FaIIYGQgFP3Xkdw5zMo/3PUWqN+uoPYtIgxds76CsSEVcDaIIT7rig2+TOOvru+wmZ t1hS3Va3+GJkfVsGewHGjNvWRmMWjYGXYIThZimEpDoVvgnhB/zbwQe1oGCDmeYPQpKL vVsCZ5hfQzNp85QoaKA51Ef7yi+Esy+dACwgQtYoMoUZGfRRxiLFOGQAdkN9kSvkVdXb UpEbKJJtXi5gTlAcLPSn9XJAJV5gBE6cSH2NDe3hYAvTeYfBfaiHn6duYCwPJDPoUxVz 3UBOXB+NtRM1KEy4fJaQMxnTRYroZ4irsiwiSQjwPeteKEK/Ezli6giWt2wBzHqRMOLs qfSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=YkiR1pR4pf3gc+NcW9hCzitUfw6uraWiWMrmU6Iy0/0=; b=2xMa4HA7T16qd6dgCdxFF8yDotVamSsgD4eEwkRJH4gzoQKB1MEWEmykNhakM9vKmx 2LZDHW+WFl78UkpCpEL/QCZkzho3HWGbIcswNdM1mpZctdm27LVLbPHmHfgIjMnzuraW gy6Ruqhoe7+eUPZUa8j/FoezTfVa4c+yFyZJIqJ4EMSPNupe4aYj2OTbhRr9yAYxxoxm Ba7ZFh5bFLvxE1mttKG+8ptAqVu/g8ZhM6EOzxSBmrZtQ+XeRU/mn5/9ni5oKBX/tMFS cLaxRLbuaokJHwagLee1L85g15EX9moorItbW4jx89/vm091neWDtJQZ/Rk4ovKcLCYf 5T3g== X-Gm-Message-State: ACgBeo3utT204Q0aS/qFofWRLRD5lvg36rrh3JxzAORwq9DrVkgtQbKs 86ohhV+cZMoGRuF21mOcp0U= X-Google-Smtp-Source: AA6agR5l3VmyLc3SLS6tKg66MjVA1vEHfr+VeW3GoIsLpZR90P7fjzurUE5ZvL8SsP9MOyCuxc0NQg== X-Received: by 2002:aa7:9430:0:b0:537:ab14:6cd8 with SMTP id y16-20020aa79430000000b00537ab146cd8mr37505123pfo.29.1662153084179; Fri, 02 Sep 2022 14:11:24 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id 35-20020a630c63000000b00433c26caf45sm356021pgm.73.2022.09.02.14.11.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:23 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 06/16] bpf: Optimize element count in non-preallocated hash map. Date: Fri, 2 Sep 2022 14:10:48 -0700 Message-Id: <20220902211058.60789-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153085; a=rsa-sha256; cv=none; b=eniaHzfJ33nAFKMlHaFvOTYI++ji7LQHMXA6oSe9/H38yn0C6TbEZ52hU9ouSOagUZ7IzB NKMvlUdO2R3jG/JQyfJXGmwnOjhKkjDseIwHqJY1g8roMOiRU4TM2rtnsIA/CCmGp6JdcV zfxO6FPlZZ17WVWb0H/zsMdJWC+8c58= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FaIIYGQg; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153085; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YkiR1pR4pf3gc+NcW9hCzitUfw6uraWiWMrmU6Iy0/0=; b=xEk3k+kCRmnNEY/chevn93cjiHZRlKUarUryDjVadcwTumgDyOwFBPqiUggPFfUQvqSvyE 4RrqlO/ETBcnplMywtuwEMPST19nWIHvf7bzyuPgL35zyy6O7+T+F0giuj3g8CpEaHU6n8 gQL1vE0gC8w7lKZjWviK8RwSuGfZQnw= Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FaIIYGQg; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: x5ao8ni6hw447f6193utfkr3ck4h3uy7 X-Rspamd-Queue-Id: 6EA7E14006A X-Rspamd-Server: rspam11 X-HE-Tag: 1662153085-802925 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus access the same bpf map. Based on specified max_entries for the hash map calculate when percpu_counter becomes faster than atomic_t and use it for such maps. For example samples/bpf/map_perf_test is using hash map with max_entries 1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per second using atomic_t. On a system with 15 cpus it shows 100k events per second using percpu. map_perf_test is an extreme case where all cpus colliding on atomic_t which causes extreme cache bouncing. Note that the slow path of percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is necessary. See comment in the code why the heuristic is based on num_online_cpus(). Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 70 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 508e64351f87..36aa16dc43ad 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -101,7 +101,12 @@ struct bpf_htab { struct bpf_lru lru; }; struct htab_elem *__percpu *extra_elems; - atomic_t count; /* number of elements in this hashtable */ + /* number of elements in non-preallocated hashtable are kept + * in either pcount or count + */ + struct percpu_counter pcount; + atomic_t count; + bool use_percpu_counter; u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; @@ -565,6 +570,29 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) htab_init_buckets(htab); +/* compute_batch_value() computes batch value as num_online_cpus() * 2 + * and __percpu_counter_compare() needs + * htab->max_entries - cur_number_of_elems to be more than batch * num_online_cpus() + * for percpu_counter to be faster than atomic_t. In practice the average bpf + * hash map size is 10k, which means that a system with 64 cpus will fill + * hashmap to 20% of 10k before percpu_counter becomes ineffective. Therefore + * define our own batch count as 32 then 10k hash map can be filled up to 80%: + * 10k - 8k > 32 _batch_ * 64 _cpus_ + * and __percpu_counter_compare() will still be fast. At that point hash map + * collisions will dominate its performance anyway. Assume that hash map filled + * to 50+% isn't going to be O(1) and use the following formula to choose + * between percpu_counter and atomic_t. + */ +#define PERCPU_COUNTER_BATCH 32 + if (attr->max_entries / 2 > num_online_cpus() * PERCPU_COUNTER_BATCH) + htab->use_percpu_counter = true; + + if (htab->use_percpu_counter) { + err = percpu_counter_init(&htab->pcount, 0, GFP_KERNEL); + if (err) + goto free_map_locked; + } + if (prealloc) { err = prealloc_init(htab); if (err) @@ -891,6 +919,31 @@ static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) } } +static bool is_map_full(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + return __percpu_counter_compare(&htab->pcount, htab->map.max_entries, + PERCPU_COUNTER_BATCH) >= 0; + return atomic_read(&htab->count) >= htab->map.max_entries; +} + +static void inc_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH); + else + atomic_inc(&htab->count); +} + +static void dec_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH); + else + atomic_dec(&htab->count); +} + + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { htab_put_fd_value(htab, l); @@ -899,7 +952,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) check_and_free_fields(htab, l); __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { - atomic_dec(&htab->count); + dec_elem_count(htab); l->htab = htab; call_rcu(&l->rcu, htab_elem_free_rcu); } @@ -983,16 +1036,15 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = container_of(l, struct htab_elem, fnode); } } else { - if (atomic_inc_return(&htab->count) > htab->map.max_entries) - if (!old_elem) { + if (is_map_full(htab)) + if (!old_elem) /* when map is full and update() is replacing * old element, it's ok to allocate, since * old element will be freed immediately. * Otherwise return an error */ - l_new = ERR_PTR(-E2BIG); - goto dec_count; - } + return ERR_PTR(-E2BIG); + inc_elem_count(htab); l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); @@ -1034,7 +1086,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new->hash = hash; return l_new; dec_count: - atomic_dec(&htab->count); + dec_elem_count(htab); return l_new; } @@ -1513,6 +1565,8 @@ static void htab_map_free(struct bpf_map *map) free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->ma); + if (htab->use_percpu_counter) + percpu_counter_destroy(&htab->pcount); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Fri Sep 2 21:10:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43022ECAAA1 for ; Fri, 2 Sep 2022 21:11:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBF0980134; Fri, 2 Sep 2022 17:11:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D45C980120; Fri, 2 Sep 2022 17:11:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC06180134; Fri, 2 Sep 2022 17:11:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A58F080120 for ; Fri, 2 Sep 2022 17:11:29 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8A229A0718 for ; Fri, 2 Sep 2022 21:11:29 +0000 (UTC) X-FDA: 79868391498.20.42A921F Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf05.hostedemail.com (Postfix) with ESMTP id 2F23210005E for ; Fri, 2 Sep 2022 21:11:29 +0000 (UTC) Received: by mail-pj1-f48.google.com with SMTP id n8-20020a17090a73c800b001fd832b54f6so3311457pjk.0 for ; Fri, 02 Sep 2022 14:11:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=Hi9c27URf1pim0nXbITJrtMXVEE20b16OwyheBVe8dg=; b=nyFySNHumtyDaweiI+wNTYPdjXY4BWQ5NMBx6KD1Di1w5aRHVPoya18N/Yob/nJRMV Ej0pB0XqLQYQmQvtykjgk1DQB61/qClKmCDa3Zj84fpp8j0LenGF6J14RxIhYhiHClyW rHUHyZhbeXGDvofOyEyMBvl1OqHmp5qubQhTOmWauiwetrBMhrNLPTkwb1Tsz9BKAnXr zPj6V6eRqsk556gpWrGvuJ9I7N4dwMd3YkFdZfngQiA+rNu+Td0z9UGee+vMJ7PDoBVy osXSykYoyOqeke8GXVuGwy/zaZTK+pgGMJgXNrZy1I5t0yNI/AyDr3dRzLPPQqlOohDR Rsuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Hi9c27URf1pim0nXbITJrtMXVEE20b16OwyheBVe8dg=; b=QqyneYXA9igStNPexYiQ8vNVZOlgKAXtCmBgqeroFKzPCR1KHNR1G2g0vEt6Zac04m l5PebroD9vsLTHhk5xLZ5RLALPpNzBQ+/6UHroXjg67BxMnFP8sGGBpX7i3+AN+wKwky Hu6sTJhr/yal53SoVeWvs3c/2Ajx7A38WV0XD70B/HFFrJj+raa1mLia695sUvvnjybJ FlF29ttwewcnUFP/klwr8zVXbmuF81ZLXhnI2iKs0GZ6e31JAbZ7A4T/pzsChYqfsj3p uKb0HfJU4ocA6BGtbTPIFwpgFR8Jn0a+/G8up9RT3opy1n3o21fRdVfjPubsy5SFwJNC ygYQ== X-Gm-Message-State: ACgBeo0GRjPXn7YCjpcsGonyca1Hl6GKmbMJfbuTKf87Px+KwOcVPYFc xlxBL/7+eCvoJO326fgU7uo= X-Google-Smtp-Source: AA6agR4/CNjzZToCIrBs+WK1cPrm9fRUH+9ploLNwUeCp+ww/syu2X08rs6IDZXIBu0jo1zvoe1IEw== X-Received: by 2002:a17:90a:a415:b0:1fa:749f:ecfb with SMTP id y21-20020a17090aa41500b001fa749fecfbmr6923222pjp.112.1662153088165; Fri, 02 Sep 2022 14:11:28 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id y20-20020aa78f34000000b00537b8aa0a46sm2356076pfr.96.2022.09.02.14.11.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:27 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 07/16] bpf: Optimize call_rcu in non-preallocated hash map. Date: Fri, 2 Sep 2022 14:10:49 -0700 Message-Id: <20220902211058.60789-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153089; a=rsa-sha256; cv=none; b=4Epog0WWEcpy6JTkR81k5hjh+6WQAocxmxsWQv1LANGI5Zzx9x+huppmrZDX0bXxF4eaaH /OYFmv/QGop1ZwD5u9B8HyVe6GSzTry6tw231+H6lqann8iQCJuvRlI91f9q3W/jovBDUG TRjNNQFR4s0m/VtK5ne8u5gflbp+ZlE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=nyFySNHu; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hi9c27URf1pim0nXbITJrtMXVEE20b16OwyheBVe8dg=; b=4NzsR4oO7sCWdGPE/DAimkNFdNbGiefmk8hkumv41RHVOoRTTAgZN0eEAIDtO+f5N5obSp hXv4FTaHCdYholz7QGijP1gXnBE3EXNkLOlYGBKsm4grv4pw/FH/ain1cctfrgcp4s2hOv O2XwC0OZvxi1Bdy6GOugfk8mIhdvBaU= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=nyFySNHu; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: 5jompyrn9h6j9g6ybiaefyrroxcyqutr X-Rspamd-Queue-Id: 2F23210005E X-Rspamd-Server: rspam11 X-HE-Tag: 1662153089-947223 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 8 ++++++-- kernel/bpf/memalloc.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 ----------- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 36aa16dc43ad..0d888a90a805 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -953,8 +953,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); + if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { + l->htab = htab; + call_rcu(&l->rcu, htab_elem_free_rcu); + } else { + htab_elem_free(htab, l); + } } } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 1c46763d855e..da0721f8c28f 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -281,7 +281,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index 5f5309791649..0053c5402173 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -208,17 +208,6 @@ static int timer_cb2(void *map, int *key, struct hmap_elem *val) */ bpf_map_delete_elem(map, key); - /* in non-preallocated hashmap both 'key' and 'val' are RCU - * protected and still valid though this element was deleted - * from the map. Arm this timer for ~35 seconds. When callback - * finishes the call_rcu will invoke: - * htab_elem_free_rcu - * check_and_free_timer - * bpf_timer_cancel_and_free - * to cancel this 35 second sleep and delete the timer for real. - */ - if (bpf_timer_start(&val->timer, 1ull << 35, 0) != 0) - err |= 256; ok |= 4; } return 0; From patchwork Fri Sep 2 21:10:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06EBFC6FA82 for ; Fri, 2 Sep 2022 21:11:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99F6380135; Fri, 2 Sep 2022 17:11:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9279E80120; Fri, 2 Sep 2022 17:11:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 785D280135; Fri, 2 Sep 2022 17:11:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6006980120 for ; Fri, 2 Sep 2022 17:11:33 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 37D55160900 for ; Fri, 2 Sep 2022 21:11:33 +0000 (UTC) X-FDA: 79868391666.15.C93D72A Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf16.hostedemail.com (Postfix) with ESMTP id DBAFA180057 for ; Fri, 2 Sep 2022 21:11:32 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id p8-20020a17090ad30800b001fdfc8c7567so8354467pju.1 for ; Fri, 02 Sep 2022 14:11:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=stV/ngTHq+LHFGpY9e6rDntSPSEwS4lPScETgTMBjXo=; b=iGbP92S7AoYyStgAIFscdU0byHZ4R4yfihu/ejpdEKBhGPmlXpbGDWKnOzFhw2y+iC R/4R/7mTdU9RRJqW7tEvn94oZDXiJx7wgbQ+yg4+ZBRNyAvo07qxit9ohqCQwOBMQClM q8FqOyD93ZPSg+4ZjcieqH9g9vER8W/svk+ruDhFwxgHLLJVbuPosJUFD8YEhzKOhrc9 RbZMxx2ShWpS/k8jcyJEF6Najg/RmmHtMw2Fjw81sjxq5uCsWb4HuFSJ4BIyNGq79qs0 2r1Hn9r29DPb86IbmNkp8EbJk6s+K2lsxdZ2leThfE4j4UgHauqVUPctfExp6vhmoy4F mSoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=stV/ngTHq+LHFGpY9e6rDntSPSEwS4lPScETgTMBjXo=; b=Ql8L8r+Ncv59UNbSp8GhlJO40mang4xqHdM1EeeR3+3dPhq3fVQAHFCngKjyj2o+AB 5nSeKPs3pCViK2vfCB1adVbzYcQjwDo2xCp2S4tkjcRz+wwYUn/+2+uOu7plqZlzdy9N G+zAiGof6dWe7TLZPfuLG9AI3BuhOjU6C9O11qrCeWRUpdr863HqInOhtL/FutUswjpQ quzaIkuG6jJpHNjqMTcEeC2EUZ9koke4nho0JHFCteGloO/9oD5Ram40bukCLC4Bf9h3 /dgRyVvv4Q9pglYTmsqXpL90AMANJsqO9DJFoodBBamW59hYrkX1rMUyj+B/htv7mzTH tbsw== X-Gm-Message-State: ACgBeo0PZxcPiL7+n9mavSi6IMYPF4iOYk6ESKQq9LvSUisIK+kZTzZv 3CN++DHCQlzxU3MjEWvExuQKtdYyGIg= X-Google-Smtp-Source: AA6agR5n4FqfTJeMOgvOKydw7BaA2vMkvHK4qGfZi6dVup6svxdDjqrb+6/ZOPQBvzzINvJgxT3JeA== X-Received: by 2002:a17:90b:4c8d:b0:1f5:409b:b017 with SMTP id my13-20020a17090b4c8d00b001f5409bb017mr6991818pjb.52.1662153091882; Fri, 02 Sep 2022 14:11:31 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id ik18-20020a170902ab1200b0017534ffd491sm2093987plb.163.2022.09.02.14.11.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:31 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 08/16] bpf: Adjust low/high watermarks in bpf_mem_cache Date: Fri, 2 Sep 2022 14:10:50 -0700 Message-Id: <20220902211058.60789-9-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153092; a=rsa-sha256; cv=none; b=Z7IuKhxpaSIWysQ/a+esPUpcvARqWhY5ipYpIMmjzeMS6o6oyG1zwP7oN4e6ouINpUeo8r LiV6XGNYo/2ByHr2rzZsbF9kdDxhgqIEZ2KspcaNDRZ0EcCUnNmlXsFO/ES2GZH2mv07nR GPNL2QrToUgfcY+qvyx7bK9S+ZOxfBY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=iGbP92S7; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153092; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=stV/ngTHq+LHFGpY9e6rDntSPSEwS4lPScETgTMBjXo=; b=gOOllTawzTPsKWYwdcLf4Qu6vyjvAua1y6TLHXFv5s/iHz8Qs8o+dS/Qq/N41XqJdBfC0G sfREQ29qg2m1TJ4yZr0bixKKE5pF3jz//3KdZh13IlL7ojvdtcqJ9Pv+nOJJXY11N14EoW Tj+X5Ap0Zdx9zVdtWD7k5SLDLPRO210= Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=iGbP92S7; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: 4gdtxp88xs97rxrz797weqg4cwb6txck X-Rspamd-Queue-Id: DBAFA180057 X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1662153092-23756 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The same low/high watermarks for every bucket in bpf_mem_cache consume significant amount of memory. Preallocating 64 elements of 4096 bytes each in the free list is not efficient. Make low/high watermarks and batching value dependent on element size. This change brings significant memory savings. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 50 +++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index da0721f8c28f..7e5df6866d92 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -100,6 +100,7 @@ struct bpf_mem_cache { int unit_size; /* count of objects in free_llist */ int free_cnt; + int low_watermark, high_watermark, batch; }; struct bpf_mem_caches { @@ -118,14 +119,6 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) return entry; } -#define BATCH 48 -#define LOW_WATERMARK 32 -#define HIGH_WATERMARK 96 -/* Assuming the average number of elements per bucket is 64, when all buckets - * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + - * 64*4096*32 ~ 20Mbyte - */ - static void *__alloc(struct bpf_mem_cache *c, int node) { /* Allocate, but don't deplete atomic reserves that typical @@ -220,7 +213,7 @@ static void free_bulk(struct bpf_mem_cache *c) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); free_one(c, llnode); - } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -234,12 +227,12 @@ static void bpf_mem_refill(struct irq_work *work) /* Racy access to free_cnt. It doesn't need to be 100% accurate */ cnt = c->free_cnt; - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) /* irq_work runs on this cpu and kmalloc will allocate * from the current numa node which is what we want here. */ - alloc_bulk(c, BATCH, NUMA_NO_NODE); - else if (cnt > HIGH_WATERMARK) + alloc_bulk(c, c->batch, NUMA_NO_NODE); + else if (cnt > c->high_watermark) free_bulk(c); } @@ -248,9 +241,38 @@ static void notrace irq_work_raise(struct bpf_mem_cache *c) irq_work_queue(&c->refill_work); } +/* For typical bpf map case that uses bpf_mem_cache_alloc and single bucket + * the freelist cache will be elem_size * 64 (or less) on each cpu. + * + * For bpf programs that don't have statically known allocation sizes and + * assuming (low_mark + high_mark) / 2 as an average number of elements per + * bucket and all buckets are used the total amount of memory in freelists + * on each cpu will be: + * 64*16 + 64*32 + 64*64 + 64*96 + 64*128 + 64*196 + 64*256 + 32*512 + 16*1024 + 8*2048 + 4*4096 + * == ~ 116 Kbyte using below heuristic. + * Initialized, but unused bpf allocator (not bpf map specific one) will + * consume ~ 11 Kbyte per cpu. + * Typical case will be between 11K and 116K closer to 11K. + * bpf progs can and should share bpf_mem_cache when possible. + */ + static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) { init_irq_work(&c->refill_work, bpf_mem_refill); + if (c->unit_size <= 256) { + c->low_watermark = 32; + c->high_watermark = 96; + } else { + /* When page_size == 4k, order-0 cache will have low_mark == 2 + * and high_mark == 6 with batch alloc of 3 individual pages at + * a time. + * 8k allocs and above low == 1, high == 3, batch == 1. + */ + c->low_watermark = max(32 * 256 / c->unit_size, 1); + c->high_watermark = max(96 * 256 / c->unit_size, 3); + } + c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1); + /* To avoid consuming memory assume that 1st run of bpf * prog won't be doing more than 4 map_update_elem from * irq disabled region @@ -392,7 +414,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) WARN_ON(cnt < 0); - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) irq_work_raise(c); return llnode; } @@ -425,7 +447,7 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) local_dec(&c->active); local_irq_restore(flags); - if (cnt > HIGH_WATERMARK) + if (cnt > c->high_watermark) /* free few objects from current cpu into global kmalloc pool */ irq_work_raise(c); } From patchwork Fri Sep 2 21:10:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D62ABECAAA1 for ; Fri, 2 Sep 2022 21:11:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7560080136; Fri, 2 Sep 2022 17:11:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DDB480120; Fri, 2 Sep 2022 17:11:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 530CC80136; Fri, 2 Sep 2022 17:11:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3D91A80120 for ; Fri, 2 Sep 2022 17:11:37 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 180521205E9 for ; Fri, 2 Sep 2022 21:11:37 +0000 (UTC) X-FDA: 79868391834.01.762F3D7 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by imf20.hostedemail.com (Postfix) with ESMTP id A67A21C0060 for ; Fri, 2 Sep 2022 21:11:36 +0000 (UTC) Received: by mail-pg1-f176.google.com with SMTP id r69so2981354pgr.2 for ; Fri, 02 Sep 2022 14:11:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=Sd1n9vsZ5EUML56NQRA7na2eSwXiAznN5BvSCeGf4jc=; b=FFwxyrYnvWCyhFpmMpEDS2jIBMnTz36dJ0TKA/Kr3TsANM4z2PT6xDhup/Aa5922Ey IqU9Og4DEl2i6COz6fyUYcwKG1wTKTIpj7kmV27g1tZic4We4PAEM8sU2F9ARt5I5I3W nOhB/rOQQJ00pxdc/+JiEc7yNA1nRZcDeRhil82kQS/3LDvedEgb/5GEYpVnxMEvHvyU spfjWhIJ6y8D4txvS2Rlv9rWbX4VPm1yfDXQUPOCJIMZPnMGhv0mAuutt5Fpb2BWWAxR 6v5FJi38ywtJyLEUGWCPdu5veu0FFY4Y53RKVZMkOTzlTVLK4qHQwstxXthl+GxGYaUL I2Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Sd1n9vsZ5EUML56NQRA7na2eSwXiAznN5BvSCeGf4jc=; b=ddTC/mnvHfPLPFD5WI2L6Lm9uFgSOsE2WSdpzMugsgpo2y7WjKu1etjLV7CVIwyY/+ j9RQZ5h4V19bNje+TWrlxrlewc4umvT2ycuhE1M6MgnsTS+0hDBzaSiaVia2JUpL4Vb5 IRVWEX/rWQHS/A8r1BRIrhaDAxUdxp2DkTrix9Bi4wKYcIEk0gm7rS2DPAar7lAQePyt 9zHnNfR7XBnmnqtWRivRuwb/rIkvvewLxd5wVDwF8fWHUlQ12XVsXYRBPVngE8UGSusE 1G7tGKKh8ofyWS5aSMqnEmduYdu9OSenXZWrhWu1TqC2YnFptfjj+3whfe1O2Ttck32h 344A== X-Gm-Message-State: ACgBeo1r+ex61iBmDIHg/w7p9hnbp/7SHLJv5x51HH06pQcBRbJd2pKB 5a5kml5Qq3TenUymXoPviN0= X-Google-Smtp-Source: AA6agR7zm2Fu+VEBkC8swnpomAV0NgCT5uIBkP+JnFg/9kk9RW1cMzm4EY6q71QYrmAQmUJ9/UNfzA== X-Received: by 2002:a63:395:0:b0:42b:80a2:7ad2 with SMTP id 143-20020a630395000000b0042b80a27ad2mr30288190pgd.194.1662153095621; Fri, 02 Sep 2022 14:11:35 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id t128-20020a628186000000b0052d50e14f1dsm2307178pfd.78.2022.09.02.14.11.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:35 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 09/16] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. Date: Fri, 2 Sep 2022 14:10:51 -0700 Message-Id: <20220902211058.60789-10-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153096; a=rsa-sha256; cv=none; b=HBa8YURnHKj+BG3MwvmFlJ9ooocb7ElmIGcuE+26j+dpXdy3hRAMaKCrWCrllk9GGOzp0c 0evs/KlJmsL9q2ZV0R/L8pHYaFi3M+G2KlOMxWEwNAF0Mc03pAisJNd8EP/K4sYbuNiqM3 dY89pI1LpJkir+qe2MLTlL+57AIg4g8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FFwxyrYn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153096; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sd1n9vsZ5EUML56NQRA7na2eSwXiAznN5BvSCeGf4jc=; b=whRga3NYARm8/wexe25W8wGgephD6mO2qPsdxZ00xUCFyhExy1RLP8lN4cCsTjLPEDivma KW1DEKmr+1EKAe6gIGhnyBd/YGxOskVPyWdamkB/dSUfrt8y8I76QbimLEC0B0XF5s9ypp XB9AC4PHhSBhqy3tWILzNAGPRmfT1qI= X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A67A21C0060 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FFwxyrYn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: jqcdnbpgskntfyfqj81czhbthah3m65z X-HE-Tag: 1662153096-542218 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 65 +++++++++++++++++++++++++++++++++++++++++-- kernel/bpf/syscall.c | 5 +++- 2 files changed, 66 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 7e5df6866d92..5d8648a01b5c 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,11 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + + struct rcu_head rcu; + struct llist_head free_by_rcu; + struct llist_head waiting_for_gp; + atomic_t call_rcu_in_progress; }; struct bpf_mem_caches { @@ -194,6 +199,45 @@ static void free_one(struct bpf_mem_cache *c, void *obj) kfree(obj); } +static void __free_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); + struct llist_node *pos, *t; + + llist_for_each_safe(pos, t, llnode) + free_one(c, pos); + atomic_set(&c->call_rcu_in_progress, 0); +} + +static void enque_to_free(struct bpf_mem_cache *c, void *obj) +{ + struct llist_node *llnode = obj; + + /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. + * Nothing races to add to free_by_rcu list. + */ + __llist_add(llnode, &c->free_by_rcu); +} + +static void do_call_rcu(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + if (atomic_xchg(&c->call_rcu_in_progress, 1)) + return; + + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + /* There is no concurrent __llist_add(waiting_for_gp) access. + * It doesn't race with llist_del_all either. + * But there could be two concurrent llist_del_all(waiting_for_gp): + * from __free_rcu() and from drain_mem_cache(). + */ + __llist_add(llnode, &c->waiting_for_gp); + call_rcu(&c->rcu, __free_rcu); +} + static void free_bulk(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; @@ -212,12 +256,13 @@ static void free_bulk(struct bpf_mem_cache *c) local_dec(&c->active); if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); - free_one(c, llnode); + enque_to_free(c, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) - free_one(c, llnode); + enque_to_free(c, llnode); + do_call_rcu(c); } static void bpf_mem_refill(struct irq_work *work) @@ -303,7 +348,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; @@ -345,6 +390,15 @@ static void drain_mem_cache(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; + /* The caller has done rcu_barrier() and no progs are using this + * bpf_mem_cache, but htab_map_free() called bpf_mem_cache_free() for + * all remaining elements and they can be in free_by_rcu or in + * waiting_for_gp lists, so drain those lists now. + */ + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp)) + free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -366,6 +420,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) kmem_cache_destroy(c->kmem_cache); if (c->objcg) obj_cgroup_put(c->objcg); + /* c->waiting_for_gp list was drained, but __free_rcu might + * still execute. Wait for it now before we free 'c'. + */ + rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; } @@ -379,6 +437,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) } if (c->objcg) obj_cgroup_put(c->objcg); + rcu_barrier(); free_percpu(ma->caches); ma->caches = NULL; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4e9d4622aef7..074c901fbb4e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -638,7 +638,10 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) bpf_map_free_id(map, do_idr_lock); btf_put(map->btf); INIT_WORK(&map->work, bpf_map_free_deferred); - schedule_work(&map->work); + /* Avoid spawning kworkers, since they all might contend + * for the same mutex like slab_mutex. + */ + queue_work(system_unbound_wq, &map->work); } } From patchwork Fri Sep 2 21:10:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5934C6FA82 for ; Fri, 2 Sep 2022 21:11:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5329780137; Fri, 2 Sep 2022 17:11:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BB3380120; Fri, 2 Sep 2022 17:11:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3355980137; Fri, 2 Sep 2022 17:11:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1EFEE80120 for ; Fri, 2 Sep 2022 17:11:41 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 990074019E for ; Fri, 2 Sep 2022 21:11:40 +0000 (UTC) X-FDA: 79868391960.17.0590126 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf23.hostedemail.com (Postfix) with ESMTP id 3F770140067 for ; Fri, 2 Sep 2022 21:11:40 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id f12so2979950plb.11 for ; Fri, 02 Sep 2022 14:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=reguNqpc0/tY/7LivNRuvOsyUpr9xAoESsfpxvrSB0I=; b=NqW7voFLIn4KDyudLyWjCqPsQS70vgg1FBUaax0Omj/X8L34xWn4m1Oqldov6DhNk1 DRikTgVelldUt/5TYCqwiwFP+E19sogjvCN2i/r6IuFzMuPEqK5E+WCcB1T67Jxlad4Q /pr8kUheAuzS4wVgo3ShyTCY6NI20wgObu3tb2q11CC36WSJrbYK0O0qYCHgwRya/QyX WMyO+f987pMSDF/ikAHg4itsfCE5DABmyZUIEnwDHy6yO/RuFp4i0kQXCldWWuo4/uAn US29dVDOUcSAL+sreWvITwKwj4GZpjgmEaEHpI9b3CRy6d5KQf7J6c2JmpFHcL+XiQHR cAZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=reguNqpc0/tY/7LivNRuvOsyUpr9xAoESsfpxvrSB0I=; b=fl0ntDQF3rJSCnmeLOsZaA5gI5zBui2m4A0nbgmvbGK6zXGrat5FN7xDse1bYuF5q4 hGv+Xm1ZCo+yCZTXZpWpdr+lsSzEUFNCqNiou37WCIYuuukhEl3piST+L576gUYj3swP /I298wOWqB07h2Yfg38uewFNF6bIw2BqEy0CHS+nIeZCTwqR9RufxSeyooIe1l/U2PDs 6J/maM+v/0N8YxtB4JorjHAQ5l3bkYORfxsb4cB+lTBO5GPjbeiih4T5yIOHdBEPecJP qRuRbCHxetm0FnLCKxka5vhMuxENXsv/z51gOOpDfgKR4WOatRb+kmECxb8DdVRoyQry 4UGQ== X-Gm-Message-State: ACgBeo3UpLXEjhD9lJcbUtk7+X0ndA18l+x5NxD4gUkgQnQIKENKP6mH XCXOxXFGv+kI5ClJKpEDrns= X-Google-Smtp-Source: AA6agR58oow/Qu5gcVVaFEZHA7FIbwUtgpktlxwu81c/wARfg0WqWLFJRObHjEPbcDqYMZwzmUYc2A== X-Received: by 2002:a17:902:ccc9:b0:174:de2b:b19a with SMTP id z9-20020a170902ccc900b00174de2bb19amr24105969ple.100.1662153099200; Fri, 02 Sep 2022 14:11:39 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id i9-20020a170902c94900b001637529493esm2120483pla.66.2022.09.02.14.11.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:38 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 10/16] bpf: Add percpu allocation support to bpf_mem_alloc. Date: Fri, 2 Sep 2022 14:10:52 -0700 Message-Id: <20220902211058.60789-11-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153100; a=rsa-sha256; cv=none; b=b9rmT4wdPJTxqFzImhjNCHkd+o92d+iH97LiHZK7k1KWeuvZBsmL/tMyWx3/vV1Z+Ntk2F ML9w+k6yW8ckGHccUetyfaPthHfUBm6rOOPcu5BVljWkPHg2laMVkNacNkT8XQUMoEaQL3 OKCGHgxRmQh8SaymYPEGzf6Y2SOb0kc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NqW7voFL; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153100; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=reguNqpc0/tY/7LivNRuvOsyUpr9xAoESsfpxvrSB0I=; b=HMBvqUDG7YLTXoOKCeXYNYbLZU7yulugCO10rfjilAl4ekaM89uEj9hXISZEyutpDpj0jn 2dntzml8MSIeswx1B/u+9lD5gHcQT22BRqm6soNcCVBjksDnFOkvKlwuOLe/k+VWOm+XNp xT3tyk+k3wpdDEV4aq+AamDxU+aZuIs= Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NqW7voFL; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: qbbzg5hqkabyy9uzxxoxnksiyxtxbu4a X-Rspamd-Queue-Id: 3F770140067 X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1662153100-554505 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 +- kernel/bpf/hashtab.c | 2 +- kernel/bpf/memalloc.c | 44 +++++++++++++++++++++++++++++++---- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 804733070f8d..653ed1584a03 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,7 +12,7 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 0d888a90a805..70b02ff4445e 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -607,7 +607,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 5d8648a01b5c..f7b07787581b 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,7 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + bool percpu; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -133,6 +134,19 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + if (c->percpu) { + void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); + + if (!obj || !pptr) { + free_percpu(pptr); + kfree(obj); + return NULL; + } + obj[1] = pptr; + return obj; + } + if (c->kmem_cache) return kmem_cache_alloc_node(c->kmem_cache, flags, node); @@ -193,6 +207,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { + if (c->percpu) { + free_percpu(((void **)obj)[1]); + kmem_cache_free(c->kmem_cache, obj); + return; + } + if (c->kmem_cache) kmem_cache_free(c->kmem_cache, obj); else @@ -332,21 +352,30 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache; + struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; char buf[32]; - int cpu, i; + int cpu, i, unit_size; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - size += LLIST_NODE_SZ; /* room for llist_node */ + + if (percpu) { + unit_size = size; + /* room for llist_node and per-cpu pointer */ + size = LLIST_NODE_SZ + sizeof(void *); + } else { + size += LLIST_NODE_SZ; /* room for llist_node */ + unit_size = size; + } + snprintf(buf, sizeof(buf), "bpf-%u", size); kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { @@ -359,14 +388,19 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); c->kmem_cache = kmem_cache; - c->unit_size = size; + c->unit_size = unit_size; c->objcg = objcg; + c->percpu = percpu; prefill_mem_cache(c, cpu); } ma->cache = pc; return 0; } + /* size == 0 && percpu is an invalid combination */ + if (WARN_ON_ONCE(percpu)) + return -EINVAL; + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; From patchwork Fri Sep 2 21:10:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E948EC6FA82 for ; Fri, 2 Sep 2022 21:11:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 88E6B80138; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8163680120; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6692180138; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4F91D80120 for ; Fri, 2 Sep 2022 17:11:44 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 27B331605EF for ; Fri, 2 Sep 2022 21:11:44 +0000 (UTC) X-FDA: 79868392128.18.4A4E795 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf18.hostedemail.com (Postfix) with ESMTP id CF9281C004D for ; Fri, 2 Sep 2022 21:11:43 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id mj6so3148367pjb.1 for ; Fri, 02 Sep 2022 14:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=TN8REtN1cZ5FkSjWziDVRGcYvlT6/7V0g9bDQR2oXQGJAz2W/n/UtuR2uitC9mHdA0 HLa0CNPRjqJSoCIh+6YEwhzoJRsxbF5pPtGyGftOu5R3wWnWmnAknCVcMpdH+aZ6jS2F g7xwWX9ZHn4BYdOlPuan08JxfyAVErz8Nwq5vAXZBVjPLkO7U0Pzh1RhaqTP49bQiB1w fA4kygCTfMM9x8mSJqMr3xg2X16ke8zx8TKjctMXkjw1YMay+gTaq25sFT+UP7cXk6g7 1dkztw58LrjHMddP5IO6SQHmrEkSCmogZPEx1pTjj+0HH6VaTOkBOtINWm4L18hdabtJ kuGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=qViGgk/PSI0b2RLdGHX5/UFpa6gNNZ2ER9BF/ujgfkHsGZV9RSzcTlJiO965O420mD YocKyUb67ofy80xp1vSkLzUIrU9+OzWTiivRbiWpgo4zPJIcg+cyxTYtZFB8pp33G+je TV0iU1fgiOLsZ1b3kZL9ldPHxkMXig4HL2kElmW8JH94hoY/kbYnmpQPlZmxxoUFELEM NVss7FPahpk16IoU8dCJI0XfoN8YO7Pr4X5NurzGLaHM8lez70l51TZ3GZUBplQhDbfT HWs6rWzmXLLPjw3B/3BDX0gfAAJfPaG2M3qtt6ebjjTppJ11hmcJcJlq+RNF7FnovNBy AzAg== X-Gm-Message-State: ACgBeo0jc4BrBmPAtO5QsiPemzjiFhNNRYxwH5ShF1EkZp5RqVQcySqc F2+oWTwwnvc3qS44SgMas5I= X-Google-Smtp-Source: AA6agR7qo1RKuBrsh4lT+UcNA95jTvod8BaJQSuVrCPa9T2Jady4SgM/LjVGogJSCCyPFILj/AMKyQ== X-Received: by 2002:a17:902:cf43:b0:172:86f3:586a with SMTP id e3-20020a170902cf4300b0017286f3586amr37149775plg.71.1662153102793; Fri, 02 Sep 2022 14:11:42 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id d10-20020a170902ceca00b0015e8d4eb1d7sm2172217plg.33.2022.09.02.14.11.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:42 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 11/16] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. Date: Fri, 2 Sep 2022 14:10:53 -0700 Message-Id: <20220902211058.60789-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TN8REtN1; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153103; a=rsa-sha256; cv=none; b=3lm0W3gHS//xRFTY4C2v1hqOVqUxLJlN/p7gkD6BkXRaQRQscOYisJoW3qd6H/49aE/VZb Dm1Nn4424DrZTUsYrpK3I4oTm2YBwPM24c8wx7SqoAU2ZfiIe+Gz4EouZ+2Nf36zx8tDll FblIDK2dAt5swgPKpY7je1N0qA9DRpk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=AWTRtP3sa7sP18U6cZ+gmZkKCmF5b+yhvpyUfHMTW6UP5V6bIJInL6w0j9S6Qmj3Jrhwp5 qEHQLb6iiXVV75bwFIF7K7INzL93bzpUS2xZrRkpYtZs2rVddFv1El7z+mESOVplE720Gs 9VJp/FVuRhJB+Prt89/gK0UqjK9MUcA= X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CF9281C004D X-Stat-Signature: rmegseqw7q7ih6fwiytiony8qw9a7ipw Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TN8REtN1; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1662153103-213277 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to zero-fill per-cpu allocations, since dynamically allocated map elements are now similar to full prealloc, since alloc_percpu() is not called inline and the elements are reused in the freelist. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 45 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 70b02ff4445e..a77b9c4a4e48 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -94,6 +94,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; + struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { @@ -121,14 +122,14 @@ struct htab_elem { struct { void *padding; union { - struct bpf_htab *htab; struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { - struct rcu_head rcu; + /* pointer to per-cpu pointer */ + void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; @@ -448,8 +449,6 @@ static int htab_map_alloc_check(union bpf_attr *attr) bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); - BUILD_BUG_ON(offsetof(struct htab_elem, htab) != - offsetof(struct htab_elem, hash_node.pprev)); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); @@ -610,6 +609,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; + if (percpu) { + err = bpf_mem_alloc_init(&htab->pcpu_ma, + round_up(htab->map.value_size, 8), true); + if (err) + goto free_map_locked; + } } return &htab->map; @@ -620,6 +625,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); @@ -895,19 +901,11 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) - free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); + bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); check_and_free_fields(htab, l); bpf_mem_cache_free(&htab->ma, l); } -static void htab_elem_free_rcu(struct rcu_head *head) -{ - struct htab_elem *l = container_of(head, struct htab_elem, rcu); - struct bpf_htab *htab = l->htab; - - htab_elem_free(htab, l); -} - static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; @@ -953,12 +951,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); - } else { - htab_elem_free(htab, l); - } + htab_elem_free(htab, l); } } @@ -983,13 +976,12 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus) { - /* When using prealloc and not setting the initial value on all cpus, - * zero-fill element values for other cpus (just as what happens when - * not using prealloc). Otherwise, bpf program has no way to ensure + /* When not setting the initial value on all cpus, zero-fill element + * values for other cpus. Otherwise, bpf program has no way to ensure * known initial values for cpus other than current one * (onallcpus=false always when coming from bpf prog). */ - if (htab_is_prealloc(htab) && !onallcpus) { + if (!onallcpus) { u32 size = round_up(htab->map.value_size, 8); int current_cpu = raw_smp_processor_id(); int cpu; @@ -1060,18 +1052,18 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, memcpy(l_new->key, key, key_size); if (percpu) { - size = round_up(size, 8); if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ - pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_NOWAIT | __GFP_NOWARN); + pptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!pptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } + l_new->ptr_to_pptr = pptr; + pptr = *(void **)pptr; } pcpu_init_value(htab, pptr, value, onallcpus); @@ -1568,6 +1560,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); From patchwork Fri Sep 2 21:10:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A13FBC6FA82 for ; Fri, 2 Sep 2022 21:11:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D31F80139; Fri, 2 Sep 2022 17:11:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35AC480120; Fri, 2 Sep 2022 17:11:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AE1280139; Fri, 2 Sep 2022 17:11:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0480080120 for ; Fri, 2 Sep 2022 17:11:48 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E13371A0822 for ; Fri, 2 Sep 2022 21:11:47 +0000 (UTC) X-FDA: 79868392254.19.F9C821D Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf03.hostedemail.com (Postfix) with ESMTP id 89F6A20052 for ; Fri, 2 Sep 2022 21:11:47 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id w139so3041790pfc.13 for ; Fri, 02 Sep 2022 14:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=0fGyRfQR7wXSaL6USjScrGPvvIFh66CwdJrIvv58a5g=; b=f9ZJGOxbwYCUfS699Y4HA7lWPPB3p8DS5HVSKmOcLRQUl8OIe8Hlunu6jAN/iqOW/N QhCKvfG0MsvcA17ghHoZEnsfnrA6kgsmKMLwXWVJ+wzu6nGWnUs44ny1WHxEPloYRiqP GS+Y+HPYHsLEeFDiEauG/g38ktmMlyQizGE6PiWPYuz/DDbbMQI6UvLCL+Ctb9Ieyly0 l1V60K6e11KsUkpS4iskN8qgaySrb4A7plkOJ8oOl15OKzlcFxMv7+mvVjaeSr7KRaJz KLfhEYUuFMSGr+0ZwK1/K05S7YWXkqvIwHxjJ87arj7aDQCAj32CZOoDbiDdK0RB1jdD HOXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=0fGyRfQR7wXSaL6USjScrGPvvIFh66CwdJrIvv58a5g=; b=JIL72xkEmtjP3rO8ZT2/DcKP6Mc19nJRmK4MNyF1EOchV0fW5oVGqhvST87CdgH4j5 D/jpiM/qmwiZbTjeVE3K0mLxK6l62oWpgu+O7kJpdCY74t24xMWVCSrtORzHLtKbghQl 8D/lmm41Km261vJh910UapWICnaRiXuIiM9Ob9HSqTi7vmfAhQrl7F3EuMBwmzVZ9zAO AQPXJ4FhDeOvTjbvg/wAV1cARC/do2SVnEO1Xpwqu0QQ1hGpURxHLzUOimyma+E6vN7z 7QcuvsdoSdNELkQ/Dgxv0KMgPpfR5TQKKkMzAxjRgx44K5rNHFiErERnua9V1OSAvvmY 9H0A== X-Gm-Message-State: ACgBeo0QnP46ZLfI2s2NqJuwQcaKIu0Zoi/kelQvXRfrC8uBeZISTCx2 SW4vSRMy6xgtZNXNlaI353zIqnTbMZ0= X-Google-Smtp-Source: AA6agR7Y0ntmo6WJN5i54Frl+0Q0pJs+/U+OuU5sDJJjsDGmYEEAIe/z6DKl8Pf1q40IiYJfNKDOGQ== X-Received: by 2002:a63:451f:0:b0:42c:5a26:d7cc with SMTP id s31-20020a63451f000000b0042c5a26d7ccmr19825722pga.199.1662153106479; Fri, 02 Sep 2022 14:11:46 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id iw5-20020a170903044500b0016d1f6d1b99sm2075753plb.49.2022.09.02.14.11.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:46 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 12/16] bpf: Remove tracing program restriction on map types Date: Fri, 2 Sep 2022 14:10:54 -0700 Message-Id: <20220902211058.60789-13-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=f9ZJGOxb; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153107; a=rsa-sha256; cv=none; b=xPNHJt8iup0aJOD1gyn4d4Kq4I80di0JzrjFzBEPqPFIGJSgaoMjwye8mfncI3GEhpHYvJ EOXCpHbXGt8ilMXNIvqsrndrFtwOKmzf5VTNyTYM6XItS2roH6CwI1SB7f2EoYUDDd74ef BcPxaW+NeAuTgaDMrGiLJ3ehKIjkxnw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0fGyRfQR7wXSaL6USjScrGPvvIFh66CwdJrIvv58a5g=; b=Y14hXWeEkIikKMCjwtfxRdxIhcYfKRN+8IcnOzzvlU8GVzwz4SMeauGoSkcFtSFuNOHWvu ABTaML99jhrTtyz7GPNqe+fxS36+TiBI7ZbS3RaYpnCuNRDmhdgA8Bo7aip9RSrq+MTvfU 0k+bu2eLgYqHbXH4chCbwQ/oJ/M7naQ= X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 89F6A20052 X-Stat-Signature: 8pwwfowgqp3hyyk3cz95tueag1mizjiu Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=f9ZJGOxb; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1662153107-178220 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The hash map is now fully converted to bpf_mem_alloc. Its implementation is not allocating synchronously and not calling call_rcu() directly. It's now safe to use non-preallocated hash maps in all types of tracing programs including BPF_PROG_TYPE_PERF_EVENT that runs out of NMI context. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3dce3166855f..57ec06b1d09d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12623,48 +12623,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, { enum bpf_prog_type prog_type = resolve_prog_type(prog); - /* - * Validate that trace type programs use preallocated hash maps. - * - * For programs attached to PERF events this is mandatory as the - * perf NMI can hit any arbitrary code sequence. - * - * All other trace types using non-preallocated per-cpu hash maps are - * unsafe as well because tracepoint or kprobes can be inside locked - * regions of the per-cpu memory allocator or at a place where a - * recursion into the per-cpu memory allocator would see inconsistent - * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is - * safe to use from kprobe/fentry and in RT. - * - * On RT enabled kernels run-time allocation of all trace type - * programs is strictly prohibited due to lock type constraints. On - * !RT kernels it is allowed for backwards compatibility reasons for - * now, but warnings are emitted so developers are made aware of - * the unsafety and can fix their programs before this is enforced. - */ - if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { - if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { - /* perf_event bpf progs have to use preallocated hash maps - * because non-prealloc is still relying on call_rcu to free - * elements. - */ - verbose(env, "perf_event programs can only use preallocated hash map\n"); - return -EINVAL; - } - if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || - (map->inner_map_meta && - map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, - "trace type programs can only use preallocated per-cpu hash map\n"); - return -EINVAL; - } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, - "trace type programs with run-time allocated per-cpu hash maps are unsafe." - " Switch to preallocated hash maps.\n"); - } - } if (map_value_has_spin_lock(map)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { From patchwork Fri Sep 2 21:10:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 282DDECAAA1 for ; Fri, 2 Sep 2022 21:11:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C37438013A; Fri, 2 Sep 2022 17:11:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC01280120; Fri, 2 Sep 2022 17:11:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A39CE8013A; Fri, 2 Sep 2022 17:11:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8E5E180120 for ; Fri, 2 Sep 2022 17:11:51 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6AE274053C for ; Fri, 2 Sep 2022 21:11:51 +0000 (UTC) X-FDA: 79868392422.05.D101492 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf18.hostedemail.com (Postfix) with ESMTP id 299B01C004D for ; Fri, 2 Sep 2022 21:11:51 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id 145so3083490pfw.4 for ; Fri, 02 Sep 2022 14:11:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=7zZUv+wUiLTbSdfVlnRTA9LoEPz96HdCoAnAM60zCkc=; b=NwS+9ZK2eKTx+aaMOQ7GdHd0ONySpdNTqKHMSj9KKgWrBRR8iXt8Df/fwP/LvFm+10 D1EWQA+KOVMQrFsTbYkppw0sjJqjbiJ8xgDT/caLNdqxKR3ulJItoGLuoFfkewGl4GVj WooLniQWrCHe2zNrOzblGJtjrYvUfDWJTZG1kajIikXP6grHGph1/wguaa33stHepzJ+ L2ZiXeHpJwK8aDdwIawyaJ8Ex9mjGH+9bWT/TAU3ij0sM/l57nR4efX65CHBrZzq/NiT 6ZDREUOVgpQFhfiMnmg/RieFkpIfvfPUn9Bw2snvhhpFPoUHuljnJnBJuQFM1RWMjNxA 19EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=7zZUv+wUiLTbSdfVlnRTA9LoEPz96HdCoAnAM60zCkc=; b=b+w1Bhnd9BIxdbVOcBT6D21xf5YfVfAZaviV7+CazAhSFYRjHxSqFE3VIk7MJqiomt 3XwRA1YhxEDfboww+gr1rOhC3lssxISfec5LTXSvADq31QNRneM0ot87H2/9uoddMRBs muU32zzDdfA49QUb4oQjgBECpPL7eFzUuUvbXFI5oc80oXkLQB63uyxKqqRlcyUtYsvC 58ShV6T/yXAYradywOIu/uHz1XxsoR1dabIPG1Vq/8p27OEytI0MWNg7/hSWN55VdOJN TXhm2FXPkzyLnC98XSiFtB9rtkFo8NN189/4n9f9FGFC29BSn9vSYe9uxoYR8BqbHHRX jtlw== X-Gm-Message-State: ACgBeo199SkI1uD1D1X5aIjZmc0wL4hNMVKUqRhE/7sYablEHadwJ4ZF Gcs1W6ICUPcjLsCk01JtF6c= X-Google-Smtp-Source: AA6agR5uPuVHb59iNf+oddnsd3ubEx/z1NYbz1Rk4wngjEXEteRwU+dblj5OPnu6zTdC6eXF++2rNA== X-Received: by 2002:a63:2048:0:b0:41c:daad:450d with SMTP id r8-20020a632048000000b0041cdaad450dmr31330653pgm.240.1662153110141; Fri, 02 Sep 2022 14:11:50 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id c18-20020a170902d49200b0017315b11bb8sm2124526plg.213.2022.09.02.14.11.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:49 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 13/16] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. Date: Fri, 2 Sep 2022 14:10:55 -0700 Message-Id: <20220902211058.60789-14-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153111; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7zZUv+wUiLTbSdfVlnRTA9LoEPz96HdCoAnAM60zCkc=; b=Hq+uXo2tK+XiJqHGupBfdXNlLqdJ6HU3SDsn8BfmMRtasMYOXQLKwseSOfUxYHG5230CY0 auvH7Q23GQ6YkY7jmc6/XMcyiZ+aI+AKEKXo5lMD9IGeuy1l2pHzgQT9FsbFTerbh055f1 CozvY1gwlPJigwm+FrF09BYVpIze/WI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NwS+9ZK2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153111; a=rsa-sha256; cv=none; b=PVl2QpGLpGnpjeVTfM1hDu80pgZK77aawg0f7nPQFHTcTI+P2LUI92795rPyw3l9fgP0yt hmQQcb/roCflInB6fEiEf22yWxS3FiGtYNkx1ZA8qMQpIXaN66WLlqL5dJap+68/V425yo ajIdPWWFZlzofxg0IAaT3GY75J4E7MQ= X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 299B01C004D Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NwS+9ZK2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: ocw1quekbqaqmzf8381akcxx7wus9na3 X-HE-Tag: 1662153111-359330 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Use call_rcu_tasks_trace() to wait for sleepable progs to finish. Then use call_rcu() to wait for normal progs to finish and finally do free_one() on each element when freeing objects into global memory pool. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index f7b07787581b..8895c016dcdb 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -230,6 +230,13 @@ static void __free_rcu(struct rcu_head *head) atomic_set(&c->call_rcu_in_progress, 0); } +static void __free_rcu_tasks_trace(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + + call_rcu(&c->rcu, __free_rcu); +} + static void enque_to_free(struct bpf_mem_cache *c, void *obj) { struct llist_node *llnode = obj; @@ -255,7 +262,11 @@ static void do_call_rcu(struct bpf_mem_cache *c) * from __free_rcu() and from drain_mem_cache(). */ __llist_add(llnode, &c->waiting_for_gp); - call_rcu(&c->rcu, __free_rcu); + /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. + * Then use call_rcu() to wait for normal progs to finish + * and finally do free_one() on each element. + */ + call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) @@ -457,6 +468,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) /* c->waiting_for_gp list was drained, but __free_rcu might * still execute. Wait for it now before we free 'c'. */ + rcu_barrier_tasks_trace(); rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; @@ -471,6 +483,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) } if (c->objcg) obj_cgroup_put(c->objcg); + rcu_barrier_tasks_trace(); rcu_barrier(); free_percpu(ma->caches); ma->caches = NULL; From patchwork Fri Sep 2 21:10:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E40FCC6FA82 for ; Fri, 2 Sep 2022 21:11:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F1A78013B; Fri, 2 Sep 2022 17:11:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77A2880120; Fri, 2 Sep 2022 17:11:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CD4D8013B; Fri, 2 Sep 2022 17:11:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3E52C80120 for ; Fri, 2 Sep 2022 17:11:55 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2097E40606 for ; Fri, 2 Sep 2022 21:11:55 +0000 (UTC) X-FDA: 79868392590.22.9C3AEA0 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf06.hostedemail.com (Postfix) with ESMTP id DA191180051 for ; Fri, 2 Sep 2022 21:11:54 +0000 (UTC) Received: by mail-pf1-f182.google.com with SMTP id 199so3091819pfz.2 for ; Fri, 02 Sep 2022 14:11:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=nsaRt32bGU9db3oBvsXD1o3WCeCEvT2VQ8VQ0NN+DuQ=; b=C/CaWD4ElLDtQ9OCjP7zOMpPGB6pYoJ90/7pZPfC3SrcvyyHOFoqMO6a0yYQyCi1X2 Qv3gXak4IX/7E9+oBq/vaMNOdlWTE4mbN1D2NtlDP8JvL+1NH31u2uuL3lxCvb+CqmHm RXFf4dGQeIV6ptFYGiSLSYq962jDG0shEmm6CR81FJIgTI5qT8zUCa/aOgsETE6nQbH+ FjkDYoz2XK6ynUFRIimGHCiVkJr4Ek+QsKm/d5b4B36jf8JNV8g6zlfGQKUPXIjAFLmn S9EbAHCx0R26ODJyrFWANcbyPq7Ll6QBDUzDb0IokS/ZCsbI79XGtAJfUD6EpE81F26s 28jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=nsaRt32bGU9db3oBvsXD1o3WCeCEvT2VQ8VQ0NN+DuQ=; b=SKpMzW+2SZaO7EgmRzSx04s6Wb0xFqsbIYzH7MKheUsNfXjiewxO91GmsFV214v6Oo 78jASqBzjwiAnU96go0z6EXOE08r/4KIi+r0XQsRzpWWvNftjRk2D/ScRNJSYhA4xDPM daD3/UmYCXjqtS719tq1emRZ/ZdpPXBl7pmAeZvsDsFWTY547e4As492ZM7dJiRxEM5s iwKOHynVCML20N+fm2lN/QZgfhWfLmDn3f6BXaXSj09VHYAGAeJYUA6qZzP2kkSSboyH QyNQbMA1GnZd07JWV8hbZ+cpsRK+E5aB65nlCKpjUM/x0kg6vCRs6xR6yEBKWxoRXc1t iEiA== X-Gm-Message-State: ACgBeo0zD+tE2pLApIzJoAbr98EX9S7YAkA79DWKDtc+RQ1RWg+d3KVs nKKiSDFlpNKAYKPr3eVqLfG25e0XJGM= X-Google-Smtp-Source: AA6agR7n2Lcnr0pfUz+dYSYvV00llofPH/wKroO0zBTAg4E6DBUpTD3EVRZhXHdvzx96c3xWOiTC+Q== X-Received: by 2002:a05:6a00:1797:b0:538:7c07:f36d with SMTP id s23-20020a056a00179700b005387c07f36dmr21628092pfg.12.1662153113858; Fri, 02 Sep 2022 14:11:53 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id u6-20020a62d446000000b00537f7d04fb3sm2328371pfl.145.2022.09.02.14.11.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:53 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 14/16] bpf: Remove prealloc-only restriction for sleepable bpf programs. Date: Fri, 2 Sep 2022 14:10:56 -0700 Message-Id: <20220902211058.60789-15-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153114; a=rsa-sha256; cv=none; b=njHxddJG8SV4sejFqR8f+d4W7hn787zGEC0F6wU+FyBYhUQlMpi7FzhDio25yrFzcHAPpS 1JOxci5BU8JvUrd5iCu5+tSxPQJdVJ19e3koKpJQ8dJV7D8Ug1HY3gkQkiGlVOrF61PB0v /ZfztRk9hcKRWJSvjFhYHW2wEygGxRo= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="C/CaWD4E"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nsaRt32bGU9db3oBvsXD1o3WCeCEvT2VQ8VQ0NN+DuQ=; b=fL2jAu6RNnPKfkRiJVsP7bJrBqfAu8szmlovnTsoLKr/iPslR1Y7r9jxfNEjy+XRoU1ZId 8+uJZVvUeL1QLYZMC5HaiChlhOi7lMt3QgyYUACpbGj1ZnBLXMs2mTnUM/i9HnjxCICjwX plbgKcnL1sD7NWdrnL+on1yWI0irDZs= X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: DA191180051 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="C/CaWD4E"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: bg1dt8jnj6rwtg7rhhiu5hr7caib7asm X-HE-Tag: 1662153114-256249 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since hash map is now converted to bpf_mem_alloc and it's waiting for rcu and rcu_tasks_trace GPs before freeing elements into global memory slabs it's safe to use dynamically allocated hash maps in sleepable bpf programs. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 57ec06b1d09d..068b20ed34d2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12586,14 +12586,6 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env, return err; } -static int check_map_prealloc(struct bpf_map *map) -{ - return (map->map_type != BPF_MAP_TYPE_HASH && - map->map_type != BPF_MAP_TYPE_PERCPU_HASH && - map->map_type != BPF_MAP_TYPE_HASH_OF_MAPS) || - !(map->map_flags & BPF_F_NO_PREALLOC); -} - static bool is_tracing_prog_type(enum bpf_prog_type type) { switch (type) { @@ -12608,15 +12600,6 @@ static bool is_tracing_prog_type(enum bpf_prog_type type) } } -static bool is_preallocated_map(struct bpf_map *map) -{ - if (!check_map_prealloc(map)) - return false; - if (map->inner_map_meta && !check_map_prealloc(map->inner_map_meta)) - return false; - return true; -} - static int check_map_prog_compatibility(struct bpf_verifier_env *env, struct bpf_map *map, struct bpf_prog *prog) @@ -12669,12 +12652,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_LRU_PERCPU_HASH: case BPF_MAP_TYPE_ARRAY_OF_MAPS: case BPF_MAP_TYPE_HASH_OF_MAPS: - if (!is_preallocated_map(map)) { - verbose(env, - "Sleepable programs can only use preallocated maps\n"); - return -EINVAL; - } - break; case BPF_MAP_TYPE_RINGBUF: case BPF_MAP_TYPE_INODE_STORAGE: case BPF_MAP_TYPE_SK_STORAGE: From patchwork Fri Sep 2 21:10:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8626AC6FA82 for ; Fri, 2 Sep 2022 21:11:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24DBB8013C; Fri, 2 Sep 2022 17:11:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D58880120; Fri, 2 Sep 2022 17:11:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0287A8013C; Fri, 2 Sep 2022 17:11:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DE73B80120 for ; Fri, 2 Sep 2022 17:11:58 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BEBF1AB4F4 for ; Fri, 2 Sep 2022 21:11:58 +0000 (UTC) X-FDA: 79868392716.21.58AC200 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf25.hostedemail.com (Postfix) with ESMTP id 74F3FA0066 for ; Fri, 2 Sep 2022 21:11:58 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id m10-20020a17090a730a00b001fa986fd8eeso6677041pjk.0 for ; Fri, 02 Sep 2022 14:11:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=Pags8qPPfwR6EZ4x9OlF9eTDmuCtBGklMkNVzgvuBVU=; b=dt5/tqYKF+8sKq/fRHZyCUDdVjK+DPcHhDtZ6m/chA0Q3MJn9qFWnyvSCWK6/+3yjl xj4mdLcEYxjg6L6ZqvGepiFOLXhps7zRiC6QiREiOCP/Zjbnd07p3evGJYAcH6+jq8v+ RuYHGGYvKhoTTvQZ0i8uouxsdCm2mx4r+UPg8VROTr3RNyNJgI+su8OU7tgqLd2Dse1L gzXtjlarLRLSKn06SqHEcD9Ck2eBH4ET3GfOXQcxYMqmG6MUNWs7hi4nzJbH/ysFNezw Nvv71ibCORw/xxC0vBXnr//I9BIoEITz3pyBxuPrYWRYoskO+JFYDy97INFruI9YYoWo p/uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Pags8qPPfwR6EZ4x9OlF9eTDmuCtBGklMkNVzgvuBVU=; b=yMsBVC04NJyW3A676JjoKTzZ2P4EjL1UVXLkZJHL6xePZinEhyotkNEKdYo4t7ke/w +1pbOVEvYjfPlWBsGPKQJVuqSdOWAmvJ9prVwry8/swh4LoUbrrg/Yq1unPAwguUqzRi WmNZ/yFFiXvB9PiT5h1fdNCWzVivJ6NkC67lLCigh4mWcb6rHvKyFQV2GhzCMweV5D+E pREu+dvNwRfWv7y8d6k0ZqlY3Vc/3cMJrx9u8/tCBONXLcjSOyv7O3F/uStSP9jO5Dg7 CzxujAev93ybu1Vtd4S5Vbe1bh1618NG+b2tlRm8hJIIjT1EoymDLD1Y+uDo+lWOOo3z e+wQ== X-Gm-Message-State: ACgBeo17Mj75foiyUWHfBj5m2yOhsipOxUMPU7PQIy6D7swxWU2ztJJp qzhz6fEqv5my8iQxiUZ1p5k= X-Google-Smtp-Source: AA6agR5nt/10dzw1MEiwQ0MhAY16jQplPyzoS39H0+ktckGNi7GdGsrRpdUU0PcOM1jr5LU9HCCF6Q== X-Received: by 2002:a17:90a:e4cb:b0:1fd:9626:c7cf with SMTP id e11-20020a17090ae4cb00b001fd9626c7cfmr6945435pju.103.1662153117526; Fri, 02 Sep 2022 14:11:57 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id p2-20020a17090a748200b001fb0cf37db1sm5555785pjk.14.2022.09.02.14.11.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:11:57 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 15/16] bpf: Remove usage of kmem_cache from bpf_mem_cache. Date: Fri, 2 Sep 2022 14:10:57 -0700 Message-Id: <20220902211058.60789-16-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153118; a=rsa-sha256; cv=none; b=ZNRgbRdgEprZUreL6RJ2jUGbu8n/JwwBNc+fVDQuv+J5E1cWjIdNiA5Cmbjyr6PO1ej85m zMAGDARlxLtt7aRRsVxHO/ojYvoGo0DmhO+84Ogyn6Q1XcOY3Iwtp4iGhjsPYb7u+4f4Od 0N81NhM9A3T81lC6njXg1zmuT5JCkPQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="dt5/tqYK"; spf=pass (imf25.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Pags8qPPfwR6EZ4x9OlF9eTDmuCtBGklMkNVzgvuBVU=; b=18sniltL+DKxw7dsjQx3yeAFa2QUBNNnJR9DqER7SX6/dZgpHnh/AumzYwe7w4W5HB6Aip R+kwYhusHmZlSIZey1UgX777XI6P6HxA0NECZl/rx1hvax3oHkstskK5+JakoUEaFQBljJ 5kTXFDdMnLncctvIOsP7DxdfZld3HOE= Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="dt5/tqYK"; spf=pass (imf25.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: xst138d5ggm88cunwp3sdg749sm4mqb3 X-Rspamd-Queue-Id: 74F3FA0066 X-Rspamd-Server: rspam11 X-HE-Tag: 1662153118-828005 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov For bpf_mem_cache based hash maps the following stress test: for (i = 1; i <= 512; i <<= 1) for (j = 1; j <= 1 << 18; j <<= 1) fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, i, j, 2, 0); creates many kmem_cache-s that are not mergeable in debug kernels and consume unnecessary amount of memory. Turned out bpf_mem_cache's free_list logic does batching well, so usage of kmem_cache for fixes size allocations doesn't bring any performance benefits vs normal kmalloc. Hence get rid of kmem_cache in bpf_mem_cache. That saves memory, speeds up map create/destroy operations, while maintains hash map update/delete performance. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 50 ++++++++++++------------------------------- 1 file changed, 14 insertions(+), 36 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 8895c016dcdb..38fbd15c130a 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -91,17 +91,13 @@ struct bpf_mem_cache { */ struct llist_head free_llist_extra; - /* kmem_cache != NULL when bpf_mem_alloc was created for specific - * element size. - */ - struct kmem_cache *kmem_cache; struct irq_work refill_work; struct obj_cgroup *objcg; int unit_size; /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; - bool percpu; + int percpu_size; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -134,8 +130,8 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; - if (c->percpu) { - void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + if (c->percpu_size) { + void **obj = kmalloc_node(c->percpu_size, flags, node); void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); if (!obj || !pptr) { @@ -147,9 +143,6 @@ static void *__alloc(struct bpf_mem_cache *c, int node) return obj; } - if (c->kmem_cache) - return kmem_cache_alloc_node(c->kmem_cache, flags, node); - return kmalloc_node(c->unit_size, flags, node); } @@ -207,16 +200,13 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { - if (c->percpu) { + if (c->percpu_size) { free_percpu(((void **)obj)[1]); - kmem_cache_free(c->kmem_cache, obj); + kfree(obj); return; } - if (c->kmem_cache) - kmem_cache_free(c->kmem_cache, obj); - else - kfree(obj); + kfree(obj); } static void __free_rcu(struct rcu_head *head) @@ -356,7 +346,7 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); } -/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. +/* When size != 0 bpf_mem_cache for each cpu. * This is typical bpf hash map use case when all elements have equal size. * * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on @@ -368,40 +358,29 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; - char buf[32]; - int cpu, i, unit_size; + int cpu, i, unit_size, percpu_size = 0; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - if (percpu) { - unit_size = size; + if (percpu) /* room for llist_node and per-cpu pointer */ - size = LLIST_NODE_SZ + sizeof(void *); - } else { + percpu_size = LLIST_NODE_SZ + sizeof(void *); + else size += LLIST_NODE_SZ; /* room for llist_node */ - unit_size = size; - } + unit_size = size; - snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); - if (!kmem_cache) { - free_percpu(pc); - return -ENOMEM; - } #ifdef CONFIG_MEMCG_KMEM objcg = get_obj_cgroup_from_current(); #endif for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); - c->kmem_cache = kmem_cache; c->unit_size = unit_size; c->objcg = objcg; - c->percpu = percpu; + c->percpu_size = percpu_size; prefill_mem_cache(c, cpu); } ma->cache = pc; @@ -461,8 +440,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) c = per_cpu_ptr(ma->cache, cpu); drain_mem_cache(c); } - /* kmem_cache and memcg are the same across cpus */ - kmem_cache_destroy(c->kmem_cache); + /* objcg is the same across cpus */ if (c->objcg) obj_cgroup_put(c->objcg); /* c->waiting_for_gp list was drained, but __free_rcu might From patchwork Fri Sep 2 21:10:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12964694 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 486DBC6FA83 for ; Fri, 2 Sep 2022 21:12:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D70F58013D; Fri, 2 Sep 2022 17:12:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD15B80120; Fri, 2 Sep 2022 17:12:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B240C8013D; Fri, 2 Sep 2022 17:12:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9BB9180120 for ; Fri, 2 Sep 2022 17:12:02 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7E2B7160900 for ; Fri, 2 Sep 2022 21:12:02 +0000 (UTC) X-FDA: 79868392884.25.7565A1B Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by imf21.hostedemail.com (Postfix) with ESMTP id 3A8961C0068 for ; Fri, 2 Sep 2022 21:12:02 +0000 (UTC) Received: by mail-pf1-f177.google.com with SMTP id 76so3083832pfy.3 for ; Fri, 02 Sep 2022 14:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date; bh=TNRfuwjqb+LO+kCM/Gb16FKn3hCFhHeYDMKUUqgmZJU=; b=mUxeRFBjujyis+Oaqckkc+Lfw3tvqmjSQQaa9XEhJefD1zM9ot961+hcJnTVarStvu vOgf+8whE9N/pIhGsOwoRvu4z75yZlh7ppOXENTADsYN461+8hJOvqVs0YHr+o20aK3V vjjltMqDFAZP2PP2fBm/wiEvpg/xGXioyWGTgUjCDZ76R2ShBw7jvn7k6sbaNkYnjj65 vcUQlyvUk7+7yxMdCw3cPhSbV1+yBHCl01/wcbozp1b8rJ1Xn6FLz+H9ymdfIWOypI4x ymajU7G6+gW96jJsGr7AapONB+oaj2BTe19Skoi1jHt9APVBnyamdP2wz/c8x2zOAHP4 cb6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=TNRfuwjqb+LO+kCM/Gb16FKn3hCFhHeYDMKUUqgmZJU=; b=HQ/W+NHpVwXE6Hy00T+G7ji/vSJ4zyqVMlzG5QMv5Go92eh2s0l96SPcpcoFOKiOz1 HuyXA347bCuI96Ao1s63I7tlPmfLaPrLn02FWL7BR87QByY6NT06ouuSWI1LbXl90w+5 h+2F62/1N/mdjrO0wpWv7M6OhttJWjT4KKpNvU00srI6MmZxxWpS3qi9Wr3I7pT9DVqO TJVN6rAr/apoI5WEcamuaeMqcyFaYGDX5pMVj4MLiLPwMIeoZZmhZzl4kbBddKFkUJdd Hut3TwkR3qoKSnXDPqKScaQdAu4ltIBXz5nwKRteDZ1s7F6b1BpAoICuLZrXWNsDdm1V Vx3w== X-Gm-Message-State: ACgBeo1a8vNkSHkaOE7b4wxPHqi+kbFAsF3gikUozbIO7/0BFlaP94WR j9o8gfH+15IelfamMrgnnQg= X-Google-Smtp-Source: AA6agR6oG9yBLI5jQxAMQ6eReXxNUFt9PmBKK9JLSdg7raR8jc1au4n4apR6oJwGTu+n1V7+P2v/6w== X-Received: by 2002:a05:6a00:1393:b0:536:5b8a:c35b with SMTP id t19-20020a056a00139300b005365b8ac35bmr39009212pfg.5.1662153121244; Fri, 02 Sep 2022 14:12:01 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::c978]) by smtp.gmail.com with ESMTPSA id e16-20020aa798d0000000b005360da6b26bsm2221360pfm.159.2022.09.02.14.11.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 02 Sep 2022 14:12:00 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v6 bpf-next 16/16] bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. Date: Fri, 2 Sep 2022 14:10:58 -0700 Message-Id: <20220902211058.60789-17-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220902211058.60789-1-alexei.starovoitov@gmail.com> References: <20220902211058.60789-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mUxeRFBj; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662153122; a=rsa-sha256; cv=none; b=5pW+7sImPdX2z9g1SuPiHbxQqB6qdRFdW4AFn9xVM535ghQLuEBHE9r8fVBbm3JkmHPAds +32nlrBTTdM2THYh3OSnZiDELunbPyNodXjgPk8SlWttnvDXDrxH/JeL1yFGlOeu//mmuq bLrjT2kHartajXkC5mHeWTZSeEEwBhk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662153122; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TNRfuwjqb+LO+kCM/Gb16FKn3hCFhHeYDMKUUqgmZJU=; b=XVuQoGKFngIW8gKdn0VGb6PQh2A+pG7vzXLWjVTdVpfdE1Zv90O0rsaItS9aIGi+zNWa0H VKPQ3p4YXe1OLHiR9zEQ8pSjUILk3LTGOegW7pLQmqyX4kFvjfIbMPG7shTlSgzPfNMj66 N+mQ8GCwQV/UiiVqr+kKOQ06g6zI4+o= X-Rspamd-Queue-Id: 3A8961C0068 X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mUxeRFBj; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam09 X-Stat-Signature: 7ndi7iu7orhzenf5f5aeh4tt5r9ciwkr X-HE-Tag: 1662153122-705160 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov User space might be creating and destroying a lot of hash maps. Synchronous rcu_barrier-s in a destruction path of hash map delay freeing of hash buckets and other map memory and may cause artificial OOM situation under stress. Optimize rcu_barrier usage between bpf hash map and bpf_mem_alloc: - remove rcu_barrier from hash map, since htab doesn't use call_rcu directly and there are no callback to wait for. - bpf_mem_alloc has call_rcu_in_progress flag that indicates pending callbacks. Use it to avoid barriers in fast path. - When barriers are needed copy bpf_mem_alloc into temp structure and wait for rcu barrier-s in the worker to let the rest of hash map freeing to proceed. Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 + kernel/bpf/hashtab.c | 6 +-- kernel/bpf/memalloc.c | 80 ++++++++++++++++++++++++++++------- 3 files changed, 69 insertions(+), 19 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 653ed1584a03..3e164b8efaa9 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -3,6 +3,7 @@ #ifndef _BPF_MEM_ALLOC_H #define _BPF_MEM_ALLOC_H #include +#include struct bpf_mem_cache; struct bpf_mem_caches; @@ -10,6 +11,7 @@ struct bpf_mem_caches; struct bpf_mem_alloc { struct bpf_mem_caches __percpu *caches; struct bpf_mem_cache __percpu *cache; + struct work_struct work; }; int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index a77b9c4a4e48..0fe3f136cbbe 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1546,10 +1546,10 @@ static void htab_map_free(struct bpf_map *map) * There is no need to synchronize_rcu() here to protect map elements. */ - /* some of free_htab_elem() callbacks for elements of this map may - * not have executed. Wait for them. + /* htab no longer uses call_rcu() directly. bpf_mem_alloc does it + * underneath and is reponsible for waiting for callbacks to finish + * during bpf_mem_alloc_destroy(). */ - rcu_barrier(); if (!htab_is_prealloc(htab)) { delete_all_elements(htab); } else { diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 38fbd15c130a..5cc952da7d41 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -414,10 +414,9 @@ static void drain_mem_cache(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; - /* The caller has done rcu_barrier() and no progs are using this - * bpf_mem_cache, but htab_map_free() called bpf_mem_cache_free() for - * all remaining elements and they can be in free_by_rcu or in - * waiting_for_gp lists, so drain those lists now. + /* No progs are using this bpf_mem_cache, but htab_map_free() called + * bpf_mem_cache_free() for all remaining elements and they can be in + * free_by_rcu or in waiting_for_gp lists, so drain those lists now. */ llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) free_one(c, llnode); @@ -429,42 +428,91 @@ static void drain_mem_cache(struct bpf_mem_cache *c) free_one(c, llnode); } +static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) +{ + free_percpu(ma->cache); + free_percpu(ma->caches); + ma->cache = NULL; + ma->caches = NULL; +} + +static void free_mem_alloc(struct bpf_mem_alloc *ma) +{ + /* waiting_for_gp lists was drained, but __free_rcu might + * still execute. Wait for it now before we freeing percpu caches. + */ + rcu_barrier_tasks_trace(); + rcu_barrier(); + free_mem_alloc_no_barrier(ma); +} + +static void free_mem_alloc_deferred(struct work_struct *work) +{ + struct bpf_mem_alloc *ma = container_of(work, struct bpf_mem_alloc, work); + + free_mem_alloc(ma); + kfree(ma); +} + +static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress) +{ + struct bpf_mem_alloc *copy; + + if (!rcu_in_progress) { + /* Fast path. No callbacks are pending, hence no need to do + * rcu_barrier-s. + */ + free_mem_alloc_no_barrier(ma); + return; + } + + copy = kmalloc(sizeof(*ma), GFP_KERNEL); + if (!copy) { + /* Slow path with inline barrier-s */ + free_mem_alloc(ma); + return; + } + + /* Defer barriers into worker to let the rest of map memory to be freed */ + copy->cache = ma->cache; + ma->cache = NULL; + copy->caches = ma->caches; + ma->caches = NULL; + INIT_WORK(©->work, free_mem_alloc_deferred); + queue_work(system_unbound_wq, ©->work); +} + void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) { struct bpf_mem_caches *cc; struct bpf_mem_cache *c; - int cpu, i; + int cpu, i, rcu_in_progress; if (ma->cache) { + rcu_in_progress = 0; for_each_possible_cpu(cpu) { c = per_cpu_ptr(ma->cache, cpu); drain_mem_cache(c); + rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } /* objcg is the same across cpus */ if (c->objcg) obj_cgroup_put(c->objcg); - /* c->waiting_for_gp list was drained, but __free_rcu might - * still execute. Wait for it now before we free 'c'. - */ - rcu_barrier_tasks_trace(); - rcu_barrier(); - free_percpu(ma->cache); - ma->cache = NULL; + destroy_mem_alloc(ma, rcu_in_progress); } if (ma->caches) { + rcu_in_progress = 0; for_each_possible_cpu(cpu) { cc = per_cpu_ptr(ma->caches, cpu); for (i = 0; i < NUM_CACHES; i++) { c = &cc->cache[i]; drain_mem_cache(c); + rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } } if (c->objcg) obj_cgroup_put(c->objcg); - rcu_barrier_tasks_trace(); - rcu_barrier(); - free_percpu(ma->caches); - ma->caches = NULL; + destroy_mem_alloc(ma, rcu_in_progress); } }