From patchwork Thu Sep 1 16:15:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FFBCECAAD3 for ; Thu, 1 Sep 2022 16:15:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9604C6B0095; Thu, 1 Sep 2022 12:15:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E95780007; Thu, 1 Sep 2022 12:15:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73D55940008; Thu, 1 Sep 2022 12:15:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5CC886B0095 for ; Thu, 1 Sep 2022 12:15:57 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 28D271C66C9 for ; Thu, 1 Sep 2022 16:15:57 +0000 (UTC) X-FDA: 79864017954.03.BD33E35 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf16.hostedemail.com (Postfix) with ESMTP id 9F044180054 for ; Thu, 1 Sep 2022 16:15:56 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id s206so16839729pgs.3 for ; Thu, 01 Sep 2022 09:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=40G1ymhuiC/g2HUm+i3kSkyauN9uFFb8HbB3KTVRMFY=; b=G+PCLeS/3pUiVWC9cEhyke8wmbF88uRK4iQ03aAHJxiprNnnIVXtbNMtzWu081nWsG GOrt51x01uaFqQBy2E/XtJARQvVI9Lg95Fckwtpj1bRkoGz0g9Wv0b3i9mjcNy8OBnYr +pJ99jEM0Enwi3MFIjjHilkmC2HD/remq5X0ngYZIWpg1Q6v9mYh8LF10UiHwPmQn9w+ ynG0/D9xJhKBaOjjym2inMd2yNu8eviAjUbrIdcjPjWxjCxZ4lQ9fHOL+MypGgksx3zu MfBTvPnFqxbXurD4rXNp8I50UtoMi4HPNLFsVMRelhq7cQrQepw5zl3ZSXM73LtWotoP XcVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=40G1ymhuiC/g2HUm+i3kSkyauN9uFFb8HbB3KTVRMFY=; b=Y+SfOJVc21Y6x0xfeCKrlh/9VGITB/TfDRneCbwIfdif2E3k7V7Yn2bqFK2X7UfrK0 3dLdR2AdBVXVtS0KKIxdhQPEEXcBNzwPKykMqKdaO8VAiTYsNkyHI4e2nsv3wKWDYRi1 90L25rVoFUgop7UD7YXua/vMQqKUicno58tmdG1nTyPnKX+XxIGVR6sC1a1JBYGQLLTu YTuhIAbwhbbBFeF00huHJW0stZ+10lET6cgbX08Oa9SjtT//BHq29bj3MwdPCEtbbGHT a3nVTMGGQdNDlpNRU+plWk1en3Fs6Zy0bZpEXiWHgy0vW8OPXkn2m2APvcoUPUFtjmW2 pu/Q== X-Gm-Message-State: ACgBeo1Z17ly4v5yI1+QJtDD2yTtirQBVfDF3gii5lotr2cNthHmksqt V5XXd+ieAppjBLZxs9kAErM= X-Google-Smtp-Source: AA6agR48qarzvb3itZBYAialhcgSzY1HzEtR0wr8ysB+/0wwQjTDDx995s2fitzveF0Cm7dHveJiOg== X-Received: by 2002:aa7:9590:0:b0:537:c766:dae0 with SMTP id z16-20020aa79590000000b00537c766dae0mr30163919pfj.71.1662048955263; Thu, 01 Sep 2022 09:15:55 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id a11-20020a170902710b00b0017546742eefsm3701186pll.68.2022.09.01.09.15.53 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:15:54 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 01/15] bpf: Introduce any context BPF specific memory allocator. Date: Thu, 1 Sep 2022 09:15:33 -0700 Message-Id: <20220901161547.57722-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=40G1ymhuiC/g2HUm+i3kSkyauN9uFFb8HbB3KTVRMFY=; b=SgHJFc3J3EZoLtq3uUGfTMNDz09lTEZ4QikJkBcAQ0kAEv0txQCh+g1gyLsrbg2UNVG6O0 WOA1aUUBznEPIS9wmZgC9t9dms8eRq6BDTkMPImsF/smIA+W+CWyipcinlrOSVcuzKZMmj KE5xZegJ1zucd8gMYiKLk3rdDPIuhsM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="G+PCLeS/"; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048956; a=rsa-sha256; cv=none; b=6E5uFq5J9k2KTBv5LKfcv73jApf5IRAqP6It5GQsnAul/pE6hgS7cNyh3qOYTtX9ECT0U8 apZxsVM8xblaAE/Qvbze8g/Ups5/lF23eCaX2BO4/MKtMyIzCya2KQuaadTL4/zstZtc0M GSXKPwrv4oSjmnq4CWoJuZNmANLw2cw= X-Stat-Signature: h1wn3op695h9ggrmju4pmogj9dxys7zj X-Rspamd-Queue-Id: 9F044180054 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="G+PCLeS/"; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1662048956-510020 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with minimal per-cpu cache of free elements. Refill this cache asynchronously from irq_work. BPF programs always run with migration disabled. It's safe to allocate from cache of the current cpu with irqs disabled. Free-ing is always done into bucket of the current cpu as well. irq_work trims extra free elements from buckets with kfree and refills them with kmalloc, so global kmalloc logic takes care of freeing objects allocated by one cpu and freed on another. struct bpf_mem_alloc supports two modes: - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. This is typical bpf hash map use case when all elements have equal size. - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on kmalloc/kfree. Max allocation size is 4096 in this case. This is bpf_dynptr and bpf_kptr use case. bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 26 ++ kernel/bpf/Makefile | 2 +- kernel/bpf/memalloc.c | 480 ++++++++++++++++++++++++++++++++++ 3 files changed, 507 insertions(+), 1 deletion(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h new file mode 100644 index 000000000000..804733070f8d --- /dev/null +++ b/include/linux/bpf_mem_alloc.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#ifndef _BPF_MEM_ALLOC_H +#define _BPF_MEM_ALLOC_H +#include + +struct bpf_mem_cache; +struct bpf_mem_caches; + +struct bpf_mem_alloc { + struct bpf_mem_caches __percpu *caches; + struct bpf_mem_cache __percpu *cache; +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); + +/* kmalloc/kfree equivalent: */ +void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); +void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); + +/* kmem_cache_alloc/free equivalent: */ +void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); +void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); + +#endif /* _BPF_MEM_ALLOC_H */ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 00e05b69a4df..341c94f208f4 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c new file mode 100644 index 000000000000..1c46763d855e --- /dev/null +++ b/kernel/bpf/memalloc.c @@ -0,0 +1,480 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include +#include +#include + +/* Any context (including NMI) BPF specific memory allocator. + * + * Tracing BPF programs can attach to kprobe and fentry. Hence they + * run in unknown context where calling plain kmalloc() might not be safe. + * + * Front-end kmalloc() with per-cpu per-bucket cache of free elements. + * Refill this cache asynchronously from irq_work. + * + * CPU_0 buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * ... + * CPU_N buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * + * The buckets are prefilled at the start. + * BPF programs always run with migration disabled. + * It's safe to allocate from cache of the current cpu with irqs disabled. + * Free-ing is always done into bucket of the current cpu as well. + * irq_work trims extra free elements from buckets with kfree + * and refills them with kmalloc, so global kmalloc logic takes care + * of freeing objects allocated by one cpu and freed on another. + * + * Every allocated objected is padded with extra 8 bytes that contains + * struct llist_node. + */ +#define LLIST_NODE_SZ sizeof(struct llist_node) + +/* similar to kmalloc, but sizeof == 8 bucket is gone */ +static u8 size_index[24] __ro_after_init = { + 3, /* 8 */ + 3, /* 16 */ + 4, /* 24 */ + 4, /* 32 */ + 5, /* 40 */ + 5, /* 48 */ + 5, /* 56 */ + 5, /* 64 */ + 1, /* 72 */ + 1, /* 80 */ + 1, /* 88 */ + 1, /* 96 */ + 6, /* 104 */ + 6, /* 112 */ + 6, /* 120 */ + 6, /* 128 */ + 2, /* 136 */ + 2, /* 144 */ + 2, /* 152 */ + 2, /* 160 */ + 2, /* 168 */ + 2, /* 176 */ + 2, /* 184 */ + 2 /* 192 */ +}; + +static int bpf_mem_cache_idx(size_t size) +{ + if (!size || size > 4096) + return -1; + + if (size <= 192) + return size_index[(size - 1) / 8] - 1; + + return fls(size - 1) - 1; +} + +#define NUM_CACHES 11 + +struct bpf_mem_cache { + /* per-cpu list of free objects of size 'unit_size'. + * All accesses are done with interrupts disabled and 'active' counter + * protection with __llist_add() and __llist_del_first(). + */ + struct llist_head free_llist; + local_t active; + + /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill + * are sequenced by per-cpu 'active' counter. But unit_free() cannot + * fail. When 'active' is busy the unit_free() will add an object to + * free_llist_extra. + */ + struct llist_head free_llist_extra; + + /* kmem_cache != NULL when bpf_mem_alloc was created for specific + * element size. + */ + struct kmem_cache *kmem_cache; + struct irq_work refill_work; + struct obj_cgroup *objcg; + int unit_size; + /* count of objects in free_llist */ + int free_cnt; +}; + +struct bpf_mem_caches { + struct bpf_mem_cache cache[NUM_CACHES]; +}; + +static struct llist_node notrace *__llist_del_first(struct llist_head *head) +{ + struct llist_node *entry, *next; + + entry = head->first; + if (!entry) + return NULL; + next = entry->next; + head->first = next; + return entry; +} + +#define BATCH 48 +#define LOW_WATERMARK 32 +#define HIGH_WATERMARK 96 +/* Assuming the average number of elements per bucket is 64, when all buckets + * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + + * 64*4096*32 ~ 20Mbyte + */ + +static void *__alloc(struct bpf_mem_cache *c, int node) +{ + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + + if (c->kmem_cache) + return kmem_cache_alloc_node(c->kmem_cache, flags, node); + + return kmalloc_node(c->unit_size, flags, node); +} + +static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) +{ +#ifdef CONFIG_MEMCG_KMEM + if (c->objcg) + return get_mem_cgroup_from_objcg(c->objcg); +#endif + +#ifdef CONFIG_MEMCG + return root_mem_cgroup; +#else + return NULL; +#endif +} + +/* Mostly runs from irq_work except __init phase. */ +static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) +{ + struct mem_cgroup *memcg = NULL, *old_memcg; + unsigned long flags; + void *obj; + int i; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (i = 0; i < cnt; i++) { + obj = __alloc(c, node); + if (!obj) + break; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + /* In RT irq_work runs in per-cpu kthread, so disable + * interrupts to avoid preemption and interrupts and + * reduce the chance of bpf prog executing on this cpu + * when active counter is busy. + */ + local_irq_save(flags); + /* alloc_bulk runs from irq_work which will not preempt a bpf + * program that does unit_alloc/unit_free since IRQs are + * disabled there. There is no race to increment 'active' + * counter. It protects free_llist from corruption in case NMI + * bpf prog preempted this loop. + */ + WARN_ON_ONCE(local_inc_return(&c->active) != 1); + __llist_add(obj, &c->free_llist); + c->free_cnt++; + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + } + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); +} + +static void free_one(struct bpf_mem_cache *c, void *obj) +{ + if (c->kmem_cache) + kmem_cache_free(c->kmem_cache, obj); + else + kfree(obj); +} + +static void free_bulk(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + unsigned long flags; + int cnt; + + do { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(flags); + WARN_ON_ONCE(local_inc_return(&c->active) != 1); + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + else + cnt = 0; + local_dec(&c->active); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + free_one(c, llnode); + } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + + /* and drain free_llist_extra */ + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +static void bpf_mem_refill(struct irq_work *work) +{ + struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); + int cnt; + + /* Racy access to free_cnt. It doesn't need to be 100% accurate */ + cnt = c->free_cnt; + if (cnt < LOW_WATERMARK) + /* irq_work runs on this cpu and kmalloc will allocate + * from the current numa node which is what we want here. + */ + alloc_bulk(c, BATCH, NUMA_NO_NODE); + else if (cnt > HIGH_WATERMARK) + free_bulk(c); +} + +static void notrace irq_work_raise(struct bpf_mem_cache *c) +{ + irq_work_queue(&c->refill_work); +} + +static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) +{ + init_irq_work(&c->refill_work, bpf_mem_refill); + /* To avoid consuming memory assume that 1st run of bpf + * prog won't be doing more than 4 map_update_elem from + * irq disabled region + */ + alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); +} + +/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. + * This is typical bpf hash map use case when all elements have equal size. + * + * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on + * kmalloc/kfree. Max allocation size is 4096 in this case. + * This is bpf_dynptr and bpf_kptr use case. + */ +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +{ + static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; + struct bpf_mem_caches *cc, __percpu *pcc; + struct bpf_mem_cache *c, __percpu *pc; + struct kmem_cache *kmem_cache; + struct obj_cgroup *objcg = NULL; + char buf[32]; + int cpu, i; + + if (size) { + pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); + if (!pc) + return -ENOMEM; + size += LLIST_NODE_SZ; /* room for llist_node */ + snprintf(buf, sizeof(buf), "bpf-%u", size); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + if (!kmem_cache) { + free_percpu(pc); + return -ENOMEM; + } +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(pc, cpu); + c->kmem_cache = kmem_cache; + c->unit_size = size; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + ma->cache = pc; + return 0; + } + + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); + if (!pcc) + return -ENOMEM; +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(pcc, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + c->unit_size = sizes[i]; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + } + ma->caches = pcc; + return 0; +} + +static void drain_mem_cache(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) + free_one(c, llnode); +} + +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) +{ + struct bpf_mem_caches *cc; + struct bpf_mem_cache *c; + int cpu, i; + + if (ma->cache) { + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(ma->cache, cpu); + drain_mem_cache(c); + } + /* kmem_cache and memcg are the same across cpus */ + kmem_cache_destroy(c->kmem_cache); + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->cache); + ma->cache = NULL; + } + if (ma->caches) { + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(ma->caches, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + drain_mem_cache(c); + } + } + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->caches); + ma->caches = NULL; + } +} + +/* notrace is necessary here and in other functions to make sure + * bpf programs cannot attach to them and cause llist corruptions. + */ +static void notrace *unit_alloc(struct bpf_mem_cache *c) +{ + struct llist_node *llnode = NULL; + unsigned long flags; + int cnt = 0; + + /* Disable irqs to prevent the following race for majority of prog types: + * prog_A + * bpf_mem_alloc + * preemption or irq -> prog_B + * bpf_mem_alloc + * + * but prog_B could be a perf_event NMI prog. + * Use per-cpu 'active' counter to order free_list access between + * unit_alloc/unit_free/bpf_mem_refill. + */ + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + } + local_dec(&c->active); + local_irq_restore(flags); + + WARN_ON(cnt < 0); + + if (cnt < LOW_WATERMARK) + irq_work_raise(c); + return llnode; +} + +/* Though 'ptr' object could have been allocated on a different cpu + * add it to the free_llist of the current cpu. + * Let kfree() logic deal with it when it's later called from irq_work. + */ +static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + unsigned long flags; + int cnt = 0; + + BUILD_BUG_ON(LLIST_NODE_SZ > 8); + + local_irq_save(flags); + if (local_inc_return(&c->active) == 1) { + __llist_add(llnode, &c->free_llist); + cnt = ++c->free_cnt; + } else { + /* unit_free() cannot fail. Therefore add an object to atomic + * llist. free_bulk() will drain it. Though free_llist_extra is + * a per-cpu list we have to use atomic llist_add here, since + * it also can be interrupted by bpf nmi prog that does another + * unit_free() into the same free_llist_extra. + */ + llist_add(llnode, &c->free_llist_extra); + } + local_dec(&c->active); + local_irq_restore(flags); + + if (cnt > HIGH_WATERMARK) + /* free few objects from current cpu into global kmalloc pool */ + irq_work_raise(c); +} + +/* Called from BPF program or from sys_bpf syscall. + * In both cases migration is disabled. + */ +void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size) +{ + int idx; + void *ret; + + if (!size) + return ZERO_SIZE_PTR; + + idx = bpf_mem_cache_idx(size + LLIST_NODE_SZ); + if (idx < 0) + return NULL; + + ret = unit_alloc(this_cpu_ptr(ma->caches)->cache + idx); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) +{ + int idx; + + if (!ptr) + return; + + idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ)); + if (idx < 0) + return; + + unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); +} + +void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) +{ + void *ret; + + ret = unit_alloc(this_cpu_ptr(ma->cache)); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) +{ + if (!ptr) + return; + + unit_free(this_cpu_ptr(ma->cache), ptr); +} From patchwork Thu Sep 1 16:15:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D2B3ECAAD3 for ; Thu, 1 Sep 2022 16:16:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0E6B6B0098; Thu, 1 Sep 2022 12:16:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A976D940009; Thu, 1 Sep 2022 12:16:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 911DE940008; Thu, 1 Sep 2022 12:16:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7ABF16B0098 for ; Thu, 1 Sep 2022 12:16:03 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A4007C0E28 for ; Thu, 1 Sep 2022 16:16:00 +0000 (UTC) X-FDA: 79864018080.01.C097C30 Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf23.hostedemail.com (Postfix) with ESMTP id 4275B14005B for ; Thu, 1 Sep 2022 16:16:00 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id x1-20020a17090ab00100b001fda21bbc90so2877384pjq.3 for ; Thu, 01 Sep 2022 09:15:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=Ga0Nwt5Zi5Wm2Kkt3SbRcQrpwa6IAGB0M+wlcmjeY9w=; b=nWj8xtovNxnToI65F02+lMStzi08h3Vfb8Hlqm/P/qDpJUAGiVvgLujS2txPE83BJG OJ6iLWh2nH7xvvIeikFp+AqEJ2N5V7k6VuJBgfVEc7lOy+zPql6nkKkVykOHqm5UZLzG LwTIXuaq8pz60Qg1MCK3VqgtZfekqK/hoZM6GOBlarLzGDy5o4gNAFVNfAICmONL7gd6 OE9Bvyja+wiB/5Sftnhw54lWAKedp9r0ZCWwh9Zsx09b377Gtsz/YM0T2AeLiIAtI32W FU4v9JLYNl8lgTULJPOK/kt1tbmxw0Iw9oHipAn45DTzXWFwV9hHS3atkCbItGkr88J3 VqsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Ga0Nwt5Zi5Wm2Kkt3SbRcQrpwa6IAGB0M+wlcmjeY9w=; b=p+/23xDgIDcz4a0rPikGgoE3BoYbnqZjejAjTxDQq7mBa7USsTpx9ZB2KeRjczIr+U FWlRs3uc/zyW8vpBe0aCIUfeSARg4zHILJAlF2J4VELaw6XlCer4XYtvgUHPrjyznDfV s/wJoqLmV5C1nLd9rQEIBQtRSxuOqTONb85VoChcI61G90hQ/WDydjZRs+reJMPQBcWD AlYULgK5gzXwgXYDQ+C/VpBYTiXMvV2yphXcxjrIjbJATpZ8NoC2kGDQCmGMNb4IblVm W4XrlCwir6cqYTk30C1Gn6OTmmrrMhf6JZoz1/B4RCyDTTtuFkacUlDMyTDSoXqjyR6A k6cA== X-Gm-Message-State: ACgBeo2D7SqpGRh813L6kpAqwsoRXaqCcy58qHyneA+V2VgYT79MhUTj C/qAOncY1MPau8XxTITFWUg= X-Google-Smtp-Source: AA6agR47a1KSz7llECCn+O+oZVaQn5Qfp0qmyRn8bPy/B3mcLMozcBxo0iQsF9ul02mGSEd4HLiAbw== X-Received: by 2002:a17:90a:d585:b0:1f4:f9a5:22a9 with SMTP id v5-20020a17090ad58500b001f4f9a522a9mr9490389pju.49.1662048958991; Thu, 01 Sep 2022 09:15:58 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id y15-20020a17090264cf00b0016a058b7547sm13972047pli.294.2022.09.01.09.15.57 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:15:58 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc. Date: Thu, 1 Sep 2022 09:15:34 -0700 Message-Id: <20220901161547.57722-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048960; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ga0Nwt5Zi5Wm2Kkt3SbRcQrpwa6IAGB0M+wlcmjeY9w=; b=E5B/J4XYV+yCLKBRF/a1b8SK+PkSqJnJSpgl26CdgoxOSbVVQazoIFVDS4Wn6fetFJD4oN jApCuXDnct+ekx07tgMedptNNasfSKM+frDqGkJ3vXeBGXVzdhLlcP0ngr7J0SN3UuKeAQ QFDc4WPtKjnbxwkQ1UZA9mW/6NsrfYE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=nWj8xtov; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048960; a=rsa-sha256; cv=none; b=YGbEaQMaJePGEeqzpqtk8kVWtI9b/SSIA3UO9E6eb1e+KfHk05ylCv/wB0K+ql9kQY5/gW vI4YgRYUOuUAr0vKq1xek+dljhCrDUMODz8bbgC1mWqPoMh86ay9+gzP8oWqd8dTxKwzwA YwrXC4mIMFkPIaOiBx/89SaAr4yQCn0= Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=nWj8xtov; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: 4tb4smqnn5pyhxkub96wade3bjjxsaz9 X-Rspamd-Queue-Id: 4275B14005B X-Rspamd-Server: rspam05 X-HE-Tag: 1662048960-687078 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert bpf hash map to use bpf memory allocator. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index eb1263f03e9b..508e64351f87 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,6 +14,7 @@ #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -92,6 +93,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; + struct bpf_mem_alloc ma; struct bucket *buckets; void *elems; union { @@ -576,6 +578,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (err) goto free_prealloc; } + } else { + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + if (err) + goto free_map_locked; } return &htab->map; @@ -586,6 +592,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); @@ -862,7 +869,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); check_and_free_fields(htab, l); - kfree(l); + bpf_mem_cache_free(&htab->ma, l); } static void htab_elem_free_rcu(struct rcu_head *head) @@ -986,9 +993,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = ERR_PTR(-E2BIG); goto dec_count; } - l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, - GFP_NOWAIT | __GFP_NOWARN, - htab->map.numa_node); + l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); goto dec_count; @@ -1007,7 +1012,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, pptr = bpf_map_alloc_percpu(&htab->map, size, 8, GFP_NOWAIT | __GFP_NOWARN); if (!pptr) { - kfree(l_new); + bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } @@ -1429,6 +1434,10 @@ static void delete_all_elements(struct bpf_htab *htab) { int i; + /* It's called from a worker thread, so disable migration here, + * since bpf_mem_cache_free() relies on that. + */ + migrate_disable(); for (i = 0; i < htab->n_buckets; i++) { struct hlist_nulls_head *head = select_bucket(htab, i); struct hlist_nulls_node *n; @@ -1439,6 +1448,7 @@ static void delete_all_elements(struct bpf_htab *htab) htab_elem_free(htab, l); } } + migrate_enable(); } static void htab_free_malloced_timers(struct bpf_htab *htab) @@ -1502,6 +1512,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Thu Sep 1 16:15:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CD2CECAAD1 for ; Thu, 1 Sep 2022 16:16:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6FBC6B009A; Thu, 1 Sep 2022 12:16:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF8D8940009; Thu, 1 Sep 2022 12:16:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D56C940008; Thu, 1 Sep 2022 12:16:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 856DE6B009A for ; Thu, 1 Sep 2022 12:16:04 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4CE4BC0E28 for ; Thu, 1 Sep 2022 16:16:04 +0000 (UTC) X-FDA: 79864018248.04.C8845FF Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf23.hostedemail.com (Postfix) with ESMTP id CC661140058 for ; Thu, 1 Sep 2022 16:16:03 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id 76so17923262pfy.3 for ; Thu, 01 Sep 2022 09:16:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=skeuYRhB3boXKMMrYYkvu6yydL2u1ik1iHlAVHEk0Qs=; b=pHv8i1ZOlSaTxKI78hoU2nrPIVXS6hK6WPU320GqacH651lznRVcZAS/qQSVoiQYzm NgXqumhbrKjmKfObsslv/8eTrW5PRxtCiOJ9Tyjs9XGLvHcilptnUqZbp3eQHHbNsr33 At1rXQnY7x+y2K4J7bS4lpPKnFYxH7nQaICbEs46HRMd6SvMyK4QHbueG4amio76/RqL CVC7ErstjJp2eB5zhn1Wip3P43R2033K9tiUYrna01Fewq2X6PMs2fAZoaVGPp6ArzPs x51DsuzZUqcpGGfzDk36tHqwvPtgDfAXKXw9Tw8GB9xuWkm3tzx4G9ae8H4kkEKAAxzw TDbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=skeuYRhB3boXKMMrYYkvu6yydL2u1ik1iHlAVHEk0Qs=; b=7/QF7oWYBNfhQWL+yJHHhHqt1L53A7w35fQNV1MrBvg/YgJ+VZP+4tFLFd3qDPLrOl MGXelICwe4Rj8szAYWtOWMiI2x/t03m/g+nGImSRJr6/pamg52FRF5ZUkL8S2XA4c2bz sCmVKat4dZ7Ay7mYaUsyc1M0fmBs5DmbsILPOVdLM1W3LUJrzzINL8NJ6/PDQRUt6+i0 2bbEIE2EqpgODOSDFM9E5kQ8iTyRFo9HEoXA9RyD8TdTilYNsam+jwj6dWFk8lkJkbA6 WlysluH5jc0SSPg+hWO0tbswzR81Xun76hl5vvGiNz0cvBwPzMNGpnkGcUANvIGvyanC fEYA== X-Gm-Message-State: ACgBeo3F+gE+LW1xknJjrZx37y0tmgyQiiZC8gHZx9q7OYpfmrxHuQmR 7w6M/V6fiDRoM4Yp4HhaCl0= X-Google-Smtp-Source: AA6agR7iGTOxZ4GkB6NEP4pnz9nzPMw+Mpnu9tVqEoyiPITw3nZx4pp3ny/tT5nD2eLR09RI0VBO0Q== X-Received: by 2002:a63:154:0:b0:42c:251b:8c0 with SMTP id 81-20020a630154000000b0042c251b08c0mr16721263pgb.359.1662048962842; Thu, 01 Sep 2022 09:16:02 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id gb5-20020a17090b060500b001f1f5e812e9sm3618487pjb.20.2022.09.01.09.16.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:02 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Date: Thu, 1 Sep 2022 09:15:35 -0700 Message-Id: <20220901161547.57722-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048963; a=rsa-sha256; cv=none; b=pDEeGEwFsM2KaAJkGucp4xNNNA8+v2rzMTv0pCVOvbfUWYiq88Bbssods7fTyTxE0Jcn3m 296V9SMFuPyYL6EmP5c5rvEM6rIu0zDcOFYiVt2uAenGrrtsi7Fx2bHAqyjeVV3h5ilVWj oLL0ncnlYm0YfEe16QVSBaARCMJxExM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=pHv8i1ZO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048963; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=skeuYRhB3boXKMMrYYkvu6yydL2u1ik1iHlAVHEk0Qs=; b=hhfJ7QyzjweOnWQHHGpnsLlng7AxGOvErASjqDdHmllJDLhNFxatR0r8OONYQDKmg5bfwO 4mnzAwZqkYYQ1KZXFsVQYwne5Hl846V379CdABr8sACkGa3IsGQOoLBti5AsVBx/wrQ+6I 1WbBXCMGmo8zJfoienTjqrAZUdj6tRI= X-Rspamd-Server: rspam09 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=pHv8i1ZO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: a5tus8crcx5w7y9cbaqc5x9xz6xemyap X-Rspamd-Queue-Id: CC661140058 X-HE-Tag: 1662048963-564587 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_maps.c | 38 ++++++++++++++++--------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index cbebfaa7c1e8..d1ffc76814d9 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -264,10 +264,11 @@ static void test_hashmap_percpu(unsigned int task, void *data) close(fd); } +#define VALUE_SIZE 3 static int helper_fill_hashmap(int max_entries) { int i, fd, ret; - long long key, value; + long long key, value[VALUE_SIZE] = {}; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), max_entries, &map_opts); @@ -276,8 +277,8 @@ static int helper_fill_hashmap(int max_entries) "err: %s, flags: 0x%x\n", strerror(errno), map_opts.map_flags); for (i = 0; i < max_entries; i++) { - key = i; value = key; - ret = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + key = i; value[0] = key; + ret = bpf_map_update_elem(fd, &key, value, BPF_NOEXIST); CHECK(ret != 0, "can't update hashmap", "err: %s\n", strerror(ret)); @@ -288,8 +289,8 @@ static int helper_fill_hashmap(int max_entries) static void test_hashmap_walk(unsigned int task, void *data) { - int fd, i, max_entries = 1000; - long long key, value, next_key; + int fd, i, max_entries = 10000; + long long key, value[VALUE_SIZE], next_key; bool next_key_valid = true; fd = helper_fill_hashmap(max_entries); @@ -297,7 +298,7 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); } assert(i == max_entries); @@ -305,9 +306,9 @@ static void test_hashmap_walk(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, NULL, &key) == 0); for (i = 0; next_key_valid; i++) { next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - value++; - assert(bpf_map_update_elem(fd, &key, &value, BPF_EXIST) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + value[0]++; + assert(bpf_map_update_elem(fd, &key, value, BPF_EXIST) == 0); key = next_key; } @@ -316,8 +317,8 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - assert(value - 1 == key); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + assert(value[0] - 1 == key); } assert(i == max_entries); @@ -1371,16 +1372,16 @@ static void __run_parallel(unsigned int tasks, static void test_map_stress(void) { + run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_hashmap, NULL); run_parallel(100, test_hashmap_percpu, NULL); run_parallel(100, test_hashmap_sizes, NULL); - run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_arraymap, NULL); run_parallel(100, test_arraymap_percpu, NULL); } -#define TASKS 1024 +#define TASKS 100 #define DO_UPDATE 1 #define DO_DELETE 0 @@ -1432,6 +1433,8 @@ static void test_update_delete(unsigned int fn, void *data) int fd = ((int *)data)[0]; int i, key, value, err; + if (fn & 1) + test_hashmap_walk(fn, NULL); for (i = fn; i < MAP_SIZE; i += TASKS) { key = value = i; @@ -1455,7 +1458,7 @@ static void test_update_delete(unsigned int fn, void *data) static void test_map_parallel(void) { - int i, fd, key = 0, value = 0; + int i, fd, key = 0, value = 0, j = 0; int data[2]; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), @@ -1466,6 +1469,7 @@ static void test_map_parallel(void) exit(1); } +again: /* Use the same fd in children to add elements to this map: * child_0 adds key=0, key=1024, key=2048, ... * child_1 adds key=1, key=1025, key=2049, ... @@ -1502,6 +1506,12 @@ static void test_map_parallel(void) key = -1; assert(bpf_map_get_next_key(fd, NULL, &key) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &key) < 0 && errno == ENOENT); + + key = 0; + bpf_map_delete_elem(fd, &key); + if (j++ < 5) + goto again; + close(fd); } static void test_map_rdonly(void) From patchwork Thu Sep 1 16:15:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 509EEECAAD8 for ; Thu, 1 Sep 2022 16:16:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9518940008; Thu, 1 Sep 2022 12:16:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C1DAF6B009D; Thu, 1 Sep 2022 12:16:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A70D7940008; Thu, 1 Sep 2022 12:16:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9536D6B009C for ; Thu, 1 Sep 2022 12:16:08 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 691CB1C65FD for ; Thu, 1 Sep 2022 16:16:08 +0000 (UTC) X-FDA: 79864018416.28.0C93285 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf09.hostedemail.com (Postfix) with ESMTP id A5FC3140065 for ; Thu, 1 Sep 2022 16:16:07 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id n65-20020a17090a5ac700b001fbb4fad865so3130848pji.1 for ; Thu, 01 Sep 2022 09:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=4+uhdcDVuSjjem7hAR91KRY20zQndt0vkR0L244Rk1E=; b=X3BQnN1FrgHzUYqVbVs1H1+mR0vvs5nam4bT9s2R8NmvUqCWS2jYL5Hx9tovemlNUC VtgRk9Ol9wZ+uMvlqWhA2hO5eBZwi0AEls2phiK4D87fd0bVR9SrCySTZCEsxGBHJt7h SToxP96pnw7JM5j/X6fgZEJl5BvcShX8/P5pLOX8eUeJZAB3IqKw8RPCQNsa6hKBkrYT xqYN+q2GHOo+hHKBst5IWxG+IsKjG0WrqQiYWu1Jwsf2lT3lD7t3WaxLL/1LK5Z/wRsA 2k7wBm+OJ6UczjU3U9M88OdpcVPPcKxavFkS2bDSFlvh3U6yuUu6Tg+uNlmv7TBdrdqj aPDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=4+uhdcDVuSjjem7hAR91KRY20zQndt0vkR0L244Rk1E=; b=iGF0Ik0QSwqsg68MUqWE6tFDm5taU0+z/hjJB26gaLAw3fvhOlMdwYgwZADuweaX/F 1N734z0sI1mgyR0uEzFeQ7+wBZWfopQWN53kZ8zouyuEWyBe+ssMgsgt6nTa8lFLR46D nNG7Cleiy84S06Nf6ezv1YF3l9lSQgbZeoOvDW2WhjkajOW+wC/yJY0JxKk6Q04FI1hw 5V2RW36oIXC/vN+ZbfFs2XeQTkpiV2GRy2VuHKrNWqkxHtdy100OMPWuLLyrCCpn0ZsW LGoMx7oV7eABJit7JdFlT2ukbr3oBEc3qoEdmZW1UHzBQjbpdkcTxmzIEwOwoHufLL3N 6d9g== X-Gm-Message-State: ACgBeo2wMJqz6UJj0bY9DBSzpzKNENgZzP39ooPKiPO6Na1CpB9BDyeX rvwYCULQlRLZwvMHt/XeSNk= X-Google-Smtp-Source: AA6agR5gABxsJnibFAbtnZO6qA0vlV4mshES5C/WYy8Tev8VVQ5AolYNHCMJMYfMtawypsYBlFEzMg== X-Received: by 2002:a17:90a:6001:b0:1fa:e851:3480 with SMTP id y1-20020a17090a600100b001fae8513480mr9452909pji.153.1662048966625; Thu, 01 Sep 2022 09:16:06 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id t2-20020a1709027fc200b001708e1a10a3sm14133494plb.94.2022.09.01.09.16.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:06 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test. Date: Thu, 1 Sep 2022 09:15:36 -0700 Message-Id: <20220901161547.57722-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048967; a=rsa-sha256; cv=none; b=Ch+Ra8Uc45PfAFyiQvapJ+e3veaMo98+bsw6yeDr5rfLwH6aNBdRyyJH56qvnjdhP8quPo 8kjy5MYeVqL/o7P63BO4KX6DngumuHXamocZXVkmRt/fF5C1aFfMwkUBAmyzlpc/qHDUZz DQbxxIvtZtLUN6RJkHfSIh7+/XW2CTc= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=X3BQnN1F; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048967; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4+uhdcDVuSjjem7hAR91KRY20zQndt0vkR0L244Rk1E=; b=OfdsouTHPU+kpAzjsWtbzFAUPWJsZvG1iLNVGQE8eo7J019eKamAR7C5Cd7M7Le4HlwbBo mBG3nimg1OjmGYO/J6UWw85YhsKFpptmGKHE/eSIsCliBEXoaBw0hJX9MsiIQG7LU7D+V1 fUR/rodAMefnupZUyt/xyqPTmE9KAKE= Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=X3BQnN1F; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A5FC3140065 X-Stat-Signature: bb8odcoaq8qyds1d189gxs4kpzhkshu6 X-HE-Tag: 1662048967-829662 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make map_perf_test for preallocated and non-preallocated hash map spend more time inside bpf program to focus performance analysis on the speed of update/lookup/delete operations performed by bpf program. It makes 'perf report' of bpf_mem_alloc look like: 11.76% map_perf_test [k] _raw_spin_lock_irqsave 11.26% map_perf_test [k] htab_map_update_elem 9.70% map_perf_test [k] _raw_spin_lock 9.47% map_perf_test [k] htab_map_delete_elem 8.57% map_perf_test [k] memcpy_erms 5.58% map_perf_test [k] alloc_htab_elem 4.09% map_perf_test [k] __htab_map_lookup_elem 3.44% map_perf_test [k] syscall_exit_to_user_mode 3.13% map_perf_test [k] lookup_nulls_elem_raw 3.05% map_perf_test [k] migrate_enable 3.04% map_perf_test [k] memcmp 2.67% map_perf_test [k] unit_free 2.39% map_perf_test [k] lookup_elem_raw Reduce default iteration count as well to make 'map_perf_test' quick enough even on debug kernels. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- samples/bpf/map_perf_test_kern.c | 44 ++++++++++++++++++++------------ samples/bpf/map_perf_test_user.c | 2 +- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 8773f22b6a98..7342c5b2f278 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -108,11 +108,14 @@ int stress_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map, &key); - if (value) - bpf_map_delete_elem(&hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map, &key); + if (value) + bpf_map_delete_elem(&hash_map, &key); + } return 0; } @@ -123,11 +126,14 @@ int stress_percpu_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map, &key); + } return 0; } @@ -137,11 +143,14 @@ int stress_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&hash_map_alloc, &key); + } return 0; } @@ -151,11 +160,14 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + } return 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index b6fc174ab1f2..1bb53f4b29e1 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -72,7 +72,7 @@ static int test_flags = ~0; static uint32_t num_map_entries; static uint32_t inner_lru_hash_size; static int lru_hash_lookup_test_entries = 32; -static uint32_t max_cnt = 1000000; +static uint32_t max_cnt = 10000; static int check_test_flags(enum test_type t) { From patchwork Thu Sep 1 16:15:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8025ECAAD8 for ; Thu, 1 Sep 2022 16:16:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51BC880007; Thu, 1 Sep 2022 12:16:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A3DE6B009E; Thu, 1 Sep 2022 12:16:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31E2F80007; Thu, 1 Sep 2022 12:16:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 174D96B009D for ; Thu, 1 Sep 2022 12:16:12 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DA30A140362 for ; Thu, 1 Sep 2022 16:16:11 +0000 (UTC) X-FDA: 79864018542.05.29AA909 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf22.hostedemail.com (Postfix) with ESMTP id 5CF51C0060 for ; Thu, 1 Sep 2022 16:16:11 +0000 (UTC) Received: by mail-pf1-f173.google.com with SMTP id w139so1819453pfc.13 for ; Thu, 01 Sep 2022 09:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=J/UPgVAZJ6tCW63aRjPvtUFbVFMqtQ38ez6bXN9pU3M=; b=IjJOZXlwXAdP31mjg7782uhU4/jcLGpev5FwbYyLhgz8B3Zyrk4J8mVjMSBWXWA4bI RrzZpEIwc0YP+o9onLWQaX5ooaOkQouh8CYdWyrvbzI+EMPUGTivQ5WkqlFhAODmhltO jgF8SWSqZeqj+a87lrKgwL7ZowGWROIsXchk69atLmOeiLLT8zne4zXYWaie77A5Wtc2 Y481EEM1zTnfk0387pksfhRPshOc9invTfAvcs/p81XSnKJyNPXvy6Dszik+X3hI7DN9 7Qr0fb9T91xvc5QFwMditIqFHhfJkvXDNi5dkEuHpmp0or4w2iLyXz/lQEDgEtjrreBR w6GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=J/UPgVAZJ6tCW63aRjPvtUFbVFMqtQ38ez6bXN9pU3M=; b=CIJ9iWVL8agciIB02HPXoXAlxWMzl6pl+GpsASkPLgqDWx6dlv2wALl9DmzD1MH0NA tjhlCW0cO4x0hzyyemnU0o0UnPafj2QA3wjr4kgb4ERiwXjH/Fp1w2v/Wga0kdrPjJDR OkzuiquYZV2mXQA3xwLD5FCLuq09DP1sK0LZz3i4FCmzuxB2jf4QtL59jbmpllNCSq8P SjAOCSaQSMUQirVxsuTS7onudPQr/JxfiKNEYFbo8ACr3QTmhEs27HuXDuVHMVnPHxID m0Cl98flQPx4ueq20xIGoIU6NJlkx/gkqfKTrei7E6X2uhUGvHFEuAwudhOpCyumY5eM IuBA== X-Gm-Message-State: ACgBeo0NoPX/Vzol4oadXnLf/0ad+1vzLbO5HQjFcV/RjxxL6m5+PJ2y nfKZMqUM8bDizUteeW9vFGo= X-Google-Smtp-Source: AA6agR6tHH4CQrxuMQdew+YzVDFHkU2FlaZGbV8ylgjbpuxZs8+gpgFwqfa6KHB8piTEvus91KL2Ag== X-Received: by 2002:a63:450f:0:b0:430:491e:ed5c with SMTP id s15-20020a63450f000000b00430491eed5cmr5762528pga.330.1662048970311; Thu, 01 Sep 2022 09:16:10 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id z16-20020aa79490000000b00538147b1a38sm9804924pfk.160.2022.09.01.09.16.08 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:09 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs. Date: Thu, 1 Sep 2022 09:15:37 -0700 Message-Id: <20220901161547.57722-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048971; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J/UPgVAZJ6tCW63aRjPvtUFbVFMqtQ38ez6bXN9pU3M=; b=nHQHtuf/nk34aB2qCBM4V+9MLGOhAwaMAq7lhhXQiP2xXm/ONjwEugwGYchIgB5q41O2BG NEsr/O9wq0aP5S5A4pfq4way4Mdj4wjc/10IVYjKY2zy10g8KF6WbdvS32fsPpeH9S9HKu 7lreMhERZuDXSQg4wzXf4Kw/bhT8A/0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IjJOZXlw; spf=pass (imf22.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048971; a=rsa-sha256; cv=none; b=GgVjU0eCvtzGq+I4LxMbzclsMygOrcuahZn2AHz7U7aEDfI+xEZiAtOwlCFhW9c2SeelxB GeLD/gb7Dmk1xA06p/aGTMYsNUX9PlPYqI8qq9FPurjQFLbjQGDwtCSE1gmKutUWHjTjfq fs6On+8Bg/LbY77GnqISq/qYwCq/ivQ= X-Stat-Signature: 9hcda7qs4nyxk9feyd9gdrh5xd7nzrwk X-Rspamd-Queue-Id: 5CF51C0060 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IjJOZXlw; spf=pass (imf22.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam12 X-Rspam-User: X-HE-Tag: 1662048971-384906 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since bpf hash map was converted to use bpf_mem_alloc it is safe to use from tracing programs and in RT kernels. But per-cpu hash map is still using dynamic allocation for per-cpu map values, hence keep the warning for this map type. In the future alloc_percpu_gfp can be front-end-ed with bpf_mem_cache and this restriction will be completely lifted. perf_event (NMI) bpf programs have to use preallocated hash maps, because free_htab_elem() is using call_rcu which might crash if re-entered. Sleepable bpf programs have to use preallocated hash maps, because life time of the map elements is not protected by rcu_read_lock/unlock. This restriction can be lifted in the future as well. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0194a36d0b36..3dce3166855f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12629,10 +12629,12 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, * For programs attached to PERF events this is mandatory as the * perf NMI can hit any arbitrary code sequence. * - * All other trace types using preallocated hash maps are unsafe as - * well because tracepoint or kprobes can be inside locked regions - * of the memory allocator or at a place where a recursion into the - * memory allocator would see inconsistent state. + * All other trace types using non-preallocated per-cpu hash maps are + * unsafe as well because tracepoint or kprobes can be inside locked + * regions of the per-cpu memory allocator or at a place where a + * recursion into the per-cpu memory allocator would see inconsistent + * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is + * safe to use from kprobe/fentry and in RT. * * On RT enabled kernels run-time allocation of all trace type * programs is strictly prohibited due to lock type constraints. On @@ -12642,15 +12644,26 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, */ if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { + /* perf_event bpf progs have to use preallocated hash maps + * because non-prealloc is still relying on call_rcu to free + * elements. + */ verbose(env, "perf_event programs can only use preallocated hash map\n"); return -EINVAL; } - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, "trace type programs can only use preallocated hash map\n"); - return -EINVAL; + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || + (map->inner_map_meta && + map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + verbose(env, + "trace type programs can only use preallocated per-cpu hash map\n"); + return -EINVAL; + } + WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); + verbose(env, + "trace type programs with run-time allocated per-cpu hash maps are unsafe." + " Switch to preallocated hash maps.\n"); } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n"); } if (map_value_has_spin_lock(map)) { From patchwork Thu Sep 1 16:15:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6FC2ECAAD3 for ; Thu, 1 Sep 2022 16:16:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 432B26B009E; Thu, 1 Sep 2022 12:16:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B9CC80009; Thu, 1 Sep 2022 12:16:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20C0680008; Thu, 1 Sep 2022 12:16:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0BA0A6B009E for ; Thu, 1 Sep 2022 12:16:16 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D907AA102A for ; Thu, 1 Sep 2022 16:16:15 +0000 (UTC) X-FDA: 79864018710.12.E0CBE68 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf05.hostedemail.com (Postfix) with ESMTP id 6DB3E10004D for ; Thu, 1 Sep 2022 16:16:14 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id 76so17923840pfy.3 for ; Thu, 01 Sep 2022 09:16:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=YkiR1pR4pf3gc+NcW9hCzitUfw6uraWiWMrmU6Iy0/0=; b=f+Ls2E4aBNf/YW+LcGaS0QlAtFoaKAWpaaXhx4XMeviqrs4Tb1K1XFoYST49thzMTt oqpDgefScu/wrQmgC/tLTjHc8z/f6/W8I3INxxAGbNd7ZC+w7ZFNOV5FgU38G3SjbxUU 6an5XqCzvGOoObl6JN84/nVU5AKgl8Zr0PwLhf/51KlDCr3sGKUZGkoNMWtWfQ16gb/b NHIsmwxvti0daoX3TNQzjaMeY7/RuaUxtZqGqLIhA2efJMhWCl9A17zpXvKC/i4MoDGY XkGSDPqtlhYUdJUrTwH85QX8x6kYmdRPJK19dBhX4PQKR5YGHbOpgLu6wpSn0qxLfWEl 7Ngg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=YkiR1pR4pf3gc+NcW9hCzitUfw6uraWiWMrmU6Iy0/0=; b=AiWGEq5Eak8iUmN80MCiaYcX/J+UuIb4G9VmBxZBzRGsOMiZ6mpeeSLol3Xmaoq8a/ aBF9TXJNAiu6B+s/cBxaZH+zY2Qu71nec0Xb84pKQfhBFqy3bwO/LJ364x20846G2LeF Nm57rGCau9EwAXmZaPWwpJoTIDSWriNt45XgQy3nysvEpevACS7V8IWRzdutqemviCOA oJ0+lKSN353P0VdUUj4x7kBd612WP5pejhBirzXSoQZLngWsnRpIBvg1rK6U54N9pAzD TuVNo2SYLDXIt+VmcoMx9LSUbnkwORxyqY96OmCz/9nekHl0oUJU9jA/ePXDqGbVTrks vSgA== X-Gm-Message-State: ACgBeo0uk+bsQ95iuGsi9+r5fhbnFxayW41M4TMd7zwhQXO3c2GuA1R9 enNGdJ73EZdM9EToStxCOimvhLmBjYg= X-Google-Smtp-Source: AA6agR764NUVqidKnV118YCUV/yNTZGH/FV1amZWfmu8tOL2fxhKLlLNH0gTMCH46VsATqBjGBtJ/w== X-Received: by 2002:a63:3d01:0:b0:42b:d5c7:57bc with SMTP id k1-20020a633d01000000b0042bd5c757bcmr19899282pga.3.1662048974110; Thu, 01 Sep 2022 09:16:14 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id t6-20020a1709027fc600b00172a1e9dad9sm13887968plb.275.2022.09.01.09.16.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:13 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map. Date: Thu, 1 Sep 2022 09:15:38 -0700 Message-Id: <20220901161547.57722-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048975; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YkiR1pR4pf3gc+NcW9hCzitUfw6uraWiWMrmU6Iy0/0=; b=R293DvF9wCBZBnSk+AFnrD2HqK+yz0Jb8qb8NqrdogDACNDEz+TSrMqMDNfT4SLHuZaUE/ 5LHIs08MIXj+yC3u8laHt2VWLcoNYDaYJ5W18SQyXo5MHnkhWqHzqce6LiiBD/Tdr/lmiH KWkdciemgYYvHWs+RIYJuXtkUTUcn9k= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=f+Ls2E4a; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048975; a=rsa-sha256; cv=none; b=0+GfTNYcPWSq4iAktAcwnpKYyokNbdLlWLdslZnn0VMjZln1xoP24xXmUP1qb5jk7/MWbM 9MEtblMhcdpED62FF8XW7Aq9Sr42Quold9gt6RA/Xp/hsLNM37qPZBEtAz7sFTIlVNuYxN mIGzNf5daVMTOKcFwVt30HZz6hckSag= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=f+Ls2E4a; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: hjhdfknmytwkudyujm4seanr4fzksbog X-Rspamd-Queue-Id: 6DB3E10004D X-Rspamd-Server: rspam05 X-HE-Tag: 1662048974-5194 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus access the same bpf map. Based on specified max_entries for the hash map calculate when percpu_counter becomes faster than atomic_t and use it for such maps. For example samples/bpf/map_perf_test is using hash map with max_entries 1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per second using atomic_t. On a system with 15 cpus it shows 100k events per second using percpu. map_perf_test is an extreme case where all cpus colliding on atomic_t which causes extreme cache bouncing. Note that the slow path of percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is necessary. See comment in the code why the heuristic is based on num_online_cpus(). Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 70 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 508e64351f87..36aa16dc43ad 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -101,7 +101,12 @@ struct bpf_htab { struct bpf_lru lru; }; struct htab_elem *__percpu *extra_elems; - atomic_t count; /* number of elements in this hashtable */ + /* number of elements in non-preallocated hashtable are kept + * in either pcount or count + */ + struct percpu_counter pcount; + atomic_t count; + bool use_percpu_counter; u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; @@ -565,6 +570,29 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) htab_init_buckets(htab); +/* compute_batch_value() computes batch value as num_online_cpus() * 2 + * and __percpu_counter_compare() needs + * htab->max_entries - cur_number_of_elems to be more than batch * num_online_cpus() + * for percpu_counter to be faster than atomic_t. In practice the average bpf + * hash map size is 10k, which means that a system with 64 cpus will fill + * hashmap to 20% of 10k before percpu_counter becomes ineffective. Therefore + * define our own batch count as 32 then 10k hash map can be filled up to 80%: + * 10k - 8k > 32 _batch_ * 64 _cpus_ + * and __percpu_counter_compare() will still be fast. At that point hash map + * collisions will dominate its performance anyway. Assume that hash map filled + * to 50+% isn't going to be O(1) and use the following formula to choose + * between percpu_counter and atomic_t. + */ +#define PERCPU_COUNTER_BATCH 32 + if (attr->max_entries / 2 > num_online_cpus() * PERCPU_COUNTER_BATCH) + htab->use_percpu_counter = true; + + if (htab->use_percpu_counter) { + err = percpu_counter_init(&htab->pcount, 0, GFP_KERNEL); + if (err) + goto free_map_locked; + } + if (prealloc) { err = prealloc_init(htab); if (err) @@ -891,6 +919,31 @@ static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) } } +static bool is_map_full(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + return __percpu_counter_compare(&htab->pcount, htab->map.max_entries, + PERCPU_COUNTER_BATCH) >= 0; + return atomic_read(&htab->count) >= htab->map.max_entries; +} + +static void inc_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH); + else + atomic_inc(&htab->count); +} + +static void dec_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH); + else + atomic_dec(&htab->count); +} + + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { htab_put_fd_value(htab, l); @@ -899,7 +952,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) check_and_free_fields(htab, l); __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { - atomic_dec(&htab->count); + dec_elem_count(htab); l->htab = htab; call_rcu(&l->rcu, htab_elem_free_rcu); } @@ -983,16 +1036,15 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = container_of(l, struct htab_elem, fnode); } } else { - if (atomic_inc_return(&htab->count) > htab->map.max_entries) - if (!old_elem) { + if (is_map_full(htab)) + if (!old_elem) /* when map is full and update() is replacing * old element, it's ok to allocate, since * old element will be freed immediately. * Otherwise return an error */ - l_new = ERR_PTR(-E2BIG); - goto dec_count; - } + return ERR_PTR(-E2BIG); + inc_elem_count(htab); l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); @@ -1034,7 +1086,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new->hash = hash; return l_new; dec_count: - atomic_dec(&htab->count); + dec_elem_count(htab); return l_new; } @@ -1513,6 +1565,8 @@ static void htab_map_free(struct bpf_map *map) free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->ma); + if (htab->use_percpu_counter) + percpu_counter_destroy(&htab->pcount); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Thu Sep 1 16:15:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9F2BECAAD3 for ; Thu, 1 Sep 2022 16:16:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51BEF6B00A0; Thu, 1 Sep 2022 12:16:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A3E280009; Thu, 1 Sep 2022 12:16:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FA1F80008; Thu, 1 Sep 2022 12:16:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 104DF6B00A0 for ; Thu, 1 Sep 2022 12:16:19 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DC0B8C0E28 for ; Thu, 1 Sep 2022 16:16:18 +0000 (UTC) X-FDA: 79864018836.26.009307A Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf10.hostedemail.com (Postfix) with ESMTP id AA6D0C004C for ; Thu, 1 Sep 2022 16:16:18 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id t11-20020a17090a510b00b001fac77e9d1fso2865094pjh.5 for ; Thu, 01 Sep 2022 09:16:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=Hi9c27URf1pim0nXbITJrtMXVEE20b16OwyheBVe8dg=; b=XDJedl0TGYJYEMVy+SyVZrTp/O0ndtJLPhArbdv2y/ghbv3KJ8YYlQHnQ0Xukf04B6 W7zMc+TrDoxHUOa3tnD/NIF8Zg7iPT3MaNb9s58znIc3O4J3Gm+88RFCzT/okyZ3PtYo 8k20sDJsM7qkH8Eu1gZ9rFQLG813EchbLB61KUF/e3sJBihspNEkzP12PpgECz6jJhB1 +GYjLv1YG/uTABwbQEuZxyECdEIo4HMMohgO0cZbD5Edb0JmOVVF8Y8M8JBldyACK7hE hUiZAVV8GpKM6aU6D8butSKvqKtoU+dvKlTejL87tfFsxPeV4pTFXj4NXk+gv++7YRcR LV3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Hi9c27URf1pim0nXbITJrtMXVEE20b16OwyheBVe8dg=; b=V9b6CfZyIKCOWIiX8xZ6SUJylBWiNAyA5UrXbwc/oaxG+uT67bPxu768aQgR5XvZfh YxGjN5DjYYXGLz9SqMh4z2SZWYuhah9Rp5Z8pslbWQF1XAl48I2cZenwl9/iBK3Km/bb ZfKqR/djeRMEA4cFcaeBOgWl3tNmiAoUUKB7etC8tO1hIIzBNK+lKcXK0kacJmX1M39O jH03Mo/AyVIvciRnk+d6Nwgwy12Ph6d8dWONzwsxDkd+hCmojpwEbBP17VpudIl6bcqX AwboEnI/gXRj0f41oUwuZX2JDuaD06tXsDZmJPRlDW4RdikO3Ua4FET4x/EDiQln3cGM nyBw== X-Gm-Message-State: ACgBeo2eKBBtdyWiNpzz2djwwQAN5Zf4lnjVE5E288uxdBYEztG/G1Jt 6wzSFjS723pjqnL7THSIFkw= X-Google-Smtp-Source: AA6agR650XYQ+ZYci8if+HgvgvoSid7GaHbT1zNas30DW/jmjziNiwSgptYBjFIoiScqrSZ9o5rXFg== X-Received: by 2002:a17:90b:3b92:b0:1fe:b74:3de0 with SMTP id pc18-20020a17090b3b9200b001fe0b743de0mr9437271pjb.217.1662048977750; Thu, 01 Sep 2022 09:16:17 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id h5-20020a170902680500b00174f7d10a03sm7612980plk.86.2022.09.01.09.16.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:17 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 07/15] bpf: Optimize call_rcu in non-preallocated hash map. Date: Thu, 1 Sep 2022 09:15:39 -0700 Message-Id: <20220901161547.57722-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hi9c27URf1pim0nXbITJrtMXVEE20b16OwyheBVe8dg=; b=yK8vDWGYFypfGM4YwzUFT8mVW/E6LMXTAJdaEGbaib7s3JpyP6jkFmZ1ejifgzVb0WkqG/ HN5vJsMKOWff4IdpcSO4/0AeZhyuuuyYc/RLfiVThawbYa1Wcy0yXiGAATJ12A0DncsojF vBO0wBGFjFuuhY9w3DoGK/LyK/sW7FA= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=XDJedl0T; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048978; a=rsa-sha256; cv=none; b=uYsWgzbrqm4WYhwBqCOddolgB3HNbZ0VSjUjsMK8LNmqQvatkHUbu3cKkVrRGD4aBQoHzM xX/U84sDZ06kCZ93A395Zd6jpaWG3FsatVNBtwdQnZoGiDI2RlmQXPmLf5fF8NKitDByHk 7oJO72jUuaW9NNQfezk43PS0b3NncVU= Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=XDJedl0T; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: mxd3ugagxrkje5nxb99fm14dkudu1pyc X-Rspamd-Queue-Id: AA6D0C004C X-Rspamd-Server: rspam05 X-HE-Tag: 1662048978-316058 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 8 ++++++-- kernel/bpf/memalloc.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 ----------- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 36aa16dc43ad..0d888a90a805 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -953,8 +953,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); + if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { + l->htab = htab; + call_rcu(&l->rcu, htab_elem_free_rcu); + } else { + htab_elem_free(htab, l); + } } } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 1c46763d855e..da0721f8c28f 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -281,7 +281,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index 5f5309791649..0053c5402173 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -208,17 +208,6 @@ static int timer_cb2(void *map, int *key, struct hmap_elem *val) */ bpf_map_delete_elem(map, key); - /* in non-preallocated hashmap both 'key' and 'val' are RCU - * protected and still valid though this element was deleted - * from the map. Arm this timer for ~35 seconds. When callback - * finishes the call_rcu will invoke: - * htab_elem_free_rcu - * check_and_free_timer - * bpf_timer_cancel_and_free - * to cancel this 35 second sleep and delete the timer for real. - */ - if (bpf_timer_start(&val->timer, 1ull << 35, 0) != 0) - err |= 256; ok |= 4; } return 0; From patchwork Thu Sep 1 16:15:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B41E8ECAAD8 for ; Thu, 1 Sep 2022 16:16:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51CD16B00A2; Thu, 1 Sep 2022 12:16:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A46E80009; Thu, 1 Sep 2022 12:16:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F6A580008; Thu, 1 Sep 2022 12:16:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 175916B00A2 for ; Thu, 1 Sep 2022 12:16:23 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EE0CDA0FEC for ; Thu, 1 Sep 2022 16:16:22 +0000 (UTC) X-FDA: 79864019004.14.4EDB829 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf24.hostedemail.com (Postfix) with ESMTP id 849A1180070 for ; Thu, 1 Sep 2022 16:16:22 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id y127so17940249pfy.5 for ; Thu, 01 Sep 2022 09:16:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=stV/ngTHq+LHFGpY9e6rDntSPSEwS4lPScETgTMBjXo=; b=N6MyjSlUzDJXtv02JyeOM6PHW2NDlQPTGiwUe7cfo3qpz42kBT6sYf1iq26y6kyKyt nCS3jsdcqHTGW1sh9WIFdLLA8KM/TtvRqxy3KTsz53xi2OVL1OVBU4LkkX40UTGZ1Yt9 0UhMKYQS2GvFUF8qipNQ4Z49U8b96MyKK9BuPE3AtJc2awn922LeeD+BrYWMlfKRgHqO kWilN2QLxazIJsZiZOHe/eX4OkpzncV+yCPxKOfEuR/H2HK5ENlLzthTwkp5mhmq8cpE BMTagos3Ot94+cFo4VD9vGvUsnSLe8bjOLsNbIkBM6/FB7w3RkbSY7k0ubNWffMtSsc0 GSeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=stV/ngTHq+LHFGpY9e6rDntSPSEwS4lPScETgTMBjXo=; b=JBrSntK44A+4L2owHtX1sC8sWoTQKqLdSPQlD3PJMAhMZY19A8ao+JMq1C4Um5jKh0 FZJVA1P7Fm0AFnqWrIhhlWBqcuqeVqfQRgpIS+osrp4C2A0JQpG5xe/cSrtnjE8gaCu+ vtrmG7gMO2RwGCm3imTVndBKXD3hUBLWQBqh1bw9j5N4/CJsyNtbj4KgoTwDPCIVuZDl FgRpaAFfiBWWw0zAExYGF+t9XhXu2onPMC/hDkxjy85rbzfgS+XbcJ+wpUC0EQHzKZ7Q yXT+vGQQIRU9ZV8wDX46S9zMctj0NvpJGLdWEP2PdF5ZzqP62TF6PfSOeH+yuJKVgB8b CUkA== X-Gm-Message-State: ACgBeo1cWRCKwf47A4r4j37AA7AO0RcgWtLeob+ID2VX2Wc8vu7jp4Hc djZ01li71a+qRRvX+iNDD7Q= X-Google-Smtp-Source: AA6agR6N+kwEacwX/Xi3d7DB0C3hqfgJH90/m/qMwZy3J8A/foJmL9UmJsCRglp5PIMqw7aKkFEp/g== X-Received: by 2002:a05:6a00:234f:b0:525:1f7c:f2bf with SMTP id j15-20020a056a00234f00b005251f7cf2bfmr32144184pfj.14.1662048981457; Thu, 01 Sep 2022 09:16:21 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id l5-20020a170903244500b0016d9d6d05f7sm11890994pls.273.2022.09.01.09.16.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:20 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Date: Thu, 1 Sep 2022 09:15:40 -0700 Message-Id: <20220901161547.57722-9-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048982; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=stV/ngTHq+LHFGpY9e6rDntSPSEwS4lPScETgTMBjXo=; b=xubySR9y6qzs0wlV/Gpt8KN1ff1Vclx8GW/vmvzBSu2+Lu3sLj3mEbmk7An4y0Ifi1dctI Y82Fx1URpSpAt4wZ9CpIcu714xdB5IDqzxTW033CFTZZ8/vSxm+k8SHmCBMqS2zti6eoQf 5yn2rEhpP+UAePwcvY7CKVbQ2hlEN6E= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=N6MyjSlU; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048982; a=rsa-sha256; cv=none; b=8jDPFe+TDKGHA2LkB76NigaM1Yr8Fiq8YOsRz0X6i+GU1sZTr0arhiXfbvsl7c11PQg/Lf dke2zisCF7p32670ZGUe1jpd/o6BAvWwNZ2eqG1jU5g3UwVz1eDnyeTVn/Teh440n8rosT k5i+0+Ih3Kuz+0wEo2yCt09bBSSsiWE= X-Stat-Signature: mqh5aucdwrcf8etbjn73pqj49zi7nk66 X-Rspamd-Queue-Id: 849A1180070 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=N6MyjSlU; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam12 X-Rspam-User: X-HE-Tag: 1662048982-327955 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The same low/high watermarks for every bucket in bpf_mem_cache consume significant amount of memory. Preallocating 64 elements of 4096 bytes each in the free list is not efficient. Make low/high watermarks and batching value dependent on element size. This change brings significant memory savings. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 50 +++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index da0721f8c28f..7e5df6866d92 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -100,6 +100,7 @@ struct bpf_mem_cache { int unit_size; /* count of objects in free_llist */ int free_cnt; + int low_watermark, high_watermark, batch; }; struct bpf_mem_caches { @@ -118,14 +119,6 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) return entry; } -#define BATCH 48 -#define LOW_WATERMARK 32 -#define HIGH_WATERMARK 96 -/* Assuming the average number of elements per bucket is 64, when all buckets - * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + - * 64*4096*32 ~ 20Mbyte - */ - static void *__alloc(struct bpf_mem_cache *c, int node) { /* Allocate, but don't deplete atomic reserves that typical @@ -220,7 +213,7 @@ static void free_bulk(struct bpf_mem_cache *c) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); free_one(c, llnode); - } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -234,12 +227,12 @@ static void bpf_mem_refill(struct irq_work *work) /* Racy access to free_cnt. It doesn't need to be 100% accurate */ cnt = c->free_cnt; - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) /* irq_work runs on this cpu and kmalloc will allocate * from the current numa node which is what we want here. */ - alloc_bulk(c, BATCH, NUMA_NO_NODE); - else if (cnt > HIGH_WATERMARK) + alloc_bulk(c, c->batch, NUMA_NO_NODE); + else if (cnt > c->high_watermark) free_bulk(c); } @@ -248,9 +241,38 @@ static void notrace irq_work_raise(struct bpf_mem_cache *c) irq_work_queue(&c->refill_work); } +/* For typical bpf map case that uses bpf_mem_cache_alloc and single bucket + * the freelist cache will be elem_size * 64 (or less) on each cpu. + * + * For bpf programs that don't have statically known allocation sizes and + * assuming (low_mark + high_mark) / 2 as an average number of elements per + * bucket and all buckets are used the total amount of memory in freelists + * on each cpu will be: + * 64*16 + 64*32 + 64*64 + 64*96 + 64*128 + 64*196 + 64*256 + 32*512 + 16*1024 + 8*2048 + 4*4096 + * == ~ 116 Kbyte using below heuristic. + * Initialized, but unused bpf allocator (not bpf map specific one) will + * consume ~ 11 Kbyte per cpu. + * Typical case will be between 11K and 116K closer to 11K. + * bpf progs can and should share bpf_mem_cache when possible. + */ + static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) { init_irq_work(&c->refill_work, bpf_mem_refill); + if (c->unit_size <= 256) { + c->low_watermark = 32; + c->high_watermark = 96; + } else { + /* When page_size == 4k, order-0 cache will have low_mark == 2 + * and high_mark == 6 with batch alloc of 3 individual pages at + * a time. + * 8k allocs and above low == 1, high == 3, batch == 1. + */ + c->low_watermark = max(32 * 256 / c->unit_size, 1); + c->high_watermark = max(96 * 256 / c->unit_size, 3); + } + c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1); + /* To avoid consuming memory assume that 1st run of bpf * prog won't be doing more than 4 map_update_elem from * irq disabled region @@ -392,7 +414,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) WARN_ON(cnt < 0); - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) irq_work_raise(c); return llnode; } @@ -425,7 +447,7 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) local_dec(&c->active); local_irq_restore(flags); - if (cnt > HIGH_WATERMARK) + if (cnt > c->high_watermark) /* free few objects from current cpu into global kmalloc pool */ irq_work_raise(c); } From patchwork Thu Sep 1 16:15:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962892 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 562C2ECAAD3 for ; Thu, 1 Sep 2022 16:16:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF54580008; Thu, 1 Sep 2022 12:16:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D7D3D6B00A5; Thu, 1 Sep 2022 12:16:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF76C80008; Thu, 1 Sep 2022 12:16:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AA44F6B00A4 for ; Thu, 1 Sep 2022 12:16:26 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7670DAAA94 for ; Thu, 1 Sep 2022 16:16:26 +0000 (UTC) X-FDA: 79864019172.17.1959572 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf13.hostedemail.com (Postfix) with ESMTP id 21D622003E for ; Thu, 1 Sep 2022 16:16:26 +0000 (UTC) Received: by mail-pl1-f178.google.com with SMTP id p18so17514603plr.8 for ; Thu, 01 Sep 2022 09:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=E3IIcfWIiXxIzRfH4Teq4SGpQHcE0DFK2pxedM/bi6U=; b=h4JxG7EHWn6gH8d4PdDXt93tPoarHykZl4ZEXHul5DEiamHC1+eXWVXNtWcikU3M4S 2M9o16QqcAot2+Ho8M7PnCOfKMWU2MqsBz4wzI1xanRveUEM5bFZdV2M4PVys8svt++j 8IVsqyfuf5//I6YAz8nOIsySmqUbVBWu1hnilNEott4t9BKTYuvilWJm4iv9mKE3lfVh JxuQKuaKEWWCBtMWvBGEVYfRMIU30sTUoybgCqTbbiDfRFtCEEhVzp9wpHAExO3y1rsF vez+mdNe84EtvIR/0iWjhv9rgmQMWcXD3B9V2vzZM98tNZJUTVuEimIb2GZGca5FpDXq jGuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=E3IIcfWIiXxIzRfH4Teq4SGpQHcE0DFK2pxedM/bi6U=; b=5zCNhaZZtsoLxxuuroDI3ymgvDxdQu1v7y0Zusb6/VBmJICxP/0cEjBHbhUH5SrIWx MNsgBALkeaR3lWelbCSzgxWHw8JeL7kViRIce222NNyqrJw4EcAwZkj+PJGEF6iNCmHp 8WEeN4efESWdEoCeZ0e1PXDMsPjPi+/oVSu71H5nAazszH9pLz5PpZeKOAhlmkrA4kL/ eh/7PUm0+tCOkW99zbWBazy2tEYeyC6cQFoz97GS7EFLCaLcoH/GNFDjIvQAf4aph7ot aMSdbkIxb4IWrjADWznvIaZ4p5AKgDpL02+HWHxsfcOW0ahJgmrVFxA+eMRjRjesd3gk 9fIA== X-Gm-Message-State: ACgBeo3iNUOax/HE75owVDErq7vm2Mhjam7Ivfy7h+4DNdh0xjNTR0HJ wN8GvDp6URiBRgfQWBNMPiR9W0CXzUM= X-Google-Smtp-Source: AA6agR5tqpqxI4VRqofU2G1ViiKFvtf6aVOiu/vbk+25507IFBS9pJ58KZllVhU0kz66Ljt/6v0gbw== X-Received: by 2002:a17:902:b408:b0:175:4641:ee97 with SMTP id x8-20020a170902b40800b001754641ee97mr7607531plr.140.1662048985139; Thu, 01 Sep 2022 09:16:25 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id 35-20020a630d63000000b0042a55fb60bbsm5455027pgn.28.2022.09.01.09.16.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:24 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. Date: Thu, 1 Sep 2022 09:15:41 -0700 Message-Id: <20220901161547.57722-10-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048986; a=rsa-sha256; cv=none; b=SjQ3S4g34XeeZMq+JwWiA0PWQPT0uGg70Y4JeLo4xJ6jdifbB/uULOfAiHiJBUoIV+3Cwj tYL3s9lnCJx+dagxO01DAQT8UXlYVsTcaghuh8XD94Qd2TBVPmVO7X0Hoo63pwdMLBhAI0 dcmebBwfDD/gTI4p6GnNZcj+dbNOGZo= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=h4JxG7EH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048986; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E3IIcfWIiXxIzRfH4Teq4SGpQHcE0DFK2pxedM/bi6U=; b=RU6qa/RmpccqUnUATihAd99m2oQrgCr3v3eIE37EZIu/wK6NQZkNTLDRZ3C13ey3LLTnYk dO7JywWLAphcgJMXMPHU/G/g8zFy7nJeDTxh5D6CAFr/eeHOsdufVKIMlVzzlGbS/POLXK aaA0o2+dMq66a514QSx6ejijxBqRpII= X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 21D622003E X-Stat-Signature: m54yf3see7pn7qmkai1bm88o416zyomg Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=h4JxG7EH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-HE-Tag: 1662048986-328832 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 64 +++++++++++++++++++++++++++++++++++++++++-- kernel/bpf/syscall.c | 5 +++- 2 files changed, 65 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 7e5df6866d92..2d553f91e8ab 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,11 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + + struct rcu_head rcu; + struct llist_head free_by_rcu; + struct llist_head waiting_for_gp; + atomic_t call_rcu_in_progress; }; struct bpf_mem_caches { @@ -194,6 +199,45 @@ static void free_one(struct bpf_mem_cache *c, void *obj) kfree(obj); } +static void __free_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct llist_node *llnode = llist_del_all(&c->waiting_for_gp); + struct llist_node *pos, *t; + + llist_for_each_safe(pos, t, llnode) + free_one(c, pos); + atomic_set(&c->call_rcu_in_progress, 0); +} + +static void enque_to_free(struct bpf_mem_cache *c, void *obj) +{ + struct llist_node *llnode = obj; + + /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. + * Nothing races to add to free_by_rcu list. + */ + __llist_add(llnode, &c->free_by_rcu); +} + +static void do_call_rcu(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + if (atomic_xchg(&c->call_rcu_in_progress, 1)) + return; + + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + /* There is no concurrent __llist_add(waiting_for_gp) access. + * It doesn't race with llist_del_all either. + * But there could be two concurrent llist_del_all(waiting_for_gp): + * from __free_rcu() and from drain_mem_cache(). + */ + __llist_add(llnode, &c->waiting_for_gp); + call_rcu(&c->rcu, __free_rcu); +} + static void free_bulk(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; @@ -212,12 +256,13 @@ static void free_bulk(struct bpf_mem_cache *c) local_dec(&c->active); if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); - free_one(c, llnode); + enque_to_free(c, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) - free_one(c, llnode); + enque_to_free(c, llnode); + do_call_rcu(c); } static void bpf_mem_refill(struct irq_work *work) @@ -303,7 +348,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; @@ -345,6 +390,15 @@ static void drain_mem_cache(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; + /* The caller has done rcu_barrier() and no progs are using this + * bpf_mem_cache, but htab_map_free() called bpf_mem_cache_free() for + * all remaining elements and they can be in free_by_rcu or in + * waiting_for_gp lists, so drain those lists now. + */ + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + free_one(c, llnode); + llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp)) + free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) free_one(c, llnode); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) @@ -366,6 +420,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) kmem_cache_destroy(c->kmem_cache); if (c->objcg) obj_cgroup_put(c->objcg); + /* c->waiting_for_gp list was drained, but __free_rcu might + * still execute. Wait for it now before we free 'c'. + */ + rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4e9d4622aef7..074c901fbb4e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -638,7 +638,10 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) bpf_map_free_id(map, do_idr_lock); btf_put(map->btf); INIT_WORK(&map->work, bpf_map_free_deferred); - schedule_work(&map->work); + /* Avoid spawning kworkers, since they all might contend + * for the same mutex like slab_mutex. + */ + queue_work(system_unbound_wq, &map->work); } } From patchwork Thu Sep 1 16:15:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BEFFECAAD8 for ; Thu, 1 Sep 2022 16:16:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 087BD80009; Thu, 1 Sep 2022 12:16:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 036C96B00A6; Thu, 1 Sep 2022 12:16:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF2E580009; Thu, 1 Sep 2022 12:16:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CB24F6B00A5 for ; Thu, 1 Sep 2022 12:16:30 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A6A031A0820 for ; Thu, 1 Sep 2022 16:16:30 +0000 (UTC) X-FDA: 79864019340.17.1F6B3E9 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf06.hostedemail.com (Postfix) with ESMTP id 37650180045 for ; Thu, 1 Sep 2022 16:16:29 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id h188so1508530pgc.12 for ; Thu, 01 Sep 2022 09:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=On4mfWGsobRP6ONryR+3hJNZx943vy0r81zwFWx6UsI=; b=T1sMyAz2k+jwEJIpB0VpRg7rB3qTe5n4xNysUkR5nSPzNQKpZqVLu36ORZ5JZNXW3O KS7vMyV4R316Ad1qMOO26qdItT7uQqZ+pIDqGb1SDxs66wuYb0rHPlboxJnzzbx5N7mr tOcW5HREfybg9yJojmiaESNXlxD+T8+PtaKYpwd8ZDF6kar1hU2035jQqnWbGWEA89E1 U/jqPHROp1Y8pYOhYgj/KGKgxkfVD1YohXrHcGzeEKeixYMI+zV8MqoJ8LVZxN8xyzYz NOy5mjrQPqr/yrI+5dbrpb5hKEo+QgjGkQTURcW/IcH+j2ysWBke1z5f9tery71FnjTw xmnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=On4mfWGsobRP6ONryR+3hJNZx943vy0r81zwFWx6UsI=; b=G66w18T8iNPtInRBu4hIvRwcVBEqVVOCSuQfX1E6rM/CHjVvlUBbm0SJyiC9VjX3hd XOVwCzu/wmDzIhRMGVjAoNyECbyl+nFCVm4SCYOgwrr9kS4enxY62BUSsggnCxFCUTaM KZNDy43GSHVE1RVaUrVBrvmVzvFvDWa0Dez+cu0JB/t22IczV56mdJZS7+3cFDd1W507 bujKQeiR9eATTOIzE5vSk4rj8f9o5GeVz/6tXjYAuJReEmTM/MjPxppYDH+J3z9Zv6iY eGyXPsF2YO47TvMln5gglYqNWhFQEGCIJ88lkb5jpUTXH9xrt7OAL8/AYzbvdB5IMMVz tWAQ== X-Gm-Message-State: ACgBeo2gjb5EVe3sMspW4O2/dEIgjcGfwF85P8DSVnzd29PdsdCbl38Z lhH3NePfcsd0dZy2hPnvF/0= X-Google-Smtp-Source: AA6agR5hn1NCP3rD30mFOoRnelDRFSF3qxYhNnbb2SnXTclQjNnqwkhxo3/Ihlo5427fZxzV4Jt7Yg== X-Received: by 2002:aa7:9019:0:b0:535:fb2e:4ae9 with SMTP id m25-20020aa79019000000b00535fb2e4ae9mr31639571pfo.72.1662048988822; Thu, 01 Sep 2022 09:16:28 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id b14-20020aa78ece000000b0052d98fbf8f3sm13648940pfr.56.2022.09.01.09.16.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:28 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc. Date: Thu, 1 Sep 2022 09:15:42 -0700 Message-Id: <20220901161547.57722-11-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048990; a=rsa-sha256; cv=none; b=09/RpNRkh5XoSOyNKbU7PIIVntdMFQeU9b3YYM+i22Dgs6QnGNNr4wdnB76OXdauTqrrQC y5vwNKL4C23BcImyC7Sg2Oqp591uozeizKVxxsSC+YKqUHrjgZtgqCb+a6h1E8+dVa8PrD R+Xpa3RDJ3lFMhvGRnhfNbRsahld2h8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=T1sMyAz2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048990; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=On4mfWGsobRP6ONryR+3hJNZx943vy0r81zwFWx6UsI=; b=o/2DDns+fbf1Jk2j1gn6VhK3PSZPETBMtEKEd1pOAtnhnwqgL9FWci/iE/7xACTzIcQS2T bvYUPHpn8teLPw9gov3uoBv27oihlKirvmMJ1A2VSO/zMToMN7yC7PUxXXHTjctHnU/wAv r9Bl21RNXjhcbrXE5tjxGjywRAfyVR8= X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 37650180045 X-Stat-Signature: uk8oqdjitj6kybjf5qgnykxgofyasccu Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=T1sMyAz2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-HE-Tag: 1662048989-682268 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 +- kernel/bpf/hashtab.c | 2 +- kernel/bpf/memalloc.c | 44 +++++++++++++++++++++++++++++++---- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 804733070f8d..653ed1584a03 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,7 +12,7 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 0d888a90a805..70b02ff4445e 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -607,7 +607,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 2d553f91e8ab..967ccd02ecb8 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -101,6 +101,7 @@ struct bpf_mem_cache { /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; + bool percpu; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -133,6 +134,19 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + if (c->percpu) { + void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); + + if (!obj || !pptr) { + free_percpu(pptr); + kfree(obj); + return NULL; + } + obj[1] = pptr; + return obj; + } + if (c->kmem_cache) return kmem_cache_alloc_node(c->kmem_cache, flags, node); @@ -193,6 +207,12 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { + if (c->percpu) { + free_percpu(((void **)obj)[1]); + kmem_cache_free(c->kmem_cache, obj); + return; + } + if (c->kmem_cache) kmem_cache_free(c->kmem_cache, obj); else @@ -332,21 +352,30 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache; + struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; char buf[32]; - int cpu, i; + int cpu, i, unit_size; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - size += LLIST_NODE_SZ; /* room for llist_node */ + + if (percpu) { + unit_size = size; + /* room for llist_node and per-cpu pointer */ + size = LLIST_NODE_SZ + sizeof(void *); + } else { + size += LLIST_NODE_SZ; /* room for llist_node */ + unit_size = size; + } + snprintf(buf, sizeof(buf), "bpf-%u", size); kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { @@ -359,14 +388,19 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); c->kmem_cache = kmem_cache; - c->unit_size = size; + c->unit_size = unit_size; c->objcg = objcg; + c->percpu = percpu; prefill_mem_cache(c, cpu); } ma->cache = pc; return 0; } + /* size == 0 && percpu is an invalid combination */ + if (WARN_ON_ONCE(percpu)) + return -EINVAL; + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; From patchwork Thu Sep 1 16:15:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAFEBECAAD3 for ; Thu, 1 Sep 2022 16:16:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 431C78000A; Thu, 1 Sep 2022 12:16:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B9D56B00A7; Thu, 1 Sep 2022 12:16:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20C1F8000A; Thu, 1 Sep 2022 12:16:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0BD846B00A6 for ; Thu, 1 Sep 2022 12:16:34 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CC44B80ED5 for ; Thu, 1 Sep 2022 16:16:33 +0000 (UTC) X-FDA: 79864019466.31.9459D62 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf01.hostedemail.com (Postfix) with ESMTP id 80A0B4004B for ; Thu, 1 Sep 2022 16:16:33 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id 72so17912180pfx.9 for ; Thu, 01 Sep 2022 09:16:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=EQZ2xzNN/0pRofenn394iYwAnngKtYZO4PkW25lRchNBqSfY53KPDiiOeBvxvvZ2Sh BXUEvCBjiu+lAg197XopYD4oTFaLqlxN2KbfMtcm9MEw7/Eg6rV4HkYrEQ9GoSO0WsOd vTos5yTnDCE8oWBmdsFxcLg7RScP2U4/3XtIKyAiB9MeFUwkpEbg/cDA1JACSQhM5ZyT srhMbxha1n3ZpmbtIcA+Fj4/IZC/3k/Fe0hvO/kARSJLgzQ/1nCWuhgpGWplDoXAS8su wAqGbvqgvWdhXM9euXA/jQsxhCCpvlFNajxML4kHDMVMLSe7PqyEgvSFzK4dLUSOWDRj JocA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=rDMiS/tdqJDj9dHsrMEqtoYnI/NTnzyoo8pHALJi4yoKgPnAGh43CZYKweOVU55lSy gilc78sEjszD89/JXTFMRq59QBAR5FasQhuWrnxRFXlo2lxn6jOB5dOYlld7BOiSaavO ms5rVGm96k3meAj0ZBHz7Od9/RxNwNrSV+Vzf5aRqYf5uG+akUF/QIJjJNj8u2DGxzHV y5uA1MVN1qr6g9Fokfvk4F7U/vnafAfXmM/SIAZp/yoy4JvKrSx6PeMg7it2jxq18Y2V jOKTLe9R9W9VVO+L2bYxnJA21pLtAP82yY+BbeWXO4MGmBQdoJP8kksSkezyHpyvXGma ACUg== X-Gm-Message-State: ACgBeo1MhUVOHW7mqfD7uonhol5QuOWy9HUSLIoA0NF+QxXHBy2gOc56 tJaTIxhDwOM9BVLpShPjGbE= X-Google-Smtp-Source: AA6agR6stlm8Bfa1pY7zMhm7F5fF+hwY+KVk2t1NXgdhaVgBQk9p7W4bd5KMa4PGlvwVGZd0nkoPjA== X-Received: by 2002:a63:2a49:0:b0:41d:95d8:3d3d with SMTP id q70-20020a632a49000000b0041d95d83d3dmr26749135pgq.43.1662048992561; Thu, 01 Sep 2022 09:16:32 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id 198-20020a6216cf000000b0052b84ca900csm13611271pfw.62.2022.09.01.09.16.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:32 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. Date: Thu, 1 Sep 2022 09:15:43 -0700 Message-Id: <20220901161547.57722-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048993; a=rsa-sha256; cv=none; b=ylCUBT//akvR/0uY2Y8POgron0QVbrb25J+bH22fH1DjyGzSKj1Z/8pq+9ExVTIIuhbRtI QHSYjbGFELULSR4TLSGDmgGZ8rTAVhx4laTpysJI+b6Vl709mBzk7ZEoxtt466cr7eicxf TvXGd2ETCf5en72umMXgEgg1U5RLVzE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=EQZ2xzNN; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1exLg6zwiw8VbHVoGF/maqudTmvyPHhqAhRpXlF+ss0=; b=YTSlSaEHAnvhUCytYjvvc8rUUfw7/FJf5T8pf6g7Q2+TjIGFzHAB+To5huKcIx3WDa/g6h NngUpEXXuxMDnBi5TWQdhgaysgTYBsCCW4ayobVKI/80h+YchY+7HWyJzFnzGS0znt8DCk TKE+AVliEJ6bTwfUikOKPvMsd1ylrdQ= X-Rspam-User: X-Stat-Signature: gjgqeda4ndm4kyc84m3yyzhhbsyt7swk X-Rspamd-Queue-Id: 80A0B4004B X-Rspamd-Server: rspam06 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=EQZ2xzNN; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1662048993-523156 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to zero-fill per-cpu allocations, since dynamically allocated map elements are now similar to full prealloc, since alloc_percpu() is not called inline and the elements are reused in the freelist. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 45 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 70b02ff4445e..a77b9c4a4e48 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -94,6 +94,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; + struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { @@ -121,14 +122,14 @@ struct htab_elem { struct { void *padding; union { - struct bpf_htab *htab; struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { - struct rcu_head rcu; + /* pointer to per-cpu pointer */ + void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; @@ -448,8 +449,6 @@ static int htab_map_alloc_check(union bpf_attr *attr) bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); - BUILD_BUG_ON(offsetof(struct htab_elem, htab) != - offsetof(struct htab_elem, hash_node.pprev)); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); @@ -610,6 +609,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; + if (percpu) { + err = bpf_mem_alloc_init(&htab->pcpu_ma, + round_up(htab->map.value_size, 8), true); + if (err) + goto free_map_locked; + } } return &htab->map; @@ -620,6 +625,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); @@ -895,19 +901,11 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) - free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); + bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); check_and_free_fields(htab, l); bpf_mem_cache_free(&htab->ma, l); } -static void htab_elem_free_rcu(struct rcu_head *head) -{ - struct htab_elem *l = container_of(head, struct htab_elem, rcu); - struct bpf_htab *htab = l->htab; - - htab_elem_free(htab, l); -} - static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; @@ -953,12 +951,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); - } else { - htab_elem_free(htab, l); - } + htab_elem_free(htab, l); } } @@ -983,13 +976,12 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus) { - /* When using prealloc and not setting the initial value on all cpus, - * zero-fill element values for other cpus (just as what happens when - * not using prealloc). Otherwise, bpf program has no way to ensure + /* When not setting the initial value on all cpus, zero-fill element + * values for other cpus. Otherwise, bpf program has no way to ensure * known initial values for cpus other than current one * (onallcpus=false always when coming from bpf prog). */ - if (htab_is_prealloc(htab) && !onallcpus) { + if (!onallcpus) { u32 size = round_up(htab->map.value_size, 8); int current_cpu = raw_smp_processor_id(); int cpu; @@ -1060,18 +1052,18 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, memcpy(l_new->key, key, key_size); if (percpu) { - size = round_up(size, 8); if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ - pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_NOWAIT | __GFP_NOWARN); + pptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!pptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } + l_new->ptr_to_pptr = pptr; + pptr = *(void **)pptr; } pcpu_init_value(htab, pptr, value, onallcpus); @@ -1568,6 +1560,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); From patchwork Thu Sep 1 16:15:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62546ECAAD3 for ; Thu, 1 Sep 2022 16:16:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 039C06B00A7; Thu, 1 Sep 2022 12:16:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2B1E8000C; Thu, 1 Sep 2022 12:16:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA4E78000B; Thu, 1 Sep 2022 12:16:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C53186B00A7 for ; Thu, 1 Sep 2022 12:16:37 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9BFABAAF62 for ; Thu, 1 Sep 2022 16:16:37 +0000 (UTC) X-FDA: 79864019634.07.9E21463 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf10.hostedemail.com (Postfix) with ESMTP id 53CAAC004B for ; Thu, 1 Sep 2022 16:16:37 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id y127so17940974pfy.5 for ; Thu, 01 Sep 2022 09:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=0fGyRfQR7wXSaL6USjScrGPvvIFh66CwdJrIvv58a5g=; b=moJ3QGwNSKtYu6wWQdRUAqn5vj/OGKF7dCZIO4PjfQTweG7WVmeKw0iOVyYrtXJiDx l9BwYgFNLwLHikDnkkSagE0/FIh2cyXKDA+EPfE+BEaD8B7kq4ahMptMml1suYP9ycH3 HAzsZJhAf+DFN15Ykj/1O8MFENQrBh1KNA5N4AStyIOHaEYlOtwf3QJhqNdRbnagLlj4 Mufw7a3+axZD0NZXcGG6Y+tlD7hjAw7hA0khvYK35msgOJXpFagZF6t1lMAZtEDEpF5Y qUr4/1Nvwb7kTr/M3DhZdxgalroISYL0M/ViOhOtkbpvi0bt0vsX5MY4s2TkDuEcHl9L WJqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=0fGyRfQR7wXSaL6USjScrGPvvIFh66CwdJrIvv58a5g=; b=Mm6rDSpvnuvei39VYPeBY5zv9yIgxpySkR9jP0WcU6Hi/eltiuWHwCnWz6ItXCVRM4 wvtphmqm8lXB4LG/Vv3kOvdf7x3pyzvm11FWNFvk5vSdj2Ul6tJAK0w+v+xw9QVxIIBf x2kZeRfugrp5yqd89xvf4vyhQuJDk1khJUVIO3g9Fu83W88bDV834V19caveK+n8olml ZFd+GtlJoyLgqJ5AAa7nTgo3XrgieP2dlnzq6uonT+kLk1hiFQZPsxS2RkorA61KvQeQ zgfxo/M0qoslX7kMxVFjVazTy5K6vDYByMljEodz/PvXmkJSSCKifox0tKQ+cC2+/PAG leAA== X-Gm-Message-State: ACgBeo0sFpyEooto43iQdbZLiD1ohiSUQwfqE9ppJAaMWvQOrlfBb5zZ 0g5GPo+rGGTrsjqxKtacZbc= X-Google-Smtp-Source: AA6agR7LjjuE3wBIwKQp/cjDicVkre1L5Fbr8voVsdCnosFf0RhuKL47jGob3bHkfRIqJC/40s3vEw== X-Received: by 2002:a63:450:0:b0:42b:c914:a0fc with SMTP id 77-20020a630450000000b0042bc914a0fcmr20695445pge.317.1662048996291; Thu, 01 Sep 2022 09:16:36 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id d2-20020a170902f14200b00173368e9dedsm10798175plb.252.2022.09.01.09.16.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:35 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 12/15] bpf: Remove tracing program restriction on map types Date: Thu, 1 Sep 2022 09:15:44 -0700 Message-Id: <20220901161547.57722-13-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662048997; a=rsa-sha256; cv=none; b=4ksLGozJa4pwIBRhnpTGogj5kGsuMNrbi7GV6KGi2mLJdhGutYYWES3ynXiIQGhs6IALxD sLJq787q8pzi4aWWydpELry6qhbT77T5UFAi2pXPcc97lD81usP4SBz8jKD2dmHO66lEFS JWUKclJLiXWbn9Lf30TTAmJrs8zpdgE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=moJ3QGwN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662048997; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0fGyRfQR7wXSaL6USjScrGPvvIFh66CwdJrIvv58a5g=; b=dlEsLzkoaQsRaF+mqX+ZPtO50kMA85uly4qhsIu4ZwghYGG/TmtKx2wEndb4MnZKMs+wQ7 jJe9DHuOX+1KByjQQrUgylgetDPQHQ12H1BXnL6cFMfcl9f+FpO0A0w6rlYCwsX2UJQNQ3 SfH276mlVe/l9i7GW3431dk/WruMTvM= X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 53CAAC004B X-Stat-Signature: 4hsrgk5qgfp6tsgi1u8w1ehhq1q8bydg Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=moJ3QGwN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-HE-Tag: 1662048997-687345 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The hash map is now fully converted to bpf_mem_alloc. Its implementation is not allocating synchronously and not calling call_rcu() directly. It's now safe to use non-preallocated hash maps in all types of tracing programs including BPF_PROG_TYPE_PERF_EVENT that runs out of NMI context. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3dce3166855f..57ec06b1d09d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12623,48 +12623,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, { enum bpf_prog_type prog_type = resolve_prog_type(prog); - /* - * Validate that trace type programs use preallocated hash maps. - * - * For programs attached to PERF events this is mandatory as the - * perf NMI can hit any arbitrary code sequence. - * - * All other trace types using non-preallocated per-cpu hash maps are - * unsafe as well because tracepoint or kprobes can be inside locked - * regions of the per-cpu memory allocator or at a place where a - * recursion into the per-cpu memory allocator would see inconsistent - * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is - * safe to use from kprobe/fentry and in RT. - * - * On RT enabled kernels run-time allocation of all trace type - * programs is strictly prohibited due to lock type constraints. On - * !RT kernels it is allowed for backwards compatibility reasons for - * now, but warnings are emitted so developers are made aware of - * the unsafety and can fix their programs before this is enforced. - */ - if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { - if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { - /* perf_event bpf progs have to use preallocated hash maps - * because non-prealloc is still relying on call_rcu to free - * elements. - */ - verbose(env, "perf_event programs can only use preallocated hash map\n"); - return -EINVAL; - } - if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || - (map->inner_map_meta && - map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, - "trace type programs can only use preallocated per-cpu hash map\n"); - return -EINVAL; - } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, - "trace type programs with run-time allocated per-cpu hash maps are unsafe." - " Switch to preallocated hash maps.\n"); - } - } if (map_value_has_spin_lock(map)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { From patchwork Thu Sep 1 16:15:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4715ECAAD1 for ; Thu, 1 Sep 2022 16:16:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 854226B00A9; Thu, 1 Sep 2022 12:16:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FF298000C; Thu, 1 Sep 2022 12:16:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 679008000B; Thu, 1 Sep 2022 12:16:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 55A8D6B00A9 for ; Thu, 1 Sep 2022 12:16:41 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 316F11A01D5 for ; Thu, 1 Sep 2022 16:16:41 +0000 (UTC) X-FDA: 79864019802.04.AB64E8F Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf04.hostedemail.com (Postfix) with ESMTP id E30804004D for ; Thu, 1 Sep 2022 16:16:40 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id m10-20020a17090a730a00b001fa986fd8eeso2931729pjk.0 for ; Thu, 01 Sep 2022 09:16:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=42e36wx7XXKoNRKIF9mVOArDC422WoW66dv9/nAsqMQ=; b=mEyjFDXtAvOlKfUWKYjrinEOWqFoQlAn3j+ZAMCU0tYmuUaQh3eq4Gze2eLJ369IRz nyeIQmdP7+dprrgfnAvrx30leEeMSmCN4Lrneqq7t7lEk8doeNLsV5xpp0HGxTkmBFvp CNUGH8KNInPDESJ6RdgFJpSf/sUycGWv1RZZ4oljJuad1tZlbQSi32D9Qs1IUzGHTEJi Ww9Thux9nyc+ke7zAO7lkLoEelNDn+ZE5t2wt70H4U/zFMcpfc9blv7tHJIcoVn35zCY qEl1SOf3Drg+71f+mYeZkmi+HueO9olNdFACZ/kYvl6KjAUT4S0K5mx+ogSzqcUww2XK l9tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=42e36wx7XXKoNRKIF9mVOArDC422WoW66dv9/nAsqMQ=; b=W2ScDu79fRo32xitQ7iF+wSF1R/+awEVOEYUqWiSrI9x3/vyLpCOvvACEdOjAkGDBV UE90bxU1GwCmjiV3XxxwJI5buQ9Rfo8BLLMJ5v/kPudllU/+wMBWOAdCawRSWXTfEc74 X+Wz4qomVt1uLLwZNKr3B6jq5Yf+wtVr8eI0NoR4WzybJOJgpzq0VPpW2Wz5x/dCA0vs tYUYUvvo11KNnZb7C/2aMUmKImHdCiJmFW60GCyqRw6mlQeXR2yiyRI7Tku+n81SvTMl icsxTYNTJocoaqqM5Tu2zujokfCBLFPDR4XR/y83ukAIqPbTw9TwKtShbpUA0AF51K2G Uk2Q== X-Gm-Message-State: ACgBeo1M5P86C9hxvRO29djQSO7IQtMYccTwX3d1AS87DJk8FkKCrnY4 47se4pDOrn3OnbEx2o/PpbQ= X-Google-Smtp-Source: AA6agR78xK3PbbrN1He9C+ueQ4NagXOAMJTErV01ObA/z2a5A07FWyxOdPTRRTZfgo4z9Aj3X3A58Q== X-Received: by 2002:a17:903:32cc:b0:174:e627:4909 with SMTP id i12-20020a17090332cc00b00174e6274909mr17798812plr.67.1662048999937; Thu, 01 Sep 2022 09:16:39 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id mh16-20020a17090b4ad000b001f8aee0d826sm3498191pjb.53.2022.09.01.09.16.38 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:39 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. Date: Thu, 1 Sep 2022 09:15:45 -0700 Message-Id: <20220901161547.57722-14-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662049000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=42e36wx7XXKoNRKIF9mVOArDC422WoW66dv9/nAsqMQ=; b=5wVGLJnKLvTLfydkCSK9lIfb0ULc4x2+8A/ba7+1WplE6esKGEL3aLMR8tpAl3LzRNUMCV VSCA9q2mV6iSCDmLIs3ZOya0yfTzIQpZW5CSnS5rU9jK7u4RPNIpA8LL07YWm6E2BSC8ZL wOaPrV/q1ni2F3MYSga18vCwcAbDDYk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mEyjFDXt; spf=pass (imf04.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662049000; a=rsa-sha256; cv=none; b=EaKhzfuL0r1EAbKvAaAoAj9+ZAP7bZK5wQer2zpwiC4d1CjNMbtVBks2O/34lugrhT4oYH fLYDg8vh2Efr9huU9lzilHQ2ADU+Bnyo2Ch3gYCAtJfSDQSEk9UkFkh8FJq2PWWAJaLO4u lRUSYNjs/ImE6tKdg8dOBwE1mTWmLs4= Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mEyjFDXt; spf=pass (imf04.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam11 X-Stat-Signature: 78cfdzwmricmdqznbra73xtwp75gbu1x X-Rspamd-Queue-Id: E30804004D X-Rspam-User: X-HE-Tag: 1662049000-541801 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Use call_rcu_tasks_trace() to wait for sleepable progs to finish. Then use call_rcu() to wait for normal progs to finish and finally do free_one() on each element when freeing objects into global memory pool. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 967ccd02ecb8..a66bca8caddf 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -230,6 +230,13 @@ static void __free_rcu(struct rcu_head *head) atomic_set(&c->call_rcu_in_progress, 0); } +static void __free_rcu_tasks_trace(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + + call_rcu(&c->rcu, __free_rcu); +} + static void enque_to_free(struct bpf_mem_cache *c, void *obj) { struct llist_node *llnode = obj; @@ -255,7 +262,11 @@ static void do_call_rcu(struct bpf_mem_cache *c) * from __free_rcu() and from drain_mem_cache(). */ __llist_add(llnode, &c->waiting_for_gp); - call_rcu(&c->rcu, __free_rcu); + /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. + * Then use call_rcu() to wait for normal progs to finish + * and finally do free_one() on each element. + */ + call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) @@ -457,6 +468,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) /* c->waiting_for_gp list was drained, but __free_rcu might * still execute. Wait for it now before we free 'c'. */ + rcu_barrier_tasks_trace(); rcu_barrier(); free_percpu(ma->cache); ma->cache = NULL; From patchwork Thu Sep 1 16:15:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEBD0ECAAD8 for ; Thu, 1 Sep 2022 16:16:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67ABB6B00AB; Thu, 1 Sep 2022 12:16:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 605978000C; Thu, 1 Sep 2022 12:16:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 455D48000B; Thu, 1 Sep 2022 12:16:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 34E406B00AB for ; Thu, 1 Sep 2022 12:16:45 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0BB4A140FD7 for ; Thu, 1 Sep 2022 16:16:45 +0000 (UTC) X-FDA: 79864019970.16.95CD729 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf09.hostedemail.com (Postfix) with ESMTP id A35FE14005D for ; Thu, 1 Sep 2022 16:16:44 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id bh13so16831928pgb.4 for ; Thu, 01 Sep 2022 09:16:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=nsaRt32bGU9db3oBvsXD1o3WCeCEvT2VQ8VQ0NN+DuQ=; b=AjGkvzQarWREX80MYqd521cobbbisB71tXVBt1zkuUzMMblPgBytWVeffcjAsKO3Xz WjwHNd3YDPp3J5q6qPDwlmeM7rj8rI7Sh0TXuIycZXCgX0enJAZapcMD0ckYYFoQtFVy ZtUUiel+e5KqtE4NKw1sk8BkdxyYUIRt2NAQCGtdB22vxAkOWkVs/0YZ9kWoUVPB2jQc DsESX8cCxppTmRJnxVkf1ZtWD59gamB4XwYvrnCwQA6COoa5Y8sOqtmOMyJy64A5wHkW SXih0ASCqSYrA41JsmYRpK3MaOHJ3wtX4QOhvV8JapSVB9XMI9FlSg4dSP5Vx5qe/bbF o05g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=nsaRt32bGU9db3oBvsXD1o3WCeCEvT2VQ8VQ0NN+DuQ=; b=PsBwVMDFIZVOCY47jMtVMnQFZrESW5lVGKVW76kBny+aT1uEw25/6lpHsIhMAeke2J uk7lw3H2Fk5NuC7Qee4x0aFYSqVwCaCHNFOr8s34aRkUglWhjFhDkVWBEeFTa43VXjVi zHazV0QGtA25W57k5uE6G82sX3grS73Iz9WZ+hvs6+BmLBieA8IUE+QHn4lH86V8YTZ+ 77OeluYskhmscXpggmYMyXYZsJJ7LuuBA1cTlmSAtmg3Zmb17N7T5Zgk4L/PyurzWhUl RtN2M8/FQbWrdw3GNoI/qJa+Qwt1Wj+ZASDYyEQdOLMRqrLTswe4XFB5EKQ2oYjFsnPf A/PQ== X-Gm-Message-State: ACgBeo3cLfBE4f93A1/PoCLygIK0DSGPU1SnngywTgN2kyGmVSpKGONi jvFmNLuUWmiAt2m/il/eNfc= X-Google-Smtp-Source: AA6agR5fl5WH3XWfSQxaOYgWoidwKTZaD/1MGsZn98/okbXbZF5fM8ZiNCbJW1FLOk6IwQZCyO5jDQ== X-Received: by 2002:a63:6cc7:0:b0:42b:7d8f:7136 with SMTP id h190-20020a636cc7000000b0042b7d8f7136mr25223605pgc.15.1662049003655; Thu, 01 Sep 2022 09:16:43 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id a9-20020a637f09000000b004215af667cdsm5537666pgd.41.2022.09.01.09.16.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:43 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 14/15] bpf: Remove prealloc-only restriction for sleepable bpf programs. Date: Thu, 1 Sep 2022 09:15:46 -0700 Message-Id: <20220901161547.57722-15-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662049004; a=rsa-sha256; cv=none; b=HzJYsoXtRqshG92vNcYL4cvKOMhAkaj8VqTBvlw//kPnT5GFDsmnIoz1YruRXkvcblaPCi V95q4ShtsAe86IJjE4qiJZbwKovrYxht2yynJHXXoCtyOYzEf1hGoSo2B0LBmQETRD87ae dpk4q7PzClz3FPxSbCLdJ6ySBdafd70= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AjGkvzQa; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662049004; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nsaRt32bGU9db3oBvsXD1o3WCeCEvT2VQ8VQ0NN+DuQ=; b=RR/pyPZkeRYX7Phcn4xZKjDiZV57rFl+RCNseyf+Nz/CrKL77ILQAuDalOFFpNp7v5x042 hqxI8igHXvLyT6sJFlSC7vcc9JRc2OHPz5Uummo8V3JSJ7fCLv4tw5YvhR44MWIuc2Gpfi ImjGr1g6ajTCLgxoaO4M8mS/gn0cVh8= Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AjGkvzQa; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A35FE14005D X-Stat-Signature: m3c48tefdpfkgcg96izj93t6fgt3ncea X-HE-Tag: 1662049004-368293 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since hash map is now converted to bpf_mem_alloc and it's waiting for rcu and rcu_tasks_trace GPs before freeing elements into global memory slabs it's safe to use dynamically allocated hash maps in sleepable bpf programs. Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 57ec06b1d09d..068b20ed34d2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12586,14 +12586,6 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env, return err; } -static int check_map_prealloc(struct bpf_map *map) -{ - return (map->map_type != BPF_MAP_TYPE_HASH && - map->map_type != BPF_MAP_TYPE_PERCPU_HASH && - map->map_type != BPF_MAP_TYPE_HASH_OF_MAPS) || - !(map->map_flags & BPF_F_NO_PREALLOC); -} - static bool is_tracing_prog_type(enum bpf_prog_type type) { switch (type) { @@ -12608,15 +12600,6 @@ static bool is_tracing_prog_type(enum bpf_prog_type type) } } -static bool is_preallocated_map(struct bpf_map *map) -{ - if (!check_map_prealloc(map)) - return false; - if (map->inner_map_meta && !check_map_prealloc(map->inner_map_meta)) - return false; - return true; -} - static int check_map_prog_compatibility(struct bpf_verifier_env *env, struct bpf_map *map, struct bpf_prog *prog) @@ -12669,12 +12652,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_LRU_PERCPU_HASH: case BPF_MAP_TYPE_ARRAY_OF_MAPS: case BPF_MAP_TYPE_HASH_OF_MAPS: - if (!is_preallocated_map(map)) { - verbose(env, - "Sleepable programs can only use preallocated maps\n"); - return -EINVAL; - } - break; case BPF_MAP_TYPE_RINGBUF: case BPF_MAP_TYPE_INODE_STORAGE: case BPF_MAP_TYPE_SK_STORAGE: From patchwork Thu Sep 1 16:15:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12962898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76081ECAAD3 for ; Thu, 1 Sep 2022 16:16:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 124776B00AD; Thu, 1 Sep 2022 12:16:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D3F58000C; Thu, 1 Sep 2022 12:16:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8F338000B; Thu, 1 Sep 2022 12:16:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D17CA6B00AD for ; Thu, 1 Sep 2022 12:16:48 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AFAC7A97E8 for ; Thu, 1 Sep 2022 16:16:48 +0000 (UTC) X-FDA: 79864020096.29.C040B2B Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf07.hostedemail.com (Postfix) with ESMTP id 691AF4000D for ; Thu, 1 Sep 2022 16:16:48 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id y127so17941494pfy.5 for ; Thu, 01 Sep 2022 09:16:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=RA+4cm8zTpkfrIIbrm6JTzVBZhwl0jvK9mFgoA+gZ08=; b=WTDm9cN/MTqNosWApeqoRFwUMxQyt6sycnu78tWJQwstlz38nDKFyySWLHO5dKtET9 4whPCzAXqPRoMzIO7LkjN7mboj4noKPcK3mtoyJZKOAHNRduA+pxxbXKYq4pylv3DC2D mUa++3IXnukgBwpjXSjjvKLRIAPuKSkEGicExeqt/pFAaDNOQ/cesXm8ap3N9cGxUYHT c0qCevyjPJ0QJzYL44H4cdsleZyICbOJD09gGzrI/KDcjEgIUkMX19uDF6kMNglQAZ1W 5S/UZujb2eQshskl3Ty1i52u49DXD0MFQ0yetcSKHuZePNX0J2zqR5KRv7Gu9oh2Uj7A q73w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=RA+4cm8zTpkfrIIbrm6JTzVBZhwl0jvK9mFgoA+gZ08=; b=ORjBnJvFjabm8Z0h0IVs6RANMi8UI3Uco0reEW5X7eQWkk7TxKD/nv3cOwcovXsnTG O80H4wgzzxw6vc/tsfQpeYl9LCI/OFsr6BcNNmbIp85OEhQEEcTVxotIZcBTXcnFJkK2 FwW8mjIeFYbtHcV1/dfC3WG572udT3+6AWFm5aT2xmwPIfs9DVE5OBvt4YlDIiyk43ie i2wxEoV1g1p++gMivQ5bV8ynvc4kKZ/fhctpm8PTyekyrSnvybcg1QWbfw3xpj12wlsT DMDDbC87I//KHRgSPCl/hnWKIjHo8QpNnFdIfxcGixokr862/MTwbMv5sZv/6pgLUqaG PSLg== X-Gm-Message-State: ACgBeo0KYMcTkxP06ulRysfddxcOryVzlf6XziXQJlGOC4exsI0DKegX xBBlmxcvcDl4oxe0xAxRPyo= X-Google-Smtp-Source: AA6agR6S/k89hEnQD77ag437aMY6R0fRShGYmVk3f/U2AnuECyQjbGfyShiEGwHyyEams6FPoNEAwQ== X-Received: by 2002:a63:69c7:0:b0:41c:590a:62dc with SMTP id e190-20020a6369c7000000b0041c590a62dcmr25912489pgc.388.1662049007352; Thu, 01 Sep 2022 09:16:47 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::3:4dc5]) by smtp.gmail.com with ESMTPSA id x18-20020aa78f12000000b0053788e9f865sm13448812pfr.21.2022.09.01.09.16.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 01 Sep 2022 09:16:46 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v5 bpf-next 15/15] bpf: Remove usage of kmem_cache from bpf_mem_cache. Date: Thu, 1 Sep 2022 09:15:47 -0700 Message-Id: <20220901161547.57722-16-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220901161547.57722-1-alexei.starovoitov@gmail.com> References: <20220901161547.57722-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662049008; a=rsa-sha256; cv=none; b=SV/ZdoDno893cVn6LC3TARouwTD3ir8UMxUVsOkjd9tbWCZLexiGF26Yfms/s7TZ8GAFly asZY1wgq0WAb0XV5p6GPjbDkasLlKCfh9B0zclPkpVXEF7Gg1iN7GfkzidWvcRLo04XbF1 itJlNFDyBQIPZrSyDq+IG9w73Wrgj6Y= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="WTDm9cN/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662049008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RA+4cm8zTpkfrIIbrm6JTzVBZhwl0jvK9mFgoA+gZ08=; b=jzfdgLW41YREeuj5ucFx5EsG9PyG/BWS9abz+XZjfekCDouPupr6FbwaIItmoNMf7pPic+ OuZQ96OjNnkVWdTiMynjWd+acj4g1zLkd3+5nKdo8YB7C5eco3eNlW4vBocze2CkN0ECQU 3QAK6IlXhmssTudHjVXk+LWGKvqRxkM= X-Rspamd-Server: rspam09 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="WTDm9cN/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Stat-Signature: tnmdf1wp31kmajup3rjtos8d8433dtrq X-Rspamd-Queue-Id: 691AF4000D X-HE-Tag: 1662049008-652882 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov For bpf_mem_cache based hash maps the following stress test: for (i = 1; i <= 512; i <<= 1) for (j = 1; j <= 1 << 18; j <<= 1) fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, i, j, 2, 0); creates many kmem_cache-s that are not mergeable in debug kernels and consume unnecessary amount of memory. Turned out bpf_mem_cache's free_list logic does batching well, so usage of kmem_cache for fixes size allocations doesn't bring any performance benefits vs normal kmalloc. Hence get rid of kmem_cache in bpf_mem_cache. That saves memory, speeds up map create/destroy operations, while maintains hash map update/delete performance. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 50 ++++++++++++------------------------------- 1 file changed, 14 insertions(+), 36 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index a66bca8caddf..00757b6fb294 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -91,17 +91,13 @@ struct bpf_mem_cache { */ struct llist_head free_llist_extra; - /* kmem_cache != NULL when bpf_mem_alloc was created for specific - * element size. - */ - struct kmem_cache *kmem_cache; struct irq_work refill_work; struct obj_cgroup *objcg; int unit_size; /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; - bool percpu; + int percpu_size; struct rcu_head rcu; struct llist_head free_by_rcu; @@ -134,8 +130,8 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; - if (c->percpu) { - void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + if (c->percpu_size) { + void **obj = kmalloc_node(c->percpu_size, flags, node); void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); if (!obj || !pptr) { @@ -147,9 +143,6 @@ static void *__alloc(struct bpf_mem_cache *c, int node) return obj; } - if (c->kmem_cache) - return kmem_cache_alloc_node(c->kmem_cache, flags, node); - return kmalloc_node(c->unit_size, flags, node); } @@ -207,16 +200,13 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { - if (c->percpu) { + if (c->percpu_size) { free_percpu(((void **)obj)[1]); - kmem_cache_free(c->kmem_cache, obj); + kfree(obj); return; } - if (c->kmem_cache) - kmem_cache_free(c->kmem_cache, obj); - else - kfree(obj); + kfree(obj); } static void __free_rcu(struct rcu_head *head) @@ -356,7 +346,7 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); } -/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. +/* When size != 0 bpf_mem_cache for each cpu. * This is typical bpf hash map use case when all elements have equal size. * * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on @@ -368,40 +358,29 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; - char buf[32]; - int cpu, i, unit_size; + int cpu, i, unit_size, percpu_size = 0; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - if (percpu) { - unit_size = size; + if (percpu) /* room for llist_node and per-cpu pointer */ - size = LLIST_NODE_SZ + sizeof(void *); - } else { + percpu_size = LLIST_NODE_SZ + sizeof(void *); + else size += LLIST_NODE_SZ; /* room for llist_node */ - unit_size = size; - } + unit_size = size; - snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); - if (!kmem_cache) { - free_percpu(pc); - return -ENOMEM; - } #ifdef CONFIG_MEMCG_KMEM objcg = get_obj_cgroup_from_current(); #endif for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); - c->kmem_cache = kmem_cache; c->unit_size = unit_size; c->objcg = objcg; - c->percpu = percpu; + c->percpu_size = percpu_size; prefill_mem_cache(c, cpu); } ma->cache = pc; @@ -461,8 +440,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) c = per_cpu_ptr(ma->cache, cpu); drain_mem_cache(c); } - /* kmem_cache and memcg are the same across cpus */ - kmem_cache_destroy(c->kmem_cache); + /* objcg is the same across cpus */ if (c->objcg) obj_cgroup_put(c->objcg); /* c->waiting_for_gp list was drained, but __free_rcu might