From patchwork Wed Aug 17 21:04:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01DF8C25B08 for ; Wed, 17 Aug 2022 21:04:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6128D6B0073; Wed, 17 Aug 2022 17:04:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C2248D0003; Wed, 17 Aug 2022 17:04:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43D4B6B0075; Wed, 17 Aug 2022 17:04:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 34AE26B0073 for ; Wed, 17 Aug 2022 17:04:28 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0E5C5A0B0E for ; Wed, 17 Aug 2022 21:04:28 +0000 (UTC) X-FDA: 79810313016.13.22DB2C8 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf18.hostedemail.com (Postfix) with ESMTP id 9DEA71C0083 for ; Wed, 17 Aug 2022 21:04:27 +0000 (UTC) Received: by mail-pl1-f178.google.com with SMTP id 2so1800865pll.0 for ; Wed, 17 Aug 2022 14:04:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=bvD1L2pAy6L6365DgzAGHUwn+tSTUN067y++5HDu6Vs=; b=NOmAmgT61QOnz274iTV83b+IcDnlHf+i0kLo7vRo2BdJAFvXbKxl4XDsTdew4yzgiP rGWL/6XbQ4vDuKIBKzbVxP5zEMSHiquftghb1v5OYVMn/tYq33uC7cgbtXudyv3meEK6 m8D8+LnF9q/FrnXC/LSPC86jl5AqBKbnsV7FA22ZVsIUhCe6K0OH+e43brRYDSFTTPyI Ew8T5J2PyD8AlsFvHti4NtDsRdfXoKcL0mRZP2zGrQsNqP0eTG0BCxC28jNZC8K8/sPG R/Xfubo4M/CTs4akEKn9FtmPw1oIFwFNPoqpw996hHK1lSajTAiuCnSpzPK9mYf9l24Z xs3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=bvD1L2pAy6L6365DgzAGHUwn+tSTUN067y++5HDu6Vs=; b=djOuqDXPWwHjgRj2bpXUjTl0nU3DHMhunenwJRt/SZPgtN/xCguX3edGXSRuRE74WR 4GLMvOvLhFkZ2+RL/cqfgM6fucFB2KHUrYqG9MNJZSu+NnRtkZDyMkG+vDc8K408BOss rC0zv1hxEGoDxyFqVAQQFVTvCaHWEduSxbIIxAObeRdraP8mfA5fn0bnWusHTtEnVthQ xm8CXm78zkk/VHRWUrXP3AXtYXeo38BBNCuG6oemMiIsyLjqeW98lqLTTdhncIJV8VZK m/RnyNDh4sVuiJELd97fX3SYpVFU1tn2FceOEPrKqJxKFXeIbE4qh8Dg+13LXE9P5wnB e41w== X-Gm-Message-State: ACgBeo1M7BUoHmfshkE6LCIxHYDZdlYrXbfdJCSuOXmhKJ89hlOHnarU O98zJU5PGUk6GzMvwyKts6c= X-Google-Smtp-Source: AA6agR7wHRBLLUlOuYtnf2zN4VbNhPFAgk7X30iHuTm/5Uu+PsjhfjDRROB/344Sl0T7+Sk5ctZU1g== X-Received: by 2002:a17:902:82ca:b0:172:9a0b:f34a with SMTP id u10-20020a17090282ca00b001729a0bf34amr53765plz.105.1660770266313; Wed, 17 Aug 2022 14:04:26 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id pl4-20020a17090b268400b001f89383d587sm2017199pjb.56.2022.08.17.14.04.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:25 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 01/12] bpf: Introduce any context BPF specific memory allocator. Date: Wed, 17 Aug 2022 14:04:08 -0700 Message-Id: <20220817210419.95560-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770267; a=rsa-sha256; cv=none; b=OJNdOBP9Sl+ILFjTw1O25+EPfXyT4dPse9LsWj/wMf5NYuuJN8L54tPNu+VZkb0AIOqtCw dvai/0VzH9UWy98sVFadkkGQy2dVz+FoPtljqkbo5odUwZJcP8Ary4zMOCo3bATh6QgwF2 +APEADfUtryYmbfeNTcXsJW51Ift2dM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NOmAmgT6; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bvD1L2pAy6L6365DgzAGHUwn+tSTUN067y++5HDu6Vs=; b=8JLnrj8xDGKNg6HcDsTs6q/SNYi6hp6o8A+8UlDM+wC6Hkpfs9QXEdhAUaZtdT998fuzOH IMD0O/CUYWYQGHUyQ9e4Ip0p50abzEiFeLjWz7MtJyKZvVcEm7/cjQwxFaBWnoTpXEc5rv 0X3Bx6O2i47/a0F/gVipe02BX/4A2Ro= Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NOmAmgT6; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 7icuid71zmymmrgbn1b3w5dy8gdqjrrr X-Rspamd-Queue-Id: 9DEA71C0083 X-HE-Tag: 1660770267-187597 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with minimal per-cpu cache of free elements. Refill this cache asynchronously from irq_work. BPF programs always run with migration disabled. It's safe to allocate from cache of the current cpu with irqs disabled. Free-ing is always done into bucket of the current cpu as well. irq_work trims extra free elements from buckets with kfree and refills them with kmalloc, so global kmalloc logic takes care of freeing objects allocated by one cpu and freed on another. struct bpf_mem_alloc supports two modes: - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. This is typical bpf hash map use case when all elements have equal size. - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on kmalloc/kfree. Max allocation size is 4096 in this case. This is bpf_dynptr and bpf_kptr use case. bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 26 ++ kernel/bpf/Makefile | 2 +- kernel/bpf/memalloc.c | 526 ++++++++++++++++++++++++++++++++++ 3 files changed, 553 insertions(+), 1 deletion(-) create mode 100644 include/linux/bpf_mem_alloc.h create mode 100644 kernel/bpf/memalloc.c diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h new file mode 100644 index 000000000000..804733070f8d --- /dev/null +++ b/include/linux/bpf_mem_alloc.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#ifndef _BPF_MEM_ALLOC_H +#define _BPF_MEM_ALLOC_H +#include + +struct bpf_mem_cache; +struct bpf_mem_caches; + +struct bpf_mem_alloc { + struct bpf_mem_caches __percpu *caches; + struct bpf_mem_cache __percpu *cache; +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); + +/* kmalloc/kfree equivalent: */ +void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); +void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); + +/* kmem_cache_alloc/free equivalent: */ +void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); +void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); + +#endif /* _BPF_MEM_ALLOC_H */ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 057ba8e01e70..11fb9220909b 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o -obj-$(CONFIG_BPF_SYSCALL) += btf.o +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c new file mode 100644 index 000000000000..8de268922380 --- /dev/null +++ b/kernel/bpf/memalloc.c @@ -0,0 +1,526 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include +#include + +/* Any context (including NMI) BPF specific memory allocator. + * + * Tracing BPF programs can attach to kprobe and fentry. Hence they + * run in unknown context where calling plain kmalloc() might not be safe. + * + * Front-end kmalloc() with per-cpu per-bucket cache of free elements. + * Refill this cache asynchronously from irq_work. + * + * CPU_0 buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * ... + * CPU_N buckets + * 16 32 64 96 128 196 256 512 1024 2048 4096 + * + * The buckets are prefilled at the start. + * BPF programs always run with migration disabled. + * It's safe to allocate from cache of the current cpu with irqs disabled. + * Free-ing is always done into bucket of the current cpu as well. + * irq_work trims extra free elements from buckets with kfree + * and refills them with kmalloc, so global kmalloc logic takes care + * of freeing objects allocated by one cpu and freed on another. + * + * Every allocated objected is padded with extra 8 bytes that contains + * struct llist_node. + */ +#define LLIST_NODE_SZ sizeof(struct llist_node) + +/* similar to kmalloc, but sizeof == 8 bucket is gone */ +static u8 size_index[24] __ro_after_init = { + 3, /* 8 */ + 3, /* 16 */ + 4, /* 24 */ + 4, /* 32 */ + 5, /* 40 */ + 5, /* 48 */ + 5, /* 56 */ + 5, /* 64 */ + 1, /* 72 */ + 1, /* 80 */ + 1, /* 88 */ + 1, /* 96 */ + 6, /* 104 */ + 6, /* 112 */ + 6, /* 120 */ + 6, /* 128 */ + 2, /* 136 */ + 2, /* 144 */ + 2, /* 152 */ + 2, /* 160 */ + 2, /* 168 */ + 2, /* 176 */ + 2, /* 184 */ + 2 /* 192 */ +}; + +static int bpf_mem_cache_idx(size_t size) +{ + if (!size || size > 4096) + return -1; + + if (size <= 192) + return size_index[(size - 1) / 8] - 1; + + return fls(size - 1) - 1; +} + +#define NUM_CACHES 11 + +struct bpf_mem_cache { + /* per-cpu list of free objects of size 'unit_size'. + * All accesses are done with preemption disabled + * with __llist_add() and __llist_del_first(). + */ + struct llist_head free_llist; + + /* NMI only free list. + * All accesses are NMI-safe llist_add() and llist_del_first(). + * + * Each allocated object is either on free_llist or on free_llist_nmi. + * One cpu can allocate it from NMI by doing llist_del_first() from + * free_llist_nmi, while another might free it back from non-NMI by + * doing llist_add() into free_llist. + */ + struct llist_head free_llist_nmi; + + /* kmem_cache != NULL when bpf_mem_alloc was created for specific + * element size. + */ + struct kmem_cache *kmem_cache; + struct irq_work refill_work; + struct obj_cgroup *objcg; + int unit_size; + /* count of objects in free_llist */ + int free_cnt; + /* count of objects in free_llist_nmi */ + atomic_t free_cnt_nmi; + /* flag to refill nmi list too */ + bool refill_nmi_list; +}; + +struct bpf_mem_caches { + struct bpf_mem_cache cache[NUM_CACHES]; +}; + +static struct llist_node notrace *__llist_del_first(struct llist_head *head) +{ + struct llist_node *entry, *next; + + entry = head->first; + if (!entry) + return NULL; + next = entry->next; + head->first = next; + return entry; +} + +#define BATCH 48 +#define LOW_WATERMARK 32 +#define HIGH_WATERMARK 96 +/* Assuming the average number of elements per bucket is 64, when all buckets + * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + + * 64*4096*32 ~ 20Mbyte + */ + +/* extra macro useful for testing by randomizing in_nmi condition */ +#define bpf_in_nmi() in_nmi() + +static void *__alloc(struct bpf_mem_cache *c, int node) +{ + /* Allocate, but don't deplete atomic reserves that typical + * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc + * will allocate from the current numa node which is what we + * want here. + */ + gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + + if (c->kmem_cache) + return kmem_cache_alloc_node(c->kmem_cache, flags, node); + + return kmalloc_node(c->unit_size, flags, node); +} + +static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) +{ +#ifdef CONFIG_MEMCG_KMEM + if (c->objcg) + return get_mem_cgroup_from_objcg(c->objcg); +#endif + +#ifdef CONFIG_MEMCG + return root_mem_cgroup; +#else + return NULL; +#endif +} + +/* Mostly runs from irq_work except __init phase. */ +static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) +{ + struct mem_cgroup *memcg = NULL, *old_memcg; + unsigned long flags; + void *obj; + int i; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (i = 0; i < cnt; i++) { + obj = __alloc(c, node); + if (!obj) + break; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + /* In RT irq_work runs in per-cpu kthread, so we have + * to disable interrupts to avoid race with + * bpf_mem_alloc/free. Note the read of free_cnt in + * bpf_mem_refill is racy in RT. It's ok to do. + */ + local_irq_save(flags); + __llist_add(obj, &c->free_llist); + c->free_cnt++; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + } + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); +} + +/* Refill NMI specific llist. Mostly runs from irq_work except __init phase. */ +static void alloc_bulk_nmi(struct bpf_mem_cache *c, int cnt, int node) +{ + struct mem_cgroup *memcg = NULL, *old_memcg; + void *obj; + int i; + + memcg = get_memcg(c); + old_memcg = set_active_memcg(memcg); + for (i = 0; i < cnt; i++) { + obj = __alloc(c, node); + if (!obj) + break; + llist_add(obj, &c->free_llist_nmi); + atomic_inc(&c->free_cnt_nmi); + } + set_active_memcg(old_memcg); + mem_cgroup_put(memcg); +} + +static void free_one(struct bpf_mem_cache *c, void *obj) +{ + if (c->kmem_cache) + kmem_cache_free(c->kmem_cache, obj); + else + kfree(obj); +} + +static void free_bulk(struct bpf_mem_cache *c) +{ + struct llist_node *llnode; + unsigned long flags; + int cnt; + + do { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(flags); + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + else + cnt = 0; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); + free_one(c, llnode); + } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); +} + +static void free_bulk_nmi(struct bpf_mem_cache *c) +{ + struct llist_node *llnode; + int cnt; + + do { + llnode = llist_del_first(&c->free_llist_nmi); + if (llnode) + cnt = atomic_dec_return(&c->free_cnt_nmi); + else + cnt = 0; + free_one(c, llnode); + } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); +} + +static void bpf_mem_refill(struct irq_work *work) +{ + struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); + int cnt; + + cnt = c->free_cnt; + if (cnt < LOW_WATERMARK) + /* irq_work runs on this cpu and kmalloc will allocate + * from the current numa node which is what we want here. + */ + alloc_bulk(c, BATCH, NUMA_NO_NODE); + else if (cnt > HIGH_WATERMARK) + free_bulk(c); + + if (!c->refill_nmi_list) + /* don't refill NMI specific freelist + * until alloc/free from NMI. + */ + return; + cnt = atomic_read(&c->free_cnt_nmi); + if (cnt < LOW_WATERMARK) + alloc_bulk_nmi(c, BATCH, NUMA_NO_NODE); + else if (cnt > HIGH_WATERMARK) + free_bulk_nmi(c); + c->refill_nmi_list = false; +} + +static void notrace irq_work_raise(struct bpf_mem_cache *c, bool in_nmi) +{ + if (in_nmi) + /* Raise the flag only if in_nmi. Cannot assign it + * unconditionally since subsequent non-nmi irq_work_raise + * might clear it. + */ + c->refill_nmi_list = in_nmi; + irq_work_queue(&c->refill_work); +} + +static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) +{ + init_irq_work(&c->refill_work, bpf_mem_refill); + /* To avoid consuming memory assume that 1st run of bpf + * prog won't be doing more than 4 map_update_elem from + * irq disabled region + */ + alloc_bulk(c, c->unit_size < 256 ? 4 : 1, cpu_to_node(cpu)); + + /* NMI progs are rare. Assume they have one map_update + * per prog at the very beginning. + */ + alloc_bulk_nmi(c, 1, cpu_to_node(cpu)); +} + +/* When size != 0 create kmem_cache and bpf_mem_cache for each cpu. + * This is typical bpf hash map use case when all elements have equal size. + * + * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on + * kmalloc/kfree. Max allocation size is 4096 in this case. + * This is bpf_dynptr and bpf_kptr use case. + */ +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +{ + static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; + struct bpf_mem_caches *cc, __percpu *pcc; + struct bpf_mem_cache *c, __percpu *pc; + struct kmem_cache *kmem_cache; + struct obj_cgroup *objcg = NULL; + char buf[32]; + int cpu, i; + + if (size) { + pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); + if (!pc) + return -ENOMEM; + size += LLIST_NODE_SZ; /* room for llist_node */ + snprintf(buf, sizeof(buf), "bpf-%u", size); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + if (!kmem_cache) { + free_percpu(pc); + return -ENOMEM; + } +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(pc, cpu); + c->kmem_cache = kmem_cache; + c->unit_size = size; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + ma->cache = pc; + return 0; + } + + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); + if (!pcc) + return -ENOMEM; +#ifdef CONFIG_MEMCG_KMEM + objcg = get_obj_cgroup_from_current(); +#endif + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(pcc, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + c->unit_size = sizes[i]; + c->objcg = objcg; + prefill_mem_cache(c, cpu); + } + } + ma->caches = pcc; + return 0; +} + +static void drain_mem_cache(struct bpf_mem_cache *c) +{ + struct llist_node *llnode; + + while ((llnode = llist_del_first(&c->free_llist_nmi))) + free_one(c, llnode); + while ((llnode = __llist_del_first(&c->free_llist))) + free_one(c, llnode); +} + +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) +{ + struct bpf_mem_caches *cc; + struct bpf_mem_cache *c; + int cpu, i; + + if (ma->cache) { + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(ma->cache, cpu); + drain_mem_cache(c); + } + /* kmem_cache and memcg are the same across cpus */ + kmem_cache_destroy(c->kmem_cache); + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->cache); + ma->cache = NULL; + } + if (ma->caches) { + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(ma->caches, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + drain_mem_cache(c); + } + } + if (c->objcg) + obj_cgroup_put(c->objcg); + free_percpu(ma->caches); + ma->caches = NULL; + } +} + +/* notrace is necessary here and in other functions to make sure + * bpf programs cannot attach to them and cause llist corruptions. + */ +static void notrace *unit_alloc(struct bpf_mem_cache *c) +{ + bool in_nmi = bpf_in_nmi(); + struct llist_node *llnode; + unsigned long flags; + int cnt = 0; + + if (unlikely(in_nmi)) { + llnode = llist_del_first(&c->free_llist_nmi); + if (llnode) + cnt = atomic_dec_return(&c->free_cnt_nmi); + } else { + /* Disable irqs to prevent the following race: + * bpf_prog_A + * bpf_mem_alloc + * preemption or irq -> bpf_prog_B + * bpf_mem_alloc + */ + local_irq_save(flags); + llnode = __llist_del_first(&c->free_llist); + if (llnode) + cnt = --c->free_cnt; + local_irq_restore(flags); + } + WARN_ON(cnt < 0); + + if (cnt < LOW_WATERMARK) + irq_work_raise(c, in_nmi); + return llnode; +} + +/* Though 'ptr' object could have been allocated on a different cpu + * add it to the free_llist of the current cpu. + * Let kfree() logic deal with it when it's later called from irq_work. + */ +static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) +{ + struct llist_node *llnode = ptr - LLIST_NODE_SZ; + bool in_nmi = bpf_in_nmi(); + unsigned long flags; + int cnt; + + BUILD_BUG_ON(LLIST_NODE_SZ > 8); + + if (unlikely(in_nmi)) { + llist_add(llnode, &c->free_llist_nmi); + cnt = atomic_inc_return(&c->free_cnt_nmi); + } else { + local_irq_save(flags); + __llist_add(llnode, &c->free_llist); + cnt = ++c->free_cnt; + local_irq_restore(flags); + } + WARN_ON(cnt <= 0); + + if (cnt > HIGH_WATERMARK) + /* free few objects from current cpu into global kmalloc pool */ + irq_work_raise(c, in_nmi); +} + +/* Called from BPF program or from sys_bpf syscall. + * In both cases migration is disabled. + */ +void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size) +{ + int idx; + void *ret; + + if (!size) + return ZERO_SIZE_PTR; + + idx = bpf_mem_cache_idx(size + LLIST_NODE_SZ); + if (idx < 0) + return NULL; + + ret = unit_alloc(this_cpu_ptr(ma->caches)->cache + idx); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) +{ + int idx; + + if (!ptr) + return; + + idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ)); + if (idx < 0) + return; + + unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); +} + +void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) +{ + void *ret; + + ret = unit_alloc(this_cpu_ptr(ma->cache)); + return !ret ? NULL : ret + LLIST_NODE_SZ; +} + +void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) +{ + if (!ptr) + return; + + unit_free(this_cpu_ptr(ma->cache), ptr); +} From patchwork Wed Aug 17 21:04:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946472 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 449D3C25B08 for ; Wed, 17 Aug 2022 21:04:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1B248D0005; Wed, 17 Aug 2022 17:04:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACB6C8D0003; Wed, 17 Aug 2022 17:04:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96C518D0005; Wed, 17 Aug 2022 17:04:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 89A348D0003 for ; Wed, 17 Aug 2022 17:04:31 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 59DAA1C70DD for ; Wed, 17 Aug 2022 21:04:31 +0000 (UTC) X-FDA: 79810313142.12.095CEF6 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf08.hostedemail.com (Postfix) with ESMTP id 0672A160046 for ; Wed, 17 Aug 2022 21:04:30 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id a8so13549095pjg.5 for ; Wed, 17 Aug 2022 14:04:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=EubkRMPFW1FZ5vCml4J7uu3XUib2x53PZwts9RYBHO8=; b=AAhwi0tpS4dbREsxaD/kUt5rCbyZ8yEGdUp76wDDzCpvtyVv5MYPF6RTs2SSFMmyQy w+H0pm9UCp9dLYVZqb3Lb3f5mucK+GbCfmbcnNPtQ90brrvmM1x/MqW8AN84Qwn4Z4ZC q1aHKvHw6LBUiRbNYkLeQ8AhXQ3hyjWWZ30chmXF3fbnYz+kSVsqRKzamRLsCjZtZJME rOD6aCHyhI2sPZTz2ajpcS7OMfmwoNJOaHCBTPplod7pedw/i5AHoSumZKffdu4ATflc Jo4ywTY//+WdDfcvsCfc0kiY+4JmqkMXNfeTi75wF2ZzH6btvE0zf+mV9UBZxvoEZ2Kd uqIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=EubkRMPFW1FZ5vCml4J7uu3XUib2x53PZwts9RYBHO8=; b=HlWjPLzylxkyVnUXKJjxFkVhK0DJDb1cjW3JAtHqt2r/ApQqnFiEma+/wgGMl2rysT 3VTBdvqOTZh9J7nqDagt2uKwqvfzQMdyofHzaPNqrxF5zOP6LTNX9IhPgHhkLqbFO2mw wmDEEiTTUJAC+BEu+DaaTPQBLvILzIprCTp1y8Cy5mQyQQ1XfcKcJf2oVBwZUFZWJXes Fua2WtYVOzaBHXMLJ1SoLmKbyfHkUCvfyvtgEVzlDPaM+d55KqWx2lIVBXoyQqn2MgEY PM06E9XV7GvESk5xWDvUKiWp8eOH3PPU6fgGieSRS6qRYtMr4ZaUMlVM2pxoSxE7Z9Z7 8SJg== X-Gm-Message-State: ACgBeo1Y3C/gvIMisbsetPcNBab3ijmlkPPHoex0glqLBGc+7MeKv/JM ubogpAdegqZ4MZ2ztswulOo= X-Google-Smtp-Source: AA6agR4KFIdGL5gh6mAYPvLrDjOYiB5IktxSi6KOoCq7JQA56D/JsgK3X6oe2k+xhcAVGSxzrq0/Bg== X-Received: by 2002:a17:90a:b703:b0:1dd:1e2f:97d7 with SMTP id l3-20020a17090ab70300b001dd1e2f97d7mr5471744pjr.62.1660770270036; Wed, 17 Aug 2022 14:04:30 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id x28-20020aa7941c000000b0052d63fb109asm10912430pfo.20.2022.08.17.14.04.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:29 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 02/12] bpf: Convert hash map to bpf_mem_alloc. Date: Wed, 17 Aug 2022 14:04:09 -0700 Message-Id: <20220817210419.95560-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AAhwi0tp; spf=pass (imf08.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770270; a=rsa-sha256; cv=none; b=CGGW7lZUgKbyAVXQHd91g1JnVyZJciUelmvGCqOuZUdG0KmDf0Aok7n+a02TZCVfS0RPpB PVVWNmx4L+7Z4SBcqWe9BM7qcSugtZHkko3riAH76aAZLfuMDs/WdSDbz86Sd81UMLVgvg 8hFPlq3QQ5QytK0woIBWZZHBXX7rVAw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EubkRMPFW1FZ5vCml4J7uu3XUib2x53PZwts9RYBHO8=; b=EFNzmZw29h3qFvjACsrOCDC/8zYb4PpAeB6wHQ8yjmSOGd6TOHLZKJCHeHN0i6Zwdp0XAz 3GUatHMUCyQoY9M+B2kUxPE6IF7v1adI9geLRO3ArhzMjyKiHFSMwtNa8Xr/WqOVxQopk1 +swogmaUoPGOoJh2GghhlwGRzD8qM1Q= X-Stat-Signature: etjx5uxpm9wdiqu1t4taw4yajrbnc1f4 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AAhwi0tp; spf=pass (imf08.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 0672A160046 X-Rspam-User: X-HE-Tag: 1660770270-846778 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert bpf hash map to use bpf memory allocator. Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 8392f7f8a8ac..6c0db430507a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,6 +14,7 @@ #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" +#include #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ @@ -92,6 +93,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; + struct bpf_mem_alloc ma; struct bucket *buckets; void *elems; union { @@ -567,6 +569,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) if (err) goto free_prealloc; } + } else { + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + if (err) + goto free_map_locked; } return &htab->map; @@ -577,6 +583,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); bpf_map_area_free(htab); @@ -853,7 +860,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); check_and_free_fields(htab, l); - kfree(l); + bpf_mem_cache_free(&htab->ma, l); } static void htab_elem_free_rcu(struct rcu_head *head) @@ -977,9 +984,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = ERR_PTR(-E2BIG); goto dec_count; } - l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, - GFP_NOWAIT | __GFP_NOWARN, - htab->map.numa_node); + l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); goto dec_count; @@ -998,7 +1003,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, pptr = bpf_map_alloc_percpu(&htab->map, size, 8, GFP_NOWAIT | __GFP_NOWARN); if (!pptr) { - kfree(l_new); + bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } @@ -1493,6 +1498,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->ma); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Wed Aug 17 21:04:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F060DC25B08 for ; Wed, 17 Aug 2022 21:04:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 882E78D0006; Wed, 17 Aug 2022 17:04:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 832C18D0003; Wed, 17 Aug 2022 17:04:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D3C08D0006; Wed, 17 Aug 2022 17:04:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 600C18D0003 for ; Wed, 17 Aug 2022 17:04:35 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2EDDB8144B for ; Wed, 17 Aug 2022 21:04:35 +0000 (UTC) X-FDA: 79810313268.24.9D64D93 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf05.hostedemail.com (Postfix) with ESMTP id C57CB100061 for ; Wed, 17 Aug 2022 21:04:34 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id y141so13053445pfb.7 for ; Wed, 17 Aug 2022 14:04:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=dmkm8LUqLMPRLLj04Upjo1iLNrY0YeL6wYt31D9wn9c=; b=TOuE3nXjzfO56npQZIbCSssMvlbRr/u/HiMwwBdZIuTPByXNe5JaD2lvSNhl1m6Hbg cd7Kt4vbwWoTutiCiISzPkdphG7Gn8N/6N6CEJZaBA+KhPqW4EHA0YNgYpyjCBqoy13k EaxpBJi2z9PXSFvXnxIegLLc18x3QR5TpMXGbx/O2axYABvIb43zymRCtngiY4s1oJSh ub8nnlnxenU9CxRd9/nbC7v5tUBQIEe54yFBaI1afrGTqvWCFAIdbcnvQEsl+/cV2H7h m7eOjUPMQ1AKNXWoJ1nIQZMPdyY7K3zGoiJBnrq2meT1QZkHgXYyoTI3oMKrxrvnAltz YZqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=dmkm8LUqLMPRLLj04Upjo1iLNrY0YeL6wYt31D9wn9c=; b=mwy7IUz692E3glVsyzEYKRzzwmuA28x0bF2e1c3aO9lisxPP/HpZGLL5tNz8XlyWiq iYEQrzUi+okASonEF2gwGzP136D1gR348n2fXg56d25aqhfKq7Ui9SM+jKrlNNVAV/5d djXoHPyYvNUf8VzX0fQ1X1jQxYKP8rwhe+LyMqbuApb/gAWZJD3VFr407aoeU9Oix964 GBpleiNrjmr5PvL1ZysPz5fhhhIouu+X2Ug/Xm7pLfB1vaT1sNmIdAUtUyNwejUWO+hY JQlq1mqsT0ECQ4WFNTxdZVIiQ8aOdL6rwjzNwOEDMCB0Ho1ukBgZKebntIW01tWPC/aH eZMA== X-Gm-Message-State: ACgBeo3u2pguVi5Iq4tOY00h8Zl4X2aH84PwqSWcCO6aYzX+uQoOBtmE g4CdeMLU8nhFfY40m72rFT4= X-Google-Smtp-Source: AA6agR73C90+0IewzVmPCCa8UTtKBoRt+AB6v6GayOxs42nE2c0Pl38Z0yrp0AH7mN2Qh26qFQYSsA== X-Received: by 2002:a05:6a00:3017:b0:535:bb66:23c with SMTP id ay23-20020a056a00301700b00535bb66023cmr7705pfb.15.1660770273699; Wed, 17 Aug 2022 14:04:33 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id o14-20020a65614e000000b0040d1eb90d67sm9748208pgv.93.2022.08.17.14.04.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:33 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 03/12] selftests/bpf: Improve test coverage of test_maps Date: Wed, 17 Aug 2022 14:04:10 -0700 Message-Id: <20220817210419.95560-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TOuE3nXj; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770273; a=rsa-sha256; cv=none; b=GY6yW3ExW101LI7jo+9MID+6isQQmlaZVDifBO6PE7ILPN7JgQqHLHbecOLw/XkyFL7dsH 3s8CvedNpRIPUl5oXzEIJFFt/2AVizEF7I1qTCOeAKjeS7CGxC3JUvDbYwOGBFTUfpGY4N kk7Z6ABZYB5BJU3J4AaNMumjJ1aXud8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dmkm8LUqLMPRLLj04Upjo1iLNrY0YeL6wYt31D9wn9c=; b=uKTJe1cgImgLGDPVpO81A1W6PihedEj6NFGuzjlwLAM38hJz9oBcZI7Aq+k8tz6xeBkbvO 3CESBwOylrCd3nAUezNdvJdpDq24R7kRAJBeY4Xrf9S72UYeKRyaf05+q0XwjOGrg5tNrq OqRsPcLL7yVUDdxfgqa26oixXjH6e8g= X-Stat-Signature: 3e3z8rtd6k6eyjsbcf6z6zxz9eridh6a Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TOuE3nXj; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C57CB100061 X-Rspam-User: X-HE-Tag: 1660770274-837996 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_maps.c | 38 ++++++++++++++++--------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index cbebfaa7c1e8..d1ffc76814d9 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -264,10 +264,11 @@ static void test_hashmap_percpu(unsigned int task, void *data) close(fd); } +#define VALUE_SIZE 3 static int helper_fill_hashmap(int max_entries) { int i, fd, ret; - long long key, value; + long long key, value[VALUE_SIZE] = {}; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), max_entries, &map_opts); @@ -276,8 +277,8 @@ static int helper_fill_hashmap(int max_entries) "err: %s, flags: 0x%x\n", strerror(errno), map_opts.map_flags); for (i = 0; i < max_entries; i++) { - key = i; value = key; - ret = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST); + key = i; value[0] = key; + ret = bpf_map_update_elem(fd, &key, value, BPF_NOEXIST); CHECK(ret != 0, "can't update hashmap", "err: %s\n", strerror(ret)); @@ -288,8 +289,8 @@ static int helper_fill_hashmap(int max_entries) static void test_hashmap_walk(unsigned int task, void *data) { - int fd, i, max_entries = 1000; - long long key, value, next_key; + int fd, i, max_entries = 10000; + long long key, value[VALUE_SIZE], next_key; bool next_key_valid = true; fd = helper_fill_hashmap(max_entries); @@ -297,7 +298,7 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); } assert(i == max_entries); @@ -305,9 +306,9 @@ static void test_hashmap_walk(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, NULL, &key) == 0); for (i = 0; next_key_valid; i++) { next_key_valid = bpf_map_get_next_key(fd, &key, &next_key) == 0; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - value++; - assert(bpf_map_update_elem(fd, &key, &value, BPF_EXIST) == 0); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + value[0]++; + assert(bpf_map_update_elem(fd, &key, value, BPF_EXIST) == 0); key = next_key; } @@ -316,8 +317,8 @@ static void test_hashmap_walk(unsigned int task, void *data) for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { key = next_key; - assert(bpf_map_lookup_elem(fd, &key, &value) == 0); - assert(value - 1 == key); + assert(bpf_map_lookup_elem(fd, &key, value) == 0); + assert(value[0] - 1 == key); } assert(i == max_entries); @@ -1371,16 +1372,16 @@ static void __run_parallel(unsigned int tasks, static void test_map_stress(void) { + run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_hashmap, NULL); run_parallel(100, test_hashmap_percpu, NULL); run_parallel(100, test_hashmap_sizes, NULL); - run_parallel(100, test_hashmap_walk, NULL); run_parallel(100, test_arraymap, NULL); run_parallel(100, test_arraymap_percpu, NULL); } -#define TASKS 1024 +#define TASKS 100 #define DO_UPDATE 1 #define DO_DELETE 0 @@ -1432,6 +1433,8 @@ static void test_update_delete(unsigned int fn, void *data) int fd = ((int *)data)[0]; int i, key, value, err; + if (fn & 1) + test_hashmap_walk(fn, NULL); for (i = fn; i < MAP_SIZE; i += TASKS) { key = value = i; @@ -1455,7 +1458,7 @@ static void test_update_delete(unsigned int fn, void *data) static void test_map_parallel(void) { - int i, fd, key = 0, value = 0; + int i, fd, key = 0, value = 0, j = 0; int data[2]; fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), @@ -1466,6 +1469,7 @@ static void test_map_parallel(void) exit(1); } +again: /* Use the same fd in children to add elements to this map: * child_0 adds key=0, key=1024, key=2048, ... * child_1 adds key=1, key=1025, key=2049, ... @@ -1502,6 +1506,12 @@ static void test_map_parallel(void) key = -1; assert(bpf_map_get_next_key(fd, NULL, &key) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &key) < 0 && errno == ENOENT); + + key = 0; + bpf_map_delete_elem(fd, &key); + if (j++ < 5) + goto again; + close(fd); } static void test_map_rdonly(void) From patchwork Wed Aug 17 21:04:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946474 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95391C32772 for ; Wed, 17 Aug 2022 21:04:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26ADA8D0007; Wed, 17 Aug 2022 17:04:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 218B58D0003; Wed, 17 Aug 2022 17:04:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B9558D0007; Wed, 17 Aug 2022 17:04:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id F20008D0003 for ; Wed, 17 Aug 2022 17:04:38 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C89CB415B4 for ; Wed, 17 Aug 2022 21:04:38 +0000 (UTC) X-FDA: 79810313436.08.310FFB1 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf12.hostedemail.com (Postfix) with ESMTP id 6FD814008B for ; Wed, 17 Aug 2022 21:04:38 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id 130so12793946pfy.6 for ; Wed, 17 Aug 2022 14:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=Y2Iep7E/2mTKKhtTknM/1Uwo/WL5pQSofmmMCSYQizM=; b=l511t2TjU9cNTBOx9nOJrQ6ETMhdEVpE34sc520kV4U+jZ5f1/mHDo0n9OekUckOKy du4HTtixwqbakopIP6DPt7uHyoakZjMbBFFYI2P6BgAezDFSgDnjzXkuF/HgbF4bXKrQ lXjZ2nN+GZMcztKh2LTcO0PENhud63SE9DLHss4fMDYrSDj+rRyftMt0+O6i9+MtgVZd Ygtrzto0rVVXWoQZsLipuFrePxZasxuS8/DFwMdD/qvZ7BPB3uAaYRUWnJZxw12ohn9n 9NJ/1olkJ8lCRS/0zQtacyESI++6VurdVTijINkcRMVs3BDS81X+4ZQ6u/jYBhfY1WFW uvGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Y2Iep7E/2mTKKhtTknM/1Uwo/WL5pQSofmmMCSYQizM=; b=cLW6xZ2YXeIY1kjRInywyYhuUqRhtm+eKkZhMsHOi0O7zZbxi9l0QcO4H5VaTxuCi5 2QI5K+ATqOqogzBukJAjq98LSaWLJeb910eeMCZZVvVH7iEUYZ2kKpYU+gqCfn0N8j6m MjGw32f3kegSI9UuLAECSigGUFmJcPSPXQp6bSQ03N8xOJlKF/tczqPyG4dk0cVlARoz tELByhu21HJH28OGOCfdspA+A6O3qdJ2Ab4l3pCMoAC7PxINK/Uh2gOTRVQPmtRXyI4K NBm3qAMhT1D9Q0SYVjAoBJ0mZYPsDVCH0m6Wk7ePB8YyBgqBsVrBxVj9K5PDnuLkr9H2 rKNQ== X-Gm-Message-State: ACgBeo0Rp3/kHH9Mrk0gNFStnRTQGhwpZhjbh3nWNteKO5/Skc0G0PFU w4B650XtoCK/ckXJcMTyMYI= X-Google-Smtp-Source: AA6agR45GVBnh/AwT6HQavfsDbI0KSDhUWRxLAFnLHiMWYdzeBKVe1kvgvJtDJgwHUoWoAjHWT8J1g== X-Received: by 2002:a63:54c:0:b0:429:ee05:92 with SMTP id 73-20020a63054c000000b00429ee050092mr28661pgf.587.1660770277392; Wed, 17 Aug 2022 14:04:37 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id e15-20020a17090301cf00b0016e8178aa9csm346372plh.210.2022.08.17.14.04.35 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:36 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 04/12] samples/bpf: Reduce syscall overhead in map_perf_test. Date: Wed, 17 Aug 2022 14:04:11 -0700 Message-Id: <20220817210419.95560-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=l511t2Tj; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770277; a=rsa-sha256; cv=none; b=HCCDoM5WbygOs4qkLj2h6a3Yk5MIN6Dfg4sPykA2RiZ6xIgn7PV5UJ4ojykr6oWCtrTvA4 6BKPF7PqZF74vLNgO2no+plbN95oN599bWz4GOZQTpRSZcfwfmhAcM/Pd7QWKER43WWCmw cJxIHTVh6i5Qth5DnHUSki6xouu2lKo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y2Iep7E/2mTKKhtTknM/1Uwo/WL5pQSofmmMCSYQizM=; b=cTsorZbbHs3owGr2FEtAtK3ZBQqzLoSngJDLnWGXt/ItiwcO7EiQNW4qZ9EpTGytAylATn PvPSTjy7ICRfMC8b1kZUFL4HOUvmNYgVVVSs8MtJcfQYdi6PoWsfDU6er85CTzAuM7nUX2 XEAOv37sXfU7ts0MGQP4p3P8ga6aeSc= X-Rspamd-Queue-Id: 6FD814008B Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=l511t2Tj; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: yrid571dif5ok38ddw5uesddnb9jq5ez X-HE-Tag: 1660770278-529603 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Make map_perf_test for preallocated and non-preallocated hash map spend more time inside bpf program to focus performance analysis on the speed of update/lookup/delete operations performed by bpf program. It makes 'perf report' of bpf_mem_alloc look like: 11.76% map_perf_test [k] _raw_spin_lock_irqsave 11.26% map_perf_test [k] htab_map_update_elem 9.70% map_perf_test [k] _raw_spin_lock 9.47% map_perf_test [k] htab_map_delete_elem 8.57% map_perf_test [k] memcpy_erms 5.58% map_perf_test [k] alloc_htab_elem 4.09% map_perf_test [k] __htab_map_lookup_elem 3.44% map_perf_test [k] syscall_exit_to_user_mode 3.13% map_perf_test [k] lookup_nulls_elem_raw 3.05% map_perf_test [k] migrate_enable 3.04% map_perf_test [k] memcmp 2.67% map_perf_test [k] unit_free 2.39% map_perf_test [k] lookup_elem_raw Reduce default iteration count as well to make 'map_perf_test' quick enough even on debug kernels. Signed-off-by: Alexei Starovoitov --- samples/bpf/map_perf_test_kern.c | 44 ++++++++++++++++++++------------ samples/bpf/map_perf_test_user.c | 2 +- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c index 8773f22b6a98..7342c5b2f278 100644 --- a/samples/bpf/map_perf_test_kern.c +++ b/samples/bpf/map_perf_test_kern.c @@ -108,11 +108,14 @@ int stress_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map, &key); - if (value) - bpf_map_delete_elem(&hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map, &key); + if (value) + bpf_map_delete_elem(&hash_map, &key); + } return 0; } @@ -123,11 +126,14 @@ int stress_percpu_hmap(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map, &key); + } return 0; } @@ -137,11 +143,14 @@ int stress_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&hash_map_alloc, &key); + } return 0; } @@ -151,11 +160,14 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx) u32 key = bpf_get_current_pid_tgid(); long init_val = 1; long *value; + int i; - bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); - value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); - if (value) - bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + for (i = 0; i < 10; i++) { + bpf_map_update_elem(&percpu_hash_map_alloc, &key, &init_val, BPF_ANY); + value = bpf_map_lookup_elem(&percpu_hash_map_alloc, &key); + if (value) + bpf_map_delete_elem(&percpu_hash_map_alloc, &key); + } return 0; } diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index b6fc174ab1f2..1bb53f4b29e1 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -72,7 +72,7 @@ static int test_flags = ~0; static uint32_t num_map_entries; static uint32_t inner_lru_hash_size; static int lru_hash_lookup_test_entries = 32; -static uint32_t max_cnt = 1000000; +static uint32_t max_cnt = 10000; static int check_test_flags(enum test_type t) { From patchwork Wed Aug 17 21:04:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C581C32772 for ; Wed, 17 Aug 2022 21:04:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E26D8D0008; Wed, 17 Aug 2022 17:04:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 092318D0003; Wed, 17 Aug 2022 17:04:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9C278D0008; Wed, 17 Aug 2022 17:04:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DA5918D0003 for ; Wed, 17 Aug 2022 17:04:42 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 93082C06DD for ; Wed, 17 Aug 2022 21:04:42 +0000 (UTC) X-FDA: 79810313604.30.80A14C4 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf17.hostedemail.com (Postfix) with ESMTP id 41C25401CA for ; Wed, 17 Aug 2022 21:04:42 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id t11-20020a17090a510b00b001fac77e9d1fso249537pjh.5 for ; Wed, 17 Aug 2022 14:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=KAWNhWE/WD3wFZacJ2ko17zYo7gsMhvffUTmRG8how0=; b=NCslPlVANCcPFLe+NudeKJp3stSLbLNvDlVM4l6GRW31jEQunqYdwUeqVpyAZFEYUd Nsu90E90NAi+jruawjTmW3iszTS5o4+/oQRXA+EAYmMH6gMH+q6E4XaK9xCNoOE/Jlfi AtFH6utAhWuBPc8K5UIr4v/+SdWR+9gVFb9k9dah8dOC4wlzYCNQU1w10aLDysfso3Zz pT2KEyJ5Uydtip79+Zy/PgCOlmSEbMzk/b50NFLGvcFgLJ6P9DNzLuCqIeOgFtNkMbJ5 +iWKmUjxGcydPI+jGNj0PGqdzIaNM/TBGgRdACzDXrjY8dHbtYTnOzUPMA7WxV+Xzp5u E+zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=KAWNhWE/WD3wFZacJ2ko17zYo7gsMhvffUTmRG8how0=; b=T4jShPIif4jrUksYSe25S6+ZthyVCvm9W4c7pstp6Oj0IY+ErxchQV7Uat2wN3X6sx zSDgwh+Gd/BaBHOUeKEdtllByWrBn5l9do6MhDvIYKQ38uHaVElphGj0B6wMBRh2kShQ xc/IvNOiH4W+uEdOFdDvFYDg7zvG4sbGbd/0QzUyR8T95fqToQoo6FTp99ewADspp1vn /oyuoojff9KLWyVakhZn9dKgBuxh3hk0qfAZPU2lH34InqHLbq3GVdi4hoMhstzlTiSa y44NzvFl1R70px95aUI6FGqMlCRHW6BsQn/bJt1JwjgjXSWuULoXjSDrucUmuuoMV7l0 BWzg== X-Gm-Message-State: ACgBeo186WNGRg7+yU9dJuTqmiQ5Mbh9MtyqmEARTIPTZCtYyWRb6kfx VryPoWGZsLLe2pe6CRbsEbpnnPTm4Ws= X-Google-Smtp-Source: AA6agR74GmAwlmZZ2NQTJ2UBJaLITzOqdg5eTHBAJveQj8+xB4gEduTC/HaURwto3/n38jvd/RaQ8g== X-Received: by 2002:a17:90b:1c0d:b0:1f5:7bda:1447 with SMTP id oc13-20020a17090b1c0d00b001f57bda1447mr5407863pjb.88.1660770281180; Wed, 17 Aug 2022 14:04:41 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id b189-20020a62cfc6000000b0052d1275a570sm10913598pfg.64.2022.08.17.14.04.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:40 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 05/12] bpf: Relax the requirement to use preallocated hash maps in tracing progs. Date: Wed, 17 Aug 2022 14:04:12 -0700 Message-Id: <20220817210419.95560-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KAWNhWE/WD3wFZacJ2ko17zYo7gsMhvffUTmRG8how0=; b=hRPfJ4Sef180btYQsWSBbtzv9Zxn8KRGRDRO21xCMiFeFan8xqLSRJAU8playUbzx3ahrF lWYeNojmIAmpu6AQfuedYg8AUY2+vwBYkuRwVtkup3UYUGKKpcAZj9jxgi++kaEQgH8AzP CGpiWAi6+BQNDiO5HKW0U4GspMvefPE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NCslPlVA; spf=pass (imf17.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770281; a=rsa-sha256; cv=none; b=aQosTFl2VZtMLIohislq98v5nD4T1pnllX2RECYWpgnQq78hix0zaz3t33KG5IGBsSBS+X 9rFHxNuORzBIu6BlFOAjm8VGqq1oR2ifITCjuEfe7/B3rDnrIDJ/leoOeHdoa+xSWHwAGg pjVOzG04xSCduuZ7o3GlVdx3IYJAyQA= Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NCslPlVA; spf=pass (imf17.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: ccao6yy1opcqd66f7xjoxzq473us3wux X-Rspamd-Queue-Id: 41C25401CA X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1660770282-730348 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Since bpf hash map was converted to use bpf_mem_alloc it is safe to use from tracing programs and in RT kernels. But per-cpu hash map is still using dynamic allocation for per-cpu map values, hence keep the warning for this map type. In the future alloc_percpu_gfp can be front-end-ed with bpf_mem_cache and this restriction will be completely lifted. perf_event (NMI) bpf programs have to use preallocated hash maps, because free_htab_elem() is using call_rcu which might crash if re-entered. Sleepable bpf programs have to use preallocated hash maps, because life time of the map elements is not protected by rcu_read_lock/unlock. This restriction can be lifted in the future as well. Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2c1f8069f7b7..d785f29047d7 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12605,10 +12605,12 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, * For programs attached to PERF events this is mandatory as the * perf NMI can hit any arbitrary code sequence. * - * All other trace types using preallocated hash maps are unsafe as - * well because tracepoint or kprobes can be inside locked regions - * of the memory allocator or at a place where a recursion into the - * memory allocator would see inconsistent state. + * All other trace types using non-preallocated per-cpu hash maps are + * unsafe as well because tracepoint or kprobes can be inside locked + * regions of the per-cpu memory allocator or at a place where a + * recursion into the per-cpu memory allocator would see inconsistent + * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is + * safe to use from kprobe/fentry and in RT. * * On RT enabled kernels run-time allocation of all trace type * programs is strictly prohibited due to lock type constraints. On @@ -12618,15 +12620,26 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, */ if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { + /* perf_event bpf progs have to use preallocated hash maps + * because non-prealloc is still relying on call_rcu to free + * elements. + */ verbose(env, "perf_event programs can only use preallocated hash map\n"); return -EINVAL; } - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, "trace type programs can only use preallocated hash map\n"); - return -EINVAL; + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || + (map->inner_map_meta && + map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + verbose(env, + "trace type programs can only use preallocated per-cpu hash map\n"); + return -EINVAL; + } + WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); + verbose(env, + "trace type programs with run-time allocated per-cpu hash maps are unsafe." + " Switch to preallocated hash maps.\n"); } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n"); } if (map_value_has_spin_lock(map)) { From patchwork Wed Aug 17 21:04:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946476 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE9ACC25B08 for ; Wed, 17 Aug 2022 21:04:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68AAF8D0009; Wed, 17 Aug 2022 17:04:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 637578D0003; Wed, 17 Aug 2022 17:04:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D8288D0009; Wed, 17 Aug 2022 17:04:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3F57D8D0003 for ; Wed, 17 Aug 2022 17:04:46 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 22761A0B0E for ; Wed, 17 Aug 2022 21:04:46 +0000 (UTC) X-FDA: 79810313772.04.6106765 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf16.hostedemail.com (Postfix) with ESMTP id BC2351801B6 for ; Wed, 17 Aug 2022 21:04:45 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id p125so13062653pfp.2 for ; Wed, 17 Aug 2022 14:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=1DZ3i2bbkc3eTrJ/YvjarNAdlh/YgPZTxrhVDFxf0vQ=; b=O2oXhew5JiBi3bP9J6m/k7tE2dafEvAM/6B5hr+5iROIDquBiUTfxxeajve9s5fWqG wj8zhx1RmWLM+I96Lh+vrK8vurZa4TTqxZMt9EYbNqIHkmT74gLXfhDb6R8FGZz37fNL ImkQGrK9EQ+ETzgk7gEQRsOF5ya8U3/DZtlRf7R50awNPoGsSSh1qpympx3DNlR7fQzT NX0bA+e3YJqiBWvqCICAMDJsIUr7MQ5SNDu4Fk4PPYstEql3nAU2OOVfLmJN5ZKoN1L1 mOvePRVcU58g4mO9CRIjP/9H3JYTEc13VKgwwnVSEvnzPnD09B3wBWmLm823PavK0skX 21DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=1DZ3i2bbkc3eTrJ/YvjarNAdlh/YgPZTxrhVDFxf0vQ=; b=UrPS/2YKQWtv0/JfUuZqVFom/0QAaU6oxqW4D16FggfFr8jLnKIfXgscDmb9Wl9q/a q1XIASHzGITzv2OZQAJKeR/66V0vYXRiN3sJwhxpQbhAhLQtAG4TwepspNTQ+XP3qqKp cJ+IMOeoRE2Jk1LCjrm9wQ6KDND/hlPaZu1x+qeFv++JcnWvEcZCdBlNXOB+gIPe6fpT 9nwULmS6dJ+V6+zhmG33klCS5DvswNiiie2os6HCfuNDgmO2ylNJiCPaF9axt2VPm/ZJ AOgyDeU608O0d9gkRZZjUVOLCTo2lcRQYbsPLh4Knnysrk3ttZxYqWJIh3XmgmkG/4lp M/Dw== X-Gm-Message-State: ACgBeo1cL6mw+YmwjuDSCbgiMrEn+e2QnryPUWZRBgHauThsFCcwZJAQ j1mQz/V+DeCsUGdk1hhlVY8= X-Google-Smtp-Source: AA6agR5ZXTfkc32b+ie7Gu+KRGH8xSXKaoIu4ajEMJHyyrfMuIggVVP0qg+DGC2trYE7cpazwq6ihw== X-Received: by 2002:a63:e448:0:b0:429:f52a:3ad2 with SMTP id i8-20020a63e448000000b00429f52a3ad2mr74547pgk.138.1660770284799; Wed, 17 Aug 2022 14:04:44 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id t20-20020a170902dcd400b0016cd74e5f87sm335557pll.240.2022.08.17.14.04.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:44 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 06/12] bpf: Optimize element count in non-preallocated hash map. Date: Wed, 17 Aug 2022 14:04:13 -0700 Message-Id: <20220817210419.95560-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770284; a=rsa-sha256; cv=none; b=vuVbtdmedlPIejLIwUwEhDon0zlhQ1z9lbnxiqGAR4vGy0DSD1eQGJdRs6uG/tYOsWvFwQ FUi2VvExiD3yeyqKjt6GSyukE83XNblHeWw/FZtmmpexdB9LMp5yWUmSH73FwqiAxY+7pQ GTtIUZuOWhIVU69XgIQzPdu4VVIpQOI= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=O2oXhew5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770284; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1DZ3i2bbkc3eTrJ/YvjarNAdlh/YgPZTxrhVDFxf0vQ=; b=zYfyQEwAoqPcFja/6W0GKeMMR9+UUtf1l6RK2k+7XNRvzPtZxG4iq72nAWuHdIPvc7smP3 LMjQdzf8x3PJEVRZXvT5WkZqai9PAxeByKb0h+kGsxZ89J5oL/T25v5qyTQRxD5v0Wzrai EHv65t60ovkhpKZXCKyaBfKMnLQGKMY= X-Stat-Signature: 6dtxgmaxpf8qg415q9g18tzjy6wu84qb X-Rspamd-Queue-Id: BC2351801B6 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=O2oXhew5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1660770285-937138 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus access the same bpf map. Based on specified max_entries for the hash map calculate when percpu_counter becomes faster than atomic_t and use it for such maps. For example samples/bpf/map_perf_test is using hash map with max_entries 1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per second using atomic_t. On a system with 15 cpus it shows 100k events per second using percpu. map_perf_test is an extreme case where all cpus colliding on atomic_t which causes extreme cache bouncing. Note that the slow path of percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is necessary. See comment in the code why the heuristic is based on num_online_cpus(). Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 70 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 62 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 6c0db430507a..65ebe5a719f5 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -101,7 +101,12 @@ struct bpf_htab { struct bpf_lru lru; }; struct htab_elem *__percpu *extra_elems; - atomic_t count; /* number of elements in this hashtable */ + /* number of elements in non-preallocated hashtable are kept + * in either pcount or count + */ + struct percpu_counter pcount; + atomic_t count; + bool use_percpu_counter; u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; @@ -556,6 +561,29 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) htab_init_buckets(htab); +/* compute_batch_value() computes batch value as num_online_cpus() * 2 + * and __percpu_counter_compare() needs + * htab->max_entries - cur_number_of_elems to be more than batch * num_online_cpus() + * for percpu_counter to be faster than atomic_t. In practice the average bpf + * hash map size is 10k, which means that a system with 64 cpus will fill + * hashmap to 20% of 10k before percpu_counter becomes ineffective. Therefore + * define our own batch count as 32 then 10k hash map can be filled up to 80%: + * 10k - 8k > 32 _batch_ * 64 _cpus_ + * and __percpu_counter_compare() will still be fast. At that point hash map + * collisions will dominate its performance anyway. Assume that hash map filled + * to 50+% isn't going to be O(1) and use the following formula to choose + * between percpu_counter and atomic_t. + */ +#define PERCPU_COUNTER_BATCH 32 + if (attr->max_entries / 2 > num_online_cpus() * PERCPU_COUNTER_BATCH) + htab->use_percpu_counter = true; + + if (htab->use_percpu_counter) { + err = percpu_counter_init(&htab->pcount, 0, GFP_KERNEL); + if (err) + goto free_map_locked; + } + if (prealloc) { err = prealloc_init(htab); if (err) @@ -882,6 +910,31 @@ static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) } } +static bool is_map_full(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + return __percpu_counter_compare(&htab->pcount, htab->map.max_entries, + PERCPU_COUNTER_BATCH) >= 0; + return atomic_read(&htab->count) >= htab->map.max_entries; +} + +static void inc_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH); + else + atomic_inc(&htab->count); +} + +static void dec_elem_count(struct bpf_htab *htab) +{ + if (htab->use_percpu_counter) + percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH); + else + atomic_dec(&htab->count); +} + + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { htab_put_fd_value(htab, l); @@ -890,7 +943,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) check_and_free_fields(htab, l); __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { - atomic_dec(&htab->count); + dec_elem_count(htab); l->htab = htab; call_rcu(&l->rcu, htab_elem_free_rcu); } @@ -974,16 +1027,15 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new = container_of(l, struct htab_elem, fnode); } } else { - if (atomic_inc_return(&htab->count) > htab->map.max_entries) - if (!old_elem) { + if (is_map_full(htab)) + if (!old_elem) /* when map is full and update() is replacing * old element, it's ok to allocate, since * old element will be freed immediately. * Otherwise return an error */ - l_new = ERR_PTR(-E2BIG); - goto dec_count; - } + return ERR_PTR(-E2BIG); + inc_elem_count(htab); l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); @@ -1025,7 +1077,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, l_new->hash = hash; return l_new; dec_count: - atomic_dec(&htab->count); + dec_elem_count(htab); return l_new; } @@ -1499,6 +1551,8 @@ static void htab_map_free(struct bpf_map *map) free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->ma); + if (htab->use_percpu_counter) + percpu_counter_destroy(&htab->pcount); for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); lockdep_unregister_key(&htab->lockdep_key); From patchwork Wed Aug 17 21:04:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9E31C25B08 for ; Wed, 17 Aug 2022 21:04:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48BFE8D000A; Wed, 17 Aug 2022 17:04:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4145E8D0003; Wed, 17 Aug 2022 17:04:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 267628D000A; Wed, 17 Aug 2022 17:04:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1665B8D0003 for ; Wed, 17 Aug 2022 17:04:50 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DE22AABF2C for ; Wed, 17 Aug 2022 21:04:49 +0000 (UTC) X-FDA: 79810313898.17.D3D9C03 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 78A571001C7 for ; Wed, 17 Aug 2022 21:04:49 +0000 (UTC) Received: by mail-pl1-f175.google.com with SMTP id 2so1801577pll.0 for ; Wed, 17 Aug 2022 14:04:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=Lv5oDDXoP51M0FtVQV1jm/w40k3x4dVuENG348Cw/nw=; b=ZXSpm8lQbBrV+pU9U4pQEM3v4XLsgwAXKotmTp5zpOTDkVdKlyc1PHW2NqIaTuLID9 zjWx0yK9Nf9DCOQUS4E5vWFC/5GOZEKuWPCQNm3NbjWvsTXeQtK+jF0yvay0Yhkfl7wp uCd/Y/GtWHw6FGX+R0eG4ApsZGRJ6Zu1cdApjTl1pSy9kKAyX8eg/a2Yp7EoIgxzpGEa 0MGCBdWTPy/TFHVgy6eqioEsQsy9uxFOC0u0RBX8f/Da9M/j4nc39CsGAO5riu9i5HuF CaCsIbw7LqPMlu0gbj/IQaJfRHksrMb0FNBg4oFTfAugLITLAD8DZNWMLy/VY6VHN8go rNaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=Lv5oDDXoP51M0FtVQV1jm/w40k3x4dVuENG348Cw/nw=; b=JZFQjQDTToxjddKgH6L2As9kBSac1YNvA56nGZnkT9vuFamMyyl/bVtPuMw83Oekyo r+8O7zVtl9cfQ7iMT57BvtlMgJiC9Fr537IFnj/wSdajmhyB6jM/AQhMU2craL38cjqI sR7S7FP6lFiiKt0QlDqGRSOiYX7+599DwDddzLj/j3nGcw5H3ks7wQkSEJM2l7OZvHvg Z1nEEjaQ+6fMH0rlQj0Ub/obQm5blWRoE9fnaT2+6td8Id7avCuEj1PAuZ0OAH8w7Dzi tUWkbOiRuBqH4Ug0qFnaNFhdDXjCzffuYoU+GGBCa5ip1LIK0UuuiLPcFSTjUvIc6ha8 23cQ== X-Gm-Message-State: ACgBeo2fm7pmsKs21m0UBAJFs5AfaDaZzP77NSDjCBmhz2DiC2z082bp Z59YxZFXvamwYiYJ2aB0RGrfCJ+IOI8= X-Google-Smtp-Source: AA6agR5R8107j9CXk3t1Ru7cJ4EvMgwEyjWC6iEOO06AeiuQYxrSsdNX9l+nsIhYNHB8+KnEVmvG1Q== X-Received: by 2002:a17:90b:4390:b0:1f7:2cb1:9e43 with SMTP id in16-20020a17090b439000b001f72cb19e43mr5542773pjb.91.1660770288439; Wed, 17 Aug 2022 14:04:48 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id h26-20020aa79f5a000000b0052d33bf14d6sm10934193pfr.63.2022.08.17.14.04.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:48 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 07/12] bpf: Optimize call_rcu in non-preallocated hash map. Date: Wed, 17 Aug 2022 14:04:14 -0700 Message-Id: <20220817210419.95560-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZXSpm8lQ; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770288; a=rsa-sha256; cv=none; b=WByUpi9zRQfgUP8YoW8bNSkEJn0IqKV0Nely6FYRbfqLH2OGnHakIs6A57k3Kf8Fos9srC bBq4XLUtJ584Wv0orGvv5zMyxwcR13bqTp0rGHqye/V8uMhbC1ETi2KUu63IeN/zmNMUKy E54ldxTQpXyEzGanAGJKp/K7ITmEKPc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770288; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Lv5oDDXoP51M0FtVQV1jm/w40k3x4dVuENG348Cw/nw=; b=M8L0SI9Woh0CW4jPMhR0E0Ii0zUeGJwWFZWW8ZavCHkDSXHTd5LEIZ6ExVyvnGdGI68jJA c3dOQnX8UQVKThW95ss8Uj+PXCHC1EyywyOcS64lVD8IQonqEr/r4ZJsmZYgh+/AjGc9gR ZXvxWF75a67B7Rf8RkR2gF/cpg2VOng= X-Stat-Signature: me3ab7534tijke933rubspztrxw6krcq X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZXSpm8lQ; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: 78A571001C7 X-Rspamd-Server: rspam03 X-HE-Tag: 1660770289-516929 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 8 ++++++-- kernel/bpf/memalloc.c | 2 +- tools/testing/selftests/bpf/progs/timer.c | 11 ----------- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 65ebe5a719f5..3c1d15fd052a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -944,8 +944,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); + if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { + l->htab = htab; + call_rcu(&l->rcu, htab_elem_free_rcu); + } else { + htab_elem_free(htab, l); + } } } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 8de268922380..a43630371b9f 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -332,7 +332,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index 5f5309791649..0053c5402173 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -208,17 +208,6 @@ static int timer_cb2(void *map, int *key, struct hmap_elem *val) */ bpf_map_delete_elem(map, key); - /* in non-preallocated hashmap both 'key' and 'val' are RCU - * protected and still valid though this element was deleted - * from the map. Arm this timer for ~35 seconds. When callback - * finishes the call_rcu will invoke: - * htab_elem_free_rcu - * check_and_free_timer - * bpf_timer_cancel_and_free - * to cancel this 35 second sleep and delete the timer for real. - */ - if (bpf_timer_start(&val->timer, 1ull << 35, 0) != 0) - err |= 256; ok |= 4; } return 0; From patchwork Wed Aug 17 21:04:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B042C25B08 for ; Wed, 17 Aug 2022 21:04:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2AE48D000B; Wed, 17 Aug 2022 17:04:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD8B78D0003; Wed, 17 Aug 2022 17:04:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C794A8D000B; Wed, 17 Aug 2022 17:04:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B734B8D0003 for ; Wed, 17 Aug 2022 17:04:53 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 94401ABF2C for ; Wed, 17 Aug 2022 21:04:53 +0000 (UTC) X-FDA: 79810314066.23.90A3C9F Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf03.hostedemail.com (Postfix) with ESMTP id 2403320165 for ; Wed, 17 Aug 2022 21:04:52 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id jm11so10776450plb.13 for ; Wed, 17 Aug 2022 14:04:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=CqItYD5FIMTqL735q4RzdfGMBe2ezI24gvr9zo137dQ=; b=Ix9FtXEe9O4LvS7HxcGkD5ryZ2d6IEyvzm7T6nS9/vzVIg/95jivgmzzrBZiPu66sM soKCJQl5rIWYyIaq5MpE/lKVfQag3/Y2ba6VUNViE9RQeWAsXw2dsU5h7G4BxoJuG9aw WyIKdjqeevpmNB9AJXLlJj1w9NS9STwKDKv51GCEOPvPeYruzfEdqpxq0RN50WIywKuN kzdXCFmZO9coSK1kES9iFKyi1J4LO+P/jt3ywKeWfigT+wGdI227haKBDCFaTV4aYe43 JHs/GxZEzz/veMVyJiZTWepRCTYKoyURcXfDXSslxoxoF7uksYU+wvJC8vrqkX5S6mcf wzhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=CqItYD5FIMTqL735q4RzdfGMBe2ezI24gvr9zo137dQ=; b=of5ZoJePFUW19VptiEDReWvzxgpKm5s4Rlw9oPYOuIilf1JYf4hdnlSHt3WDY+oTwg g9L+GB7nF+kCwA84psir0VSpxSVkmfPURRZJlILlrE5bVzNcG3M4rS0eNGHTJ7YkiWAe a0tN5qRwJHdtLwbu/5hdYLHBfwgRFOUl+S4/2MeZjfviM0wudBgaqwM7IMOHBwfYkBrh rNLnXAoudfzTtf69cyG7hGSH0N2ptyrvIff1pLtihvXUk5QZXE3YmKdegZHwDk17tXCx f7Mx32u4R+0aX1icfrqO42Un7EG2g8p7vrKX4LHsfDG1+DmFYS+jP6hTQDrMEwcFQOvj QPRQ== X-Gm-Message-State: ACgBeo278Qh694188Dmgy8nKAo7W9fUN1d0f+ryqxfQzJTZ23cLajwnw EkvWPM4IMpU8UNdqi2viioQ= X-Google-Smtp-Source: AA6agR4qqQ/vAvMyyxhIFskV/Nf8U14xtKT5zSR0M/4ctfttbF7UcUwMENRPo8U4t7RHZZCr57WBsw== X-Received: by 2002:a17:90a:a08:b0:1fa:b43d:68cf with SMTP id o8-20020a17090a0a0800b001fab43d68cfmr4873035pjo.41.1660770292085; Wed, 17 Aug 2022 14:04:52 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id x7-20020a628607000000b0053554e0e950sm149572pfd.147.2022.08.17.14.04.50 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:51 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 08/12] bpf: Adjust low/high watermarks in bpf_mem_cache Date: Wed, 17 Aug 2022 14:04:15 -0700 Message-Id: <20220817210419.95560-9-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770292; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CqItYD5FIMTqL735q4RzdfGMBe2ezI24gvr9zo137dQ=; b=pATZS5RMZTMhM61Z9A7N4pazMd+sHTtM7RQfUJSo0ltWcyVx9qZzDQw3YcYrwqXn5Me6QQ 45qIdsupJ52vdKGl09NurfHznlK7UqFrVFxLglBcmMYjc7uHavlQ7cS9L129pgMxeFNde1 jYtb41uc9yclBBPssK07WR17b9ekekk= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Ix9FtXEe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770292; a=rsa-sha256; cv=none; b=7FhJdLHVQ5Yix64KTJErdJ3vt1wUE89rDHqZ2tFJOantRk8jC4AfNCqK33FT+AtABl08Ht cdXhyMwfpT80KmwiiDu5ptApRNjCP8EuU9YvQ/b2V/zJkpvfC32NyC/9PsxG/ZSzjGPmJM 03QLj2rL0D46QfON4AVWvplWDqqXb8c= Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Ix9FtXEe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: f6rxur5nmzqdqifudyyzdmj9j8zoisj1 X-Rspamd-Queue-Id: 2403320165 X-HE-Tag: 1660770292-369970 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Same low/high watermarks for every bucket in bpf_mem_cache consume significant amount of memory. Preallocating 64 elements of PAGE_SIZE to the free list is not efficient. Make low/high watermarks and batching value depend on element size. This change brings significant memory savings. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 64 ++++++++++++++++++++++++++++++------------- 1 file changed, 45 insertions(+), 19 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index a43630371b9f..be8262f5c9ec 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -105,6 +105,7 @@ struct bpf_mem_cache { atomic_t free_cnt_nmi; /* flag to refill nmi list too */ bool refill_nmi_list; + int low_watermark, high_watermark, batch; }; struct bpf_mem_caches { @@ -123,14 +124,6 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head) return entry; } -#define BATCH 48 -#define LOW_WATERMARK 32 -#define HIGH_WATERMARK 96 -/* Assuming the average number of elements per bucket is 64, when all buckets - * are used the total memory will be: 64*16*32 + 64*32*32 + 64*64*32 + ... + - * 64*4096*32 ~ 20Mbyte - */ - /* extra macro useful for testing by randomizing in_nmi condition */ #define bpf_in_nmi() in_nmi() @@ -238,7 +231,7 @@ static void free_bulk(struct bpf_mem_cache *c) if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); free_one(c, llnode); - } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + } while (cnt > (c->high_watermark + c->low_watermark) / 2); } static void free_bulk_nmi(struct bpf_mem_cache *c) @@ -253,7 +246,7 @@ static void free_bulk_nmi(struct bpf_mem_cache *c) else cnt = 0; free_one(c, llnode); - } while (cnt > (HIGH_WATERMARK + LOW_WATERMARK) / 2); + } while (cnt > (c->high_watermark + c->low_watermark) / 2); } static void bpf_mem_refill(struct irq_work *work) @@ -262,12 +255,12 @@ static void bpf_mem_refill(struct irq_work *work) int cnt; cnt = c->free_cnt; - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) /* irq_work runs on this cpu and kmalloc will allocate * from the current numa node which is what we want here. */ - alloc_bulk(c, BATCH, NUMA_NO_NODE); - else if (cnt > HIGH_WATERMARK) + alloc_bulk(c, c->batch, NUMA_NO_NODE); + else if (cnt > c->high_watermark) free_bulk(c); if (!c->refill_nmi_list) @@ -276,9 +269,9 @@ static void bpf_mem_refill(struct irq_work *work) */ return; cnt = atomic_read(&c->free_cnt_nmi); - if (cnt < LOW_WATERMARK) - alloc_bulk_nmi(c, BATCH, NUMA_NO_NODE); - else if (cnt > HIGH_WATERMARK) + if (cnt < c->low_watermark) + alloc_bulk_nmi(c, c->batch, NUMA_NO_NODE); + else if (cnt > c->high_watermark) free_bulk_nmi(c); c->refill_nmi_list = false; } @@ -294,14 +287,47 @@ static void notrace irq_work_raise(struct bpf_mem_cache *c, bool in_nmi) irq_work_queue(&c->refill_work); } +/* For typical bpf map case that uses bpf_mem_cache_alloc and single bucket + * the freelist cache will be elem_size * 64 (or less) on each cpu. + * + * For bpf programs that don't have statically known allocation sizes and + * assuming (low_mark + high_mark) / 2 as an average number of elements per + * bucket and all buckets are used the total amount of memory in freelists + * on each cpu will be: + * 64*16 + 64*32 + 64*64 + 64*96 + 64*128 + 64*196 + 64*256 + 32*512 + 16*1024 + 8*2048 + 4*4096 + * + nmi's reserves + * 1*16 + 1*32 + 1*64 + 1*96 + 1*128 + 1*196 + 1*256 + 1*512 + 1*1024 + 1*2048 + 1*4096 + * == ~ 122 Kbyte using below heuristic. + * In unlikely worst case where bpf progs used all allocations sizes from + * non-NMI and from NMI too: ~ 227 Kbyte per cpu. + * Initialized, but unused bpf allocator (not bpf map specific one) will + * consume ~ 19 Kbyte per cpu. + * Typical case will be between 19K and 122K closer to 19K. + * bpf progs can and should share bpf_mem_cache when possible. + */ + static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) { init_irq_work(&c->refill_work, bpf_mem_refill); + if (c->unit_size <= 256) { + c->low_watermark = 32; + c->high_watermark = 96; + } else { + /* When page_size == 4k, order-0 cache will have low_mark == 2 + * and high_mark == 6 with batch alloc of 3 individual pages at + * a time. + * 8k allocs and above low == 1, high == 3, batch == 1. + */ + c->low_watermark = max(32 * 256 / c->unit_size, 1); + c->high_watermark = max(96 * 256 / c->unit_size, 3); + } + c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1); + /* To avoid consuming memory assume that 1st run of bpf * prog won't be doing more than 4 map_update_elem from * irq disabled region */ - alloc_bulk(c, c->unit_size < 256 ? 4 : 1, cpu_to_node(cpu)); + alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); /* NMI progs are rare. Assume they have one map_update * per prog at the very beginning. @@ -442,7 +468,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) } WARN_ON(cnt < 0); - if (cnt < LOW_WATERMARK) + if (cnt < c->low_watermark) irq_work_raise(c, in_nmi); return llnode; } @@ -471,7 +497,7 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) } WARN_ON(cnt <= 0); - if (cnt > HIGH_WATERMARK) + if (cnt > c->high_watermark) /* free few objects from current cpu into global kmalloc pool */ irq_work_raise(c, in_nmi); } From patchwork Wed Aug 17 21:04:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24D56C32773 for ; Wed, 17 Aug 2022 21:04:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1BEF8D0003; Wed, 17 Aug 2022 17:04:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACB758D0002; Wed, 17 Aug 2022 17:04:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96C308D0003; Wed, 17 Aug 2022 17:04:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 887568D0002 for ; Wed, 17 Aug 2022 17:04:57 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4A633817B2 for ; Wed, 17 Aug 2022 21:04:57 +0000 (UTC) X-FDA: 79810314234.01.F949434 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf05.hostedemail.com (Postfix) with ESMTP id D6F90100061 for ; Wed, 17 Aug 2022 21:04:56 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id d65-20020a17090a6f4700b001f303a97b14so3079439pjk.1 for ; Wed, 17 Aug 2022 14:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=eC2rtkVbVwNuBSQ3V1/X4MqvRMlFh39VvvUQCY2nUqM=; b=ZODVqnfokgzR6xAOLe1TR5kgjT+E1Py99cY5RCQvlvoxhbEJsXhjwpvdDEW7mEhEMi EQel2tKcUTK+YTD80kM9f2qh3socQ5OOtAQPa8YG55jVh1J4ITLcjY/hJY/LFye8V00S QAqlvB7HR5ZWoM17s5K1pZGKJzsgTbD+dbKJt7WVCuW4yMiq390J/0ZiGxPpJ+b+4ulr 4T8tvQ4ukZrlU4ypzs98d2gXeQB13zXMW/vgCPWkwEjRuoXKvg4D7O2xBQkYrj/VVDxP yXHmc5q5YaPLtpzugqED2ETClH9M5SWrHUaCOtlWKEKTNM0GrEmQyCfLZa4CW+zngrEq XO7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=eC2rtkVbVwNuBSQ3V1/X4MqvRMlFh39VvvUQCY2nUqM=; b=HrmiPBgrcV693fvnYUWTvuejlgHhq6PmtKNeAQ6kUd6wq27ZMcojRDrxh/W4CTXbt+ EI/fyO9Ew3wH4IC+HGtVYAcqDtHi2epJyx4A4CAMFpwrLya9qkQklbfSyA5NAxEzKphC LrCEa/FN4j8AcoqLadhdAjADZTuKOHQlcTLvKk2xcAjPrpJ+PNcvDJs0IWndcmIjLTJv b8/FHbf7Z7hcHg3xPu6inOW4xqcmdU0izop9oEV7MFbDXh9n/VTnekaRdoO3nRDRh8qP COxJrHefhjYGmW3vswnbFER26YIrFpqzaoDa396hRTAXxXd8kzS5n3Hhc1+5HFX2na84 r3Xg== X-Gm-Message-State: ACgBeo3kqAm5mR1kD4IGi7TmStSIY9rM+LI9mRUdCA5Mc0ou4fGXAybk 6FFd4BXj3hoH2kXWybtbVkOprl/stk0= X-Google-Smtp-Source: AA6agR6igdPjlVXHt861XtWddd3bEdvSscIUU3Taah7nMW3P5IM4zXUDgmqGFd5YcSA2Iaf7izwODA== X-Received: by 2002:a17:90b:350d:b0:1f4:f6a5:a281 with SMTP id ls13-20020a17090b350d00b001f4f6a5a281mr5386216pjb.99.1660770295857; Wed, 17 Aug 2022 14:04:55 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id i16-20020a056a00005000b0052d82ce65a9sm11309230pfk.143.2022.08.17.14.04.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:55 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 09/12] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. Date: Wed, 17 Aug 2022 14:04:16 -0700 Message-Id: <20220817210419.95560-10-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770296; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eC2rtkVbVwNuBSQ3V1/X4MqvRMlFh39VvvUQCY2nUqM=; b=wsJ6t7b9hqLVeFtUoWAYqvf2fAIHO40tYPrjK8sD7L34vSgkLrEjdFYSksfAYI0LpQunUM LfgRjDrq3Qyq/QRMaspEx7syq3BnrixR9Lq8yQzE59OD8qR+wZq8/rimUXY0HOqvmAaw4y qVpHIGkQrZ7iQFZu6Frd20wRCsnDmVM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZODVqnfo; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770296; a=rsa-sha256; cv=none; b=0Y/cPGei0Wji7KgBPBZe+esYUacqbML1QzlQeziczeW3emc30iD4cKRJ7145nbseyJhjl+ bpA0jzvW2U09j1PWRDEQ7EwiRldqWw8lOvoCYj2lBmtcjkH94TQk9XxVjLsWf/SEislfga 0Zcfa0lQf0XBa2rB1BOzUWLSt15FAts= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZODVqnfo; spf=pass (imf05.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: adctu9safmmzstp7zxetexjd7reypgdr X-Rspamd-Queue-Id: D6F90100061 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1660770296-412739 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Signed-off-by: Alexei Starovoitov --- kernel/bpf/memalloc.c | 58 ++++++++++++++++++++++++++++++++++++++++--- kernel/bpf/syscall.c | 5 +++- 2 files changed, 59 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index be8262f5c9ec..ae4cdc9493c3 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -106,6 +106,11 @@ struct bpf_mem_cache { /* flag to refill nmi list too */ bool refill_nmi_list; int low_watermark, high_watermark, batch; + + struct rcu_head rcu; + struct llist_head free_by_rcu; + struct llist_head waiting_for_gp; + atomic_t call_rcu_in_progress; }; struct bpf_mem_caches { @@ -214,6 +219,39 @@ static void free_one(struct bpf_mem_cache *c, void *obj) kfree(obj); } +static void __free_rcu(struct rcu_head *head) +{ + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); + struct llist_node *llnode = __llist_del_all(&c->waiting_for_gp); + struct llist_node *pos, *t; + + llist_for_each_safe(pos, t, llnode) + free_one(c, pos); + atomic_set(&c->call_rcu_in_progress, 0); +} + +static void enque_to_free(struct bpf_mem_cache *c, void *obj) +{ + struct llist_node *llnode = obj; + + /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. + * Nothing races to add to free_by_rcu list. + */ + __llist_add(llnode, &c->free_by_rcu); +} + +static void do_call_rcu(struct bpf_mem_cache *c) +{ + struct llist_node *llnode, *t; + + if (atomic_xchg(&c->call_rcu_in_progress, 1)) + return; + + llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) + __llist_add(llnode, &c->waiting_for_gp); + call_rcu(&c->rcu, __free_rcu); +} + static void free_bulk(struct bpf_mem_cache *c) { struct llist_node *llnode; @@ -230,8 +268,9 @@ static void free_bulk(struct bpf_mem_cache *c) cnt = 0; if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(flags); - free_one(c, llnode); + enque_to_free(c, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); + do_call_rcu(c); } static void free_bulk_nmi(struct bpf_mem_cache *c) @@ -245,8 +284,9 @@ static void free_bulk_nmi(struct bpf_mem_cache *c) cnt = atomic_dec_return(&c->free_cnt_nmi); else cnt = 0; - free_one(c, llnode); + enque_to_free(c, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); + do_call_rcu(c); } static void bpf_mem_refill(struct irq_work *work) @@ -358,7 +398,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) return -ENOMEM; size += LLIST_NODE_SZ; /* room for llist_node */ snprintf(buf, sizeof(buf), "bpf-%u", size); - kmem_cache = kmem_cache_create(buf, size, 8, SLAB_TYPESAFE_BY_RCU, NULL); + kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { free_percpu(pc); return -ENOMEM; @@ -400,6 +440,18 @@ static void drain_mem_cache(struct bpf_mem_cache *c) { struct llist_node *llnode; + /* The caller has done rcu_barrier() and no progs are using this + * bpf_mem_cache, but htab_map_free() called bpf_mem_cache_free() for + * all remaining elements and they can be in free_by_rcu or in + * waiting_for_gp lists, so drain accumulating free_by_rcu list and + * optionally wait for callbacks to finish. + */ + while ((llnode = __llist_del_first(&c->free_by_rcu))) + free_one(c, llnode); + if (atomic_xchg(&c->call_rcu_in_progress, 1)) + rcu_barrier(); + WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); + while ((llnode = llist_del_first(&c->free_llist_nmi))) free_one(c, llnode); while ((llnode = __llist_del_first(&c->free_llist))) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 83c7136c5788..eeef64b27683 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -638,7 +638,10 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock) bpf_map_free_id(map, do_idr_lock); btf_put(map->btf); INIT_WORK(&map->work, bpf_map_free_deferred); - schedule_work(&map->work); + /* Avoid spawning kworkers, since they all might contend + * for the same mutex like slab_mutex. + */ + queue_work(system_unbound_wq, &map->work); } } From patchwork Wed Aug 17 21:04:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C036DC32772 for ; Wed, 17 Aug 2022 21:05:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 576A58D0005; Wed, 17 Aug 2022 17:05:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 526048D0002; Wed, 17 Aug 2022 17:05:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A0E28D0005; Wed, 17 Aug 2022 17:05:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2736F8D0002 for ; Wed, 17 Aug 2022 17:05:01 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F0618C08E9 for ; Wed, 17 Aug 2022 21:05:00 +0000 (UTC) X-FDA: 79810314360.16.38D5611 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf04.hostedemail.com (Postfix) with ESMTP id 9091C40063 for ; Wed, 17 Aug 2022 21:05:00 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id c24so12961824pgg.11 for ; Wed, 17 Aug 2022 14:05:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=O1bPhAvwfl8qB1dAQm/TmI1W1wAtyZwEot9xV2yHQP4=; b=p96ZA5x/3QjnYv8zVsscJb6pTFi92zNWwi26jxJg0QUXJD/cEBomqgcq1FWk8ylyGh YPrJ7RDbYIsidHVMnsqAk6OzfC00tyEKc+1ZTG3z3Mlw/Xzl5myRTMM+/46zE8L213KV UYvNle63ynwuAityBSNEyuEnc+vgO2OINHxoAyeltszVmb6XrK+x9sMqid/GWcoFLFeH 5BWcHjr9Pa32CcGgO46akz1nhfOXBswtvBoRY/QMhmewRu1wb7kS8W+owKucBsYpPan7 DMDfkETIy1FcxMJECwQj9Sd7LYt06G2rH9ML6gqXKpJNGKuaT3ngp9vEhrj4F1XrpUyj IE0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=O1bPhAvwfl8qB1dAQm/TmI1W1wAtyZwEot9xV2yHQP4=; b=T/+QrkUBgj/hHnQheN0o0AOIb+abFQhm6Z4UM5WdGFQmm8CWeiwxPO/yLxhc98RNZz ZFZ1WK4CbEAMYY95NImOEWXMOUbRr3ZlCrVeCUREZz2ssdg6TevWmUy8Raixja/DUldG 7zxumIvyoejlszaD0DA7l419MgWXGwD/ZCAVMotmihozsEFipKSmnc2MNN9IdLyZalnC dGRii5d+VTmYiZL/DQ2YHKfLel2I+N7VGyiFLGVTI4/8Ec48TV9Y/XJdfoGoOMYH/p/I BkZ+C+vXUvdnlbUUcYz+fPi6z0DIx9Qy18DStmoYiaHMWnMvLPYgluB/e7LaEQ78W8/5 OYFQ== X-Gm-Message-State: ACgBeo0YsUB/E8YWoxGLi9gs5DYTl4aGdzHVgyXUTInaPy30V2/A9r6r zw/ZUQ1Tvj+puS8WNOPWep4= X-Google-Smtp-Source: AA6agR7anURERJvTlCyw49DHRJIKlg5RjD5rEb05XJJfzLEgUv8HD2FU+zUwL4MkZmNVh1Iq1K6YAg== X-Received: by 2002:a05:6a00:4509:b0:52d:4943:90b4 with SMTP id cw9-20020a056a00450900b0052d494390b4mr24467pfb.22.1660770299491; Wed, 17 Aug 2022 14:04:59 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id o11-20020a170902d4cb00b0016bd8fb1fafsm325846plg.307.2022.08.17.14.04.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:04:59 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 10/12] bpf: Add percpu allocation support to bpf_mem_alloc. Date: Wed, 17 Aug 2022 14:04:17 -0700 Message-Id: <20220817210419.95560-11-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="p96ZA5x/"; spf=pass (imf04.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770299; a=rsa-sha256; cv=none; b=WmR0f4o0iHPWLxy8jT8NTkSoBIpd1XVz1PPiN08b2N2CBx0FJDyHZ3jsPVea65uzGt3hYQ MLHYQa2G8rC7mAR7h2xFHSZZXN3x6sZyN6ZU3di/17YHfEw6sLWv6YuyWKB0zuJtz4mFEZ Exo+c0YscVha1eVNPOF+SyTB8T92qxw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770299; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O1bPhAvwfl8qB1dAQm/TmI1W1wAtyZwEot9xV2yHQP4=; b=gLHqiU6MGiQu2G8NOX46eTSAwlslsgfRLTwnQUcil3CruyfHj2HZcwjCbS596Cvehz4Lb9 zjsfEalp00ns7xfZ33IUFmkyosx1dYIn33f7oco9KXP0AHBVPhGaEHd/jqTIzQN2mCKuWt kPabzpEqjEQYLAjytFheQ87ZCDzCrQc= X-Rspam-User: X-Rspamd-Queue-Id: 9091C40063 X-Stat-Signature: 8hoo3okfxes7gicgg5zhp6wjqp8rcw9a Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="p96ZA5x/"; spf=pass (imf04.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam11 X-HE-Tag: 1660770300-806724 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 2 +- kernel/bpf/hashtab.c | 2 +- kernel/bpf/memalloc.c | 44 +++++++++++++++++++++++++++++++---- 3 files changed, 41 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 804733070f8d..653ed1584a03 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,7 +12,7 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 3c1d15fd052a..bf20c45002fe 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -598,7 +598,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) goto free_prealloc; } } else { - err = bpf_mem_alloc_init(&htab->ma, htab->elem_size); + err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; } diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index ae4cdc9493c3..633e7eb9ba62 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -105,6 +105,7 @@ struct bpf_mem_cache { atomic_t free_cnt_nmi; /* flag to refill nmi list too */ bool refill_nmi_list; + bool percpu; int low_watermark, high_watermark, batch; struct rcu_head rcu; @@ -141,6 +142,19 @@ static void *__alloc(struct bpf_mem_cache *c, int node) */ gfp_t flags = GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT; + if (c->percpu) { + void **obj = kmem_cache_alloc_node(c->kmem_cache, flags, node); + void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); + + if (!obj || !pptr) { + free_percpu(pptr); + kfree(obj); + return NULL; + } + obj[1] = pptr; + return obj; + } + if (c->kmem_cache) return kmem_cache_alloc_node(c->kmem_cache, flags, node); @@ -213,6 +227,12 @@ static void alloc_bulk_nmi(struct bpf_mem_cache *c, int cnt, int node) static void free_one(struct bpf_mem_cache *c, void *obj) { + if (c->percpu) { + free_percpu(((void **)obj)[1]); + kmem_cache_free(c->kmem_cache, obj); + return; + } + if (c->kmem_cache) kmem_cache_free(c->kmem_cache, obj); else @@ -382,21 +402,30 @@ static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) { static u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; struct bpf_mem_caches *cc, __percpu *pcc; struct bpf_mem_cache *c, __percpu *pc; - struct kmem_cache *kmem_cache; + struct kmem_cache *kmem_cache = NULL; struct obj_cgroup *objcg = NULL; char buf[32]; - int cpu, i; + int cpu, i, unit_size; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; - size += LLIST_NODE_SZ; /* room for llist_node */ + + if (percpu) { + unit_size = size; + /* room for llist_node and per-cpu pointer */ + size = LLIST_NODE_SZ + sizeof(void *); + } else { + size += LLIST_NODE_SZ; /* room for llist_node */ + unit_size = size; + } + snprintf(buf, sizeof(buf), "bpf-%u", size); kmem_cache = kmem_cache_create(buf, size, 8, 0, NULL); if (!kmem_cache) { @@ -409,14 +438,19 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size) for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); c->kmem_cache = kmem_cache; - c->unit_size = size; + c->unit_size = unit_size; c->objcg = objcg; + c->percpu = percpu; prefill_mem_cache(c, cpu); } ma->cache = pc; return 0; } + /* size == 0 && percpu is an invalid combination */ + if (WARN_ON_ONCE(percpu)) + return -EINVAL; + pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; From patchwork Wed Aug 17 21:04:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33828C32772 for ; Wed, 17 Aug 2022 21:05:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C54D38D000C; Wed, 17 Aug 2022 17:05:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C05C18D0002; Wed, 17 Aug 2022 17:05:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA52D8D000C; Wed, 17 Aug 2022 17:05:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9A2268D0002 for ; Wed, 17 Aug 2022 17:05:04 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 756F8161775 for ; Wed, 17 Aug 2022 21:05:04 +0000 (UTC) X-FDA: 79810314528.19.E8044EA Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf30.hostedemail.com (Postfix) with ESMTP id 27917801B5 for ; Wed, 17 Aug 2022 21:05:04 +0000 (UTC) Received: by mail-pg1-f178.google.com with SMTP id 73so12968048pgb.9 for ; Wed, 17 Aug 2022 14:05:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=eyosi/ApRDmJH06E2mfAg4K+Gs7C6ov+wMXDcYoXgiA=; b=TBY+PoFfGdsg0yomLDUoAOGXLF9qjczt9cp0q1IiM/33m0YWNLrsV+Zbulc7dEckwd cO8pG7g6UMuWrw/EqWsVbyrwYIEsU+fnO8RVKKCa5iAq9EESd4TGuRszWSbOWpbyu1lU PmfbskvdzlxfWOaMtSzXO+DybRcIg4cf/htmkFRUUHLBBcCs0ixbHqmSVMswiqOd5SwO SMYeNvcUmLX2TmCB2PsmQ99VrrWyVJiAZZaRvq2azixp+l9J/tZVQ72Jh3w8xXdE7GDv emSmbNxl72zQYAJ2RXy0A1FMAEJQBSLKvwGO6xd7rVSTYaijbis6necH/stR9FTslGqZ a1ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=eyosi/ApRDmJH06E2mfAg4K+Gs7C6ov+wMXDcYoXgiA=; b=PdUKoga2/ohA0+bVFaKCtOG1W5WW2UDypESbI55ICHj6YTwBhdH7btI+jyNp9a0P6L ldh+BqGIDFZKGMxmgs7m5eTrVLfZbZ116XhUti3HpFYsuzWJmO07R5i0OyQa6ttHW/jT Al/pnf7BkOgonml/D/qURJnQgENetJ3IorQG6px7EdU6AirapgDBiMSnmXp1I4z4kqQe IbUogV80Q72qnFZl/M7NUVkgRcGwYS3w8fvlVWnJ78hDSR6ny+zSsn+UL8+QteBgLZxq QJvmm8W6QxEdBUHZ5zzf6mq1tkLQ5ROOGURyi/rKF/juCZvORhFEMc0wR85tTNtb9YYM zfvg== X-Gm-Message-State: ACgBeo0qFylc6ayiuE8SHR7vcE0upqe9tn+2ZISc1dKLNP7202+Okly4 aRvwlPmo9iAqX3PpiKBq/Zs= X-Google-Smtp-Source: AA6agR6XP65/0jbYH/wPxm3Z4T7RTZCVbt+m8h5jlW0+bxnCjWSMWGeInm1BbOsq3aGVfxvPUFZaUg== X-Received: by 2002:a65:60d4:0:b0:419:9871:9cf with SMTP id r20-20020a6560d4000000b00419987109cfmr74148pgv.214.1660770303230; Wed, 17 Aug 2022 14:05:03 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id d6-20020a170902654600b0016a33177d3csm348261pln.160.2022.08.17.14.05.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:05:02 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 11/12] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. Date: Wed, 17 Aug 2022 14:04:18 -0700 Message-Id: <20220817210419.95560-12-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TBY+PoFf; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770303; a=rsa-sha256; cv=none; b=joNVplah0UyYrfi9jvlOBUdUp616gRsekQgCVD8FhY0HjPjGJxQA5ppt4g0TXb8OXg1sGJ n3W9CrajSgQ7kkdLUIQaIErEN+khWLbWJrRtGFDVan7313KAFogkGPc3fLukrZ070I5if5 yquiC+pETVCZeAl2aSG1uM0wGz29Ch8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770303; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eyosi/ApRDmJH06E2mfAg4K+Gs7C6ov+wMXDcYoXgiA=; b=1svniTeEvZt+1348jlH5DOqgvxD9ZdEOpC5vRUZYjMhsDeEcpn62WmimddyQwZ4ebKVhBO rBcxILqG+Ebz+yOcupZ9Qxez0caen2F7kRVPHZA5QKaWVg17Ng3Mv7UtQFuoNPal8m83jo 05FPvcSLiQNwV1t41jcx9u/bAzYwW2w= Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=TBY+PoFf; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: q4zokz8uft5xeag6qd5wjzyf35myeh9t X-Rspamd-Queue-Id: 27917801B5 X-Rspamd-Server: rspam05 X-HE-Tag: 1660770304-774411 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. Signed-off-by: Alexei Starovoitov --- kernel/bpf/hashtab.c | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index bf20c45002fe..921f6fa9dc1b 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -94,6 +94,7 @@ struct bucket { struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; + struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { @@ -121,14 +122,14 @@ struct htab_elem { struct { void *padding; union { - struct bpf_htab *htab; struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { - struct rcu_head rcu; + /* pointer to per-cpu pointer */ + void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; @@ -439,8 +440,6 @@ static int htab_map_alloc_check(union bpf_attr *attr) bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); - BUILD_BUG_ON(offsetof(struct htab_elem, htab) != - offsetof(struct htab_elem, hash_node.pprev)); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); @@ -601,6 +600,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; + if (percpu) { + err = bpf_mem_alloc_init(&htab->pcpu_ma, + round_up(htab->map.value_size, 8), true); + if (err) + goto free_map_locked; + } } return &htab->map; @@ -611,6 +616,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) free_percpu(htab->map_locked[i]); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_htab: lockdep_unregister_key(&htab->lockdep_key); @@ -886,19 +892,11 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) - free_percpu(htab_elem_get_ptr(l, htab->map.key_size)); + bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); check_and_free_fields(htab, l); bpf_mem_cache_free(&htab->ma, l); } -static void htab_elem_free_rcu(struct rcu_head *head) -{ - struct htab_elem *l = container_of(head, struct htab_elem, rcu); - struct bpf_htab *htab = l->htab; - - htab_elem_free(htab, l); -} - static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; @@ -944,12 +942,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) __pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); - if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) { - l->htab = htab; - call_rcu(&l->rcu, htab_elem_free_rcu); - } else { - htab_elem_free(htab, l); - } + htab_elem_free(htab, l); } } @@ -1051,18 +1044,18 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, memcpy(l_new->key, key, key_size); if (percpu) { - size = round_up(size, 8); if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ - pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_NOWAIT | __GFP_NOWARN); + pptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!pptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } + l_new->ptr_to_pptr = pptr; + pptr = *(void **)pptr; } pcpu_init_value(htab, pptr, value, onallcpus); @@ -1554,6 +1547,7 @@ static void htab_map_free(struct bpf_map *map) bpf_map_free_kptr_off_tab(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); + bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); From patchwork Wed Aug 17 21:04:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 12946482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51CC4C25B08 for ; Wed, 17 Aug 2022 21:05:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E292C8D000D; Wed, 17 Aug 2022 17:05:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD8608D0002; Wed, 17 Aug 2022 17:05:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C79E58D000D; Wed, 17 Aug 2022 17:05:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B7D048D0002 for ; Wed, 17 Aug 2022 17:05:08 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9599E1C6F0C for ; Wed, 17 Aug 2022 21:05:08 +0000 (UTC) X-FDA: 79810314654.18.721949F Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf02.hostedemail.com (Postfix) with ESMTP id 2733E801D8 for ; Wed, 17 Aug 2022 21:05:07 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id g18so4599739pju.0 for ; Wed, 17 Aug 2022 14:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=ZBI+fwy76bkYY8LVV14e2AjKMWsYkt0VzLIv3m+2oj8=; b=WWKi4wI8r69uN3XJ0TC1pEJIpFbczA4APGAcyEcd85IDHK1d0qgmvHQGgGWL29zF80 h6GVEVP662vDxI0f1Ihp9BP9SNsUKr/sAaalCilStw2HtG+V91lr9Y+ThIEzl9CEgiVe x+xHcMTAqaqDqV2aqjtTd/xGI5qKv1LxiJ27F32wwTMARWO03anfgcCy+u55VZcY1QuN zJJh3bmyUuq6wgP3xmlIPUTYPYPUAdYWM0aCL76A31OJ5tFL6pl54wF0Kl9LKhXA9ckl BQmX+AWaxGxV2iGQawz9rTaK9CSK38EjvwR2IDHqzxs54KW9Zub5bvmqUHzoIW7nrDeq g+YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=ZBI+fwy76bkYY8LVV14e2AjKMWsYkt0VzLIv3m+2oj8=; b=JYGGkz9Q43n8iPmzNbY4U9YwoghSXq1F2vxZMvopTz6izrwIQekfeC114SKky3gV5Z 8F9MIDMKw1mbW7asJmKRyUSa/5I0JbG4YN19JwObFukF9KLKNWrtyKWF6DuFHo4NggKa FrrJNWUGWBZCUMyq52OIj0dUwP1qUB9TeRR4/D2GoLZa2qlZhjYXWlz7ZqDguDDrN+6r J4dImcVsjJUPW+6u9xwtLFV5/MIFCVAuw9Shp+1KvZ9/6ef6GU6jkoqOnQGYagqD//Rb prriitdBpNU03k+ax2gDrU3RifSQtgESxTAnMwl6z+DL/5DatJnH65gZi1KRYLSvBA35 gHQQ== X-Gm-Message-State: ACgBeo30F02T5pknaiBpYICMBqRUGWAiaOOuOTKlyJhtuVWkNNHV6HxE jM1FqXETJ/HY5QpFE9uKDWM= X-Google-Smtp-Source: AA6agR7gJR7MOvq7/smgPOcBfW8hZe45QZI7aUrmByAEqpEfPv1pIsb0aEpqc9BmPebRsePmQkj86Q== X-Received: by 2002:a17:90b:1b05:b0:1fa:c33e:9137 with SMTP id nu5-20020a17090b1b0500b001fac33e9137mr1728268pjb.59.1660770306969; Wed, 17 Aug 2022 14:05:06 -0700 (PDT) Received: from localhost.localdomain ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id 72-20020a62174b000000b005349f259457sm8204082pfx.160.2022.08.17.14.05.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 17 Aug 2022 14:05:06 -0700 (PDT) From: Alexei Starovoitov To: davem@davemloft.net Cc: daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 bpf-next 12/12] bpf: Remove tracing program restriction on map types Date: Wed, 17 Aug 2022 14:04:19 -0700 Message-Id: <20220817210419.95560-13-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220817210419.95560-1-alexei.starovoitov@gmail.com> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660770308; a=rsa-sha256; cv=none; b=pFz/HDNgfXKA9kLmnZVFr71Yd4SZM8LDdY3aZ7RfHkZZGXlepuk0kC82fNFQWMA1/vBqkg rIiL+yfJ0VopUZqnz3qIBfiYcMIXyK4UVsdwSrD66M2+znqzyGyGoMniFuBx/KC+RVJ/OY UJ0+3JIu12aGzLXg6/BBczvo4AG0Lh4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WWKi4wI8; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660770308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZBI+fwy76bkYY8LVV14e2AjKMWsYkt0VzLIv3m+2oj8=; b=B5dlDNXFOn8x9vZdAscwI3ZZ2B09BFTSvPXnYjRHvl8XRVmfVoTtyW3QWjzJ6K4f+Av+4s /0byPmZD0X/dzEG3D0rsytQ0TlWLvXjkHIlZyN63RnyxV7mZ9/Kwfsm5nOKSVO2FJD9ezW 29b+fHhlBdJBaHbbjzBAkwdFig/tE6s= Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WWKi4wI8; spf=pass (imf02.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: aufjfoornw6ojh5hz4h4hxpdpkig4m4n X-Rspamd-Queue-Id: 2733E801D8 X-HE-Tag: 1660770307-634847 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexei Starovoitov The hash map is now fully converted to bpf_mem_alloc. Its implementation is not allocating synchronously and not calling call_rcu() directly. It's now safe to use non-preallocated hash maps in all types of tracing programs including BPF_PROG_TYPE_PERF_EVENT that runs out of NMI context. Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 42 ------------------------------------------ 1 file changed, 42 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d785f29047d7..a1ada707c57c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12599,48 +12599,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, { enum bpf_prog_type prog_type = resolve_prog_type(prog); - /* - * Validate that trace type programs use preallocated hash maps. - * - * For programs attached to PERF events this is mandatory as the - * perf NMI can hit any arbitrary code sequence. - * - * All other trace types using non-preallocated per-cpu hash maps are - * unsafe as well because tracepoint or kprobes can be inside locked - * regions of the per-cpu memory allocator or at a place where a - * recursion into the per-cpu memory allocator would see inconsistent - * state. Non per-cpu hash maps are using bpf_mem_alloc-tor which is - * safe to use from kprobe/fentry and in RT. - * - * On RT enabled kernels run-time allocation of all trace type - * programs is strictly prohibited due to lock type constraints. On - * !RT kernels it is allowed for backwards compatibility reasons for - * now, but warnings are emitted so developers are made aware of - * the unsafety and can fix their programs before this is enforced. - */ - if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { - if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { - /* perf_event bpf progs have to use preallocated hash maps - * because non-prealloc is still relying on call_rcu to free - * elements. - */ - verbose(env, "perf_event programs can only use preallocated hash map\n"); - return -EINVAL; - } - if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || - (map->inner_map_meta && - map->inner_map_meta->map_type == BPF_MAP_TYPE_PERCPU_HASH)) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - verbose(env, - "trace type programs can only use preallocated per-cpu hash map\n"); - return -EINVAL; - } - WARN_ONCE(1, "trace type BPF program uses run-time allocation\n"); - verbose(env, - "trace type programs with run-time allocated per-cpu hash maps are unsafe." - " Switch to preallocated hash maps.\n"); - } - } if (map_value_has_spin_lock(map)) { if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {