From patchwork Wed Jul 6 15:58:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 12908260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99EB9CCA473 for ; Wed, 6 Jul 2022 15:59:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A6546B0073; Wed, 6 Jul 2022 11:59:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 254626B0074; Wed, 6 Jul 2022 11:59:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07EA26B0075; Wed, 6 Jul 2022 11:59:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E47256B0073 for ; Wed, 6 Jul 2022 11:59:18 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id B4C82812F0 for ; Wed, 6 Jul 2022 15:59:18 +0000 (UTC) X-FDA: 79657134396.22.697F3B8 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf03.hostedemail.com (Postfix) with ESMTP id ED2C220010 for ; Wed, 6 Jul 2022 15:59:17 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id n10so14041627plp.0 for ; Wed, 06 Jul 2022 08:59:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=akEDbMUXW18lVVQMnLNZ0DPKQK+T5iUzEbwO1jR/ako=; b=FXEJgVPiMgIv2LHB9IwzVjkuSs7SuJ7a1DMIa50+4kEVPvktNPnYJHR3fpD6Q9sSw2 FrZS/+EIjUWwNreOb950nRMNfkIt7uH7fLHIKYhPeVgmnTP3VF+ENOQosxks6attZoeQ ge5QHZiiPluutYlz4CYAH0Mj8+ZYE6bfNHsnKkmwiGloSq8F6qsbB97bewQZCsh3ohmJ TvORLn6KlaOiQjqgiJcpFj+f+IzAyx58taO0pF/cF9dtz++DmM//RLwTBZPTZBblmygm Uh9tIzN7FMV1bXxAEgTQ9lbvvCWHlmX5slEgz+kG8+dcPpSZZXPvwWkJBhBk1WgVFYJ3 ZTuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=akEDbMUXW18lVVQMnLNZ0DPKQK+T5iUzEbwO1jR/ako=; b=uBjVnAyLtpRv5q2f+IzKrhfAY9PSoSNxXxUIbiSuIFomp+acsA3caa+nqxnu9nhFPN b3VZCfBe1ECrqBnNgtz7QMLKAhgNxTyQQLXEC8XURLkN9yVGrkWEX/z/vNFXNrYE1hf+ ZZpRMPE0X0Bpm0fRmxgiSeYunqz1LKErCg8E/u6Q2Hf8YIXKbx6CAJmV6ju88PBrlA89 Ck+liGVCDGLn2aDaVmHmsE+jcgdONIH5sHDW2SztZDUJayGQu91OSvy4oUyGiRwfGWSi MgHPsEGx9ljE+l9XCNBaaPZ0BiYk5UDYiPk2cK0atJKHilmIjkZiYMVvNSY8zTOpw4vX 4HdQ== X-Gm-Message-State: AJIora8AKFcGgcTdgEQnzcsbHQVbHFpjG3W4VNB3LyZj35BQA/83UaN7 1/4OZNz7jmlfNjak0ki/APcrVJMQF2vdSw== X-Google-Smtp-Source: AGRyM1snHtZZfSlKUu7WDNCUWn+NAwcO4M3uUNYneopXFzAEKsGmMfdreFFMrbhM0NYfdFoNhlJVCg== X-Received: by 2002:a17:902:d2c8:b0:16c:58d:7278 with SMTP id n8-20020a170902d2c800b0016c058d7278mr2434798plc.45.1657123157082; Wed, 06 Jul 2022 08:59:17 -0700 (PDT) Received: from vultr.guest ([2001:19f0:6001:3e22:5400:4ff:fe0f:2b20]) by smtp.gmail.com with ESMTPSA id n17-20020a056a0007d100b0051bada81bc7sm25000125pfu.161.2022.07.06.08.59.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Jul 2022 08:59:16 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, quentin@isovalent.com, roman.gushchin@linux.dev, haoluo@google.com Cc: bpf@vger.kernel.org, linux-mm@kvack.org, Yafang Shao Subject: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Date: Wed, 6 Jul 2022 15:58:47 +0000 Message-Id: <20220706155848.4939-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220706155848.4939-1-laoar.shao@gmail.com> References: <20220706155848.4939-1-laoar.shao@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657123158; a=rsa-sha256; cv=none; b=T4EJAiqevqOckZoI0nJ6rfXuzCp0ZTRifd2tD+Scejt1UjcitpxPK9i4EOQ9aUYfDY6KTI sERXuPwMeGiWL0O04SKecjd7o1IjlPE6EI1Ck/rg5N164j8LKc3r0E+HkuW3PLhNzGrIFo fx3S/++aag7wZHNPjiqqSAuBeMcnDa0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FXEJgVPi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657123158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=akEDbMUXW18lVVQMnLNZ0DPKQK+T5iUzEbwO1jR/ako=; b=CLcdZXu+0Qg+x0+4yJFZULaVK2KbRnCX+P3BiXM+w+rvZ+OieFxtvHtLNbAz019xK0VAsf BN3Xhc+/sujXVpRvv/QDnZO5Re0ErpyLzO/CSva1QS6YJzP0hj9vh8db3gHGnDfj6Ak5d2 QpGf5dMcYZzhYecxl8FXwOWf1JeXQoc= Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FXEJgVPi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Stat-Signature: 7yr5rcykjgbutet1g588x4amzf549myx X-Rspamd-Queue-Id: ED2C220010 X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1657123157-4434 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially if we allocate too much GFP_ATOMIC memory. For example, when we set the memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can easily break the memcg limit by force charge. So it is very dangerous to use GFP_ATOMIC in non-preallocated case. One way to make it safe is to remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate too much memory. We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is too memory expensive for some cases. That means removing __GFP_HIGH doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with it-avoiding issues caused by too much memory. So let's remove it. The force charge of GFP_ATOMIC was introduced in commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in commit 1461e8c2b6af ("memcg: unify force charging conditions") by checking __GFP_HIGH (that is no problem because both __GFP_HIGH and __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg, we have to carefully verify all the callsites. Now that we can fix it in BPF, we'd better not modify the memcg code. This fix can also apply to other run-time allocations, for example, the allocation in lpm trie, local storage and devmap. So let fix it consistently over the bpf code __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither currently. But the memcg code can be improved to make __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired. It also fixes a typo in the comment. Signed-off-by: Yafang Shao Reviewed-by: Roman Gushchin --- kernel/bpf/devmap.c | 3 ++- kernel/bpf/hashtab.c | 8 +++++--- kernel/bpf/local_storage.c | 3 ++- kernel/bpf/lpm_trie.c | 3 ++- 4 files changed, 11 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index c2867068e5bd..7672946126d5 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -845,7 +845,8 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net, struct bpf_dtab_netdev *dev; dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev), - GFP_ATOMIC | __GFP_NOWARN, + __GFP_ATOMIC | __GFP_NOWARN | + __GFP_KSWAPD_RECLAIM, dtab->map.numa_node); if (!dev) return ERR_PTR(-ENOMEM); diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 17fb69c0e0dc..9d4559a1c032 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -61,7 +61,7 @@ * * As regular device interrupt handlers and soft interrupts are forced into * thread context, the existing code which does - * spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*(); + * spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*(); * just works. * * In theory the BPF locks could be converted to regular spinlocks as well, @@ -978,7 +978,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, goto dec_count; } l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, - GFP_ATOMIC | __GFP_NOWARN, + __GFP_ATOMIC | __GFP_NOWARN | + __GFP_KSWAPD_RECLAIM, htab->map.numa_node); if (!l_new) { l_new = ERR_PTR(-ENOMEM); @@ -996,7 +997,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, } else { /* alloc_percpu zero-fills */ pptr = bpf_map_alloc_percpu(&htab->map, size, 8, - GFP_ATOMIC | __GFP_NOWARN); + __GFP_ATOMIC | __GFP_NOWARN | + __GFP_KSWAPD_RECLAIM); if (!pptr) { kfree(l_new); l_new = ERR_PTR(-ENOMEM); diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c index 8654fc97f5fe..534b69682b17 100644 --- a/kernel/bpf/local_storage.c +++ b/kernel/bpf/local_storage.c @@ -165,7 +165,8 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key, } new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size), - __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN, + __GFP_ZERO | __GFP_ATOMIC | __GFP_NOWARN | + __GFP_KSWAPD_RECLAIM, map->numa_node); if (!new) return -ENOMEM; diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index f0d05a3cc4b9..7bae7133f1dd 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -285,7 +285,8 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie, if (value) size += trie->map.value_size; - node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN, + node = bpf_map_kmalloc_node(&trie->map, size, __GFP_ATOMIC | + __GFP_KSWAPD_RECLAIM | __GFP_NOWARN, trie->map.numa_node); if (!node) return NULL;