From patchwork Wed Jan 15 02:17:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939750 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBA9A381A3 for ; Wed, 15 Jan 2025 02:17:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907481; cv=none; b=W2FZTZlj7FDYN44Bvokj6JiV0u8GeorTIgzHuxAa/6d04+V3Z8DWlHqe+fTpSzhaZ7p0VgWEzAZ4oSilCg15A0D9oTPyCtlyslpOZsvWoSyXqhq1tyCJuFrN/HKt9xGfdBbfPt2OC8Rs0M1VuFtme0IOp2/XeTbea+qLwR9brxg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907481; c=relaxed/simple; bh=AoB2KJqMaJOrCmJSuEoY+S5uhKBzMdWya5CfFTxDpsk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GLiQstqFYSwEvZBl+wkscfRCyhuDwI6dGp+ut2eGu6dcQzu40bfBTJgbdA1cVeyLBwkjL/JqAjOt9tiKC1PFIOQey5qdxlUWzgdy5VEBnofQdKtsUORMOoJgx2UAXQhzJqNmP7UH59w/3K93NwfjjUz+p9SKZ/VIzXqLipjR5e4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MRrlxRKk; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MRrlxRKk" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-2165448243fso134655585ad.1 for ; Tue, 14 Jan 2025 18:17:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907478; x=1737512278; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RRfQnK0Xbq5jILYR704HtD9aPWYF0UoBSTHx/j1KDDk=; b=MRrlxRKkGNTxEwkNuW99GxeFwKfnTCQCAQfCj9rx6+tVJM8KhDQS5PBwYW904GLVtu 7uKh/ZYfSGTB4Pf4Ndek78+QKjGkC/7e2q1oQgpSHrLdxbGDaquKA31EUIlRRvPBHhD0 rXXBex7iZrRNOcWNIo+0HazExcOpdl5zsjadvFXk7h+ZNowybsyJ6OROqxtA/Cj3+X9g /bX/RzDntSksZcjyEupQ6kr0hV2XSwCFqk8lGqyKJDf7FvQf9UttyHp5ej0HVz2fdSnY IGFypa5e78RktDDecQodHSELE3MPtDg8fUeZKPN2RQpgQ3ChpYlr36jDXt9b9Ufa89mD LCmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907478; x=1737512278; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RRfQnK0Xbq5jILYR704HtD9aPWYF0UoBSTHx/j1KDDk=; b=dS1bayjn+Jq3XHIgkBMBlr0H98vhzm2cB9ON7SEUiDjbb/rGQogxMLQKFGpg2nBEvU H9qi05Lhi8/3OtMUwB/QPlPkV9jf3RXdMszhVF53JGnBn/hhkcdU4Bw1tcWqtT3cx/e4 9e7hqTdE8BsXS6BSeBmIJPpYcUiodpTQxZMGJ0BqRdN8+BA6D338tTK1KUkGTGIuSUal s6ta4QmmfHn9GK076k4ZDxcDSl0M+mZBK3EKp2pkQFC8Bo/QA43sHXASCbqFlqiVcOwn hVfUBvp1ybsKzgB04tNlgDe1b47sr7VF5Nys3DMMVJssz5jQwp17L6TM7nX4+yoa9oVl gY9A== X-Gm-Message-State: AOJu0YyprrHEs5iuWZl9I9HcCx1wYYLAUxkbV3Xvtyw0U1XU/M3BWdX1 6k2xGelYPOietgyNoQhc2ISsFkX2aBqHoyj4BVqRIkls45r/3GFOJ88H+w== X-Gm-Gg: ASbGnctnqBqx4hNlhbvK6tiow1BFEe+wQYxH56lja6sR32fwjqDULqlI6iOu8gPmAUN oTvqBNN1DvQQsrCVFtONe0xvXKRVJADWg21Di14NbTw2ATChMBnjlE2BZ6ktNm61xnRDucg9sDI 5+FRU3pesr3s/8yzF6WcxFRNeIblUTyGrvzYIrEbbpu8fxaJ+G3DZXMZp/+q9Vd6Ao9kz2D3M0c zspoZXbrVj0cS4AI21o8WKb4NRsJquKlNtD2P0bisI65wmzVWfBdpxXTeqpk2TanxN7/rh7rhz3 dzsxIbSv X-Google-Smtp-Source: AGHT+IG6Y9Bj4+sss4AkgBmGlysFV+cBK96Pvs8KQSHpwbaMp6szpurOB66PeQKEY7FIq53RepZsSQ== X-Received: by 2002:a05:6a00:35ce:b0:729:1c0f:b94a with SMTP id d2e1a72fcca58-72d220077aamr40543484b3a.23.1736907477661; Tue, 14 Jan 2025 18:17:57 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72d40658dd7sm8357578b3a.102.2025.01.14.18.17.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:17:57 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 1/7] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Date: Tue, 14 Jan 2025 18:17:40 -0800 Message-Id: <20250115021746.34691-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko Signed-off-by: Alexei Starovoitov Acked-by: Vlastimil Babka --- include/linux/gfp.h | 22 ++++++++++ mm/internal.h | 1 + mm/page_alloc.c | 98 +++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 118 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b0fe9f62d15b..b41bb6e01781 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -39,6 +39,25 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) return !!(gfp_flags & __GFP_DIRECT_RECLAIM); } +static inline bool gfpflags_allow_spinning(const gfp_t gfp_flags) +{ + /* + * !__GFP_DIRECT_RECLAIM -> direct claim is not allowed. + * !__GFP_KSWAPD_RECLAIM -> it's not safe to wake up kswapd. + * All GFP_* flags including GFP_NOWAIT use one or both flags. + * try_alloc_pages() is the only API that doesn't specify either flag. + * + * This is stronger than GFP_NOWAIT or GFP_ATOMIC because + * those are guaranteed to never block on a sleeping lock. + * Here we are enforcing that the allaaction doesn't ever spin + * on any locks (i.e. only trylocks). There is no highlevel + * GFP_$FOO flag for this use in try_alloc_pages() as the + * regular page allocator doesn't fully support this + * allocation mode. + */ + return !(gfp_flags & __GFP_RECLAIM); +} + #ifdef CONFIG_HIGHMEM #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM #else @@ -347,6 +366,9 @@ static inline struct page *alloc_page_vma_noprof(gfp_t gfp, } #define alloc_page_vma(...) alloc_hooks(alloc_page_vma_noprof(__VA_ARGS__)) +struct page *try_alloc_pages_noprof(int nid, unsigned int order); +#define try_alloc_pages(...) alloc_hooks(try_alloc_pages_noprof(__VA_ARGS__)) + extern unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order); #define __get_free_pages(...) alloc_hooks(get_free_pages_noprof(__VA_ARGS__)) diff --git a/mm/internal.h b/mm/internal.h index cb8d8e8e3ffa..5454fa610aac 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1174,6 +1174,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_NOFRAGMENT 0x0 #endif #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ +#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ /* Flags that allow allocations below the min watermark. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1cb4b8c8886d..74c2a7af1a77 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2304,7 +2304,11 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return 0; + spin_lock_irqsave(&zone->lock, flags); + } for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, alloc_flags); @@ -2904,7 +2908,11 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return NULL; + spin_lock_irqsave(&zone->lock, flags); + } if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { @@ -4509,7 +4517,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, might_alloc(gfp_mask); - if (should_fail_alloc_page(gfp_mask, order)) + if (!(*alloc_flags & ALLOC_TRYLOCK) && + should_fail_alloc_page(gfp_mask, order)) return false; *alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags); @@ -7023,3 +7032,86 @@ static bool __free_unaccepted(struct page *page) } #endif /* CONFIG_UNACCEPTED_MEMORY */ + +/** + * try_alloc_pages_noprof - opportunistic reentrant allocation from any context + * @nid - node to allocate from + * @order - allocation order size + * + * Allocates pages of a given order from the given node. This is safe to + * call from any context (from atomic, NMI, and also reentrant + * allocator -> tracepoint -> try_alloc_pages_noprof). + * Allocation is best effort and to be expected to fail easily so nobody should + * rely on the success. Failures are not reported via warn_alloc(). + * + * Return: allocated page or NULL on failure. + */ +struct page *try_alloc_pages_noprof(int nid, unsigned int order) +{ + /* + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed. + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd + * is not safe in arbitrary context. + * + * These two are the conditions for gfpflags_allow_spinning() being true. + * + * Specify __GFP_NOWARN since failing try_alloc_pages() is not a reason + * to warn. Also warn would trigger printk() which is unsafe from + * various contexts. We cannot use printk_deferred_enter() to mitigate, + * since the running context is unknown. + * + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below + * is safe in any context. Also zeroing the page is mandatory for + * BPF use cases. + * + * Though __GFP_NOMEMALLOC is not checked in the code path below, + * specify it here to highlight that try_alloc_pages() + * doesn't want to deplete reserves. + */ + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + unsigned int alloc_flags = ALLOC_TRYLOCK; + struct alloc_context ac = { }; + struct page *page; + + /* + * In RT spin_trylock() may call raw_spin_lock() which is unsafe in NMI. + * If spin_trylock() is called from hard IRQ the current task may be + * waiting for one rt_spin_lock, but rt_spin_trylock() will mark the + * task as the owner of another rt_spin_lock which will confuse PI + * logic, so return immediately if called form hard IRQ or NMI. + * + * Note, irqs_disabled() case is ok. This function can be called + * from raw_spin_lock_irqsave region. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) + return NULL; + if (!pcp_allowed_order(order)) + return NULL; + +#ifdef CONFIG_UNACCEPTED_MEMORY + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ + if (has_unaccepted_memory()) + return NULL; +#endif + /* Bailout, since _deferred_grow_zone() needs to take a lock */ + if (deferred_pages_enabled()) + return NULL; + + if (nid == NUMA_NO_NODE) + nid = numa_node_id(); + + prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac, + &alloc_gfp, &alloc_flags); + + /* + * Best effort allocation from percpu free list. + * If it's empty attempt to spin_trylock zone->lock. + */ + page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); + + /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + + trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); + kmsan_alloc_page(page, order, alloc_gfp); + return page; +} From patchwork Wed Jan 15 02:17:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939751 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28135229B21 for ; Wed, 15 Jan 2025 02:18:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907485; cv=none; b=CaaPi9ee542yLKww7NEfDqY+4R3SpSJHGnSvsLdT+OOwQMk232SLISMUaIb9WZHMZRBHobu4yKiRpA/x6aLVos0Mc4YORkV2JEoGhU/mZbH5whDRxPzcMIBO9+rBAfrfnhfk6nzzgXPZj0Ch1B1nSOU9KlDfYSBJOFDhJ4f1Alc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907485; c=relaxed/simple; bh=ejEuhpoEGvyNxUp4USVtn/ell+1cuwTpjPk8SMpbUaQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XnOYiqD5VUr3IyzpcXvBc0hjhWT/+NrhvC8Jl6DU+VhWGGxhy6TdnVpLXe3zHPoUmhbxQTUL375G89C30T2TeEufHBTYz0L4ZlfMi1/xZIN/nKLRL74lOd5ccTdRWpeBYgnxhkRSaO/sTlAWgC1zM7YGcpXqw7LfL/8R2+Dtcbo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=C+RmwAUE; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="C+RmwAUE" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21a7ed0155cso106947485ad.3 for ; Tue, 14 Jan 2025 18:18:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907483; x=1737512283; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S/4MiM3RUKHfwlZ4XiouzlgmWU/NeE4CfHyc1IF4D50=; b=C+RmwAUEaJvANdVB8aRvdM+LkR5Z6iN8tzwRHa7gv4qRavU2InkvI1AkapyOLssNxQ dj8o0LSAuqZlLk0fYAcDi6ZeLGbfVNE4paxUAVZY1rR4Dl0baoCZOUJ5l7QeVLILYBkr fJ4iXBRMdJmFgEe8tyHJxyITg9ZXajW+aA8qEAnj9BTcpAxiwWP/6z5J7Mfr67CQ5rOg HYrxoqcmkSxA15oIzUVn2E5j67M5xwI94yZBHlDuDMfI4qVP/bR8RSoqW1JpSbv5/846 4gR1QSnJlnLYV5V0mHUs2pqxZsDNNkfy4pEJigjMQMEB/IfM8lWEci2omxGzyxNm0tVK 2kWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907483; x=1737512283; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S/4MiM3RUKHfwlZ4XiouzlgmWU/NeE4CfHyc1IF4D50=; b=gBZz7iBnw2oLWLvAepiBNS27z2xDdVlE/++RPlqwyUmOH/PkvyBbMxnLAfG+mqCP5Q LajlYO7vGuwBF9jbN9eRjyLi17JlOfyryIYr/XP6VdejwDZn2chHeKD0LysZYCV42hMO 9fQZaqKm7vMWr3pDo4DpHwuVWyNG3zLRtcnFGCHcL7owupVYPdldy+KaNdFxsmf5ONhM 8H38vntne2+yi8G2L+zBxdw8vFzfFUpv3dzrZdkB+EZFtheCgIbYUCfeO7TgaJV24NX7 V8H7Gv00If5hAyfaeHP3Gh709cXnIVYvDuM7GlezjrjWxayIyVSjob5vo/W/TrJ8rWxi pSUw== X-Gm-Message-State: AOJu0YwWfNoYgUDrd208zZ9YHcTuYa5XAbdT/Ifd8H5jqr5fqJQgWjgW vVZ1OIxu9b1gNAkitymXnK2xEznldQZV4O0Tteu3oA1ustJfwFt3ErtAOg== X-Gm-Gg: ASbGncsGBdWTSQT4C/5m1jOgC09/eM4bHeA9VcrdjgkFiwuQXbTcUsA8iVLB2NmLY0N tcJbCrNSj5pjq5i9byjnamEjn5EePJRqe9Z46pEAJphrnj8sT/pzkHtMW6ulcYX7jBW8l/2uHWA WvmLfBcX/36xOGNB4serGYuljxWONP/J1HKgr6jzzZB9q9+r2XtHCAhF+Gt32CVyW3mvCXgvKHC kLEkUolvJ8LYo8s20fwYCq3OiQR9fwNQ/Tzmn5XqFSnddq7pWXOPVuSOJytJk5eifB0afREHQJK tbupeYzh X-Google-Smtp-Source: AGHT+IE2HSgtTwGa6fxDzYYIuEsJKl/KfTDggfWr6X8IZDgbB2AylvSlI27CX51K5j5TCNfNyEO1Hw== X-Received: by 2002:a17:902:fb8e:b0:215:6426:30a5 with SMTP id d9443c01a7336-21a83fc150dmr338535215ad.40.1736907482518; Tue, 14 Jan 2025 18:18:02 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25a583sm74065695ad.244.2025.01.14.18.18.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:02 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 2/7] mm, bpf: Introduce free_pages_nolock() Date: Tue, 14 Jan 2025 18:17:41 -0800 Message-Id: <20250115021746.34691-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Signed-off-by: Alexei Starovoitov Acked-by: Vlastimil Babka --- include/linux/gfp.h | 1 + include/linux/mm_types.h | 4 ++ include/linux/mmzone.h | 3 ++ mm/page_alloc.c | 79 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 79 insertions(+), 8 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b41bb6e01781..6eba2d80feb8 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -391,6 +391,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas __get_free_pages((gfp_mask) | GFP_DMA, (order)) extern void __free_pages(struct page *page, unsigned int order); +extern void free_pages_nolock(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7361a8f3ab68..52547b3e5fd8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -99,6 +99,10 @@ struct page { /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; + struct { + struct llist_node pcp_llist; + unsigned int order; + }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16..1a854e0a9e3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -953,6 +953,9 @@ struct zone { /* Primarily protects free_area */ spinlock_t lock; + /* Pages to be freed when next trylock succeeds */ + struct llist_head trylock_free_pages; + /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 74c2a7af1a77..a9c639e3db91 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; */ #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) +/* Free the page without taking locks. Rely on trylock only. */ +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1247,13 +1250,44 @@ static void split_large_buddy(struct zone *zone, struct page *page, } } +static void add_page_to_zone_llist(struct zone *zone, struct page *page, + unsigned int order) +{ + /* Remember the order */ + page->order = order; + /* Add the page to the free list */ + llist_add(&page->pcp_llist, &zone->trylock_free_pages); +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { + struct llist_head *llhead; unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + add_page_to_zone_llist(zone, page, order); + return; + } + spin_lock_irqsave(&zone->lock, flags); + } + + /* The lock succeeded. Process deferred pages. */ + llhead = &zone->trylock_free_pages; + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) { + struct llist_node *llnode; + struct page *p, *tmp; + + llnode = llist_del_all(llhead); + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { + unsigned int p_order = p->order; + + split_large_buddy(zone, p, page_to_pfn(p), p_order, fpi_flags); + __count_vm_events(PGFREE, 1 << p_order); + } + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2596,7 +2630,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, fpi_t fpi_flags) { int high, batch; int pindex; @@ -2631,6 +2665,14 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) pcp->free_count += (1 << order); + + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + /* + * Do not attempt to take a zone lock. Let pcp->count get + * over high mark temporarily. + */ + return; + } high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2645,7 +2687,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, /* * Free a pcp page */ -void free_unref_page(struct page *page, unsigned int order) +static void __free_unref_page(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2654,7 +2697,7 @@ void free_unref_page(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, fpi_flags); return; } @@ -2671,24 +2714,33 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, fpi_flags); return; } migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) { + add_page_to_zone_llist(zone, page, order); + return; + } pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, migratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order, fpi_flags); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); } pcp_trylock_finish(UP_flags); } +void free_unref_page(struct page *page, unsigned int order) +{ + __free_unref_page(page, order, FPI_NONE); +} + /* * Free a batch of folios */ @@ -2777,7 +2829,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_unref_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, FPI_NONE); } if (pcp) { @@ -4853,6 +4905,17 @@ void __free_pages(struct page *page, unsigned int order) } EXPORT_SYMBOL(__free_pages); +/* + * Can be called while holding raw_spin_lock or from IRQ and NMI, + * but only for pages that came from try_alloc_pages(): + * order <= 3, !folio, etc + */ +void free_pages_nolock(struct page *page, unsigned int order) +{ + if (put_page_testzero(page)) + __free_unref_page(page, order, FPI_TRYLOCK); +} + void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) { From patchwork Wed Jan 15 02:17:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939752 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59E272D638 for ; Wed, 15 Jan 2025 02:18:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907490; cv=none; b=SbzC5TFoje/yV/jJ5GZ5yQM1BrOGy9G2iJY2KSPgtDTAHfq0cjrgajAqWn+TYwxntXPwDIfKL3n2eUXvhYXGubOz0IcgwJAtrUhpCU5n677Pbw1AyoCvXoOt7E68byi+r+qdq70kBshQKa542qbG2eXHXyM5nFhFSeYE2Xr+ros= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907490; c=relaxed/simple; bh=FxvXZIK6Q1cSySyzTPe24jvgtxLpl4PK0Mjw9ENtIKI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Hp9z2nrm0jyiulhEC/p4d7DoY6a/1xA4XYpOkO8u0fcfgmEJVkBJPOT2ejb/PzARqYZhullH5G0Xl2AWL4VKpqxU0n89OGNT6YZK/ntsUC9grvpVIAr159odMV/5ifhw/v7ZdZWY/DL62QFyCs8dDQzqooRzzgjTyRhkLPk33YE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ylzc3FNt; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ylzc3FNt" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2f441904a42so10670918a91.1 for ; Tue, 14 Jan 2025 18:18:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907487; x=1737512287; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w+Lbt2oOvlSTPMdIT0UeNEpvkZe5nRvEnawf5u/tH+w=; b=Ylzc3FNti01w/AWFXwIQ3XmyHo+KXXoQQ6opRGWUbE19v/H+ccgi0yywWUj3fZAAJR NRf3Id4AVR1shD8teYzr4B7WloZNt+LP5V2vi8/TpKStlfEyVBwlr81hiLY6he80w4Oo zvgq6OKuPi5hevLj+Hr9Ytw+XDMGldITBo8r8Jr+0f6MtHEkkphv/VMCFd3WqsZsZA0/ 7RXHrma4WeV1y47CUASdjJXAA0WDIID+jtP+l1xxJmq+CYeKBaVUqf7mzgWaw6vueqzO TsHAScW/enjwdGQAeoffXo5BALga31wqRnABZJOVKDHAmbHS31515saUJGpqQBqxK8hV 3UVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907487; x=1737512287; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w+Lbt2oOvlSTPMdIT0UeNEpvkZe5nRvEnawf5u/tH+w=; b=aOO91UL0yokphbhbsBhgdAyIA4Lj4yhhac8UKBSK11Wv0SzI0kfqTYnS33sOfKf7pV iObUeqrf7h+47S5GXTRCsE5tglgWViB6INDHQaVplS4NrNUjxgj9ebHf1KsJwC10K3bg aEJAdqPzLBx1Vu9PqJtU9DBEr4JdCihfRqwkXwiU46Y32brUOZIvF7cT9Qi6x2HqB6Ri xxoxhTmz/JhvIPOKzOW5hws9Np2GGSfz8yuBSaFhDnDvIuNcTlFwaCMVPKDTiP+VIg9/ 4WvqzeUKpWs7VEgAJnleDn3noxcTkYSYzl5g5q2uxXDGSceT8rqX8OgIet+amXCor8QH JbkQ== X-Gm-Message-State: AOJu0YztE9a+Wg3hbQQKNXw8OnJ9eimh3wnOMx+exPLqLKoSlh3cQza3 JVpqtXtc+g+FTTcnSqODhEPMyY3Lt0Dh70RxE/GlUIGlTc6yRp9O/5b7ug== X-Gm-Gg: ASbGncsaKVOlZNo4r2xobOMxv/frOBQNRL8SBSTtvwyvNumjd3crHhnbH6yAyhyYopp 0Rl9PiBd28yZXV+aPwWahkNYBWg0oFpqi1RkSooLgSFnmfmD2Cc8B7iDx49yc5HUHGCKpBE5f7r kmPiair4VGAl1AwWEijMgBYYlKuMH3Q4xEdTORaiNedMbeQSGvHBxiUuqutIAqlPvxPZaMtrBie QoTQf97o2i8fM5MAh7jB6F4MUgpwwQ24GxRxs45jHZVtWUlPeEjvi1NO6CbXwYzUN0ZgeHqVjVK l0zbW/dy X-Google-Smtp-Source: AGHT+IG/cFjjNteRnggOPcJd7ml3IweEri0dZgwymeHUFWzv/5oMKi4cBo8NV/fu9YuNibkf/qV0PQ== X-Received: by 2002:a05:6a00:1942:b0:71e:6b8:2f4a with SMTP id d2e1a72fcca58-72d21f7bcccmr39594907b3a.12.1736907487230; Tue, 14 Jan 2025 18:18:07 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72d406a4dddsm8016029b3a.155.2025.01.14.18.18.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:06 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 3/7] locking/local_lock: Introduce local_trylock_irqsave() Date: Tue, 14 Jan 2025 18:17:42 -0800 Message-Id: <20250115021746.34691-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov Similar to local_lock_irqsave() introduce local_trylock_irqsave(). This is inspired by 'struct local_tryirq_lock' in: https://lore.kernel.org/all/20241112-slub-percpu-caches-v1-5-ddc0bdc27e05@suse.cz/ Use spin_trylock in PREEMPT_RT when not in hard IRQ and not in NMI and fail instantly otherwise, since spin_trylock is not safe from IRQ due to PI issues. In !PREEMPT_RT use simple active flag to prevent IRQs or NMIs reentering locked region. Note there is no need to use local_inc for active flag. If IRQ handler grabs the same local_lock after READ_ONCE(lock->active) already completed it has to unlock it before returning. Similar with NMI handler. So there is a strict nesting of scopes. It's a per cpu lock. Multiple cpus do not access it in parallel. Signed-off-by: Alexei Starovoitov --- include/linux/local_lock.h | 9 ++++ include/linux/local_lock_internal.h | 76 ++++++++++++++++++++++++++--- 2 files changed, 78 insertions(+), 7 deletions(-) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index 091dc0b6bdfb..84ee560c4f51 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -30,6 +30,15 @@ #define local_lock_irqsave(lock, flags) \ __local_lock_irqsave(lock, flags) +/** + * local_trylock_irqsave - Try to acquire a per CPU local lock, save and disable + * interrupts. Always fails in RT when in_hardirq or NMI. + * @lock: The lock variable + * @flags: Storage for interrupt flags + */ +#define local_trylock_irqsave(lock, flags) \ + __local_trylock_irqsave(lock, flags) + /** * local_unlock - Release a per CPU local lock * @lock: The lock variable diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 8dd71fbbb6d2..93672127c73d 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -9,6 +9,7 @@ #ifndef CONFIG_PREEMPT_RT typedef struct { + int active; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; struct task_struct *owner; @@ -22,7 +23,7 @@ typedef struct { .wait_type_inner = LD_WAIT_CONFIG, \ .lock_type = LD_LOCK_PERCPU, \ }, \ - .owner = NULL, + .owner = NULL, .active = 0 static inline void local_lock_acquire(local_lock_t *l) { @@ -31,6 +32,13 @@ static inline void local_lock_acquire(local_lock_t *l) l->owner = current; } +static inline void local_trylock_acquire(local_lock_t *l) +{ + lock_map_acquire_try(&l->dep_map); + DEBUG_LOCKS_WARN_ON(l->owner); + l->owner = current; +} + static inline void local_lock_release(local_lock_t *l) { DEBUG_LOCKS_WARN_ON(l->owner != current); @@ -45,6 +53,7 @@ static inline void local_lock_debug_init(local_lock_t *l) #else /* CONFIG_DEBUG_LOCK_ALLOC */ # define LOCAL_LOCK_DEBUG_INIT(lockname) static inline void local_lock_acquire(local_lock_t *l) { } +static inline void local_trylock_acquire(local_lock_t *l) { } static inline void local_lock_release(local_lock_t *l) { } static inline void local_lock_debug_init(local_lock_t *l) { } #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ @@ -60,6 +69,7 @@ do { \ 0, LD_WAIT_CONFIG, LD_WAIT_INV, \ LD_LOCK_PERCPU); \ local_lock_debug_init(lock); \ + (lock)->active = 0; \ } while (0) #define __spinlock_nested_bh_init(lock) \ @@ -75,37 +85,73 @@ do { \ #define __local_lock(lock) \ do { \ + local_lock_t *l; \ preempt_disable(); \ - local_lock_acquire(this_cpu_ptr(lock)); \ + l = this_cpu_ptr(lock); \ + lockdep_assert(l->active == 0); \ + WRITE_ONCE(l->active, 1); \ + local_lock_acquire(l); \ } while (0) #define __local_lock_irq(lock) \ do { \ + local_lock_t *l; \ local_irq_disable(); \ - local_lock_acquire(this_cpu_ptr(lock)); \ + l = this_cpu_ptr(lock); \ + lockdep_assert(l->active == 0); \ + WRITE_ONCE(l->active, 1); \ + local_lock_acquire(l); \ } while (0) #define __local_lock_irqsave(lock, flags) \ do { \ + local_lock_t *l; \ local_irq_save(flags); \ - local_lock_acquire(this_cpu_ptr(lock)); \ + l = this_cpu_ptr(lock); \ + lockdep_assert(l->active == 0); \ + WRITE_ONCE(l->active, 1); \ + local_lock_acquire(l); \ } while (0) +#define __local_trylock_irqsave(lock, flags) \ + ({ \ + local_lock_t *l; \ + local_irq_save(flags); \ + l = this_cpu_ptr(lock); \ + if (READ_ONCE(l->active) == 1) { \ + local_irq_restore(flags); \ + l = NULL; \ + } else { \ + WRITE_ONCE(l->active, 1); \ + local_trylock_acquire(l); \ + } \ + !!l; \ + }) + #define __local_unlock(lock) \ do { \ - local_lock_release(this_cpu_ptr(lock)); \ + local_lock_t *l = this_cpu_ptr(lock); \ + lockdep_assert(l->active == 1); \ + WRITE_ONCE(l->active, 0); \ + local_lock_release(l); \ preempt_enable(); \ } while (0) #define __local_unlock_irq(lock) \ do { \ - local_lock_release(this_cpu_ptr(lock)); \ + local_lock_t *l = this_cpu_ptr(lock); \ + lockdep_assert(l->active == 1); \ + WRITE_ONCE(l->active, 0); \ + local_lock_release(l); \ local_irq_enable(); \ } while (0) #define __local_unlock_irqrestore(lock, flags) \ do { \ - local_lock_release(this_cpu_ptr(lock)); \ + local_lock_t *l = this_cpu_ptr(lock); \ + lockdep_assert(l->active == 1); \ + WRITE_ONCE(l->active, 0); \ + local_lock_release(l); \ local_irq_restore(flags); \ } while (0) @@ -148,6 +194,22 @@ typedef spinlock_t local_lock_t; __local_lock(lock); \ } while (0) +#define __local_trylock_irqsave(lock, flags) \ + ({ \ + __label__ out; \ + int ret = 0; \ + typecheck(unsigned long, flags); \ + flags = 0; \ + if (in_nmi() || in_hardirq()) \ + goto out; \ + migrate_disable(); \ + ret = spin_trylock(this_cpu_ptr((lock))); \ + if (!ret) \ + migrate_enable(); \ + out: \ + ret; \ + }) + #define __local_unlock(__lock) \ do { \ spin_unlock(this_cpu_ptr((__lock))); \ From patchwork Wed Jan 15 02:17:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939753 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E645C2D638 for ; Wed, 15 Jan 2025 02:18:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907494; cv=none; b=TsXxqgBnhrQu60LrzOmHFiA4wz9u5r0syLBkR9R9bpW6d7ZMp3+QRHVvXMi8Uk5DVlmleMANhZLW9+4phaeOCefmbNnqw124Gvd+fTr/GOJfpCuxndA/oPh0kl7kP7dhZrpoR/AONJpnV6SKhVgw8XYLbRyD7SuXM7Xv40CapXY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907494; c=relaxed/simple; bh=Uwr2xE6B7JMwmcTthZddqGjvDCqEOvs584NXfxZNX8w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tAEmXgtSmYxHS4PraMuG1ZtERi9GyBwQRKPORGEvTY+aV28RP4HMeYSOgCxBTczeBiGVNePvbkzzqax8OZyMdhwhO9HNeQw+3u4FJOey1YyCaNh4G1t9rp1dbW7PesWD106TBCA7YaDcgrk7VvMvqTsP9pRNjKqiFNlVasI+KY8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=k7KGtxI8; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="k7KGtxI8" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21619108a6bso108436105ad.3 for ; Tue, 14 Jan 2025 18:18:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907492; x=1737512292; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mxa1zavVGcrL3mhmNj3zaW3g0F2sJ8bBr01E6sisj+Y=; b=k7KGtxI83/Zr3eZU5P1DcDunTMaDYyOe14xX3PNxY/7YsCAb60MLKJnWcDAZe9Xjsu ssmjubbJxnSCkj4ERYfhmNJbf5Hc5T95iAVpaf8J//Xqkxy1MQA2cd6xuL2eVhg0XzZ3 JZ6MxWrz02l/738dwJzkYDl4p6g8wh2vnio9KaY16bx4cROVWMWGKOaqS0fnYTCkXYEb B3yGpkM61b0oVWzsbh+qHwoqjkPtWldDnZfRXh/9WKuyoMcSlOsoLgjBz/iPQXmGXq9V e0fc+O1/NAL34Pat6h+maQo7roEnEOUIGyJxJP88iF7i09BMUl0ueo26wPn0i+IQFxIQ a3bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907492; x=1737512292; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mxa1zavVGcrL3mhmNj3zaW3g0F2sJ8bBr01E6sisj+Y=; b=Oy/iB2x3yNUGWdEIWI3XyZgREyiXu6MzbWEsSaEHBQDwv2gjVMLId8rmWUCchIpwJZ eaDibxdLG06qp0fN1vkGo3VcOfp80E7oqd4iAp4Akq8ZtMozS4qFrXq9zP69aG0OETqF 9fiSsVr6oNcl0EzbfuV7tbSrCyMKzbHZ+58z3ZZJYoGniG52YM/HNlV6JhsQzzIdLGld HW/2wXpvXydaD7RgOCqXqaPxjunEi6Qc0kzinRv3jzEiIJlZrznB/CGbNBPRXUTkmGVv dyK33Xg18cerMoLRBZdiU6haclfBsS26pklGtM7dYlnNTa58Jd4C9WjQJIhhr7lDkzKP 8tfQ== X-Gm-Message-State: AOJu0YwPFVzW21QMjwAU/oZVIGWNCt1h/siNOrmDupL1eTDdqPGbdK0B IMGeExCB6eXey0dTsu4fhdPiTYJ5RijeQXpwYWB+tbgB8vJNNIKxd8/Vog== X-Gm-Gg: ASbGnctcWH2Mc3r5hlZf23Bgdr3j9Fj+rONOaqmcnC5lypj6riXRPKLManLpkHanU8i Fhl4Tt3zLz1/pLIhyD99nCkSqVkY+IpxUea/wZhmr0pTU//LzAp1Gb685DcczRTBQDdL8oMfDQG qfDD0y1kMfI09U06x9M2Fh9Q/bBLyN4xZID7aJ9rbeDF1g/3QXSrxTS2rjAi/NiC8yJRSV9Km+i H0nVNZuvRYDCdxl2cI2OJCzTxaps3m42t1IPjiTsjEBXU17yqHWHVzq+K4D9SrHP8sNwrD9qpb/ KVQwqsA0 X-Google-Smtp-Source: AGHT+IES28E6/UzBuAXcKHyytweWh1DbAVGKG/FlHuIJMf6aRzxb6IkpsZav+M1wSoIo/J2A0mfOZg== X-Received: by 2002:a17:903:41c5:b0:215:19ae:77bf with SMTP id d9443c01a7336-21a83f4ea67mr419970035ad.19.1736907491906; Tue, 14 Jan 2025 18:18:11 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f12f919sm73361445ad.69.2025.01.14.18.18.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:11 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 4/7] memcg: Use trylock to access memcg stock_lock. Date: Tue, 14 Jan 2025 18:17:43 -0800 Message-Id: <20250115021746.34691-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov Teach memcg to operate under trylock conditions when spinning locks cannot be used. local_trylock might fail and this would lead to charge cache bypass if the calling context doesn't allow spinning (gfpflags_allow_spinning). In those cases charge the memcg counter directly and fail early if that is not possible. This might cause a pre-mature charge failing but it will allow an opportunistic charging that is safe from try_alloc_pages path. Acked-by: Michal Hocko Signed-off-by: Alexei Starovoitov --- mm/memcontrol.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..e4c7049465e0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1756,7 +1756,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, * * returns true if successful, false otherwise. */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, + gfp_t gfp_mask) { struct memcg_stock_pcp *stock; unsigned int stock_pages; @@ -1766,7 +1767,11 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) if (nr_pages > MEMCG_CHARGE_BATCH) return ret; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + if (!gfpflags_allow_spinning(gfp_mask)) + return ret; + local_lock_irqsave(&memcg_stock.stock_lock, flags); + } stock = this_cpu_ptr(&memcg_stock); stock_pages = READ_ONCE(stock->nr_pages); @@ -1851,7 +1856,14 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { unsigned long flags; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + /* + * In case of unlikely failure to lock percpu stock_lock + * uncharge memcg directly. + */ + mem_cgroup_cancel_charge(memcg, nr_pages); + return; + } __refill_stock(memcg, nr_pages); local_unlock_irqrestore(&memcg_stock.stock_lock, flags); } @@ -2196,9 +2208,13 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned long pflags; retry: - if (consume_stock(memcg, nr_pages)) + if (consume_stock(memcg, nr_pages, gfp_mask)) return 0; + if (!gfpflags_allow_spinning(gfp_mask)) + /* Avoid the refill and flush of the older stock */ + batch = nr_pages; + if (!do_memsw_account() || page_counter_try_charge(&memcg->memsw, batch, &counter)) { if (page_counter_try_charge(&memcg->memory, batch, &counter)) From patchwork Wed Jan 15 02:17:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939754 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E9C63594F for ; Wed, 15 Jan 2025 02:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907501; cv=none; b=aTVxkJw/gAFqYWuMG74362PgXSU5StOBsut2oBgptw6+OaTQrxULxDq30RrJlaPHmUXx5nB49ZY7zxZlfq32dqW/x8DZsuNBqv2taaxDRoFT4daFf6HdNOEcfNQC94hflRi35rOLgrfVaR1ea23zJcPVkHh+j/eYnWgK4z+VYVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907501; c=relaxed/simple; bh=GoND/gqV9lHkmshdkED2zIzF0DqyThdxlWZmtV52wlQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dNKoEZDq/DT7wT1mvwGy2X8VWyhuHIqeRmTToMa1fNpR5EzcViIDRdK4/3HFqoAM8bRn3UxVP3mEQwyVx7NQxfZkltWi2fm/GiaRFiHUeAncWgaJjmlPtdQakiI/7fekV0V/vxIQz3RRLcgthBySpCROP7BswFdLUyJsX2iNuio= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jHz1pro4; arc=none smtp.client-ip=209.85.216.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jHz1pro4" Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2ee76befe58so10241023a91.2 for ; Tue, 14 Jan 2025 18:18:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907496; x=1737512296; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9FG/FEC6x+yVU1JM4G4OAfpXBrpZC0py7nMQcVE3sTc=; b=jHz1pro4CIUloux6pJPLC4y4LXYpADyXlMGfUTeOfLO7jqEUILY0JGMFsWN6MwbRXj ssMB7SGuhp++EXsNTM1YU7KeHkIRfV7Tx5L9zJthSOGZqCM3SmOMieHxOhNomYVTlJq0 XbKReSSY04BF3EjWWcAF/iP50OOI8snVMClsdkmInjG6uEZRFJleijE6dbOzoPy4oz7+ 4/JvPguI16iyi2g/38SYcf6oSZlLZ5dY/oa5xssLiYeeOOSNlanO3YIiggvQsy2yY5TX 6Y4YX/9im1fb/jjk9sOKbUmXYs5BP6E9drmHlBFLDehXdarCQmhP+ClYdF/b0WkRyt7g VApw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907496; x=1737512296; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9FG/FEC6x+yVU1JM4G4OAfpXBrpZC0py7nMQcVE3sTc=; b=rtFHODkHBYCCTiLf4i+hZbNl5WJdMTnvWiEmAH0eoA/dgFb9ANjPixKs8K1xn9BooQ YCQIHVjbgfjHn3xDryNkcHpEHiH/05bKeQwso4K5orAwzxvjqeG+u7BJZQwWr2m6ZdL6 YLyn2ooVzqDcGbBHWvs+DkNBmNC3B1asRiP6vYjyt0xkSFuyyGzoIhWt9qC8YszRYstb el41uaArsJvIpip99xbXv8YvPw2sh3feLZEyRpL8i9hyjmNdSCObKs93lcZgN/sz9abx 6NN61wfQjGsWGoYahdw0cR1Kv9QOQuWE3yChZf+eGfJa+/SfMQsJJ4HQcFWjMGgn7hOk TeIg== X-Gm-Message-State: AOJu0Yw0AFc86DBJVQmVFgTnjr8mqHYytx2OoRcKcpY+X0rLZ6IbDpZk DhvmJz1aDZx7fXzVkDocyHLQsYgsNzq1QOCy6yJSyV4IZdOyognQYzAPBw== X-Gm-Gg: ASbGncvZuyE7WNn5x9XAmWFq+eT3Im59DjVZoi5bPGuDkP2lXd4vdNx8djIKc7ZFvli vbFKeGRrixVGLSB8D8GTnY/r/TaG+ep//Xc6MqckmDCPYgEoOaRIup11kICXzcSzT8RreJbO6Rs GB33rgqx/YnXKMoPu1iIzCKfb7bE/Qz/G8/Cbc4GO5gwh5pJx3KVKxmdN+tJTixCWqU2MUgtyQ7 MW+j5j2zPROBbpqrEN3AXVTh2dplE3myyDa820RYFDN4WfUqk8O7u8SVQMb5a91CtE/uxrU18pP c23zlgmJ X-Google-Smtp-Source: AGHT+IHNs63EfYlrVfU/fJaiMNGL7uVjWhVVtEL4nC7WmO69hxtGaNlrKdUTNVZudJVLwEfNIUiYFQ== X-Received: by 2002:a17:90b:2f45:b0:2ef:2f49:7d7f with SMTP id 98e67ed59e1d1-2f548ece7afmr43528398a91.18.1736907496611; Tue, 14 Jan 2025 18:18:16 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f72c157a09sm231999a91.8.2025.01.14.18.18.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:16 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 5/7] mm, bpf: Use memcg in try_alloc_pages(). Date: Tue, 14 Jan 2025 18:17:44 -0800 Message-Id: <20250115021746.34691-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov Unconditionally use __GFP_ACCOUNT in try_alloc_pages(). The caller is responsible to setup memcg correctly. All BPF memory accounting is memcg based. Signed-off-by: Alexei Starovoitov --- mm/page_alloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a9c639e3db91..c87fd6cc3909 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7131,7 +7131,8 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) * specify it here to highlight that try_alloc_pages() * doesn't want to deplete reserves. */ - gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC + | __GFP_ACCOUNT; unsigned int alloc_flags = ALLOC_TRYLOCK; struct alloc_context ac = { }; struct page *page; @@ -7174,6 +7175,11 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + if (memcg_kmem_online() && page && + unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) { + free_pages_nolock(page, order); + page = NULL; + } trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); return page; From patchwork Wed Jan 15 02:17:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939755 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 893D222DF8C for ; Wed, 15 Jan 2025 02:18:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907504; cv=none; b=eMs4plaxl8WuQZu5+XKakzyn4xY6hILMCn74/Ido+EPcj84B0ueU99+wxfx068N+C/aMwedNZ6ya3wMuTOcvNoHUe3/8dj+mTNSQEQlaKD45OeDkOuiwMRIeNCM3TxGmFUnpgrOirmESL3Rt/z5n7BdUHYuTkaemRIjudN+5b60= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907504; c=relaxed/simple; bh=8E6kbq+nXTYfGXRiZYASyUuGhjPSs+DxNVZms7tjApc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cBjNgkrWa7D5OPlZ4HJKmuM1Y5gZ/nXtX7aRWupq6PDqolPsjxW/Q2C66+sjUz4Nde5J5+0xHBsg++1Z9ygq9D5AvgPay3WxVdiuUPSW3O0hCf0QszmrgqApwOnDhsDrWi9/Von23aJwDKs275j16K+2L9SarQiv5SWKpKDjbwg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=P0X/YzYH; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P0X/YzYH" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-216426b0865so109020325ad.0 for ; Tue, 14 Jan 2025 18:18:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907501; x=1737512301; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q8dNygR7UFFvbzRNwqjbILq73kIw4oFHXTY0L0penA8=; b=P0X/YzYHYAhcc5ugiL5j6ockAGRRyxVaGcJbFx0yn8+FLosNvEWriaq6cdumqqHgcZ daAm7KM3weGjUWg2QAIA1L3BkrzKZZuyw/BaSPRlE7KQ6mRw8uGulq+NknqaBIKgRBl4 Em3u8MnWZq1O0PA/fLCYhy8WzsnrsBIekVPK1gXi8YQ1wBp+2l7ZU40IhF+yZ3+CWZ06 4oEdEyd3+l2bBvHIebgZnXlJSxfrGbAp8l+asgGlHnel9sOdeQ5pHM1+sjUXTPqdFu9F PL1g/802HdbtIFBk1hTo89jpxEOrpN2MT0YOU+V1K0vXxoM2Ww6NUF/p4BPhYS5YAog7 BHYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907501; x=1737512301; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q8dNygR7UFFvbzRNwqjbILq73kIw4oFHXTY0L0penA8=; b=cGnk6welKRQYVRLIW73yCe9YXMRfJUHLYe2FNX63YBpAv1JTb+RsO2HRL3U8n/xD5Q 4XwBjEAjWmrMJ+Gn1nW7Z9PNamMKoRc186LAgT9rlgb0qs+aE2M0uME6IDbryeeoxEq8 m7kTvcIkgrziFnr4jDj4CvP6i/fNkQNfKddCQDiYnll45NHUPFK23yHEHfI6F/dIBlff 8D4f1KpOJ56XixhBZCTRF2xHpN0JpGsYJ98cbjV0GMwEeLIsL1w/cpOM/4bvhNBZoF/h zDbOR0GaBv5YU/CbjEvXTKmxgndg9dbeJYzV2Kb7W/AxguMJVc0QD2jhsG8fduEp0tHS UTmQ== X-Gm-Message-State: AOJu0Ywq1m3uRFmD/SoXzHF+ogxf4lsau53zz39Cx/Ddmaouw5Sxe4La FOH9wcGenSlkEuDpf+pGVLpgaCfiXCOhgWxeN/G+Prf6DCcWXDXQxcxRiw== X-Gm-Gg: ASbGncu+WPCwPOYZayEjFmcYyCvc1QYdz66AnBuuRYJWfJmx7RUoEu6M+zJbPdCjid8 U+rWmHOgb+spmSWEvKd1GoY79IjmRUd8x4WYhEghHTHk6D/S38YCRok/pF/XJ8buxOJO8w+ElDN m+Zrpdv+MARw40waxhqP+hLBRYtkaQkphepFL5uIGlspZxLLi4SEhmk/Ahpr/cWHA+hTOJ700Bm hY8b2RsCKbkzEcZjeb0gHAck9aiJHymXwLRgURvyXRUrk7ZQJeT84GpYM/IgKcy/HgAqQOF53j8 AWCEa/Zz X-Google-Smtp-Source: AGHT+IFwyv6v50tn9W5bSyzXASzJW2tC+n3Yvr284DWZQwO0ZzTjXDMbHxo/pZeBv3MUI9+RJdh/7w== X-Received: by 2002:a17:902:cf0b:b0:215:431f:268a with SMTP id d9443c01a7336-21a83f647e1mr449439745ad.31.1736907501257; Tue, 14 Jan 2025 18:18:21 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f254040sm72719235ad.232.2025.01.14.18.18.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:20 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 6/7] mm: Make failslab, kfence, kmemleak aware of trylock mode Date: Tue, 14 Jan 2025 18:17:45 -0800 Message-Id: <20250115021746.34691-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov When gfpflags_allow_spinning() == false spin_locks cannot be taken. Make failslab, kfence, kmemleak compliant. Signed-off-by: Alexei Starovoitov --- mm/failslab.c | 3 +++ mm/kfence/core.c | 4 ++++ mm/kmemleak.c | 3 +++ 3 files changed, 10 insertions(+) diff --git a/mm/failslab.c b/mm/failslab.c index c3901b136498..86c7304ef25a 100644 --- a/mm/failslab.c +++ b/mm/failslab.c @@ -27,6 +27,9 @@ int should_failslab(struct kmem_cache *s, gfp_t gfpflags) if (gfpflags & __GFP_NOFAIL) return 0; + if (!gfpflags_allow_spinning(gfpflags)) + return 0; + if (failslab.ignore_gfp_reclaim && (gfpflags & __GFP_DIRECT_RECLAIM)) return 0; diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 67fc321db79b..e5f2d63f3220 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -1096,6 +1096,10 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) if (s->flags & SLAB_SKIP_KFENCE) return NULL; + /* Bailout, since kfence_guarded_alloc() needs to take a lock */ + if (!gfpflags_allow_spinning(flags)) + return NULL; + allocation_gate = atomic_inc_return(&kfence_allocation_gate); if (allocation_gate > 1) return NULL; diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 2a945c07ae99..64cb44948e9e 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -648,6 +648,9 @@ static struct kmemleak_object *__alloc_object(gfp_t gfp) { struct kmemleak_object *object; + if (!gfpflags_allow_spinning(gfp)) + return NULL; + object = mem_pool_alloc(gfp); if (!object) { pr_warn("Cannot allocate a kmemleak_object structure\n"); From patchwork Wed Jan 15 02:17:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939756 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 467CA22DFB2 for ; Wed, 15 Jan 2025 02:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907509; cv=none; b=bx6DNqaq1HlBAcvTe9AXvtCPfFdtM+JaTcwWdtn/NH87zmwL2okhnUTe3VxNI1RZw4UMiFF/2KnpQpNY0A5w2fie7GQBG86IuVP5MKijJCC2AseqXfOOibPnL6dYeOss2rP0Js44fpRqXYb20TuFQK3aDI3buNviXiwg0yP0JPc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736907509; c=relaxed/simple; bh=qpD82EOTiD0WuUW0ru5J3NdtZhJ8QpRSIPbqAuteWAQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HGgt03X1MirDaqGkhCDyc9j2XgBZsHLRuScgUAJ2aLCcWozhuTwcT9qpWYyd9okZpMG3RBddXgKJg8y6CdgFMxOy/XiC6xh9LeYMG09NP6nplmfsqqISj6ksy/3XDqSnyP28oEGwSi6OfeDMuKBVGrPATzZ7lKoHaT85ZR94yRc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TFzTrjLB; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TFzTrjLB" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2166651f752so139632555ad.3 for ; Tue, 14 Jan 2025 18:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907506; x=1737512306; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6YPnFtJFai+0xFpNHgKwg1vEFPjmUMGue3EQsxvwhos=; b=TFzTrjLBuQEQ5owZTLtDbMUNwoS28s1+pzALNeflUlVzOoGkaM4pMjf41ezpB2liR5 A2B6umhAMJXfeAYBB78kKzYGfP8nQ7Urr/bjFoApPktRijQVLT/8GOvxM7oppE0854aC RYNgAFtNtuVltROXfncqrx3q4IwVRgwsheDjP7pv0QC8wWyqwQuovzQI8FfVEC4pe9nm t9vEob5YrnGV9nc7OtAR+m2l2ehVR1+Q3po0+vJO3CLdw29Dw7TKcOvRh1t3/W8CxrzD apGKBXAcUv0eSjZj3GQKEJpRHUP+cBxn3YGhy0/6jo1LLzyly6fyzuwdDxl/JJXqDjbl Eb4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907506; x=1737512306; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6YPnFtJFai+0xFpNHgKwg1vEFPjmUMGue3EQsxvwhos=; b=VwQU6AGUUTzCZgRchiGACUt6XjCwVPEARZ2SzisVjiuWxkzhaCZi5lqG3qAykrYJTY rfw+Z4nrBMlQEKJa0i4dR6SDfIwbUSILmfVysLsGTUnJKBZ2B1fVJL7uUYzk+fpVa4Fd mib32Yz3KjalxGDvJUQzUm7TF2ZW4C4G5NqNOSn0swd/JU+UBf2SWA3/TaKNBQfyxoPf QlhjlwwOXzJ9F0KcQl3sFGKHwUl05VnTTPPQsrczrY6JmqN/kxisxFzcf9cvI+BYWqid RR+0/MEJYYHvd5K2Qm6e+Iq42w3x6ARXh77xgEpdHHpVNacJQ/Kx5jnYZ7Nq3lgSLOeQ yWLA== X-Gm-Message-State: AOJu0Yz2QBtgemM9u3nrmbL+3yvNLnziUVBkVJETL1uTQSv7J+fXpkc5 jRoQNnBjitIYS29TrO8QWpLVMaE4xku5+aVzoBu4jf4+Pd+0RtgMZjE2hQ== X-Gm-Gg: ASbGnctivlw0LHjW5WH/Fe4lJ15fvZgXcmSTF/CMGD27CyI+a2q2763O+AQhAuAUO0B NJPzGoXvJZyLM1iwyBorB0tFFYP9GDwF4kONFk2mqSvt7NI8PPBfhUiFpBMSucONN9rBtque+hv lmsLjyne2b0atrzS88yG3m+fs6S9CbYvOxqmyfJBDBHtHysFLgSl1OqsnT5SvKosPqfpWG7tbcf MVnGsBnDXYf450WwxMppp54AVAHGPGEe9mYD/vRKn7sbyYFltdLGmPBufpaFzFRPjNoHYgFFaXI 17zZgkpO X-Google-Smtp-Source: AGHT+IHp8XYH01n8SuhXDCfKih1NoDPtrNmfatMAx4uMkp9I9uvhY2pzgjGZtQiUuw/WiH6P6RobmA== X-Received: by 2002:a05:6a21:789a:b0:1e1:9bea:659e with SMTP id adf61e73a8af0-1e88d3612e6mr50844644637.35.1736907505893; Tue, 14 Jan 2025 18:18:25 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72d4056a5c3sm8349229b3a.62.2025.01.14.18.18.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:25 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 7/7] bpf: Use try_alloc_pages() to allocate pages for bpf needs. Date: Tue, 14 Jan 2025 18:17:46 -0800 Message-Id: <20250115021746.34691-8-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Alexei Starovoitov Use try_alloc_pages() and free_pages_nolock() Signed-off-by: Alexei Starovoitov --- kernel/bpf/syscall.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 0daf098e3207..8bcf48e31a5a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -582,14 +582,14 @@ int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, old_memcg = set_active_memcg(memcg); #endif for (i = 0; i < nr_pages; i++) { - pg = alloc_pages_node(nid, gfp | __GFP_ACCOUNT, 0); + pg = try_alloc_pages(nid, 0); if (pg) { pages[i] = pg; continue; } for (j = 0; j < i; j++) - __free_page(pages[j]); + free_pages_nolock(pages[j], 0); ret = -ENOMEM; break; }