From patchwork Fri Jan 24 03:56:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 030CEC0218C for ; Fri, 24 Jan 2025 03:57:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68526280030; Thu, 23 Jan 2025 22:57:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 60C6E28002E; Thu, 23 Jan 2025 22:57:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48860280030; Thu, 23 Jan 2025 22:57:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 24DC428002E for ; Thu, 23 Jan 2025 22:57:05 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D40071A0B64 for ; Fri, 24 Jan 2025 03:57:04 +0000 (UTC) X-FDA: 83040984768.23.71F2350 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf03.hostedemail.com (Postfix) with ESMTP id D9F4720007 for ; Fri, 24 Jan 2025 03:57:02 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iyNUgFNq; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691023; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G89tmHrRovh0M5Rk+1LopEs8NOXI2BpuMblkeiiZfDc=; b=BJPdyWNR1PFc7FpIn45UbByuYP8d0WCie/3HQbf361lL2UQ3ttyfEN3vUyz9Yt/VEg14RB F3I45oVvuQM1UTJP5nr67cDIrJRf6l0p4sR516G/XcbFH2sa+4Zsea1fFWiMG6ptVdudUT WJ/Ks5At3LwCtylTa2KiN6u3EsWO7yQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691023; a=rsa-sha256; cv=none; b=xAXz1UmCaANDrKCvItR2Cj2LjXuMGYw9IEnw5ym+tK15cfcSPUWJm1x+nPTpUpgjbwEqHo q76CV4YaYUGjXkNv3bDI4lp5+wIPSRHtEUGO7DiP6dEPjORceMsplSL0yfXxuAFrlYEz8A s3ZA7zmWm/DxnwPO2VFTBo5+tSR6xa0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iyNUgFNq; spf=pass (imf03.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-21619108a6bso27394715ad.3 for ; Thu, 23 Jan 2025 19:57:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691021; x=1738295821; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G89tmHrRovh0M5Rk+1LopEs8NOXI2BpuMblkeiiZfDc=; b=iyNUgFNqhLEk7a+YL8r+b+cCgMAaJOPs9pn6AeHwnD7I/HxoG83YSG+MixQ14K/WDS DWi5l93O7aaw8kq6YrF6M1rhMvsDpYpmh4ngByCF/9sbx9eiN+d13/KvWqdPDE5FU2dv +a3aQHFBs4+hoqS2riMvtiaKg72w29Z/NWiPIZcWHW8AwloLjiNGXBUMj2+e7lAUbFuC R7yOxaj/jPnYMsld1hseYR2DPELDYh2ulKErBXpW+ybwNEJW4GLSGc+PF8k+mzuN92nh ZuUfKZJW6Vk/Z/pb3YCf1vBnUQoUe5RwEa9IQSgxqKRamsOALmvcYBlxhHSqH8AJGxch AgoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691021; x=1738295821; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G89tmHrRovh0M5Rk+1LopEs8NOXI2BpuMblkeiiZfDc=; b=gq09XUdehNWJK4Bmc/n7F4oKY5JIh4i3uoRSB4NKnjgh6Z7CiGQv+PwIfxtcyDmVa4 KPpb575KZfEg4RkWJKojmf4+TxoOgJFJoZ0SvaUhJB6fsZEVFUwjeGx0bgVHSOPlfM6k bQYtOpce/655KZeMRXVDZxNKCzUVJur+G98/ccHbrnRSCJHRjB69Cy2UCsmcX94or+I9 cBv1YV6jsDf+1N9Cm56hNt9KAH3cODV2f7hq8nJ6IPOKR1VntHvBxwRdUmE26t+mRaki kv07PH6PsOOnMxbHI8U6x1uBLPr9D8FT337NUmKVIwg/WF+jmhWPgkLSxrkBoDPfFRi2 /Ojw== X-Forwarded-Encrypted: i=1; AJvYcCUtfavVijb8RCLaVJXaFjb76uHia8FTGSYRpv5xlg0OiHqbIJ5yfQnfYogbYneF/BoVk7CNzL4/cA==@kvack.org X-Gm-Message-State: AOJu0YwnUsoV1Gh5akamgGh/7OqiA2lyG1jT65DaG/wuj+N7g3WOewv1 HLvPr/my8oLoQho7TmFtC7Fsw4LBzMHSWhUnmGRONRI4qye3jtxD X-Gm-Gg: ASbGnctwMt9wmArL9BDUpk/LC6HQzBfqxG1Stbop+17+O+nNCdjSibH/GnuzqNi/ORc +xgkFWWuaAi+lP5M0JURssXwut3gNLFRdiKq3d2xOwhZdqmUwxhtbKTx4Z1YCxDAjxgUPCyMBMq t8W7ZpshUtwbdRSAX8WOWv+Mixuz3sJk0hY8soZ6x2um01KJwAm1a6Ku+JOMidozMvyxypQyUuY QgMNLIkWnfhkPuBoaZmLqt84GPEOhcOaIMUZenkRwyb6E6pOn4AO5beFuOYIN5v3D5HCirQJVC0 0Yq8BCcd7CIQ2qGUSh61bY1GV71qO7ySfKAoGxY= X-Google-Smtp-Source: AGHT+IEvNDPkJRiZfkwwi/aPaSWxbkYozYuQgk6SDROUjP76ZWWgmUKEf41JFPWQtip/8VwELQfVqQ== X-Received: by 2002:a17:902:ce8a:b0:216:6855:15f with SMTP id d9443c01a7336-21c3562198emr408647685ad.51.1737691021406; Thu, 23 Jan 2025 19:57:01 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da424c145sm6717335ad.240.2025.01.23.19.57.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:57:01 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Date: Thu, 23 Jan 2025 19:56:50 -0800 Message-Id: <20250124035655.78899-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250124035655.78899-1-alexei.starovoitov@gmail.com> References: <20250124035655.78899-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D9F4720007 X-Stat-Signature: 3xb5xo4mn9t9yetkoa5rprr4kmn81z7e X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1737691022-963172 X-HE-Meta: U2FsdGVkX198gfrPiJAXZSWDEFBJZnuiQxs3a1lgWI0E/aij5xtbcnEMHEFUpkblrGSTgHjxiVAIxV8ORw9SkG3TrinRbeAPIBr13+jz9fo5rrPhS1t5hjCYlzjqxlQ2AsVKlbd9dn3hviXnCglzKpxoMWYApA6kAIG/YySB6YrwIlJM9H0rxVrCOndE4hktkPSWpo5yWsqbbn0L/hcQhMVsjgGrhN8VXUO7z7UtENkRuHsNNjAGi981oAshQ3c6182Nzt3MHkiCO7OF/SQ0089o4Qey6fyP7zN/bmGNAY8wI7RlxDhq+5bjD5MT4piqztl2d6pm5ubVSuev5VmjlXyy9e47SDzcSINm6OYYloP7fhj70Qz2H7X+XuamaPi6eduFXf3xXkqaEXq1fXmLEBji9Ib4mLA66Mh/CzCRlaTd/X7ACP/8IFJOPaxapdWWCZpkeQnAS1XZ9Ujsg7454gpaUN2snhYR3pL+GHPNCLhVB4z18UH+zaC70k5eJTJFnwxSR1mg0W/PlJ/BwdnlVx0/dmHM97N30OVPWoPw8rPHBIhOnl6YaNU9/tIvmyl4Is5y/GX8kvHF6jz2v51r/KRneXXzXUhUlrW37cuOWng6KdrfvQzttzrHihCAU/WVzvfhmPP4xe0Y7w6UfVMhVzBKpwy5nTEpDV058qU9jb6GyVtlr1tS92yfi9lrgNjG0hKSgCrcMDa1VtScn0X2qXv6/Fd8jqTLf+y+eZJQEtyUpxEhVmk+PwosvidPXC2xo8jTxQ59flE4Fni82nFrbGYz6ve81AncF4U9pBDRMtVAPaAe25/Tn3Tvw0cHWRmRl20YSss2W8C/tmpcFvZAknEj7lmdNlbpeoaZXJlYAl7OJXXZnJT24Whz5drPjyLKWPk4W//KHaazlPFgaeXt6/fD06kX7nvJKkSdPK9+XUEgCHJovwVdlZhFbMkbifsiNd/iTQl+o1WFL9NzWal 7hJuc2LQ VGTsJMy+j6S0qcpJ8pp3v2hOuh9yYwFaNBOW0HCnqsCAel/M6RQJ+5+ebTHm3pfuKgXWVbt3384yiG97q9Gv9DsTaLRVhDEmBw1mzJY/fBRzLeFpiC2SgW08vr+Y+gv96ubN2EyAfr6xggrHgBbzs/CogiyeVce3NhNO+7OhWEtdCV8vUw2jSh9kOl8uBQkMwh/ZSxKRU3pSOoOCZy7oKQ4brCctx5hZ4xY5raFaIK4XSazpIIcjrj6IOqQhlWHuP4rYaHTlTaLsjGSxrVOJloZ2fhDi/mG9tNYeXr5mENXznM/MIYzER0uUfvbAy1jP9f2byWlVSmT+WiUetm5FEwnRuTP95h4x9sE1AzIic2VKe4oVhi+1FhLZCHg0S+RdHDxlyy7e9nA0y8NNhsTius5TjekHwDsy3Ks6foBscTLDzpoHCa0jQApyB4VpGGRcCb3aOO4y4KZG29q/smNB+WvYsEUsAOYPIbr2/BvF//STI6GvvSb82ot2DWUY8ZWTOUpG1IxZg3kpUUbKKoCckCI52YmJCG59gHJgn2T8nEtR/FIUUdGYvHw/IRDL1gl5iUTjl1ykZU+7qVTgVrHlDNf8oIpy0YSBB5vlTUEJetGo7Ldg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 22 ++++++++++ lib/stackdepot.c | 5 ++- mm/internal.h | 1 + mm/page_alloc.c | 104 ++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 127 insertions(+), 5 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b0fe9f62d15b..82bfb65b8d15 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -39,6 +39,25 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) return !!(gfp_flags & __GFP_DIRECT_RECLAIM); } +static inline bool gfpflags_allow_spinning(const gfp_t gfp_flags) +{ + /* + * !__GFP_DIRECT_RECLAIM -> direct claim is not allowed. + * !__GFP_KSWAPD_RECLAIM -> it's not safe to wake up kswapd. + * All GFP_* flags including GFP_NOWAIT use one or both flags. + * try_alloc_pages() is the only API that doesn't specify either flag. + * + * This is stronger than GFP_NOWAIT or GFP_ATOMIC because + * those are guaranteed to never block on a sleeping lock. + * Here we are enforcing that the allocation doesn't ever spin + * on any locks (i.e. only trylocks). There is no high level + * GFP_$FOO flag for this use in try_alloc_pages() as the + * regular page allocator doesn't fully support this + * allocation mode. + */ + return !(gfp_flags & __GFP_RECLAIM); +} + #ifdef CONFIG_HIGHMEM #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM #else @@ -347,6 +366,9 @@ static inline struct page *alloc_page_vma_noprof(gfp_t gfp, } #define alloc_page_vma(...) alloc_hooks(alloc_page_vma_noprof(__VA_ARGS__)) +struct page *try_alloc_pages_noprof(int nid, unsigned int order); +#define try_alloc_pages(...) alloc_hooks(try_alloc_pages_noprof(__VA_ARGS__)) + extern unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order); #define __get_free_pages(...) alloc_hooks(get_free_pages_noprof(__VA_ARGS__)) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 245d5b416699..377194969e61 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -591,7 +591,8 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, depot_stack_handle_t handle = 0; struct page *page = NULL; void *prealloc = NULL; - bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC; + bool allow_spin = gfpflags_allow_spinning(alloc_flags); + bool can_alloc = (depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC) && allow_spin; unsigned long flags; u32 hash; @@ -630,7 +631,7 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, prealloc = page_address(page); } - if (in_nmi()) { + if (in_nmi() || !allow_spin) { /* We can never allocate in NMI context. */ WARN_ON_ONCE(can_alloc); /* Best effort; bail if we fail to take the lock. */ diff --git a/mm/internal.h b/mm/internal.h index 9826f7dce607..6c3c664aa346 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1174,6 +1174,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_NOFRAGMENT 0x0 #endif #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ +#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ /* Flags that allow allocations below the min watermark. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 01eab25edf89..a82bc67abbdb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2306,7 +2306,11 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return 0; + spin_lock_irqsave(&zone->lock, flags); + } for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, alloc_flags); @@ -2906,7 +2910,11 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return NULL; + spin_lock_irqsave(&zone->lock, flags); + } if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { @@ -4511,7 +4519,12 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, might_alloc(gfp_mask); - if (should_fail_alloc_page(gfp_mask, order)) + /* + * Don't invoke should_fail logic, since it may call + * get_random_u32() and printk() which need to spin_lock. + */ + if (!(*alloc_flags & ALLOC_TRYLOCK) && + should_fail_alloc_page(gfp_mask, order)) return false; *alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags); @@ -7028,3 +7041,88 @@ static bool __free_unaccepted(struct page *page) } #endif /* CONFIG_UNACCEPTED_MEMORY */ + +/** + * try_alloc_pages_noprof - opportunistic reentrant allocation from any context + * @nid - node to allocate from + * @order - allocation order size + * + * Allocates pages of a given order from the given node. This is safe to + * call from any context (from atomic, NMI, and also reentrant + * allocator -> tracepoint -> try_alloc_pages_noprof). + * Allocation is best effort and to be expected to fail easily so nobody should + * rely on the success. Failures are not reported via warn_alloc(). + * See always fail conditions below. + * + * Return: allocated page or NULL on failure. + */ +struct page *try_alloc_pages_noprof(int nid, unsigned int order) +{ + /* + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed. + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd + * is not safe in arbitrary context. + * + * These two are the conditions for gfpflags_allow_spinning() being true. + * + * Specify __GFP_NOWARN since failing try_alloc_pages() is not a reason + * to warn. Also warn would trigger printk() which is unsafe from + * various contexts. We cannot use printk_deferred_enter() to mitigate, + * since the running context is unknown. + * + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below + * is safe in any context. Also zeroing the page is mandatory for + * BPF use cases. + * + * Though __GFP_NOMEMALLOC is not checked in the code path below, + * specify it here to highlight that try_alloc_pages() + * doesn't want to deplete reserves. + */ + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + unsigned int alloc_flags = ALLOC_TRYLOCK; + struct alloc_context ac = { }; + struct page *page; + + /* + * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is + * unsafe in NMI. If spin_trylock() is called from hard IRQ the current + * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will + * mark the task as the owner of another rt_spin_lock which will + * confuse PI logic, so return immediately if called form hard IRQ or + * NMI. + * + * Note, irqs_disabled() case is ok. This function can be called + * from raw_spin_lock_irqsave region. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) + return NULL; + if (!pcp_allowed_order(order)) + return NULL; + +#ifdef CONFIG_UNACCEPTED_MEMORY + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ + if (has_unaccepted_memory()) + return NULL; +#endif + /* Bailout, since _deferred_grow_zone() needs to take a lock */ + if (deferred_pages_enabled()) + return NULL; + + if (nid == NUMA_NO_NODE) + nid = numa_node_id(); + + prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac, + &alloc_gfp, &alloc_flags); + + /* + * Best effort allocation from percpu free list. + * If it's empty attempt to spin_trylock zone->lock. + */ + page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); + + /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + + trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); + kmsan_alloc_page(page, order, alloc_gfp); + return page; +} From patchwork Fri Jan 24 03:56:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948880 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DAEEC02181 for ; Fri, 24 Jan 2025 03:57:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A65F280031; Thu, 23 Jan 2025 22:57:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 830E328002E; Thu, 23 Jan 2025 22:57:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68233280031; Thu, 23 Jan 2025 22:57:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3D52C28002E for ; Thu, 23 Jan 2025 22:57:08 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DD04C140B91 for ; Fri, 24 Jan 2025 03:57:07 +0000 (UTC) X-FDA: 83040984894.12.1545674 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf10.hostedemail.com (Postfix) with ESMTP id F2B4BC0004 for ; Fri, 24 Jan 2025 03:57:05 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PfVseP3B; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691026; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fz2IAlBDxhiT58/U161LSQMwkAoY7PfGXhcEsriHk/0=; b=LG+Pcx+SlvU/Ti6RhAVvwCJRumJ2yEM8ZCQZKwEz9Atfli8Hm1kCbPenMgc/LACN+HUaFr d5rtmsByiWwQi8diYo1A9dwkFvTfjb3zjiqiN52R3d2haEFnmf9NSjy967XSZPGobS+dRP ZhFbUIbsUzqtnx2KvvatpMsT7MRl5Ro= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PfVseP3B; spf=pass (imf10.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691026; a=rsa-sha256; cv=none; b=1FtRPvC+bafFRuMZbQAK9DcTX5zEYhpX/QnvRYZqtzeiS7v21dG+kROyt2NHQIL9sZwuvY WAWDrool5F/fk/1nqwzGC0Ar0KQNYytQdb/TuPM5kzIONTBZmy5whKSz6fZiOg/SotwHG0 I0S7wghezpgdsTyHwhXOK1sr/CLl4Xo= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-21661be2c2dso28765335ad.1 for ; Thu, 23 Jan 2025 19:57:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691025; x=1738295825; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Fz2IAlBDxhiT58/U161LSQMwkAoY7PfGXhcEsriHk/0=; b=PfVseP3BulHe0dH2uLbVjRnirfQDPCMEkhVq66twUU8B1Gf/9Yl9kHr19OKFPRvik/ Y9mE1XGZFpay2V9Kzf7dKdO5uDG3R6i35JWu78VtsY+Ps99yn0JlcZsMzsrTMbaYfqQY 1c8eujnqG2W4yPeWRQzg8/arDPmT49k/EXn2wmUIKgWLdIT77lC7v7ltmvhJ9IVKjRRE QUW7Y9lC3Pkbbdm6poPEEdAWh8/tq3JfhnVhcgGnLDtQc+1fnsdiCcf7XFEta8vnlTu0 Y7GlLugl74YPCrc+x6EgHOJvAMU6Y5dhXzRy2NJwA3ivNeFEwj4rAlt/KU/vOv2JmdXQ q9cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691025; x=1738295825; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Fz2IAlBDxhiT58/U161LSQMwkAoY7PfGXhcEsriHk/0=; b=l2ePmS+3Ey4CEDvL2ao54JKSymtp0OfD1syCI8SND4S0FWs7P2BjTSVett6OMp+TGB /0kAgtkJYFYE6eJwNxpcF8vM/XVq2lrbye55mSZBwr45iIkFTJ4CYhmMwTmY3uL4qvzX dWr9tBoPYQAIFt78RhCeQ6lS7304jo5gM/zqorDjonO2mhjnvv3s5jOkgIax6C5/Q7KK S/QlR7MfcgX+fAV+EYgvstWZF/QYv/P+GIhHNODL5AsW/oNnwODOQEG3enUPNBcJSUzP jHRZQmi69erKAn9sbkHkj4BR4x+k5vsgKaQ0ID29XaThtcLMEA0QHSTAqUBryw6MF0+z LJFg== X-Forwarded-Encrypted: i=1; AJvYcCUStXHcc2hu51FwYCtSN533/UfdFRwimWS+l7YoKIJZxIa3ffxqzyx9BqZwxxd9jUgzEYwmhnEw4w==@kvack.org X-Gm-Message-State: AOJu0Yy1qkL2vFzl5/T837L1d9MB6Usq3EGOK12jFkiNF7r2B1jBsecV VYBtikE65b6yhu0R08ePQ4Bn/5C15NZIKgzCfK7t6s8WwqFLkoJf X-Gm-Gg: ASbGncvoSUdK5AnxEwioKoJCK7QsPg1EALrxhrw3xBBPJ8tL4p6xzSj3g9zjY8fnRna hz2RVIO9qLqfzYmLnXIv2vGDS1C/amh5X5vjhURcy4NCnvdRt7DT7rUIYkatpnWN/HgVuxjMiHY FKLQKPQtQWgWvLNsUrN+Bb/0xjSWqWYAVpt8YYxgxc3CNkFEGmpIDpYIl2EKDk0iGF4phSI6fJe Wo7hdpm6VGO0rQn9qfHHA6SLHijt5dlfPTFYbECV1MEVk468SvWzz+19qLlOLAACHlNTQpVuxow uFHHbV1v1JN/sovg/QgB7rqnQpo2d2O7FiHtFAg= X-Google-Smtp-Source: AGHT+IHZeimjkRMY4H4j9FWOv2afkQfL53TrxMWXGw2hDPca6jWcxwmgWQSstHxk/bosCoPa9wzktQ== X-Received: by 2002:a17:902:e84c:b0:216:6769:9eca with SMTP id d9443c01a7336-21c355b9442mr374872965ad.37.1737691024611; Thu, 23 Jan 2025 19:57:04 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da414151fsm6640485ad.121.2025.01.23.19.57.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:57:04 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 2/6] mm, bpf: Introduce free_pages_nolock() Date: Thu, 23 Jan 2025 19:56:51 -0800 Message-Id: <20250124035655.78899-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250124035655.78899-1-alexei.starovoitov@gmail.com> References: <20250124035655.78899-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F2B4BC0004 X-Stat-Signature: y1cbt3rxit4xk3awjshyk3kgrhghtr6d X-Rspam-User: X-HE-Tag: 1737691025-175310 X-HE-Meta: U2FsdGVkX19+NW2RFcipd13/oymGbjW5GHC1tEhfMUSaMjGWLTDDBMCreYKAzZ+jXsCwShHjoLMlZwp4+mRbOFh4dClrA6Vq6e8Q1wOKsQPiDHKdmAWeIBa4DffQdsH6q45Dj0Hv6j8cWkkzMybguMY0WClPn6T3+p3DG6foNIqekCn1aGkqLt2lR3/OnsTVLbjSJSOZcCcPa2hWgtzw0PoCbr41bLbgMfliafa0S64EYnMdnmaKpioqiHQiwnIJw/WH4nJPBMRkZhgwuy/a1LcGOnuH7zkAjc/2yBRjMfRkGB/3IU4EEAhrJUvZe01Lp8UZNSdg6MRVDvAf00Pndgkaeapb/TXLDOpPuzh5KdzOZZYia2ppiWxQjg60yADlcnPSdSLRdPrB2o7bnuxkVLI71UFak9vA5Sh4GWXP2N6WB/GHaWwBYEdS7HYdmV11+B0ks8tup9QRQn5hDIQ50bL7oLX+3scsGiv35zzJDTvRKDJaFhWpKVVEn8FNchUWjOapJqslB+C0etzpajL537+KUZUNqQhpHSOE9RTzTvf6XJNj9DlPm65yptI9M9UyoUHBpabgg2j4UV/8+WnG27Ywl1vszMmN7iRqzvvpOzL9hp5CQlkNWXmV0BY1L4CXwI4tg/L5aNOa5bC4XjbAjNAAhIZId6jLQUM5OgK2FfUFNReluTbri1MjL6yH7bgj283FMjhjojvuGEOsk0bNchDCs6nnSsvkX+SkHJvwY/c8drza9xLQ7necXTJGFSQ/6EHSa/ctHkeC7gkb6GlmOCsQWbTrvANxNMlxcy1/N8AG+nqflG7UuFrbi4iIJ47g3w1Gpi9vxAXlOGbnL+dRak5afHaqecQjWuRX+EWi7C/JzDM49174rs+glnCRohx89CBa75SfJ4fAgCJiFhhgepQC4MT5EAaP7hdzrCat6uNvQhuhNolp10ObNrsA7z32KQ+fttu9y2noeKJ8U4F IVv/Eqro WZ08ClUI/2eQraXhJmPOCcvw4exkccaCpnQFmIqwgUa/Y3YMc0xeoJxqY/SiiyUUYsROEk1O2GIzmCb7To1j8CnzWKuO5hTMXwEsnUprHUGso59LC25biSiUelwHhdi0NSq+8VZdh07o7cTVyNod6cBl0opMcz0KOmpTDRZmCrvYDk8ml/Qg2oZsmlkDf020cWYdL4CMsx9CXcnNqrdiahvu9bkJXvTmyhEacdvIohFb//85JjNWvkMv78OJJrMyNO3DJeP08nppXfGwX23EYPBvlIT9oaPFXEY9zUlr2XlW2N5kowtiQjSIxsRj4AjnOEBXQFHuU+QGvcN+aRSuemINlfNFMTriTjYudfMso/xMb9g2LbS4cDdvjeJsPvGF6Y3WFcumL1t5pF/pfJvB4n+CYWZt/sGBvC7Kvjqq+8O6RoQG8KZiN9u0Eb41c8SVNv55zTPpxTfztf//1JJ15P1IOu0kLvW12Do5xfVRrPjvkm+38Zzf0W8lu1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 1 + include/linux/mm_types.h | 4 ++ include/linux/mmzone.h | 3 ++ lib/stackdepot.c | 5 ++- mm/page_alloc.c | 90 +++++++++++++++++++++++++++++++++++----- mm/page_owner.c | 8 +++- 6 files changed, 98 insertions(+), 13 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 82bfb65b8d15..a8233d09acfa 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -391,6 +391,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas __get_free_pages((gfp_mask) | GFP_DMA, (order)) extern void __free_pages(struct page *page, unsigned int order); +extern void free_pages_nolock(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 825c04b56403..583bf59e2627 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -99,6 +99,10 @@ struct page { /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; + struct { + struct llist_node pcp_llist; + unsigned int order; + }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16..1a854e0a9e3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -953,6 +953,9 @@ struct zone { /* Primarily protects free_area */ spinlock_t lock; + /* Pages to be freed when next trylock succeeds */ + struct llist_head trylock_free_pages; + /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 377194969e61..73d7b50924ef 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -672,7 +672,10 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, exit: if (prealloc) { /* Stack depot didn't use this memory, free it. */ - free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); + if (!allow_spin) + free_pages_nolock(virt_to_page(prealloc), DEPOT_POOL_ORDER); + else + free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); } if (found) handle = found->handle.handle; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a82bc67abbdb..fa750c46e0fc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; */ #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) +/* Free the page without taking locks. Rely on trylock only. */ +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1249,13 +1252,44 @@ static void split_large_buddy(struct zone *zone, struct page *page, } while (1); } +static void add_page_to_zone_llist(struct zone *zone, struct page *page, + unsigned int order) +{ + /* Remember the order */ + page->order = order; + /* Add the page to the free list */ + llist_add(&page->pcp_llist, &zone->trylock_free_pages); +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { + struct llist_head *llhead; unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + add_page_to_zone_llist(zone, page, order); + return; + } + spin_lock_irqsave(&zone->lock, flags); + } + + /* The lock succeeded. Process deferred pages. */ + llhead = &zone->trylock_free_pages; + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) { + struct llist_node *llnode; + struct page *p, *tmp; + + llnode = llist_del_all(llhead); + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { + unsigned int p_order = p->order; + + split_large_buddy(zone, p, page_to_pfn(p), p_order, fpi_flags); + __count_vm_events(PGFREE, 1 << p_order); + } + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2598,7 +2632,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, fpi_t fpi_flags) { int high, batch; int pindex; @@ -2633,6 +2667,14 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) pcp->free_count += (1 << order); + + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + /* + * Do not attempt to take a zone lock. Let pcp->count get + * over high mark temporarily. + */ + return; + } high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2647,7 +2689,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, /* * Free a pcp page */ -void free_unref_page(struct page *page, unsigned int order) +static void __free_unref_page(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2656,7 +2699,7 @@ void free_unref_page(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, fpi_flags); return; } @@ -2673,24 +2716,34 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, fpi_flags); return; } migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); + if (unlikely((fpi_flags & FPI_TRYLOCK) && IS_ENABLED(CONFIG_PREEMPT_RT) + && (in_nmi() || in_hardirq()))) { + add_page_to_zone_llist(zone, page, order); + return; + } pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, migratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order, fpi_flags); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); } pcp_trylock_finish(UP_flags); } +void free_unref_page(struct page *page, unsigned int order) +{ + __free_unref_page(page, order, FPI_NONE); +} + /* * Free a batch of folios */ @@ -2779,7 +2832,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_unref_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, FPI_NONE); } if (pcp) { @@ -4843,22 +4896,37 @@ EXPORT_SYMBOL(get_zeroed_page_noprof); * Context: May be called in interrupt context or while holding a normal * spinlock, but not in NMI context or while holding a raw spinlock. */ -void __free_pages(struct page *page, unsigned int order) +static void ___free_pages(struct page *page, unsigned int order, + fpi_t fpi_flags) { /* get PageHead before we drop reference */ int head = PageHead(page); struct alloc_tag *tag = pgalloc_tag_get(page); if (put_page_testzero(page)) - free_unref_page(page, order); + __free_unref_page(page, order, fpi_flags); else if (!head) { pgalloc_tag_sub_pages(tag, (1 << order) - 1); while (order-- > 0) - free_unref_page(page + (1 << order), order); + __free_unref_page(page + (1 << order), order, + fpi_flags); } } +void __free_pages(struct page *page, unsigned int order) +{ + ___free_pages(page, order, FPI_NONE); +} EXPORT_SYMBOL(__free_pages); +/* + * Can be called while holding raw_spin_lock or from IRQ and NMI for any + * page type (not only those that came from try_alloc_pages) + */ +void free_pages_nolock(struct page *page, unsigned int order) +{ + ___free_pages(page, order, FPI_TRYLOCK); +} + void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) { diff --git a/mm/page_owner.c b/mm/page_owner.c index 2d6360eaccbb..90e31d0e3ed7 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -294,7 +294,13 @@ void __reset_page_owner(struct page *page, unsigned short order) page_owner = get_page_owner(page_ext); alloc_handle = page_owner->handle; - handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); + /* + * Do not specify GFP_NOWAIT to make gfpflags_allow_spinning() == false + * to prevent issues in stack_depot_save(). + * This is similar to try_alloc_pages() gfp flags, but only used + * to signal stack_depot to avoid spin_locks. + */ + handle = save_stack(__GFP_NOWARN); __update_page_owner_free_handle(page_ext, handle, order, current->pid, current->tgid, free_ts_nsec); page_ext_put(page_ext); From patchwork Fri Jan 24 03:56:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BD5EC0218C for ; Fri, 24 Jan 2025 03:57:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDDEF280032; Thu, 23 Jan 2025 22:57:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B8C6D28002E; Thu, 23 Jan 2025 22:57:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A041C280032; Thu, 23 Jan 2025 22:57:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 830F528002E for ; Thu, 23 Jan 2025 22:57:11 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 04952C0B86 for ; Fri, 24 Jan 2025 03:57:10 +0000 (UTC) X-FDA: 83040985062.10.5743725 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf01.hostedemail.com (Postfix) with ESMTP id 18E9C40008 for ; Fri, 24 Jan 2025 03:57:08 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Tsqwbsz1; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691029; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AhWoq2FKuOuB0wGZug1HgiVNgCDD6ysBALAFLwpbX6U=; b=TgLupCroZgHqLOxkvr2LUXs96PCSpXo5Jp6OxRuhAWFCznf1Z7almgss1tOkwbqGv/DHjW rqDvB7hYIuPvRGj+cd3oimZvjjXD8VnUO7jdkWamzhM1pqEVB6fGpWblSu2wDwPBDO36JQ XDhGFg46JqDtFtgUkmr9NeJTAqbdOrI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Tsqwbsz1; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691029; a=rsa-sha256; cv=none; b=mC4pxFCspR6xHqupXqxZsPVwxKZtDXief5VCKOuumNiEUldc/NYj9yOajWNVoEWgHfQMVc DNSYRdF6f5FNZzguwyib/pQpuqqG5XSIr0d8fIh3Fp1WUalmazBt3dBv4CdcgfE7sf7oH1 1ASTKQ7sY7lgCjAZ/rRbhCKHwkQpxdc= Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2eec9b3a1bbso2495829a91.3 for ; Thu, 23 Jan 2025 19:57:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691028; x=1738295828; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AhWoq2FKuOuB0wGZug1HgiVNgCDD6ysBALAFLwpbX6U=; b=Tsqwbsz1Xd0HaUBnBYodR6Jm7fEyN0hOtXn2zZaVtL2/EesxXDSMeCe9i4g0MmiiOz 1IxABny9qcXSH29pwmB9lTx3OU4tfaZK0XzWkSSsn2qI+8esI9QnQTFsU4/8/GyrRG/n blmOCL0w0osvjFxjHrEouyZ6dboCDRUVZHSe8RIsXzzROjD/rgpU49kvY1bjR09Mbiec iMHREbqGb2tlTfEnmB0opFQZynfclXAheGWK+10XoOcZs1I4TMgjsPSEPUiayfIEPjK+ TVv+gWPJ5LF1dNDBC9a5LT82DCYLlR5KQJsKiL+DYbYmw6oVFzj7udfVGbyvDCYzuwAi F/sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691028; x=1738295828; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AhWoq2FKuOuB0wGZug1HgiVNgCDD6ysBALAFLwpbX6U=; b=H+JHIN3a9G2u011Rfigm5JS002eixkzN4Yf7sFe+7zHrBy9765bRZyvGmeVyOAo20B 5lGjJE/gsW7u6Oy3Eg/GECj5EA2Z8NDNYbdVu8qrtR2qAYyww1FeVQF6J5OpcOLj1dXR cBHm9CWRjWkoEokrvfpU3a6v/X2X3f1UsO1o7dzR9aG+FzB24dJ2gNuYTuxbZ3bkXKOL XmOvx7FUDEwPgKeFua7+BkSyUW9rrV5MiH+FBHEUZ46JpTwwXx6jjjogiWweBMWB4mtN afpI86Q7cE6wl0vsN7eLixWJWDS4VSHj5ZPTI64TMHPtTqUt+NR4SVqgJqvM7hBalKg0 DEeQ== X-Forwarded-Encrypted: i=1; AJvYcCV1p0gmFTvv1m9YHN9YgWFdB8wCJIdpOMURNVwvQYFfUv5fnEdFTsUFLUPfiZS7c6/MtKO0dpKzPA==@kvack.org X-Gm-Message-State: AOJu0YzXiIAQTprZbT0Eh/BQhx81PDT26ZjfmK1ToJ+gT5+xS1cIX2/p z/4XOkN0osnRDKXMFMu/93BjiWbhtQXXP5NGNYoaE2Sc7wbHzQDL X-Gm-Gg: ASbGnctEKMwtzc8BkRNcFFiSbErp9z09sBmUFRJLcehXPQUqfOBbw+4O8PvX59FQFBf U5d9EfZoi/mTvJJiTkTAwKZAk8YmPVYVchg9Lgs5RBS6PJuXN5CmOyRgOemS6UE6L1+0xUMYAH6 ykQT/ErpFlXF3XodJlb5Yfx8JjnyXJlCxFlO6LcRhqCI/a4LuMAH+cY2dy25ivYVEPXotAV0S7Y hLUznLPRQt+gOG4HJfvOW7e5qd6Z0YOYOh8zpr4xxPTD02jgdYfyb56QLmbRY15fCy+lYzW7b4A 7A6u6i+ENQod5NcRZAcahI7sp6B5diT5zf5l6xU= X-Google-Smtp-Source: AGHT+IHjf895nN6HJ8zSuYUNSFGgzi7wNzn53xXi9fqFCiweThxJGpq1eNj/CM9sNN+YbDe1QrSLvw== X-Received: by 2002:a17:90b:3bc3:b0:2ee:f80c:6889 with SMTP id 98e67ed59e1d1-2f782d9a9eamr46529484a91.33.1737691027837; Thu, 23 Jan 2025 19:57:07 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f7ffaf8b34sm541383a91.30.2025.01.23.19.57.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:57:07 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 3/6] locking/local_lock: Introduce local_trylock_t and local_trylock_irqsave() Date: Thu, 23 Jan 2025 19:56:52 -0800 Message-Id: <20250124035655.78899-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250124035655.78899-1-alexei.starovoitov@gmail.com> References: <20250124035655.78899-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 18E9C40008 X-Stat-Signature: 3mzrsa6oy316e7na38im65bunbe64nxs X-Rspam-User: X-HE-Tag: 1737691028-721034 X-HE-Meta: U2FsdGVkX19FDELhcK2/a9k9F2+3pyZfJpoB1p4yPYgdilbHW3ZaI3OW6FwbygdrYy5sDzhMKJ4pyDB2ASvLK3gxI2rdvjGYSaQRR39Xwn9tzNLI57il/7DtLc6+Jp9gEGNMfXUlTEn2j0RxaEZ5zrVW+8XqDWwmzf1CqqQsX9bLMhKKjWspyyWxvydXDD8ipjoRGgb0/vbiROI+PBkUijr+PUNG0DOlpjKrH8Fer/gxyANz4RznK/YSdiBCdEIQyGlWWH1tIqlZLiLu/OpMxCW1RnCjgiKkU7Nu2w0lfqzZey7F88POvAuS+vTAFrYtbmsCnnAevSQrqa4DnuQWCa1SVkIBFSy2Bx2JhGyGQe6SkaVNe9GaYjk8tXDoWVImVopZ0pssQdjggNmKFhls3w29ficx9+c69+umYqn8IPREZ+lKmmz0IKI/hHA+/XZ+/w6GZptWw2rnNQ4ti2zQ9iSbzfhBc5h2T6/vB2AA4Zk00d5U9v+ujFYP5fJHfddPl6McV77ChEXLCxxX4IHLV1qyyle7G9rK5tRLKkIOcXEwwn9Hnce7vYTxyRvy9NgHNMd3mweBQ5s6gQuFJnpX0JTyn4U/8z9wgQNJEP6TjLKA4aENfsnjob7Q5+KHruHTws5Fjp3cmJwUi5CiwD/1l22GTQtKxuCvea58hpImOXfM8I9CNnbEfaVclLEYrZRYm1XW22Ne9YUcYGF32bovIWIti6QC98mhF0QJ+OsD7YwJF009PhhAx/0d0uUOKmOZPy2dVsL7TP/kbmnOVhEZvL37emqqAB4sJkle4cuKvHLWeBhSUWRFDQvKqH3zvLeJg99w9z5MassKwDUZfK/VOzWSfYUaTPTzXvgbAY/FHFv8iHru9hGx1f8lBrNxs7kkYGo0QKsldTMlxpM1F8ICcP/bEFqAPJyvm9d3ncL925MztCtDPXHRkeQBHyX9LXyiiw8T6bxD0zYq6urNVpc Gt8OnxW4 SfziL4zUGuwMGs1rkU0PUVSF2r+V3uGzfR6MIJpBuwzGmS0dIbBA8BCg5nPnUbK07JvWGF0XHRNdOo4tcKMhW1SQhmrzZ4lHJWT2TQ1xDKgFTRfy/qYn9rFvBBEYw3gH112wtydwJX/sOJu5wSkTTZt7oTm/7IX1mHRqdfhwICtKIlxmErbYabSz/ZASsX48MdldONQvJP0EyClM329dIPHW/gbvGIppzYZVZjZFXaLPw/Lta9By+M/1SLNog0AFashcBUZcXSZ8IlWb9lhostNaE4NqEiYmFOMP3RzgCL9zUutqHGgAOI1CG9QldeU6YxBtdRDUXGi52OLHAKaTh9igH76sZzpSc4fQ0Kh8XcvQaP9TOoXuy6sS4MfNBv6XcoIluSRO3GXncRMWUixKu3ZiOTQwyXltNO9KxJmyDOS+2pI+yzCbGSpaGdAuWN1sVH17uwJBsZsZpIcmrB0DymUzP7ffIfh1eNkoeVsusqojIo2FCjOK8TuHx3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov In !PREEMPT_RT local_lock_irqsave() disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave() for exclusive access. Introduce local_trylock_t and local_trylock_irqsave() that disables interrupts and sets active=1, so local_trylock_irqsave() from NMI of the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map local_trylock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to PI issues. Note there is no need to use local_inc for active variable, since it's a percpu variable with strict nesting scopes. Usage: local_lock_t lock; // sizeof(lock) == 0 in !RT local_lock_irqsave(&lock, ...); // irqsave as before if (local_trylock_irqsave(&lock, ...)) // compilation error local_trylock_t lock; // sizeof(lock) == 4 in !RT local_lock_irqsave(&lock, ...); // irqsave and active = 1 if (local_trylock_irqsave(&lock, ...)) // if (!active) irqsave Signed-off-by: Alexei Starovoitov --- include/linux/local_lock.h | 9 ++++ include/linux/local_lock_internal.h | 79 ++++++++++++++++++++++++++++- 2 files changed, 86 insertions(+), 2 deletions(-) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index 091dc0b6bdfb..f4bc3e9b2b20 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -30,6 +30,15 @@ #define local_lock_irqsave(lock, flags) \ __local_lock_irqsave(lock, flags) +/** + * local_trylock_irqsave - Try to acquire a per CPU local lock, save and disable + * interrupts. Fails in PREEMPT_RT when in hard IRQ or NMI. + * @lock: The lock variable + * @flags: Storage for interrupt flags + */ +#define local_trylock_irqsave(lock, flags) \ + __local_trylock_irqsave(lock, flags) + /** * local_unlock - Release a per CPU local lock * @lock: The lock variable diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 8dd71fbbb6d2..14757b7aea99 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -15,6 +15,19 @@ typedef struct { #endif } local_lock_t; +typedef struct { +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; + struct task_struct *owner; +#endif + /* + * Same layout as local_lock_t with 'active' field + * at the end, since (local_trylock_t *) will be + * casted to (local_lock_t *). + */ + int active; +} local_trylock_t; + #ifdef CONFIG_DEBUG_LOCK_ALLOC # define LOCAL_LOCK_DEBUG_INIT(lockname) \ .dep_map = { \ @@ -31,6 +44,13 @@ static inline void local_lock_acquire(local_lock_t *l) l->owner = current; } +static inline void local_trylock_acquire(local_lock_t *l) +{ + lock_map_acquire_try(&l->dep_map); + DEBUG_LOCKS_WARN_ON(l->owner); + l->owner = current; +} + static inline void local_lock_release(local_lock_t *l) { DEBUG_LOCKS_WARN_ON(l->owner != current); @@ -45,6 +65,7 @@ static inline void local_lock_debug_init(local_lock_t *l) #else /* CONFIG_DEBUG_LOCK_ALLOC */ # define LOCAL_LOCK_DEBUG_INIT(lockname) static inline void local_lock_acquire(local_lock_t *l) { } +static inline void local_trylock_acquire(local_lock_t *l) { } static inline void local_lock_release(local_lock_t *l) { } static inline void local_lock_debug_init(local_lock_t *l) { } #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ @@ -87,10 +108,37 @@ do { \ #define __local_lock_irqsave(lock, flags) \ do { \ + local_trylock_t *tl; \ + local_lock_t *l; \ local_irq_save(flags); \ - local_lock_acquire(this_cpu_ptr(lock)); \ + l = (local_lock_t *)this_cpu_ptr(lock); \ + tl = (local_trylock_t *)l; \ + _Generic((lock), \ + local_trylock_t *: ({ \ + lockdep_assert(tl->active == 0);\ + WRITE_ONCE(tl->active, 1); \ + }), \ + default:(void)0); \ + local_lock_acquire(l); \ } while (0) + +#define __local_trylock_irqsave(lock, flags) \ + ({ \ + local_trylock_t *tl; \ + local_irq_save(flags); \ + tl = this_cpu_ptr(lock); \ + if (READ_ONCE(tl->active) == 1) { \ + local_irq_restore(flags); \ + tl = NULL; \ + } else { \ + WRITE_ONCE(tl->active, 1); \ + local_trylock_acquire( \ + (local_lock_t *)tl); \ + } \ + !!tl; \ + }) + #define __local_unlock(lock) \ do { \ local_lock_release(this_cpu_ptr(lock)); \ @@ -105,7 +153,17 @@ do { \ #define __local_unlock_irqrestore(lock, flags) \ do { \ - local_lock_release(this_cpu_ptr(lock)); \ + local_trylock_t *tl; \ + local_lock_t *l; \ + l = (local_lock_t *)this_cpu_ptr(lock); \ + tl = (local_trylock_t *)l; \ + _Generic((lock), \ + local_trylock_t *: ({ \ + lockdep_assert(tl->active == 1);\ + WRITE_ONCE(tl->active, 0); \ + }), \ + default:(void)0); \ + local_lock_release(l); \ local_irq_restore(flags); \ } while (0) @@ -125,6 +183,7 @@ do { \ * critical section while staying preemptible. */ typedef spinlock_t local_lock_t; +typedef spinlock_t local_trylock_t; #define INIT_LOCAL_LOCK(lockname) __LOCAL_SPIN_LOCK_UNLOCKED((lockname)) @@ -148,6 +207,22 @@ typedef spinlock_t local_lock_t; __local_lock(lock); \ } while (0) +#define __local_trylock_irqsave(lock, flags) \ + ({ \ + __label__ out; \ + int ret = 0; \ + typecheck(unsigned long, flags); \ + flags = 0; \ + if (in_nmi() || in_hardirq()) \ + goto out; \ + migrate_disable(); \ + ret = spin_trylock(this_cpu_ptr((lock))); \ + if (!ret) \ + migrate_enable(); \ + out: \ + ret; \ + }) + #define __local_unlock(__lock) \ do { \ spin_unlock(this_cpu_ptr((__lock))); \ From patchwork Fri Jan 24 03:56:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D06CC0218B for ; Fri, 24 Jan 2025 03:57:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DED3280033; Thu, 23 Jan 2025 22:57:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 068F828002E; Thu, 23 Jan 2025 22:57:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E233F280033; Thu, 23 Jan 2025 22:57:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BEB5728002E for ; Thu, 23 Jan 2025 22:57:14 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2CA61AECF8 for ; Fri, 24 Jan 2025 03:57:14 +0000 (UTC) X-FDA: 83040985188.15.A26AC1F Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf21.hostedemail.com (Postfix) with ESMTP id 454E51C0004 for ; Fri, 24 Jan 2025 03:57:12 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M4AIm9tF; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691032; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c4la5/cXBbk2HatOeHKkn8ryi3z3XRRBiHexjMZ10FA=; b=K7F6BVu1HKEl8uefae61DPM0idpdXlcbmVBvQI/WRHM3c5mPZsLBwBb7simwIKCfU4sSjL lSia0d+6CvbnTNWviaai16XrFI1ZOihO20MIcO0c0ADdMOG8k1pYQN31B64F2avE9uy4ef 1QnaQgbRD1yM1kADCW6LTHpJZmSUorA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M4AIm9tF; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691032; a=rsa-sha256; cv=none; b=5FiL33mQ7nGKiQ0zkg397NlZ44Ud4A7cH+fRoKzUxTEO7ZmRRxkige94kvMWbSQgQejyCh kB5zaWbJjS30lx8a1YTpUWTu8sMUJmGo+zQOzVV7rzdn5GbTpeJli4uggFXxg+GrRiKXz9 BEFpoaxzS69GLCb0sr5TDM/dEzA4ZqU= Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2ef8c012913so2381016a91.3 for ; Thu, 23 Jan 2025 19:57:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691031; x=1738295831; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c4la5/cXBbk2HatOeHKkn8ryi3z3XRRBiHexjMZ10FA=; b=M4AIm9tFUmqVUnhztVQoMagUFM0FeZZz2hLHEL5x4xir16o/opmSo3qFxqIS0wlYTT kpZVDJZy1ca2Dg0wBSG3t0lWkT9JWF4OQXVpdqWnxJ/DLmGWGmgJcFWeH4e2NVfGyWvU ej9xEmU4SM0VMH3JtflfcNscto6oIQK0/Zvk9Ci92jZLN2emmswKwSftFnmEh+tGL6Q8 XlVEU8jP9rX0coibHsjnz+WPPZ/OitlUG2TlFyoCfQoakl01Wz4/5NQiSHZIfxnafcqB u5K3Gv+nI0KCbGinBtx0DIe+kdho7Ys8l4kmrB1zcN0Zal27/B7yXgUqWY98cUVHOess CCEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691031; x=1738295831; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c4la5/cXBbk2HatOeHKkn8ryi3z3XRRBiHexjMZ10FA=; b=sTGP3ATpVmmfNhgHP1JZIqpVOo5qAYob2G/59RPATiksOm/HohomFnY2lBrUxDYo0s g5ays9xqN6E6UqMQPhhdKX0xWa3LE1AvCAB05vL32iZ7YZdgDAthqAmh/OqS1fUNC7xq 0pzBXuHcJDAnxO3sPGf08w/bkNdGVGm57+pPFNMvfXlGaaFaPRqWsPdfWPbE1Rrj94WP 3btm8IqsvRJM/U5E4mmrMVjVWgGEFtqiV018SWPCHiK243wb31I78jrrl4f1dC5ZvDhu CAqXaTUihiFsa/GEMm+6u0Yb9P6ZD6nnBzo9rBO2UrS3vUgcRpPDS/Zi2RAPBTpRDNdG 2BwA== X-Forwarded-Encrypted: i=1; AJvYcCVzvWUWMZfQuWjXhH2MwIcqZk94o9L8FhfgR9FpSNt+bc9RvP5IKZ9ldhxeW8toB9u5f0asRt3kFw==@kvack.org X-Gm-Message-State: AOJu0Yw5CuBDodal3BVHcZuQJzRlsJ9QlfFUBqWRze65igSxA06Tswlz F/oSa9WsJxxt9PfXvB3lpUmTYXtpS4mtwXj4/B3IHYQbW92H5k47 X-Gm-Gg: ASbGncsFs8mMIFCCmRc1mBaUOTMexhC+d18qRnfM6FTm69P/eVpMQmbIHq7CL1EC3+u 0SuarYFWlNWdML2esVHBGd40KRGwVc5fjFhSe/btaxEezHhwlUrc8yqDYqnhEUT9E8O3YPCyC/K pp9gZrGwv8JeeRHPvx40NuEyy9DUaPF8ovZu+IgaqOucXW+WbUp4XEJO4yoc99gkQFL7/XuDdSM dJBO0JdgJxM4hUwWeUSXo5GNO2TvLeXz8CZlE7MStlNUR43bs6nzJaXWfrxc71DveQFqcZga/4X J4lsyg/wCIiDF4ifUhF2A2/Th0nKBsdxTk2z2Ks= X-Google-Smtp-Source: AGHT+IEwvNwHQhd2JUrRjyZznwdi2ypsyJojHQDiGm/2CjTDfPwIknvaSpVusJ/6NdDxjUU11w5G/A== X-Received: by 2002:a17:90b:4c45:b0:2ee:bc1d:f98b with SMTP id 98e67ed59e1d1-2f782d5d822mr38678528a91.31.1737691031062; Thu, 23 Jan 2025 19:57:11 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f7ffa6b2bdsm549950a91.25.2025.01.23.19.57.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:57:10 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 4/6] memcg: Use trylock to access memcg stock_lock. Date: Thu, 23 Jan 2025 19:56:53 -0800 Message-Id: <20250124035655.78899-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250124035655.78899-1-alexei.starovoitov@gmail.com> References: <20250124035655.78899-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 454E51C0004 X-Stat-Signature: 7y7nhpybjcnnaurz979zfiascb3h58zz X-Rspam-User: X-HE-Tag: 1737691032-651641 X-HE-Meta: U2FsdGVkX1+LGPBMamvHaw81Iatxet1WiGtC78ilNPBrnHfz7bCUqwgHTRqUBGdfze2Z9WKs3XJdhtu7EE30hhl+HK3/qBKtlNBZdUipMZSBkkHJ/A1iQkOsB7Y0LocCEiAgAJriIp+QDoBbTGJSViPdovtSArn5ByPwqB1sL647ymEqzf/B2BqbuAZWiM8qrgvU2iwIjxnJckeZs0jioNthjvv4O+IuOZbeKlYbzYIMmJCPbNzbf6VlTBbR5OvV+2OvE2g0lyAzR38g3JDA0wL7lwE3uPYaILP1gAlsBYFuPnoA5gbXBgk+X72JRzWretyZnotHmACqGsoOKB5C8OcqDLYLS35lxtY7ZKEfffbAT6MpGPb/TTg3NNuHMVz1YYqdKNbL3VckMjxZv4+jbPB3Ib5bvUCurWvTfhUE6ZlQ0HxnvjAN2xLkb4o8hQZnIN0TfM2ZJqvbKL/JlJevV5CldJobMetUyHkCXYEZHKL0lxaCLU2evM5szRBjGvLqfOpJYDoVNMddpnWkvenb8fpU3kLc2l5zX6zSyTMsGqR3kbNfgfpEP4CMDcRjSXFsk33hy8wGvdoQbTiAr4OGRQ19zL0Kjd27+smoNiUbJ74JsH/rI7FTEANGeQ+6OUnQcArI3XzjNA5xqDPaMMRnxzK3KMlZF/SpPLH69EQH4lZKdwhvltoOTrFYk1nfYSMY1h4wu/L20O9x3y2V+pPOh8/jdFRvXbtcwQkIExN6ykVxO1WymCy1UEIUMVfsnW+g0BaaHSDr4dg4dGK/tMnYXW79R0QJyPPZW4Jz8Zle0MyeX73gbrW7Qk2ca/kDvSAMMilYzGlBRch/FWuygPoEFrz1vIxnHyqryO94azIUyHstmVg5wbqjAGOQ5ZDyxh/cuCd1pJ+/WRQnWQstsTwLuc3dVdzFXBH+AX0g7lofaubSLrgO6rdkPVNSF1oHBY1jp1yjIAQFxtCOwybTy8A s9tLAeTI KApdQ2gmvZqk79oeR7StXhKXZeWLJa/SKV+Ksr6mwEws05oHDyxHl4DZKjC1LXA1iA/AfOEUXi6NJJ8/mBOgUNNqt5SMifL2uJ7NPI7anPLPXi2ubiCS17Dx4JHbkKujF9lvR5hmLec98OO1Zu/RKrMt8ORtXjd/zG1QJ11MKrZuOG97+6Y8INYRHLullGePRbugj4nEiu1t4BD8iEuINCKlzCAYEX2aDr/OLegvVfP5Rg6bi49XGUbb9IuiuiYa+Fts8Cb90oRPQLalK4xXvKoQc5vKBFEsk/y9SsZuQ/vznOXSlAWebQVUcuRBnxbG/aJRTOHMF7U53uhWK9PZJe0SB7M/rcRVIeHvrGR9Yy4ukfvBhpYknedXiqBjh1mT88fOSJL3pLJBiLlfy9qIlXCH25+9loSJAOfjnntWhq0VwrnRsJVv2o/b6Kv5oCP3WCjj1E1F9dyFZAffFDVZZcWECvnYLqa3+rDDLxN2qHbMtNcvL9ecjhSGub2105c9x08R2XX6ERS931btCjAiyAQr/+vZmeD4Lij0G/5EFmCQU27308t0V7/qnvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Teach memcg to operate under trylock conditions when spinning locks cannot be used. local_trylock might fail and this would lead to charge cache bypass if the calling context doesn't allow spinning (gfpflags_allow_spinning). In those cases charge the memcg counter directly and fail early if that is not possible. This might cause a pre-mature charge failing but it will allow an opportunistic charging that is safe from try_alloc_pages path. Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- mm/memcontrol.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..9caca00cb7de 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1722,7 +1722,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) } struct memcg_stock_pcp { - local_lock_t stock_lock; + local_trylock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; @@ -1756,7 +1756,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, * * returns true if successful, false otherwise. */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, + gfp_t gfp_mask) { struct memcg_stock_pcp *stock; unsigned int stock_pages; @@ -1766,7 +1767,11 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) if (nr_pages > MEMCG_CHARGE_BATCH) return ret; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + if (!gfpflags_allow_spinning(gfp_mask)) + return ret; + local_lock_irqsave(&memcg_stock.stock_lock, flags); + } stock = this_cpu_ptr(&memcg_stock); stock_pages = READ_ONCE(stock->nr_pages); @@ -1851,7 +1856,18 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { unsigned long flags; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + /* + * In case of unlikely failure to lock percpu stock_lock + * uncharge memcg directly. + */ + if (mem_cgroup_is_root(memcg)) + return; + page_counter_uncharge(&memcg->memory, nr_pages); + if (do_memsw_account()) + page_counter_uncharge(&memcg->memsw, nr_pages); + return; + } __refill_stock(memcg, nr_pages); local_unlock_irqrestore(&memcg_stock.stock_lock, flags); } @@ -2196,9 +2212,13 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned long pflags; retry: - if (consume_stock(memcg, nr_pages)) + if (consume_stock(memcg, nr_pages, gfp_mask)) return 0; + if (!gfpflags_allow_spinning(gfp_mask)) + /* Avoid the refill and flush of the older stock */ + batch = nr_pages; + if (!do_memsw_account() || page_counter_try_charge(&memcg->memsw, batch, &counter)) { if (page_counter_try_charge(&memcg->memory, batch, &counter)) From patchwork Fri Jan 24 03:56:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72CE8C02181 for ; Fri, 24 Jan 2025 03:57:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01CB7280034; Thu, 23 Jan 2025 22:57:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F0E7B28002E; Thu, 23 Jan 2025 22:57:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3970280034; Thu, 23 Jan 2025 22:57:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AD73628002E for ; Thu, 23 Jan 2025 22:57:17 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 67BF8120BA5 for ; Fri, 24 Jan 2025 03:57:17 +0000 (UTC) X-FDA: 83040985314.17.B218EF9 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf06.hostedemail.com (Postfix) with ESMTP id 8D1F518000B for ; Fri, 24 Jan 2025 03:57:15 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bvpLJWai; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691035; a=rsa-sha256; cv=none; b=IRz0Co7pOhwx94JXqg74sfD8qNZgRwjoxsJnY1dqVDw4h2q5I5YFr4kHleb6dWPSLNz2MO z2MhCc5vYF3SFeBEaH83v8xN5nlid+p5IKGFsj3c3oTvrt08qBI7ujgLJZSoNh7zcgBaOD kv/TUcBP7yRQ7lvbNaYagumsXPMg83A= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bvpLJWai; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691035; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iXCJ0gdG5EF9yWrbtRQyNOOIbBLtByDZRqzMTlbBC2I=; b=3PVKLIEemlO/4X5BVuiTwyKusZDK/ieKxJHI1b36vcTr357dIKip6o0lWy/Go7mYykONUX VfrlF2zG5+qqPKRFhjHvTJUMgUqJWjZ9JXYtFHAd3ky0fP9RSyWIOaGnxef4KuadwcYAeZ Q3zxqOmecCdFshgjbdMONm9ZF1Oj0jk= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-21631789fcdso37739465ad.1 for ; Thu, 23 Jan 2025 19:57:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691034; x=1738295834; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iXCJ0gdG5EF9yWrbtRQyNOOIbBLtByDZRqzMTlbBC2I=; b=bvpLJWaidxFNjvyykjNel0J3UQUgNKsa1EBfNDDUD5qCwOsFvu/xXJle9fhtKLV/i1 MQqmVJmsXuz0Pwv3WBze1DhkmgpO02B/KGc1TpzhhF6mUhWw6l8RUAm7x0jftSDZxulg oOKzxT+fRlHMDzH9IANa4sBZu1lM0bAGLfXmsREnc93csTKgScEvvMbFy7GSEFp0S5yy H3TFX3O5HrYL2Ra+NyZpNP2FAz+MN65e8rXRmHEKwf0sd3uqhH/34CKNENIe4GIke1D+ CBUbqc/kAxelJl0aGgjfNVmP1eTGuHvwMGKOsaD6/AZi44afVeM0vw96m/PWoBovcl6U dTEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691034; x=1738295834; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iXCJ0gdG5EF9yWrbtRQyNOOIbBLtByDZRqzMTlbBC2I=; b=h6m8dOeL9w4FcoWFTpnQLB2420KStnqQ/UNkvxkl2sKGrjPsOmKvfhTnDFGm140Jmj uAruiGfAagW7rOh0HuadmNUHYZT4U0LiH+8yR2/6jOFOPB+MJgzJ7/ugGq+2t2vUrlIS jlW7yMiUN3w4mmxyVnTw/oeW0u6cGBWXMEU7aJnPhLtv5oBeJq1RWfzxd9hnH83QTZ/0 ID1MayNoud99SYtawWPJ9qg+X4WcHzFiadQ+ABwUdQkLPnMTyCqndZas0TI1OPQUcRJj Y/nODIHNVBAWQNfnK6i3lnOJXfuv8PUjnoVK7Aw7ws+OFA5UltZv+ZHLt6NO5fvWRT6O NVDQ== X-Forwarded-Encrypted: i=1; AJvYcCW2ZJZoDU9C2eoSdupuO1NVvrtAINmXsRHqpEb1+MGCKe6TxRiXPtTGgF/prfPMauBHCT83qBnOGA==@kvack.org X-Gm-Message-State: AOJu0YzEBXxlSGQwm0vaLbmKeR6RLlQmd2bDOjZxizDPTyugv07a21pH veOq+wGNg0NKgavvAOk3jaYtmInJ9mcR5Tu75I4Fg56OtoR+AZ95 X-Gm-Gg: ASbGncvbiY5N/naU+h107NUde5BX3GZVJcZ6nRZT8gMRnXgFn7zj3oDMGzA3nEv4JkO wC/OZZzreLPDpNiFqB0k3gnOU+rA6h98LvCB8qB+6GfNciv6X01oT7JxwAfTvA6hKlKNgqhZ+EE kpdQSCfUb0/XiJ4WLNcOpT1IqrKnhvsFZPH7pNm2lNF+xHwIQaG6DbwT3C7SRTM55YSKcfnYFpn Jpv4G6SWTadaRNBNzx21ZrvryFssO3Pb8lYLvJ3uuAWP5oSmrIKmWIJ0T08+gZLqpLdJ1hgo3h6 XUa18+PDa78xGHGgKCz4mh/vDfOIzZhsKMpgd8A= X-Google-Smtp-Source: AGHT+IGfeaoENw63hXIHNaiPfBmvJhfA/W3E60RsZYSAmDLMCHiy5FG9LHtcRMUd/Av34DsSkZ4Ksg== X-Received: by 2002:a05:6a21:9210:b0:1e1:a7a1:22a9 with SMTP id adf61e73a8af0-1eb76cf0566mr2444457637.16.1737691034346; Thu, 23 Jan 2025 19:57:14 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-ac495d5531bsm652054a12.47.2025.01.23.19.57.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:57:14 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 5/6] mm, bpf: Use memcg in try_alloc_pages(). Date: Thu, 23 Jan 2025 19:56:54 -0800 Message-Id: <20250124035655.78899-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250124035655.78899-1-alexei.starovoitov@gmail.com> References: <20250124035655.78899-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8D1F518000B X-Stat-Signature: 5ph9t9hx3qregqkidjjpqxuqcencg16j X-Rspam-User: X-HE-Tag: 1737691035-308086 X-HE-Meta: U2FsdGVkX1/3R80xuG1cyBoS4duHLUG1mfU+IaTmO+DEKLN0CwptJilEkNwO5vYK51NrMojTzmb0J9PDISILH71oqT64SZGBtT8XOYTRKPb6qAplxdxlVtSrpmRUB+C0rIm3AQzbXS+VLVFGUsR6J5SK9nT80nRXC7t1ruRshtjX6KW5Hj+vWGe+QoL5x6BJqVZNC3ONw3LNuZwGfBT9HAnPs8qf0NO6iHydi4L8r+JfAi+XRi6jgwZQ/QcXfnPt38j+u7slw4tgEwYQWk4MBvnaRmXZ2xocLMMtcM0ikf5tcsiY2no1I8Fy0HIRNp9OpC+nqFeLEKNsOfxz6YXvMimtredc77rqvuvexdW3e8Ippd150NWgajyQUISXOCwB3DiTfcwTjS57Gzvj1m0iZDQTbqTzS2RzbHecxHhhcwgq/XQj9nUDN+9Wra2Uwne15gwfDBGowAdrImV3fDDOOWZFfpp5c9TbORiZcfJcqtO3ifSqv+IJLjB1oUDzfFZA1+6MWaMT8RtDr533Vrkmb0g/Se7FWLq980Kwm9wW50aVe80HDa3EBgBoweK6gOKt+UdvLebwH0Z/MM8xCI4TGrhXwa15DP9IiNWHc0IzEMulxnQsq/F6wZymW6Xe2sBU6f6gimzcgkCrH79g56ZZtbvRsk7kbnWIDZUbEI41CT4VoSqn0rqyr7jABFeQ2GGKCEji2k7E0/lkzPMG5RjGwdTeXJ/0rXATnoHHEclc7wcnzH4lO7reLr90fI1yPUnp8uHFbqf/sMxd77C3d/YwzWaSJZn87nA6oinmjmG9CNdpd0EQzDjjnFq6mAelsENm3GSOgMhYE2LInvPcK3lnNDFpG+P58WRkuYneEDcEwi5XqxISum8EfHeVaXipyzpIQHGd2L3uKL4CZCdzSlb0a63OuNq8jwtFwWIwh/5ZvR47ZkOiyiH1edYT6i5hJT2msTQpu3xCDBTxZBIEu2o qa1iXuKe EvC0+yl9SgdumTMAROOx9bamJhhjWytqUb/Omud8DBZrexK5J4igcCQiVVRT92Kw+8rnbSNwc4s6GVn1ILUi0H28CGgsE+0KB+7KuK+Ted+tRd3ahLCIgtLHZL83a2kaVl8WYYT6HI+6jHpKq5ldbf08cccBeMOb6we6rsKFrXl9lyyJLDHZPv0c3dJCVbLHy1CYHbHyYo3WMefgEbssoP3gSIu8BudJgvRgAHxXbh85HZJ58QiR5p+3G0bLa75cq2Mh+ErrIVln5s1m8c4omwbC5jKMIYQGQQT5Hn6iGNW513Dil7yfhYwImGK6VasrDrbQNwZQoFu7vK1VD2hjb4BTDDPnw8vT+xdIgYlT6F9RfqEp+nsAdOhPw5ST1tlhyAR0Rzbn0EKua0MB6S6RlaXU21ISXb/3Juv9jD/8wS+nvWs8uI694CxWB+pG6R8ZmbUegTWkpdtH8pPAwNVMvzfQqTO/kdyp4z0HglUjGeix1PfkHBj1czpcPxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Unconditionally use __GFP_ACCOUNT in try_alloc_pages(). The caller is responsible to setup memcg correctly. All BPF memory accounting is memcg based. Acked-by: Vlastimil Babka Acked-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- mm/page_alloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fa750c46e0fc..931cedcda788 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7146,7 +7146,8 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) * specify it here to highlight that try_alloc_pages() * doesn't want to deplete reserves. */ - gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC + | __GFP_ACCOUNT; unsigned int alloc_flags = ALLOC_TRYLOCK; struct alloc_context ac = { }; struct page *page; @@ -7190,6 +7191,11 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + if (memcg_kmem_online() && page && + unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) { + free_pages_nolock(page, order); + page = NULL; + } trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); return page; From patchwork Fri Jan 24 03:56:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA47BC02181 for ; Fri, 24 Jan 2025 03:57:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6861D280035; Thu, 23 Jan 2025 22:57:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 60DC428002E; Thu, 23 Jan 2025 22:57:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45F6D280035; Thu, 23 Jan 2025 22:57:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2542B28002E for ; Thu, 23 Jan 2025 22:57:21 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C9ADA47AEA for ; Fri, 24 Jan 2025 03:57:20 +0000 (UTC) X-FDA: 83040985440.11.FC48966 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf18.hostedemail.com (Postfix) with ESMTP id C60B11C000B for ; Fri, 24 Jan 2025 03:57:18 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DbN7e2wN; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691038; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bDuPZRtqsVh7bFH3yxNIFanE739bKkzWZ/UfDyq/wdw=; b=EAAN76tYqLA9yY0/d85YnqQdIEFV3CMsvXaVF+//8J86NFBGwVpmxNIr//QFo8d0kKVyfd 4oJ/yikQKtARl5AzJsYLzCF0HC2LVun+N5gRZU6BSnjN2tRr+wRuHZif7SMe5C1I3Ty3rG aJUXogRVIyqX66Lp7N3Q33BfOQqpPes= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DbN7e2wN; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691038; a=rsa-sha256; cv=none; b=RRFUghWDa8WiYFH1oOU3dDQJyuqfMS1sAFCUwoVrUD89nUHgBUp6sd+TuBzc+aSmB0ukIz WzV9WyHcTOxkZ3SvDPs6sG1cVjA+aOMJg+c2yjx/nfM1ziWp4suk8e0EO7IXCtQ9HS4eTm oulReTlebcpBSG9PneKvSxpS5oev/0g= Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-2efded08c79so2532797a91.0 for ; Thu, 23 Jan 2025 19:57:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691037; x=1738295837; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bDuPZRtqsVh7bFH3yxNIFanE739bKkzWZ/UfDyq/wdw=; b=DbN7e2wN0fxR6BRxFk3B4glCyMoQ1O45EaDkPftlfway4qQMhvDR7k4DRVsQs4tcoS 15SDIGajKUjvXt+BAW5JVorEDBfoyuGaz3e+zFUojI+ubgozITfGyONfmYJYk7RGWsOM oauncUC0ihUo/Nxx1Nv+uB4E4AW6aIboNUtbdN1LBnQh+VhlEr4KSK/ZcaDlgn9lqUsD qLmKZ4VyvXdHxD64WGb7rqQwSjS5jXgx1s2OJ/YGFE2FjaIYsu+EOHu6sG9MAvT0IrVH oLA4k4LLNuhhSdWGR2oohWdfiwgkIcnZUW5mK3d8zZyuS5oYY3v1NgcOkYeMANny0z/2 3Z5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691037; x=1738295837; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bDuPZRtqsVh7bFH3yxNIFanE739bKkzWZ/UfDyq/wdw=; b=ghrYo+Foalvib+UKXROx9m0TYo11O7OdHskjR3SD0W5OBk34nDVWZx3MM4JGi7R1/6 fZ/uSoCRaziJFS0R+jt4mXi3wi4QIWrcIv/wI9l+GNTwJZnrP0Rm2mcy0WQt7UP8VvYD QVErPBTuK7zP+9ECLo+9/cUlkKhkmBw+ikBO5EuW7qV41DIm81DVmJHp5MOGfTF9cYwY G81ZOJtRJN5mF9E29RIi7D3hRS4AYLd0oAgJYFeGisaadbBM/jQrPdoqTOL5SzlJpqWn ON6XlgA76iWe2xnI7LibYTbhwjeZ+xdQNQCyKhWGwpRiaKWNbrki/F8giYmi3pzUOsAx HdUw== X-Forwarded-Encrypted: i=1; AJvYcCW28JZ5ygujoMaZkljNh8bWf0HdOwIVT/RgX4xW4CK2O65xdy1U5P68Yp7aDBvhGyjeiMhaEgCdbQ==@kvack.org X-Gm-Message-State: AOJu0Yz47ut/0stcZaIlZzGLBpOnoJVXrrRASPKCPCBp+6LuoK2hlNMP wlzOBfnvZK2uig69+WwFn/8TwuXthWO+s+aZw49VBf8CTrjTaft6 X-Gm-Gg: ASbGncuGaUMtdsxUHr9ZQBra5RX3IoQHJ8iPlZKC9qwYXx7aepZMVO4CywPoi/s1qDZ raSsBkHprGYq+gDHXZSQpwQLfqA/zzppdLkYHPaRZfjVI+Qg/U3WrkBCtNnCPsOP2LRFyxSfaz/ VFJot42s43y/le19g2IJEcfvqcsKK5Kb/Pux6QS3r/RvV5+hNklaP39QzIZmNvkQtXdtJ9YvLSF GVz1OpkakYQK17DWVooHfh4N87LnFwH08mZVE9OAWuyiY1dicrW0CD14J7nbNIUd0PtD4ywP4Gk kCH3uvQqf9u0SEts6P2+evYaB9gc7IHo8VBXeV4= X-Google-Smtp-Source: AGHT+IE+LCbRskWbsR6jz9xA0dh2MfdRvUJzJjh376jPaUjhK25k9uihZ7etnl2RsWLAETsaV50r6A== X-Received: by 2002:a17:90b:3547:b0:2f7:ef57:c7df with SMTP id 98e67ed59e1d1-2f7ef57c8fbmr11319328a91.7.1737691037538; Thu, 23 Jan 2025 19:57:17 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f7ffaf896csm542025a91.34.2025.01.23.19.57.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:57:17 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs. Date: Thu, 23 Jan 2025 19:56:55 -0800 Message-Id: <20250124035655.78899-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250124035655.78899-1-alexei.starovoitov@gmail.com> References: <20250124035655.78899-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C60B11C000B X-Stat-Signature: w1stk1kypphe7bnz3hsbwkdaj99fkqrm X-Rspam-User: X-HE-Tag: 1737691038-540050 X-HE-Meta: U2FsdGVkX1+xQqQ/86hAIes8iJ57jLcLlFQ4i24dHiVVc1kpUQtiDNQpDEQmX6STo2NTLAWQJHKkJQp8L0b4QG4HMuMvf98GnT3uhDg0CLQ41SOWcfYeGFhpS4VZVq2Xom+01YWl3y+GnZozYsTl4F/f1N2v3Do1BogK9rJ+Bfz36SQeBvPXR9Vr5ZuC2r6RVIBD5qaDGkMeUer5+Js1vRiuITpVUDZA5cR8Yrds7zGnX1mCwfoH69O3H9hOE7khTIo2n+TWDm4xIpiib4/0yUgxc/Mo1hwPrLr8AFKYP2gs6UvjY6qQHxGEDqqAyG6K/GeqXzf68HrPp+cwYMsTSOuHbIY1BKkTT/F6tpW6GNEipYaI2U3ZmtL99xQfUWsOxIPrbyqpdZMJas8z53xrO0nsoxN+6SEbp9mhkJsLhUpP3tS+iU8ksZMgN/UrhK2ZuCQvebGRrE/FxLREbACQxfVxWUZJ0LZeh9A65fb/Z05zMNUaEE/lA7ln+O0xg3vzdLiGLvPQgCxF6ulRFWxWH3niA0l/+6LJwls12gqyQuM0Z4DaKYOn88XPurTWs+BTLdFB2eGHBJoqeii+5l8dtyhFvohy1EOqAirqZIBv61hri1OXriEmpiv3njBpS6ysr0Asj3QgA7ugpe9nt80h/0m7BGcToM3AXTUwOtW+zJ6QpO2I7gg5UVr5jX/YE2VehyZXPT0EkZ1wOQhDnBUk8/0uAqU6QV329ha4FvkAwpFklEDU8ehaD9fy84qd2RTLEkKGF9WzDxkU/cyWC2Z7Eq7Npo7P/e+ST925TJhG0OuGT/J9DqpBWF7e8LV7PSYRJg7p6uIXtmvhGHiXWN+R8yEzo1u+FsSWrLdqtci3rR37Dr5oT+oGR2fa/tjQqS3ULSDtTVyzBKwtNzaEr+i4B2t2ZfhpSHMvdwC2cGsiYDj4dEiOyOwmjn6bNL6Ko0i818yyhAjKEwS1maSK3B3 AABeNyRR XZm0NiQqdwmK9HLwk61WRvZCNRuT0604R+nNiwkKCOSu7ozzIPsPEzjBmaZTvUAR63Sc7TJs4kVgBi/hTbP5tSt/+vvZFXr0sLX/7w9aKSUcyB2TqfP/c66VobA4wwjiu5rBTKq+ZzCggcOXzhVKHRMBgctBG3LgVupegOOZw5A9v6R0QjMZ5FbhTVyU1N1izVOG5usQ6MUNW9db1n0PyiiI2x4bEEVE+MHQqqYc/pgk9XaPgC4ONv09E0GxwY5zdv+pS28fwArp797bfs0Q2if2lcT8Uef14eAdNEBWiLBWCYDXbpyfLjAknbW6W4RYLUMFDt97V+Gza+tdtTNKL0aLwV2O4qrAQ6zJ8njHDOn2VmYC1s+8khoOgWt2KtJixVcYX0yIEyyfdxdhQ1rqFs/iwTbTI0OZVKadi8qm1fDTFXadyLW0NWsu+jmXZ6Y+7Er1NnxUHXm1rT3B7SVf3XZ5Wfe9XUBiPwL8RG+8D/DwMcRk05V/C39xRGO41t3KdsdHNBXPIe0106FnmQJocB4bnf5CVVTn5Cmrwx20+iVdDJNA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Use try_alloc_pages() and free_pages_nolock() for BPF needs when context doesn't allow using normal alloc_pages. This is a prerequisite for further work. Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 +- kernel/bpf/arena.c | 5 ++--- kernel/bpf/syscall.c | 23 ++++++++++++++++++++--- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f3f50e29d639..e1838a341817 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2348,7 +2348,7 @@ int generic_map_delete_batch(struct bpf_map *map, struct bpf_map *bpf_map_get_curr_or_next(u32 *id); struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id); -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **page_array); #ifdef CONFIG_MEMCG void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 4b22a651b5d5..642399a5fd9f 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -287,7 +287,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGSEGV; /* Account into memcg of the process that created bpf_arena */ - ret = bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, 1, &page); + ret = bpf_map_alloc_pages(map, NUMA_NO_NODE, 1, &page); if (ret) { range_tree_set(&arena->rt, vmf->pgoff, 1); return VM_FAULT_SIGSEGV; @@ -465,8 +465,7 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt if (ret) goto out_free_pages; - ret = bpf_map_alloc_pages(&arena->map, GFP_KERNEL | __GFP_ZERO, - node_id, page_cnt, pages); + ret = bpf_map_alloc_pages(&arena->map, node_id, page_cnt, pages); if (ret) goto out; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 0daf098e3207..55588dbd2fce 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -569,7 +569,24 @@ static void bpf_map_release_memcg(struct bpf_map *map) } #endif -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +static bool can_alloc_pages(void) +{ + return preempt_count() == 0 && !irqs_disabled() && + !IS_ENABLED(CONFIG_PREEMPT_RT); +} + +static struct page *__bpf_alloc_page(int nid) +{ + if (!can_alloc_pages()) + return try_alloc_pages(nid, 0); + + return alloc_pages_node(nid, + GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT + | __GFP_NOWARN, + 0); +} + +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **pages) { unsigned long i, j; @@ -582,14 +599,14 @@ int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, old_memcg = set_active_memcg(memcg); #endif for (i = 0; i < nr_pages; i++) { - pg = alloc_pages_node(nid, gfp | __GFP_ACCOUNT, 0); + pg = __bpf_alloc_page(nid); if (pg) { pages[i] = pg; continue; } for (j = 0; j < i; j++) - __free_page(pages[j]); + free_pages_nolock(pages[j], 0); ret = -ENOMEM; break; }