From patchwork Thu Feb 13 03:35:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13972748 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 194D3C0219D for ; Thu, 13 Feb 2025 03:36:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B340280004; Wed, 12 Feb 2025 22:36:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 96456280003; Wed, 12 Feb 2025 22:36:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DCE7280004; Wed, 12 Feb 2025 22:36:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5F506280003 for ; Wed, 12 Feb 2025 22:36:06 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1ABEE140269 for ; Thu, 13 Feb 2025 03:36:06 +0000 (UTC) X-FDA: 83113507932.28.ABC8B6D Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf20.hostedemail.com (Postfix) with ESMTP id 349DE1C000F for ; Thu, 13 Feb 2025 03:36:03 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BXZdDv8x; spf=pass (imf20.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739417764; a=rsa-sha256; cv=none; b=INN5SFUqse6wfnTmClIcHCjZsZE3SkJZH3uwPgoYAXOcis0l3RyRFg/g1u3cHMzgn26A1R +5QuyDJGQqd9mWwXWg2fhK8gzeScmkAsgADQBAOq/2JIyCd+X16G2gprs89gXMJh6PaBqZ xizI7uZxFgTAfsziA3w973QIz1Z/OFg= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BXZdDv8x; spf=pass (imf20.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739417764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=neE48xwS02SE7GqbOWiJ7ZnS1/Lsz/8j6zs7IbGprYg=; b=Sa/GRQ7U4P2FnOQoVp9JniBYvMtwqWVTBjhtuM+vyhcO2oJi1qojA1t+JAuJMlrRYUARsW vIvOQ3r5alRopwt8m0SMBi6ddAc5lwtGtDGjR+kLHW7YtCkq2iK3CRva8wkTU83x+5lRWs LTFuvNnZn+RT97JtswcF7T2Q/5QLujo= Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2f9f6a2fa8dso677928a91.1 for ; Wed, 12 Feb 2025 19:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739417763; x=1740022563; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=neE48xwS02SE7GqbOWiJ7ZnS1/Lsz/8j6zs7IbGprYg=; b=BXZdDv8xe4jx/9QzPxMLzxOMSFH7jXijRDiAwbEnrUwzrTyKwCaqugxdu7tsbtuyYV bTIWGQNXT/yGf/ka5Q2vtuO7IuZ4cZQRUuuNdCLEtitIqsodeCptwm6DkgZY8K9HJcmq vBe+qAGyTNBflJZqifRAlxfzzXqJTqMqBZTrOUxyx3f7fw4IwwIVPAewOh94eLdZd+1p cnNi/Ue2myOtgP8OJcMwRSTjy4pDNWM8yLLuggdyP5mf4mp55swgPWrPb0240jtyM4Ie 1JJJi2GBHxwEO911Fek4nojYBYTRMHwGQWfYnor9RCPoLR8JZVM6pD5xtdiYrTTr4fNy 0j6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739417763; x=1740022563; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=neE48xwS02SE7GqbOWiJ7ZnS1/Lsz/8j6zs7IbGprYg=; b=d+1To/zYFisMdKS/2DrdpvHFuLUuzkWvhO1NNOB7OZS38DwpGhj+RaJuGG+V+ZWy+r aXtquIpISIQ9U8WYvwUVfTp02WEyuokvSlO3Z1UfwCc8RrZbg3mbAKNZtG4zK8sSDFYw GVrxlS4eR0arDFpyWPO5ZnWd5PxcT+Q2/31Cr2lEOEhWx2UeMYzDz7S81EGYJI9ZBiAb J7CkcPwvtbAMuDWBQtBAM68MXPxjs99XTjh0sX8waboE3f12dku1QqQDlvu7t9QN2eb1 AQ6NtnuF72R9Wuvl10+In8t0nW13H/+dBe9dMDOEiI0X7Mk/LAcLQWWBZQbE+FmNlUwc ZZ/Q== X-Forwarded-Encrypted: i=1; AJvYcCV3zl+pau4jOP5X2dK0ww3YLHS5ew5Hh/5PrT/iRDzDKoZbodCm64dUuPfxut4H6bipAv9JJOnSiw==@kvack.org X-Gm-Message-State: AOJu0YyoBTT0TgkZFg72j7QRKZtv1gO/cx0LsM+Wmy9mgp4nt1WUue9X sDDAmdoFp3hzL4Tnw1PUQ25vRQ3qPyiPBrLe7jFpP/mW9k7qtnF1 X-Gm-Gg: ASbGncvQBoXoFJq1xB0+20PRjnuaD2YE5YhQ4MPgBUvLDuTpmAbCAFtugJC6ScjjJlX /Zm2gRzTd7aYuuC81KrcjwQSAj0/H5sXZepMsGps6v/ejsNsC66NZyuI1ZR77NhjmvecXyIgZV2 0H+b2wMEZixGQmD0uHR9pY1xYDnKwaBOlElbbNxWFMK7x3dhuCJaDdKDQVHDt6gY9sU1PauwaXo KWZRATc4C5EdF8fgOvb2HiV/9sajuYJvuxfiU1WcePWodYiC9FldwieEeumc5B8uAZcGsseqO82 Q0WdugInMjrcgPkkuxPvoZAaHvQm+8R0MPPrv8/L1TyzWQPCjg== X-Google-Smtp-Source: AGHT+IG/OKe0k23HRwq6uyNTvVhZ1LwcohTQ4NHabBVvA7giq717pAjLvBtV5aaQaGpRA45ZMtAmaw== X-Received: by 2002:a17:90b:54c4:b0:2ee:ab29:1a65 with SMTP id 98e67ed59e1d1-2fbf5bc07e4mr8801377a91.4.1739417762744; Wed, 12 Feb 2025 19:36:02 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fc13b91232sm198511a91.34.2025.02.12.19.36.01 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Feb 2025 19:36:02 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v8 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Date: Wed, 12 Feb 2025 19:35:51 -0800 Message-Id: <20250213033556.9534-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250213033556.9534-1-alexei.starovoitov@gmail.com> References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Stat-Signature: g9y88fygrr39c8xcmfnjsah4osq4ootq X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 349DE1C000F X-Rspam-User: X-HE-Tag: 1739417763-924799 X-HE-Meta: U2FsdGVkX1+1DrQTs4dVqx7cacLt0VH9tSDogqhh2QPt/zHhnpQmQLCuCUi3j5gYBQicVT5eX6PJoWDlqmGD52vuZnajokNthsoIWSQcfdD6it1+Egw3S8DDNbAvpr/7YkGW6NDFzhUpgu2J9LxLYpG1Hc5azodBnghseeuykXzB1eOPQXhW6RuKTFBgwNTt6ve1uEPGA5MlYdCLwuQ9NH7ALM/LBLZI0b251zsR1UyDrLDJnCShJe8q5p4Zhi/9QU8MiO5x0ZlgXBPlkfZqNI2JoydXlMJMqgrBj3W3oOI8AvBvr1RHBxauTst642JuY63+vpbJ4d06Wge4G/lKVfOSnm65d4VqZXn79Z4MHL0Qsr7l9ol4TzRxTMBq9MJtDCJXkQLVEc5WPgdSN8VFelsQqt755oZtFG3ZJ55jqImP+CCSv8rTNxKb4lzpFz4aZqnXo5DUtxR01EnNsp3iW/CiCNLuWvYiGWt3zKqFLYUD4cWQirEhDn39YpY//ZbTDte9XSSXcB2riOMLCJBwzQCOSs2I8noxFNChRCA8HJrqVyzscrLIbHery1cbTS4PBAs4rlh4XInlWLvUO/iuLRj2mB7tTuowDks0vNg6VkVj+ILbIojx1n9+ja0vLL9lql3EpbDIptwZpTAiUwV9au3cJp+mxrguWAST8bymPvBAMVJbIkCxRjbKTuxTUhNbi2rt+iVp7DEryrJ4RvEnsMjhMehclCHd/c1mJyIFIr/zuhSW+5GDkK+fqxCLIsFHaFrtoYNz/QufKIKEu4CXPfVnf03K7vd50yTItIqW+EJTcVoYWstcEnOx79MR0ALJdAiq/e9o/Ecdun4KyUvS3ZdFaKk4Rkk1akwUVYrahnLN9MbfF24+INSnY4+Apz9kMnj4avifV/4oweWPxUrYZPG2/uL4jwEIs/efxiA40B0SRMgnBmGlUq5lCib/EPYLEHoxqI7LjPmRGgTg8hp s4TUAP6Y V4eAlO844TDgr4ZKAxvaT0BHpEYQ/3FPuUGn6ReUqfVPkk7V7uJsEMhNbEF4/0s25kOHSHRrAS/GPwlSalOLjnwRyKGfjIt6v+5Mp3odOqzIf790sVsXRluVdW1yufRF2ZIgImeTBrGQ7Mo36rHn6BRTM/Wf7iSmY4BApS9+K3wn5rK2znIu78JE3i9vchoI6nC5diOynutaDUt7PJYogoEjypkrPcbdn5yDj0jgd3HRKu1ES+O9mKPwGsIVDUd25tGQwUT11PoUD1z0w7b+dMQkRwlk8pebOQOdqgcVkK88Uz+pLNkiAmKa0eY3weDvHKAyXSSBcWS5WxnnESEeD6xYihg562X8/VYmrnw2EaYcz0xpR+dxUezf/nxPw7+TBIjvLvci5AL3+2t7tno/McG0Psbxe29TFACdNc2y4eEXOMAiEbjzCCgwYJDRjrxOENA0IwEGPhLifRGciuYoqbhT76LeHS+ydhC5BjpkxVu1qBtKnMVG5iWg5o6f6CkxeY3LQSOW+SGs0OMiENF8UAWyIIaOxMTq/JOfeT6v3xaxyR5pW9SjJmx9gdtkRv6PsKzlOKsTFYPQiSgmWs08WMAZJ1IiSgle31x0cYY6hHm9dHcg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior Reviewed-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 22 ++++++++++ lib/stackdepot.c | 5 ++- mm/internal.h | 1 + mm/page_alloc.c | 104 ++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 127 insertions(+), 5 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 6bb1a5a7a4ae..5d9ee78c74e4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -39,6 +39,25 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) return !!(gfp_flags & __GFP_DIRECT_RECLAIM); } +static inline bool gfpflags_allow_spinning(const gfp_t gfp_flags) +{ + /* + * !__GFP_DIRECT_RECLAIM -> direct claim is not allowed. + * !__GFP_KSWAPD_RECLAIM -> it's not safe to wake up kswapd. + * All GFP_* flags including GFP_NOWAIT use one or both flags. + * try_alloc_pages() is the only API that doesn't specify either flag. + * + * This is stronger than GFP_NOWAIT or GFP_ATOMIC because + * those are guaranteed to never block on a sleeping lock. + * Here we are enforcing that the allocation doesn't ever spin + * on any locks (i.e. only trylocks). There is no high level + * GFP_$FOO flag for this use in try_alloc_pages() as the + * regular page allocator doesn't fully support this + * allocation mode. + */ + return !(gfp_flags & __GFP_RECLAIM); +} + #ifdef CONFIG_HIGHMEM #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM #else @@ -335,6 +354,9 @@ static inline struct page *alloc_page_vma_noprof(gfp_t gfp, } #define alloc_page_vma(...) alloc_hooks(alloc_page_vma_noprof(__VA_ARGS__)) +struct page *try_alloc_pages_noprof(int nid, unsigned int order); +#define try_alloc_pages(...) alloc_hooks(try_alloc_pages_noprof(__VA_ARGS__)) + extern unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order); #define __get_free_pages(...) alloc_hooks(get_free_pages_noprof(__VA_ARGS__)) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 245d5b416699..377194969e61 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -591,7 +591,8 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, depot_stack_handle_t handle = 0; struct page *page = NULL; void *prealloc = NULL; - bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC; + bool allow_spin = gfpflags_allow_spinning(alloc_flags); + bool can_alloc = (depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC) && allow_spin; unsigned long flags; u32 hash; @@ -630,7 +631,7 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, prealloc = page_address(page); } - if (in_nmi()) { + if (in_nmi() || !allow_spin) { /* We can never allocate in NMI context. */ WARN_ON_ONCE(can_alloc); /* Best effort; bail if we fail to take the lock. */ diff --git a/mm/internal.h b/mm/internal.h index 109ef30fee11..10a8b4b3b86e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1187,6 +1187,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_NOFRAGMENT 0x0 #endif #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ +#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ /* Flags that allow allocations below the min watermark. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 579789600a3c..0404c4c0dfc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2307,7 +2307,11 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return 0; + spin_lock_irqsave(&zone->lock, flags); + } for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, alloc_flags); @@ -2907,7 +2911,11 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return NULL; + spin_lock_irqsave(&zone->lock, flags); + } if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { @@ -4511,7 +4519,12 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, might_alloc(gfp_mask); - if (should_fail_alloc_page(gfp_mask, order)) + /* + * Don't invoke should_fail logic, since it may call + * get_random_u32() and printk() which need to spin_lock. + */ + if (!(*alloc_flags & ALLOC_TRYLOCK) && + should_fail_alloc_page(gfp_mask, order)) return false; *alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags); @@ -7071,3 +7084,88 @@ static bool __free_unaccepted(struct page *page) } #endif /* CONFIG_UNACCEPTED_MEMORY */ + +/** + * try_alloc_pages_noprof - opportunistic reentrant allocation from any context + * @nid - node to allocate from + * @order - allocation order size + * + * Allocates pages of a given order from the given node. This is safe to + * call from any context (from atomic, NMI, and also reentrant + * allocator -> tracepoint -> try_alloc_pages_noprof). + * Allocation is best effort and to be expected to fail easily so nobody should + * rely on the success. Failures are not reported via warn_alloc(). + * See always fail conditions below. + * + * Return: allocated page or NULL on failure. + */ +struct page *try_alloc_pages_noprof(int nid, unsigned int order) +{ + /* + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed. + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd + * is not safe in arbitrary context. + * + * These two are the conditions for gfpflags_allow_spinning() being true. + * + * Specify __GFP_NOWARN since failing try_alloc_pages() is not a reason + * to warn. Also warn would trigger printk() which is unsafe from + * various contexts. We cannot use printk_deferred_enter() to mitigate, + * since the running context is unknown. + * + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below + * is safe in any context. Also zeroing the page is mandatory for + * BPF use cases. + * + * Though __GFP_NOMEMALLOC is not checked in the code path below, + * specify it here to highlight that try_alloc_pages() + * doesn't want to deplete reserves. + */ + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + unsigned int alloc_flags = ALLOC_TRYLOCK; + struct alloc_context ac = { }; + struct page *page; + + /* + * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is + * unsafe in NMI. If spin_trylock() is called from hard IRQ the current + * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will + * mark the task as the owner of another rt_spin_lock which will + * confuse PI logic, so return immediately if called form hard IRQ or + * NMI. + * + * Note, irqs_disabled() case is ok. This function can be called + * from raw_spin_lock_irqsave region. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) + return NULL; + if (!pcp_allowed_order(order)) + return NULL; + +#ifdef CONFIG_UNACCEPTED_MEMORY + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ + if (has_unaccepted_memory()) + return NULL; +#endif + /* Bailout, since _deferred_grow_zone() needs to take a lock */ + if (deferred_pages_enabled()) + return NULL; + + if (nid == NUMA_NO_NODE) + nid = numa_node_id(); + + prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac, + &alloc_gfp, &alloc_flags); + + /* + * Best effort allocation from percpu free list. + * If it's empty attempt to spin_trylock zone->lock. + */ + page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); + + /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + + trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); + kmsan_alloc_page(page, order, alloc_gfp); + return page; +} From patchwork Thu Feb 13 03:35:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13972749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0501C021A0 for ; Thu, 13 Feb 2025 03:36:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D2A9280005; Wed, 12 Feb 2025 22:36:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 482A2280003; Wed, 12 Feb 2025 22:36:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2ACEA280005; Wed, 12 Feb 2025 22:36:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0BE50280003 for ; Wed, 12 Feb 2025 22:36:10 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8AB4BAEE8C for ; Thu, 13 Feb 2025 03:36:09 +0000 (UTC) X-FDA: 83113508058.10.08686D1 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf06.hostedemail.com (Postfix) with ESMTP id 66DE7180005 for ; Thu, 13 Feb 2025 03:36:07 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gzkdFrmz; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739417767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lzTu73HDNMqdVTVLzxRBLMiERoGgzud+lLSX9AZjyIs=; b=sEe8h4jC044opUOYXjZ9L12pAb1KGRQYD3l2aq5RS7PcKjps/rms0KPbXYkTVr2uEuhwhy HGoLdUr+bXbWsvdM8BDMpYQsCV4ekxkGkNGnR7RYw4R8bVXr9U9apa52vITXY4wVFpcYwe KvartuBOb3rfY8TpTYNSUpaqvyt8wmA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gzkdFrmz; spf=pass (imf06.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739417767; a=rsa-sha256; cv=none; b=Wlbm4dumpAesHW6SeR7IU4jOGx+uPTuIf8VnFcVrW73inl2rC2VWD8IGnH7Ldj3zdkil4E S2RDyrk5C5dD6+A7vbPZlB621AsSLk2ghYbwjrP7b1krX8ZgW3yLI2eWvu790Ne5R/YkNt IQvWaU7AStWj/wGt+TLTzPShVnyhGmA= Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2fa1a3c88c5so677245a91.3 for ; Wed, 12 Feb 2025 19:36:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739417766; x=1740022566; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lzTu73HDNMqdVTVLzxRBLMiERoGgzud+lLSX9AZjyIs=; b=gzkdFrmzYI9B+H07atIv3ySgzcapNdXl74wvlxM9xQ3zChDeyK4OLZMs/JURfQdx6S HjOGNJ4cCqfd/sahLbauE7FY9gOU16L3UtsheMkqzRL8x3sixcvfUd6mQSBfrE5cz4vA Y8AW5PcK0rt2Kx5d2fym/ZAUCgldAG1GgrAnfff9+DnO9P7TjjvQDDHBs+SNnzamw83/ w3V3uQFOsJnLjOIbrS1+bXa3InB675K4IOSVdhLrLDqa0EgbDvFUJJJ08qZrB7VhU5cm AuvDK6YZw14PBkl8j8XHX0C0QQuyOzFjajG2LjKDq+chwMbV8+KY9JXck8Uwo1cgu7yW soQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739417766; x=1740022566; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lzTu73HDNMqdVTVLzxRBLMiERoGgzud+lLSX9AZjyIs=; b=K7ZNpSdhp4ZP5IDKDopb5y6PPd6D0464ErlJ5lRHOMlapC7K8vMnGf1bft+sWOIo1r a0VIHe09pdmaqXO9dF67EOt92Ug0LUE3iv9bQRYSXeva4zuHA70ovOYfQDzg8BOabn+5 /pPpL6rGhtA9GMopDMAAHM6INh24STkV/+6H/NVGqxmAa690wJUnFt1twsCpWGMWVxCi Qrq7PlffbI1pymQhmybj88pNM2j+nqFXcxSNHFeaxm3u9i9JNbeHh45bZB+97FfbZSkt gs7wODgtB5QTqy6nxou2rPXj3jzByoMrTPe+y51IKCUaQz9aD8RrDMcsVm3NBhI5PU3E A0Nw== X-Forwarded-Encrypted: i=1; AJvYcCXLTYw73Gai3Mcj2OWiPlJxYz/1LOiSI3FNnoaQcnGKX/LR/831hSEOmzfYHwhklXeGG9hzevOBpQ==@kvack.org X-Gm-Message-State: AOJu0YwKvzGKuf06etFxxqiSf/z54623iEFmPU+dLdDn5Rr3T3B6HoZ6 9iKyS3uOMD8Wf8Rhcd2AWSUiurgL3IAxRAaLegWnMxNdK+Lmu/Nw X-Gm-Gg: ASbGnctUDVRlgiWcqEg3z+cSFRBETgZlqK7XQ9SVQuuN41+5MOS0B6jxYSB+IsVN48d C4/TDwtFnNpA/XsftIslaypJPogCAPz/VYckAJ4obyU9FiGOSaX2n8CJmREk1Nlpw+z5OGqtTDW jN4oFUXWH08WMzBSepoRmFid3IBxWMQ1xQHqROBKaVL16yMP7NDRLUKwQff+1D87rHIkgPuXDHd Wil1vxGmIBp0dgev1om4hjQabw5TrKoam+AjwVDiBd6RXTu6FR8ir5winEyMLpJtpIbEp3wf640 PqnNjT9QUwt3m3dRMHDR6bJbvi3SPWc2fCvCck0omMhw0MTgSQ== X-Google-Smtp-Source: AGHT+IHwW0mWciqg0xYIxqI4O6t3z/9NXB3Y7q2juLoIWFRHSivyRpyZDeMZECumFXMCEhZFV6jUHg== X-Received: by 2002:a17:90a:d404:b0:2ea:3f34:f190 with SMTP id 98e67ed59e1d1-2fbf91067abmr7466225a91.25.1739417766159; Wed, 12 Feb 2025 19:36:06 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fbf9ab0233sm2256043a91.44.2025.02.12.19.36.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Feb 2025 19:36:05 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v8 2/6] mm, bpf: Introduce free_pages_nolock() Date: Wed, 12 Feb 2025 19:35:52 -0800 Message-Id: <20250213033556.9534-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250213033556.9534-1-alexei.starovoitov@gmail.com> References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 66DE7180005 X-Stat-Signature: ww3uu45i5pt4wcp57hrsgzru3ttoucjp X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739417767-643790 X-HE-Meta: U2FsdGVkX1/sSl9wI2QXIF8SIyoTteukddC7H/ItCeGTQho5u4ITfe2bllpyC0ssNoNwpIIsqCRyUIp/guESXbeeve1k326DPu3F5SKxASRZsKYK4zngjgWC5sd2m2ZdwvttUB0QwZErYnjyCj054CFuLS8EfThBj8B12qB41sOaeO+GUCOBqttEdJm9denLll7hRNSoU0MAeTIVDiZ41Kaix97mXRZFvK8anqwwGGTvdfXC3bX3JD/6hC0NQpMgK0NutBuakHvrv0OTDY2aa4kQ1H99UhhzUOWeFWOGqU64qFcyvn/a5NmsrXx7z7p+TkyEd1ovhsSt+ExiQtsqpVEm4aGZwGG0y1hFTTsH0j6caqATmg/9HWmgOGwZMpY+gvcLcAKunrxJ1nMJHramUJdao4uLnOVq7YBFyaU/L7JgQumYRtW0zVe/yI66JOBZm/73KJ0UxZhFenJ7XQS53QFLR87roq0Xye4IsEEuSu4uNgSQNFpT7rRe6Jjyq1u+hKccTQ9U1z/aVucuxUyOfTgTvOO+xmp3YwsOHkgN681Dg0caeFp2vDxpJ8HAFl9NydiqjI97N/seLu9G0XQSzrSsg0TXZ9Npd8D+JVwq2g92aMMQwxE4Tcx9JV4keVHL0GMm0TsXjOCxOmAPLoQVan2JjC7mmpR14dn33SZQ0I8F7pCO0n3fqondEwjBq01R6FWUqMHk2+nIeK7bFxMHFhDT0TcU4qjv39bcFLWup8utzktCBKtbyevycep9prme/4lvQ2hv+/5xk8vIRls8ls3JcTc4Td6AAZ0MQRe8Dhh0+EJndoBlKYug0J6iX7FgA1IuxYl7fuVAbzrFysY4K6yYBwtVGBBS6XyDc1Bhn/gb8Xf3807foRAFXSfxz8uFfOJkOU/XB7IfqmtDuJ2xDmFn5XAD8uhqrrRbdjA1oR0UOZr++JRWAeIUpQF9WNZtvKpzwO6yJByXw30gC7a lYEdBYj1 DeD+KQSzRLpFiY83xi2PG6tMPMZEu6hGYfD/jRx4QHBYkpfrPY6ww62BzoWA4ZZDSsIelDA282BNCRHGcLTIoAJGU0Ms1/cjSM1by3FbSwSXxdh8SVzu5Zx6y/ffzVDc0ly6fvbwVc1YGaJfLtQ314UHDt5ZSSTyQxmUH1x+jdDkO1sYL3vxs62+2FDoFPd1CkF4C0WTpzZ+pTy7WpG7bJG9uAWn1XaXzdoxgUF3kXgorgW5dOyQdOk77lFtWqUbrg0IFOyv24jIBPJoeRwxU3kL1d5fEVAb4RJ6T/LTUY/YexhwcM1dr6vYMCnqKxQwVfAp36LoLi0ZvAmqN1ZFwZ0FCW/kNruN2tESTCjnVgWA2CE6oPDt7ktwuc9RDKxCVSqNUmGgGd7HgEEZMGz0OzTHfTcYJXNrFC5kRuNiaal1lMXzQlcZTPX/V0Smh3LkNbqE+FAQpPGGDMffEN9bnK1GkPbYv2u5lwg5zvQtcHjo++e8jLr9q9AhIEw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior Reviewed-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 1 + include/linux/mm_types.h | 4 ++ include/linux/mmzone.h | 3 ++ lib/stackdepot.c | 5 ++- mm/page_alloc.c | 90 +++++++++++++++++++++++++++++++++++----- mm/page_owner.c | 8 +++- 6 files changed, 98 insertions(+), 13 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 5d9ee78c74e4..ceb226c2e25c 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -379,6 +379,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas __get_free_pages((gfp_mask) | GFP_DMA, (order)) extern void __free_pages(struct page *page, unsigned int order); +extern void free_pages_nolock(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6b27db7f9496..483aa90242cd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -99,6 +99,10 @@ struct page { /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; + struct { + struct llist_node pcp_llist; + unsigned int order; + }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9540b41894da..e16939553930 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -972,6 +972,9 @@ struct zone { /* Primarily protects free_area */ spinlock_t lock; + /* Pages to be freed when next trylock succeeds */ + struct llist_head trylock_free_pages; + /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 377194969e61..73d7b50924ef 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -672,7 +672,10 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, exit: if (prealloc) { /* Stack depot didn't use this memory, free it. */ - free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); + if (!allow_spin) + free_pages_nolock(virt_to_page(prealloc), DEPOT_POOL_ORDER); + else + free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); } if (found) handle = found->handle.handle; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0404c4c0dfc7..3fbcbeb7de8e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; */ #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) +/* Free the page without taking locks. Rely on trylock only. */ +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1249,13 +1252,44 @@ static void split_large_buddy(struct zone *zone, struct page *page, } while (1); } +static void add_page_to_zone_llist(struct zone *zone, struct page *page, + unsigned int order) +{ + /* Remember the order */ + page->order = order; + /* Add the page to the free list */ + llist_add(&page->pcp_llist, &zone->trylock_free_pages); +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { + struct llist_head *llhead; unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + add_page_to_zone_llist(zone, page, order); + return; + } + spin_lock_irqsave(&zone->lock, flags); + } + + /* The lock succeeded. Process deferred pages. */ + llhead = &zone->trylock_free_pages; + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) { + struct llist_node *llnode; + struct page *p, *tmp; + + llnode = llist_del_all(llhead); + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { + unsigned int p_order = p->order; + + split_large_buddy(zone, p, page_to_pfn(p), p_order, fpi_flags); + __count_vm_events(PGFREE, 1 << p_order); + } + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2599,7 +2633,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_frozen_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, fpi_t fpi_flags) { int high, batch; int pindex; @@ -2634,6 +2668,14 @@ static void free_frozen_page_commit(struct zone *zone, } if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) pcp->free_count += (1 << order); + + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + /* + * Do not attempt to take a zone lock. Let pcp->count get + * over high mark temporarily. + */ + return; + } high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2648,7 +2690,8 @@ static void free_frozen_page_commit(struct zone *zone, /* * Free a pcp page */ -void free_frozen_pages(struct page *page, unsigned int order) +static void __free_frozen_pages(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2657,7 +2700,7 @@ void free_frozen_pages(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, fpi_flags); return; } @@ -2675,23 +2718,33 @@ void free_frozen_pages(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); return; } migratetype = MIGRATE_MOVABLE; } + if (unlikely((fpi_flags & FPI_TRYLOCK) && IS_ENABLED(CONFIG_PREEMPT_RT) + && (in_nmi() || in_hardirq()))) { + add_page_to_zone_llist(zone, page, order); + return; + } pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_frozen_page_commit(zone, pcp, page, migratetype, order); + free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); } pcp_trylock_finish(UP_flags); } +void free_frozen_pages(struct page *page, unsigned int order) +{ + __free_frozen_pages(page, order, FPI_NONE); +} + /* * Free a batch of folios */ @@ -2780,7 +2833,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_frozen_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, FPI_NONE); } if (pcp) { @@ -4841,22 +4894,37 @@ EXPORT_SYMBOL(get_zeroed_page_noprof); * Context: May be called in interrupt context or while holding a normal * spinlock, but not in NMI context or while holding a raw spinlock. */ -void __free_pages(struct page *page, unsigned int order) +static void ___free_pages(struct page *page, unsigned int order, + fpi_t fpi_flags) { /* get PageHead before we drop reference */ int head = PageHead(page); struct alloc_tag *tag = pgalloc_tag_get(page); if (put_page_testzero(page)) - free_frozen_pages(page, order); + __free_frozen_pages(page, order, fpi_flags); else if (!head) { pgalloc_tag_sub_pages(tag, (1 << order) - 1); while (order-- > 0) - free_frozen_pages(page + (1 << order), order); + __free_frozen_pages(page + (1 << order), order, + fpi_flags); } } +void __free_pages(struct page *page, unsigned int order) +{ + ___free_pages(page, order, FPI_NONE); +} EXPORT_SYMBOL(__free_pages); +/* + * Can be called while holding raw_spin_lock or from IRQ and NMI for any + * page type (not only those that came from try_alloc_pages) + */ +void free_pages_nolock(struct page *page, unsigned int order) +{ + ___free_pages(page, order, FPI_TRYLOCK); +} + void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) { diff --git a/mm/page_owner.c b/mm/page_owner.c index 2d6360eaccbb..90e31d0e3ed7 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -294,7 +294,13 @@ void __reset_page_owner(struct page *page, unsigned short order) page_owner = get_page_owner(page_ext); alloc_handle = page_owner->handle; - handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); + /* + * Do not specify GFP_NOWAIT to make gfpflags_allow_spinning() == false + * to prevent issues in stack_depot_save(). + * This is similar to try_alloc_pages() gfp flags, but only used + * to signal stack_depot to avoid spin_locks. + */ + handle = save_stack(__GFP_NOWARN); __update_page_owner_free_handle(page_ext, handle, order, current->pid, current->tgid, free_ts_nsec); page_ext_put(page_ext); From patchwork Thu Feb 13 03:35:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13972750 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 897D0C0219D for ; Thu, 13 Feb 2025 03:36:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19DA2280006; Wed, 12 Feb 2025 22:36:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B0F9280003; Wed, 12 Feb 2025 22:36:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E46DD280006; Wed, 12 Feb 2025 22:36:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BE0F4280003 for ; Wed, 12 Feb 2025 22:36:12 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7CFBBC015E for ; Thu, 13 Feb 2025 03:36:12 +0000 (UTC) X-FDA: 83113508184.01.691AFE3 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf13.hostedemail.com (Postfix) with ESMTP id 99C6520004 for ; Thu, 13 Feb 2025 03:36:10 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=no5oIz7u; spf=pass (imf13.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739417770; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dxu1Rc7/sI72JB4NYxxk9elaSjOiJ9Td89bVV/zMYyc=; b=aJCh5kOOrsRSJfOqNbT/NyioxCANBeodgftfO+rOWu/fgUfhixY/GCHdCMIvrlwEWUlMsP r6XyR7fKM6Qumbi3eYJ3KXWWdYpPoK91SVBXPPtK6Yme9EXitm+Fle/Qn4G4SLd9BZbXkl C0joQC/FbRI0tlLBIee0S9pW0nM3YpM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=no5oIz7u; spf=pass (imf13.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739417770; a=rsa-sha256; cv=none; b=UlQpoqU91vblcnoqIZruXao+IViocF5lxoR8S1N02J/Qs9Xibnx7asaioOJz5Q9nbfjiH4 UdiGK/TVJ8Ax6nVwxPH7FS8WXeu4wfaV8L0vA7PZxueYKUALlGk0nnFlKzexWPz8Vu8zAf VeF5n1ils5JiIE8PgDRvso+1C9zHP/8= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-220c8f38febso6285935ad.2 for ; Wed, 12 Feb 2025 19:36:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739417769; x=1740022569; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Dxu1Rc7/sI72JB4NYxxk9elaSjOiJ9Td89bVV/zMYyc=; b=no5oIz7uqdUdKUm+u0W7q7hNi8QatTi61gxB5UWDeD9bvxVPtE+cx3zv5WUsg3yhHX 92uop8h32JIGusfziQgOHhqAETWOlKXQKw6Fl1Bkosu+9JJyhgcx5KUagP6XUSoUSY91 EcJELH7jMUexXKaBPrpB0lAU1Qi1is/hW/p0+eMF3coJNGzkivF8jPqy5KFHpEs98wY0 fuxUf2fFCIGmp/3CQq4gsJMM2xJ/GITplgPIfPd6gUYL1GVlQ4muSXaITjgfkzlTqsJJ jU4Bqaq0d0lgE4SCiQ68cOqql295+YJ5FAd3wYBefobouRqb6YjD7fekp8tF5Mm02WlB NY3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739417769; x=1740022569; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dxu1Rc7/sI72JB4NYxxk9elaSjOiJ9Td89bVV/zMYyc=; b=leZ6lQ67jINxgvKHqS/fF1OrAdajp7Fywk2psy8UscEK12N75wSyFWUIoVNWH8xcvl vZ+k0YdzWvTXrhjG9tTwVR9Fk+VGuWxM28OQBfhZrjZtICt+ogus764uCF1sWMs57aUE O8eQHTz1V5sqzDFzClNSOhKS6rJg5HbzKpwYp+gTxVPDyImzmk2SIPT3H69jk5rLV/vF sqnWV1HdZaUlfWnZJszi0nQp/HcR5l2R5HSBzjf7a2rEcKmChgTw1bfg6/I/WSLfxDXy PZuK8krqGJMV0zYlikKM8VoNmb1z/wnYiChp0nGSN5w126dLeiujI0K2WKxHpSHDUPvn gq7Q== X-Forwarded-Encrypted: i=1; AJvYcCVAUPwf0xM0cLNUdC/MwODd0wYULAFt+M7MZUpsQx4L2EjYoAl+LptNYpwOBbW2J/bIOoyxwm7HWA==@kvack.org X-Gm-Message-State: AOJu0Yzc20zUrdw8lL+Go8AP1+TF0cnsVPT/hgIrpqJQFTWyPUBEn8tM 6w0usKkfNxzMkXKCW9BQ5qTV8UGnIHAqYeCVPbd/Gp/qa/VCP+/j X-Gm-Gg: ASbGncvD0GmtY0OM05LwpUhQuixubQe0TAiQDKqxi5FgtgqTIpXhPjF49VkkdpbtvFL NNic/Oxec53jm73bboYCsUWrlHa0kvWeODjQpoRUiJkensUbib0jRUYI/iDrs2+R7UJANGm20MR llxEbIg+2AGK4pW5c9+Vlu+R3O9MZGCpTD2pAGocHPxhgQxFWlKapxabzwBYVTanSHE14bNNY85 EMvtF/om63BfueMQdu1raaNjp8cYowJSMHHfsgARetGNTMU7aJuaZeOgtzfBo7YJWnI+4r1W70T VDX9/AKdv2XXSWaS8brthDn0TtbiZ4ook2gzkY507l+7Tgp9zw== X-Google-Smtp-Source: AGHT+IF0oyIrki3EN7feLFgHfxItX3wzuThKb7cJwiTjjSoOQ/uGuke17pktF4EceH+U1SvptjTlvw== X-Received: by 2002:a17:902:c94b:b0:220:d1c3:2511 with SMTP id d9443c01a7336-220d1c327f5mr24489425ad.26.1739417769475; Wed, 12 Feb 2025 19:36:09 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d556d6d2sm2693595ad.177.2025.02.12.19.36.08 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Feb 2025 19:36:09 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v8 3/6] locking/local_lock: Introduce localtry_lock_t Date: Wed, 12 Feb 2025 19:35:53 -0800 Message-Id: <20250213033556.9534-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250213033556.9534-1-alexei.starovoitov@gmail.com> References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: kxho6xhta3e47ugmp5sfa8udgqc7o8cc X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 99C6520004 X-HE-Tag: 1739417770-849854 X-HE-Meta: U2FsdGVkX19IPa0mLYsAwW/gphEblCZiDV1l+lHw3Pzq5XR6WI3tCc3K1LioDgSFK97hA9Tmq4Bd5GefzeDgSVD6VmKB1nCotqq5YFE/ywbmxF7eHz8FJc+FztzRjI5uewkxabCNmMurhE2/diZkay72zWzmhVuY3cxEb6uCrz8S0Kkfn6Q3G9BonWyCcOlC0u4vb8/iQpmoyvqjeufB00glEHw0l+Hhjvv/FZYtDnnBd6899RXGsfoaK5tLuu1YkoTcVCy+pxvFGe8EOiLrjmsFLV1beGbwX9UHbZWIa5xAlzEzPJ1kBE3SrtFKXq1niQcRb2gmp4Rr1C4Ulhzagal9D/nE3FiByYiUMN7euYXXEVlay85QkTHYzdKfzFhcTWkWnL/J4cVIkDs+KJpmEBe1Z7YyacjGSoNPr+yOSXsFrdhtQh/xHQAONbrIJdbQdIjjWqp3FQb0TtVAbeOrbd23iO+3Xa2baphZpZf3AM+Mcwl1n/VAMyciIBcplBzDoCloNJLm3/bWe876RzjcTf+tx0Pyhmh9xpqWR/RYQX2tyfxJQ9fcppRTnpXUK8acCMYPTr423Dz8QTkzs4ir11Bbyqvj8vyJnV/k9TNeZ3LnexPJ3wGUwkhnW+5rh5U3nVTBnltwUSe9luTbLbvW1v6hdJkCss/tCqjO53f2CsFpbpqqEsQ/m+43mcn4LeurzG6nhmuBzLAdjP9/fT5utN+pYJY6s+ke1p6c+FmnhnLr4x6iK/W95A5sqeYVsHf60OIFy7HYQVjqxZkn7noIn7LKPsKZoUSmEP6adTYFcK1h5pCskiBF1cTrcZm2X+XRkEU7LdHZgY19+BeVi4RzS/iv2K0afOJ2DFO8ZCMt9OgXdoMmaT9JyWDxsmjuKU9ZPjgKXPPSnGVOnz8RlEI1qit1P1+u1GA+Cl2trvWRgNwgifFDAD2g+UFxKDgkAZjWvFq8CZQhpO/G9G8Wvek jrIGFHD5 TE9VTG6fC4UoiGzUPyYJnL2XALPjtkBbYYBLWdwBjkUAKGCCkwe8tlGHOO/K27fblW3QkXmxZ6T/XpEOc+wi6U7SaidRXd9nHxb5uG16QZsqCV3pGjRhLNq6cAmHdfu6gQ0e1LFZY+ds7+NEXonk7SOrhQronw5jZI6YcQ3Hlw6Pp822/NtOyk2sYRzDUzXq9GE9jn70K5iqZIcVrANFxIIrSY+pawJxAwEzLz60hDrmgQjAyYlRPXB3M2XBxI7NwmDq7FEPK1mJs+wHzdrcdMJF9t1mP3qSjzKrojcMiJIRHSFLMLDze4DnKuHVxw82sFeK2j+1+On+K8Ctd4jMmff4TKQocawpg25MrE/xoUxwg0Va/Pq8jRcNNMhbPDCgRtawqHWfdEcSHhKHSm9oh2VPMo2LcUjdm6Mc2d4MzYgwUFYQrghFMv3PiBJS54PuBSw1qfG2Zlzc3emwDRlHmzl1ICOhldgnpWSCc4TaBwyWNSViv9V/ylUXNUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Sebastian Andrzej Siewior In !PREEMPT_RT local_lock_irqsave() disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave() for exclusive access. Introduce localtry_lock_t and localtry_lock_irqsave() that disables interrupts and sets acquired=1, so localtry_lock_irqsave() from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map localtry_lock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to PI issues. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Alexei Starovoitov Signed-off-by: Vlastimil Babka Acked-by: Vlastimil Babka --- include/linux/local_lock.h | 59 +++++++++++++ include/linux/local_lock_internal.h | 123 ++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index 091dc0b6bdfb..05c254a5d7d3 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -51,6 +51,65 @@ #define local_unlock_irqrestore(lock, flags) \ __local_unlock_irqrestore(lock, flags) +/** + * localtry_lock_init - Runtime initialize a lock instance + */ +#define localtry_lock_init(lock) __localtry_lock_init(lock) + +/** + * localtry_lock - Acquire a per CPU local lock + * @lock: The lock variable + */ +#define localtry_lock(lock) __localtry_lock(lock) + +/** + * localtry_lock_irq - Acquire a per CPU local lock and disable interrupts + * @lock: The lock variable + */ +#define localtry_lock_irq(lock) __localtry_lock_irq(lock) + +/** + * localtry_lock_irqsave - Acquire a per CPU local lock, save and disable + * interrupts + * @lock: The lock variable + * @flags: Storage for interrupt flags + */ +#define localtry_lock_irqsave(lock, flags) \ + __localtry_lock_irqsave(lock, flags) + +/** + * localtry_trylock_irqsave - Try to acquire a per CPU local lock, save and disable + * interrupts if acquired + * @lock: The lock variable + * @flags: Storage for interrupt flags + * + * The function can be used in any context such as NMI or HARDIRQ. Due to + * locking constrains it will _always_ fail to acquire the lock on PREEMPT_RT. + */ +#define localtry_trylock_irqsave(lock, flags) \ + __localtry_trylock_irqsave(lock, flags) + +/** + * local_unlock - Release a per CPU local lock + * @lock: The lock variable + */ +#define localtry_unlock(lock) __localtry_unlock(lock) + +/** + * local_unlock_irq - Release a per CPU local lock and enable interrupts + * @lock: The lock variable + */ +#define localtry_unlock_irq(lock) __localtry_unlock_irq(lock) + +/** + * localtry_unlock_irqrestore - Release a per CPU local lock and restore + * interrupt flags + * @lock: The lock variable + * @flags: Interrupt flags to restore + */ +#define localtry_unlock_irqrestore(lock, flags) \ + __localtry_unlock_irqrestore(lock, flags) + DEFINE_GUARD(local_lock, local_lock_t __percpu*, local_lock(_T), local_unlock(_T)) diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 8dd71fbbb6d2..c1369b300777 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -15,6 +15,11 @@ typedef struct { #endif } local_lock_t; +typedef struct { + local_lock_t llock; + unsigned int acquired; +} localtry_lock_t; + #ifdef CONFIG_DEBUG_LOCK_ALLOC # define LOCAL_LOCK_DEBUG_INIT(lockname) \ .dep_map = { \ @@ -31,6 +36,13 @@ static inline void local_lock_acquire(local_lock_t *l) l->owner = current; } +static inline void local_trylock_acquire(local_lock_t *l) +{ + lock_map_acquire_try(&l->dep_map); + DEBUG_LOCKS_WARN_ON(l->owner); + l->owner = current; +} + static inline void local_lock_release(local_lock_t *l) { DEBUG_LOCKS_WARN_ON(l->owner != current); @@ -45,11 +57,13 @@ static inline void local_lock_debug_init(local_lock_t *l) #else /* CONFIG_DEBUG_LOCK_ALLOC */ # define LOCAL_LOCK_DEBUG_INIT(lockname) static inline void local_lock_acquire(local_lock_t *l) { } +static inline void local_trylock_acquire(local_lock_t *l) { } static inline void local_lock_release(local_lock_t *l) { } static inline void local_lock_debug_init(local_lock_t *l) { } #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ #define INIT_LOCAL_LOCK(lockname) { LOCAL_LOCK_DEBUG_INIT(lockname) } +#define INIT_LOCALTRY_LOCK(lockname) { .llock = { LOCAL_LOCK_DEBUG_INIT(lockname.llock) }} #define __local_lock_init(lock) \ do { \ @@ -118,6 +132,86 @@ do { \ #define __local_unlock_nested_bh(lock) \ local_lock_release(this_cpu_ptr(lock)) +/* localtry_lock_t variants */ + +#define __localtry_lock_init(lock) \ +do { \ + __local_lock_init(&(lock)->llock); \ + WRITE_ONCE(&(lock)->acquired, 0); \ +} while (0) + +#define __localtry_lock(lock) \ + do { \ + localtry_lock_t *lt; \ + preempt_disable(); \ + lt = this_cpu_ptr(lock); \ + local_lock_acquire(<->llock); \ + WRITE_ONCE(lt->acquired, 1); \ + } while (0) + +#define __localtry_lock_irq(lock) \ + do { \ + localtry_lock_t *lt; \ + local_irq_disable(); \ + lt = this_cpu_ptr(lock); \ + local_lock_acquire(<->llock); \ + WRITE_ONCE(lt->acquired, 1); \ + } while (0) + +#define __localtry_lock_irqsave(lock, flags) \ + do { \ + localtry_lock_t *lt; \ + local_irq_save(flags); \ + lt = this_cpu_ptr(lock); \ + local_lock_acquire(<->llock); \ + WRITE_ONCE(lt->acquired, 1); \ + } while (0) + +#define __localtry_trylock_irqsave(lock, flags) \ + ({ \ + localtry_lock_t *lt; \ + bool _ret; \ + \ + local_irq_save(flags); \ + lt = this_cpu_ptr(lock); \ + if (!READ_ONCE(lt->acquired)) { \ + WRITE_ONCE(lt->acquired, 1); \ + local_trylock_acquire(<->llock); \ + _ret = true; \ + } else { \ + _ret = false; \ + local_irq_restore(flags); \ + } \ + _ret; \ + }) + +#define __localtry_unlock(lock) \ + do { \ + localtry_lock_t *lt; \ + lt = this_cpu_ptr(lock); \ + WRITE_ONCE(lt->acquired, 0); \ + local_lock_release(<->llock); \ + preempt_enable(); \ + } while (0) + +#define __localtry_unlock_irq(lock) \ + do { \ + localtry_lock_t *lt; \ + lt = this_cpu_ptr(lock); \ + WRITE_ONCE(lt->acquired, 0); \ + local_lock_release(<->llock); \ + local_irq_enable(); \ + } while (0) + +#define __localtry_unlock_irqrestore(lock, flags) \ + do { \ + localtry_lock_t *lt; \ + lt = this_cpu_ptr(lock); \ + WRITE_ONCE(lt->acquired, 0); \ + local_lock_release(<->llock); \ + local_irq_restore(flags); \ + } while (0) + #else /* !CONFIG_PREEMPT_RT */ /* @@ -125,8 +219,10 @@ do { \ * critical section while staying preemptible. */ typedef spinlock_t local_lock_t; +typedef spinlock_t localtry_lock_t; #define INIT_LOCAL_LOCK(lockname) __LOCAL_SPIN_LOCK_UNLOCKED((lockname)) +#define INIT_LOCALTRY_LOCK(lockname) INIT_LOCAL_LOCK(lockname) #define __local_lock_init(l) \ do { \ @@ -169,4 +265,31 @@ do { \ spin_unlock(this_cpu_ptr((lock))); \ } while (0) +/* localtry_lock_t variants */ + +#define __localtry_lock_init(lock) __local_lock_init(lock) +#define __localtry_lock(lock) __local_lock(lock) +#define __localtry_lock_irq(lock) __local_lock(lock) +#define __localtry_lock_irqsave(lock, flags) __local_lock_irqsave(lock, flags) +#define __localtry_unlock(lock) __local_unlock(lock) +#define __localtry_unlock_irq(lock) __local_unlock(lock) +#define __localtry_unlock_irqrestore(lock, flags) __local_unlock_irqrestore(lock, flags) + +#define __localtry_trylock_irqsave(lock, flags) \ + ({ \ + int __locked; \ + \ + typecheck(unsigned long, flags); \ + flags = 0; \ + if (in_nmi() | in_hardirq()) { \ + __locked = 0; \ + } else { \ + migrate_disable(); \ + __locked = spin_trylock(this_cpu_ptr((lock))); \ + if (!__locked) \ + migrate_enable(); \ + } \ + __locked; \ + }) + #endif /* CONFIG_PREEMPT_RT */ From patchwork Thu Feb 13 03:35:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13972751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C60B0C021A0 for ; Thu, 13 Feb 2025 03:36:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 571A6280008; Wed, 12 Feb 2025 22:36:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5259E280007; Wed, 12 Feb 2025 22:36:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34A1B280008; Wed, 12 Feb 2025 22:36:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1696E280007 for ; Wed, 12 Feb 2025 22:36:16 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CC6A0A01B8 for ; Thu, 13 Feb 2025 03:36:15 +0000 (UTC) X-FDA: 83113508310.07.8054AF6 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf09.hostedemail.com (Postfix) with ESMTP id EC107140002 for ; Thu, 13 Feb 2025 03:36:13 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bGbVigZu; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739417774; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UVEemv3fH2Y13Oyfgyz54ddroh/hFIT8TR6MrFMiG5U=; b=pWSnYucbFGE7pfm6fmzTw8lDf56DWBTOzZqBwYrj3pnXRaUD6HPDEmcAMljoVlOO8M98D7 zsXf7eZn797R78j3WW02J6YtuxMuxlN6El1gZ4i/kQNwgGH5Rvikh0XbZ2eKLtg0ZWK8/d AS38EaKWYWz5NTedF17sOAFUa2CUzyc= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bGbVigZu; spf=pass (imf09.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739417774; a=rsa-sha256; cv=none; b=FGcIL7+rY7NGfrdvg0++U/fdHK+SUs5GGtxR1njqqDt7YkRHKljCG8smDRNH43zt007OmE dbQJwSvgiiSXYUm9oj6xMJGsBC8S7qqYola2r2L3sjDc+BNwl8yPNIkOfg1ymlbGq7X7hM 0siQxB6/nrATP8NCQTyxCImv5+ayC7M= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21f818a980cso5066385ad.3 for ; Wed, 12 Feb 2025 19:36:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739417773; x=1740022573; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UVEemv3fH2Y13Oyfgyz54ddroh/hFIT8TR6MrFMiG5U=; b=bGbVigZu6alGqZBjVexcSSpHFCEP5gu0Mi69kHzUYs7ozw34sJMGGM7V3F6Wi03mgc 0CTN+zwRaXWSQZDmfG6NXAKDzaDc/XCaXyyCBq7vr/MKucj8Qz5knwWXTjLhHbOe/epc 0cyTJ9rHXNuNC9Uxi/2yD/BzrBBilRuYmYm4yagAoBCVaV+gBOTWOkJXfYOdtClUKzf6 o26ekbuTgZKDqlbQR4PjOgaa1/DDf6Fw0WGpIZczuVwiot6ILZgxHjjocrc7PK7ub5gG uqHUTOayFjMhbzEX999pGzu0LRrDZzS84/bLdWjuVjobdlS8ZgvCMeTE/rBi22Xp9nQC GK/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739417773; x=1740022573; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UVEemv3fH2Y13Oyfgyz54ddroh/hFIT8TR6MrFMiG5U=; b=HpHTn7xB0G+uTGiool2mwdLtQmWdW6xeI6bU3Otj0BU5Pk7kQbgjdCSkyU3Eie3gtd I2xhNhjMFUDRuIQC63uTFRGMUkA/PGmscsHmyfbrCticQDpjmolBTDoQbCuvUdBp9YcS TtgRwiT+7a0xZk7Dbppgyig1LnzvNDq7LqIVio2dTQMy6oWX8ZiO+Bm3or1BgTEasGVx uYmmKeN6Si5YyOowDpC4BcJ4lw2/s3abPB5YOzvZnm0sN2urIU6fgFCwhmNovU0bWhyS XuEgPgjj0xZa1LuSfaYUtGmJDQNpiaONnGjahLz14Z22fNYRAUDu373cDmafjDY2lJ6C 2Uvw== X-Forwarded-Encrypted: i=1; AJvYcCU1lLBdu5ZmVyNF5jg7QQwvD8kCdXxQdFGdehepBHsEzIY5Btkn/JU60u98JReIYzLkyPDwLc38hw==@kvack.org X-Gm-Message-State: AOJu0YwF/zcoKSMaspPr+eBguwotuQ7nSy3V1u/C9z63CvUureKkN3ES 6quVmYsDtD71kddJL0LdDjDCYNT4WethhhRG6/CobWDTGib7Iipv X-Gm-Gg: ASbGncvmHqGQ+TxBSe5OOHFbJoOo3oP7WKNuMz4T6ZWnVsFpgqua1ciMtvdM6JCVY39 JOXykpOf/sU9xAa7nO1O3rMyqq0XotW8m8YzGnmP5c8TlAYxWVhehS/ylhgb3JZVO6e0TpAAVNg b4ML3PAPQ5868cnQc9F8RFqVGuvGZiG2gI227wdiF0BwkydTAsoSc1r7C39PjuANq1lbF6BXH1h RSORPQMR7S3xkb5PHo/gtwRx5L6f5wkAbjKfsqb0h2mXrQReCafC4oiGcTN3y4rl3eSYi0fDEYV EUsoYPjMVOBGZ+6/l9P2GB5X/OuQxUJoAig5oBaIRX7CJowRJw== X-Google-Smtp-Source: AGHT+IFVpGjNj/vzK1bi6WsgsmwnS7KOtENZ1Nu6FIVShyFexL6xf+cWsUdEjky1AdkIY4WdMWXaMw== X-Received: by 2002:a17:902:c942:b0:216:386e:dbc with SMTP id d9443c01a7336-220bbacbb1amr78015995ad.13.1739417772747; Wed, 12 Feb 2025 19:36:12 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d556d5e6sm2691755ad.173.2025.02.12.19.36.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Feb 2025 19:36:12 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v8 4/6] memcg: Use trylock to access memcg stock_lock. Date: Wed, 12 Feb 2025 19:35:54 -0800 Message-Id: <20250213033556.9534-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250213033556.9534-1-alexei.starovoitov@gmail.com> References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: EC107140002 X-Stat-Signature: gab775d8he17fkgsnk8pdbceqhq3nuo4 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739417773-218284 X-HE-Meta: U2FsdGVkX18KOFUOUXQEzDfUNCKwU2gXR1BoT6p4onGZ1qKTxv6m1zZKoVSJMNh0qJ/5FI6qq8XC5D0KYo5JaKbArRC6jHNLk22SJIaBQB8ESuS9D02WPcVWGfCQ3wtbR7ypAjMVOhGUkvErv9UnNYhC/+KvSzu/zlIqn+97ew+gJGHcp4CEKHCGllk0PSp+ntPkhw86Y/iASnQ3CmrvyQArgXhV23fHffxKpRTXO4XDWhmtk3Kimw9aREXo9ISHQcggLlbYIr8WOl3TIlILy5bEioAwqY8OkHbf7nqNp5ejZheycnGoGLRd6go/V+6SPBkwxm/jogjD6jewUF+himGy2krVJ6xh4M2Q32hKFF02dxXYrjMgvre9B8u4RsorfMfdGUU9gaO+tq7LvUD/rLTy2xHVcgSYugi8LXdSWf7/yYW0WNzBKSlzTTHfPLP7wNcRyEewA/BhSonwowYyCqd/3XN/95h88djEz2KfaSr2Q90KwRVnX9h5KezSJORTUH+NfOLzG2fEN8mgCzsiEPIfgTDm9fErNXUvrvQs+//9her0XEhLY5oi7UlgbrjkJpUZU8cdaRZD9ddfrq1vMEbbSnSN2YJHve5ud3fxF9KJM3UXTYnsvmLF4lcqlyYTNIYOIXHfl9A+6+VYJBwyOUCJKMWYa7v+NOAujfxmth6kBm3EI6xey9SfXg3omJtIfuukcgZpONC08aFOkTD8QGYONHv8hVcZhGY2Ta0LKzv011QLZksiZtvUKwpBID3FdGNgOS89SW8yw+V5dGlJqIfwnLnIMjH0EaqZz3CdhVKGD+uNrz97XV4utv1wTWiwjmTElkHVLplt3bG/ltz9whGBsSlRjfAcnUyUYoXGKwIR7RasGXRfJaEXNoQCTavakJy3RjQ6kArdv92B3KdI8WCo6Gaa4kypbe+YmpOdxXiNuWTAi8CCAzeBjj6sHqhVE22vMOIfoCz2SWnH1QP HY/vnaVt 4KbRkadugHg9OqewGkVn4mawp3DUc6q32w3i0K/8rEkvPiuXP/blATeFGEUjl42HMmQ31DuMVNqDa65+kb46Fp9a62Uptiu4pqgO3H7ioCVfp0nKQg+4OSvzcqIF6L4jHzyEZOWTzm8bdHnN5HXTsDNyHj2WwFCgCaXRfvzmgYaRAwA2OVoodjduDpzWh/9f28GZjF/tuLEzCmxD5MJArXJBaqdkmstzNba3+QFw20hoycBt+yNYH6At/oBsSWOhkY1gAx9wf3SSU37QcUctFGhhUJDm2lxPoKmSgVZ5J2el7aJslMuxXJJEiZjv188prvtJwzbxXwU6W2UPWxzKOSRfssKH+iRNBnBJi39NXOFdjJPN05NXbS9yUn0IQ4hNcWeABzEwI94hdSH5sdYsB4qvHA4lpP1Gi/4QZ7B/4zlTN79OAAfhYHxgvatLHN7DPjjggjeDUY3JzqXjCpXLAi5p5vp3/S3hdM4ig40TrQ66UavMzU0O0jdH+TulJSYv5Fo5eKLI8Pl695ktHKh8LncISmuBXmk5+OYmBbxmqnQQEmj7OwHcGx7+26A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Teach memcg to operate under trylock conditions when spinning locks cannot be used. localtry_trylock might fail and this would lead to charge cache bypass if the calling context doesn't allow spinning (gfpflags_allow_spinning). In those cases charge the memcg counter directly and fail early if that is not possible. This might cause a pre-mature charge failing but it will allow an opportunistic charging that is safe from try_alloc_pages path. Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- mm/memcontrol.c | 52 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 46f8b372d212..7587511b92cc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1739,7 +1739,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) } struct memcg_stock_pcp { - local_lock_t stock_lock; + localtry_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; @@ -1754,7 +1754,7 @@ struct memcg_stock_pcp { #define FLUSHING_CACHED_CHARGE 0 }; static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = { - .stock_lock = INIT_LOCAL_LOCK(stock_lock), + .stock_lock = INIT_LOCALTRY_LOCK(stock_lock), }; static DEFINE_MUTEX(percpu_charge_mutex); @@ -1773,7 +1773,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, * * returns true if successful, false otherwise. */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, + gfp_t gfp_mask) { struct memcg_stock_pcp *stock; unsigned int stock_pages; @@ -1783,7 +1784,11 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) if (nr_pages > MEMCG_CHARGE_BATCH) return ret; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + if (!gfpflags_allow_spinning(gfp_mask)) + return ret; + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); + } stock = this_cpu_ptr(&memcg_stock); stock_pages = READ_ONCE(stock->nr_pages); @@ -1792,7 +1797,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) ret = true; } - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } @@ -1831,14 +1836,14 @@ static void drain_local_stock(struct work_struct *dummy) * drain_stock races is that we always operate on local CPU stock * here with IRQ disabled */ - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); old = drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); obj_cgroup_put(old); } @@ -1868,9 +1873,20 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { unsigned long flags; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + /* + * In case of unlikely failure to lock percpu stock_lock + * uncharge memcg directly. + */ + if (mem_cgroup_is_root(memcg)) + return; + page_counter_uncharge(&memcg->memory, nr_pages); + if (do_memsw_account()) + page_counter_uncharge(&memcg->memsw, nr_pages); + return; + } __refill_stock(memcg, nr_pages); - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); } /* @@ -2213,9 +2229,13 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned long pflags; retry: - if (consume_stock(memcg, nr_pages)) + if (consume_stock(memcg, nr_pages, gfp_mask)) return 0; + if (!gfpflags_allow_spinning(gfp_mask)) + /* Avoid the refill and flush of the older stock */ + batch = nr_pages; + if (!do_memsw_account() || page_counter_try_charge(&memcg->memsw, batch, &counter)) { if (page_counter_try_charge(&memcg->memory, batch, &counter)) @@ -2699,7 +2719,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, unsigned long flags; int *bytes; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); /* @@ -2752,7 +2772,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, if (nr) __mod_objcg_mlstate(objcg, pgdat, idx, nr); - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); obj_cgroup_put(old); } @@ -2762,7 +2782,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) unsigned long flags; bool ret = false; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { @@ -2770,7 +2790,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) ret = true; } - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } @@ -2862,7 +2882,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, unsigned long flags; unsigned int nr_pages = 0; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ @@ -2880,7 +2900,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, stock->nr_bytes &= (PAGE_SIZE - 1); } - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); obj_cgroup_put(old); if (nr_pages) From patchwork Thu Feb 13 03:35:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13972752 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 215FFC0219D for ; Thu, 13 Feb 2025 03:36:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A2A57280009; Wed, 12 Feb 2025 22:36:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B229280007; Wed, 12 Feb 2025 22:36:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82E13280009; Wed, 12 Feb 2025 22:36:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 60943280007 for ; Wed, 12 Feb 2025 22:36:19 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D77D01A01C6 for ; Thu, 13 Feb 2025 03:36:18 +0000 (UTC) X-FDA: 83113508436.06.2C5E8DF Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf14.hostedemail.com (Postfix) with ESMTP id F3BDC10000A for ; Thu, 13 Feb 2025 03:36:16 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=N7TzuOHd; spf=pass (imf14.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739417777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3/v8WJgHJaSQyAqUJ963JkPt7Jj2HKJav5PT9g4jqQk=; b=wsINA9LuNsdWCpoRqWXzPVDU9dini5bsoX9yMc5ffPDTutrjzaXeu2P9OH+A7oL6+Cd68k RRzrJnQZ1i6wWYiRWgty7l6NrBFUqJPhnjMUj27fdVsaIFqGTpwk/Q2l1KElcK5qFxX7yF 8nzcpC2I+lfQB1uoAG+M3LI8kyqQWWA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739417777; a=rsa-sha256; cv=none; b=ERnlj3tEVFiRQomWqIMlvA1Ao4dZyr3qTuf38N+N6lxHx+PjN25QO+xFHdHFR/o04hjwHr 62hNKGgCgbWucZQ9/Sj+5KgA9qii6IdWi4HHXyKaZbQFX3W1U/LA6U5lxUV5QU5ju5ufnI vpHNbadQH0lT6KK9wOuP02s7pPrWwSs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=N7TzuOHd; spf=pass (imf14.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-220d28c215eso4912245ad.1 for ; Wed, 12 Feb 2025 19:36:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739417776; x=1740022576; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3/v8WJgHJaSQyAqUJ963JkPt7Jj2HKJav5PT9g4jqQk=; b=N7TzuOHdaXzCbk1kKpV2UcMkTk013LaBZ/MkMeOJ+cKc1W0lXtz+5+FxlpHPkKcqib e3Kwg+iZUM67wh6uwxzYCQwpbxpLVQ7DjwOe1O6/oOlEaqhD9t4me4ZILF/41eLcMIy1 2nplka9QO20uIdMO66dx8g1jbRQyeAWqSnSznqxbitxkZE1mixKrPqoyJJtxAlfYokLD SmrCWmo3EMWGjz2GAfXdYS4yQ5W3QOgIy188bb0WXCizyS5SYdMRDpQxTCz+pIs/dFj5 htiKOKrQe09H0sawChHnJs/DDhVeBwKQ2Zd2p1MjinskUKPfzzSf1ajk5HnGgKIe68Q0 MEJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739417776; x=1740022576; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3/v8WJgHJaSQyAqUJ963JkPt7Jj2HKJav5PT9g4jqQk=; b=k/Ebwp8mR3GDXGnDYK5hmcn57sYZd/RMSpXSPSO4h2gwU8uEntVvBkI8G+WcvLHYQK OXgiz5GztY0towuI5CGAelzto0j2kIGZWtahvRTKbnFWO0t+O/uwmaRH+Jjrifk+68GN OXGZ4omPOUkC7xd6L5QPFnQHymHWTaUB7tuF5DMz5tovtkffsoiwU0NxJgrsW+dQ/eNt 8dYKH2Hj9QHn2jlHOfi0hZjyCrc/qRDpNOSdTHRNKc3FHVQf6XvUylo9r5tmfCsDcnqh N5N40aLNX08yp/j21i7+P9KYjEhoteeZg/icV1HnA0KDORutXxdwXnOkOt10bqm0Jg+C rhAg== X-Forwarded-Encrypted: i=1; AJvYcCXdljAmU63r+cgG9UgBLQ5zyfFT8Z9G01D1hctEfccItxK8nR+vWVV2XRPgYheNoI/Az/QXxKHIWQ==@kvack.org X-Gm-Message-State: AOJu0YyRoo9vRFXuZ0KM3qe5qwrYRvALtw8++gfGXQdYZd7NkhG6xUXY Q6Gu8ywzgQl57gZuqCxuC7ZmfyCoQ9LeXD1ItXz8DRL/103qFBF3 X-Gm-Gg: ASbGncu5yD/ibhLXQmve7E4TGFkW26TZu8MiuTy8wxRMVXtkzggYnYzzCF9q1GDupO7 Q/T9vA7M+2syJzJBSfYWurg7brjChNQ5dglBMYvNfe+d0q45qzxOJ2E91xj+Ixt5GBV2mJbtLOA BGzWxRdmPeUtMh6IGmNpE+fIza8WXKrCfTfWSnQY+/GlNPforZVm0nJonHcUal53MzpRikm5b9K 7k5BYJX9g0xXupLpNGHoE+xD8+cxRXKOliIW+BgvaJmZiq74bDxDFEojCmL8MTjn6otuVMCDRva dmcofc9eqY18Sd2ALQ41oEZJVeQY3laYfjDGWtph79fG/35Mvg== X-Google-Smtp-Source: AGHT+IHexcsWqcq7pb6cql8+KDie7vvimoWi/+7jiQiVYLknknAkaUVO0a4xFfFcJbi5waug/L/YFQ== X-Received: by 2002:a05:6a20:4311:b0:1eb:48e2:2c2f with SMTP id adf61e73a8af0-1ee6b399c55mr3608181637.30.1739417775940; Wed, 12 Feb 2025 19:36:15 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7324256aa6asm227563b3a.60.2025.02.12.19.36.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Feb 2025 19:36:15 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v8 5/6] mm, bpf: Use memcg in try_alloc_pages(). Date: Wed, 12 Feb 2025 19:35:55 -0800 Message-Id: <20250213033556.9534-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250213033556.9534-1-alexei.starovoitov@gmail.com> References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: F3BDC10000A X-Rspamd-Server: rspam07 X-Stat-Signature: awwsp56ewbbdu1jnfiguzmq33f7b8q9r X-HE-Tag: 1739417776-645423 X-HE-Meta: U2FsdGVkX18mbYkLnR1lMwVZW6Fv/bWrle5lU9YVs369/j6dN07KxEkPW5HA5DBwH3P8ySEtwtEUFPx9DuYkQ4ZrEbJNOnmTCeVP9wS722YEvX4yKUBXcxTMf04WC3C9Lt2Agfc4hIpqtfhzRCZEsWsxjaRzKyMufDNoVT86IYvIn/gwu9HwtA8Qd953iCz6FwBCppemfFIQvNmLw9Oq4bIsPPoTuFZyNeeAdldnOzpYO/y346UI6Ef7pujtwXvZUihpcbSH19mDpbL1V2c3des11fDRxUKD8YtQxX4I3NeQh3Zo7qX0kpDSQ/lkqPEBN1XvzRnhFSsQLS6Xh3K4x0o4DG/7PwqHkWP/i+uWkv08vItqm//xLinFE/0/Yi23I1JAP3V9Lcz9Aj15mqCjBkOhPuWa3HUUizwIhQtUsjoOgwBQzg8KQHEwDvi+h3LXRjexJNj5SNCosD3ET1XKRveu5u8J7CB5KlTqClzI+3pvsEOkO1665nk8LKNLkQ2c6hPYN+2PkmZuRNRlp1jXnJSv0SG5c4lShdVjcNChj7ZuSmI9Avd3PUOrVfY9sQmJjAJ8FfVsuOPXittsfCJxI2ubQe5wB/EwIGNAV2JOiQL4PfBH3VDAtON+IJk6hZwqxi9WZimLy12Fa6HaUCFKSaUacUSOFgZnte+d+N5XW163f07Z1DF/rNlCf8xhGaoFVgt4pxhb8WGT0Pf9tgqj0MRIcbTi4SGW9bVJXJOoxGfwO10RSAEBQrKzzVYSEGnv1nxrtHFccRLi2e4O8xWG9Qv7D/vIHOz9Hh0vijbU7Av60iHDOhX6Yyt7BwxGsk/EOubO9xSJypBzuyRIfPJKgsXEI70apXXej738ug+bILLXJq/yMA/G8t8eFA0eJeQKqwXp1yLSANg9vJ8BF9BEPA18Q5IshKiUFHjsflgMVB98blGiM5zqSLopUB+hzX0zSkhltI2i7s9ve9fIn9O otl/w1TP xSPgIPkBzAMF16VmD7d6YrQHM7kABSDmOZgCV+n/20P+zO+5nZqcD/N9M7upMH9Jo28q4DzGFrDxRhZbWyjJ4kQQmEJDRvC2V8Is1q+tn/CDLKYmRW6yTEJLIKTnciMHOlGu+iBRMDWE9kPutJUk/B6ZW2EWFNaVZAC1Sjbr/Vt9YIfA6vJSVal16U0nNI+hxRfLUVSu23EXB5hTr6mmjZSqVLjCwvo6PqdGI+DdHXDs81wBMc1xYYnzDZxWjMHmSkblVyqC0qLb6uAPH4XjnlDEw+AGRY4y7DEmGoQNYeivsycv2qr2gPdWjpSpy3FgKkYiL2XefyNSp4HIS0apTrSiYvxYQZKP/qa0x/Bt7Y1ShkB4nRlFrYLrjEtB02ft2rLWTMhUli+siFvPH74SVuPQt0TvV1LNavUYNSgzxEw9hDoqID8aftCtcpf2B57zNH7PLNdCLTo7zM+TJTo/spEJpSwTorptiyBkrGGKWhi7HQ96BdhpBSohYwg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Unconditionally use __GFP_ACCOUNT in try_alloc_pages(). The caller is responsible to setup memcg correctly. All BPF memory accounting is memcg based. Acked-by: Vlastimil Babka Acked-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- mm/page_alloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbcbeb7de8e..c8068fd2da42 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7189,7 +7189,8 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) * specify it here to highlight that try_alloc_pages() * doesn't want to deplete reserves. */ - gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC + | __GFP_ACCOUNT; unsigned int alloc_flags = ALLOC_TRYLOCK; struct alloc_context ac = { }; struct page *page; @@ -7233,6 +7234,11 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + if (memcg_kmem_online() && page && + unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) { + free_pages_nolock(page, order); + page = NULL; + } trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); return page; From patchwork Thu Feb 13 03:35:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13972753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E34BC0219D for ; Thu, 13 Feb 2025 03:36:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B40628000A; Wed, 12 Feb 2025 22:36:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 06750280007; Wed, 12 Feb 2025 22:36:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF88E28000A; Wed, 12 Feb 2025 22:36:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BD372280007 for ; Wed, 12 Feb 2025 22:36:22 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 31B5D4503B for ; Thu, 13 Feb 2025 03:36:22 +0000 (UTC) X-FDA: 83113508604.02.910F420 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf23.hostedemail.com (Postfix) with ESMTP id 4276A140004 for ; Thu, 13 Feb 2025 03:36:20 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TJGvUOYg; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739417780; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1bG2TcWcZQz5k+Gqwwy79wI8/0pax/QFELT9STfGtKY=; b=MpDUiKLfsQ8bZn9bFp+EJVFcPZQpR/bz3Idc4gAEQW66mtN1ziD4tRvrZEmICCyquWm34L V/w+CC1WmphHIziv+DPbaqVsoUB4yab3P6b7c1jxYVr+/zLG9Owl8FxPYb58ATHUVsQivx R910txpuvOvFbYRDQJemCJ+WP3M9NPg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TJGvUOYg; spf=pass (imf23.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739417780; a=rsa-sha256; cv=none; b=BI6TMx7UPiS8Y8bz6fokOqJy1x3VXHYQWOc7QXWL6sLIGYOc1F23y+B6n+I51I/n2UIlBz FILIb7vwPiyCLZGxh2Cv89iNI+A+Ybc0VebjK+5Rmxqa8UUts7Y7Rube+3HoqwY9I+f90x GG6TwghDT3wrMlE5q/LLVECkZBTlK8U= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21f78b1fb7dso5424315ad.3 for ; Wed, 12 Feb 2025 19:36:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739417779; x=1740022579; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1bG2TcWcZQz5k+Gqwwy79wI8/0pax/QFELT9STfGtKY=; b=TJGvUOYgO4n0x1l83LSYWQHD92LNY7KgdwaVVu24Bbe6s52gRivBx1L1iH0AQyiPW1 QJ7gvsvPc9yanl/RvKRUPBXnaji/ED4yBr70bZngoHRyihd1buEeKL1O4P34bOePb6he kx53WiQkAa0T05S46OsRw5J/Plhx/FJ49X0km3a6WREMNASdwjY16HsHGw5sDrFPXtag jEHlyHiUJ4FzLj7SHhpFk1Yhd5NzzmN7vIgW5oa4g6MXhp0cYZNFPHSwBNSAd6HWekt/ g+Sh5kolQFvoaKMEJGwN2l9UIZfxUnoGrir5MUSWahthwMepD4enKJZRk5PVzxmfkGdq fqlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739417779; x=1740022579; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1bG2TcWcZQz5k+Gqwwy79wI8/0pax/QFELT9STfGtKY=; b=wHrXXSl4UhzA33Bnar95Dp5wd3vQXaE9v1zF8D0wdvfq6cnX6EM0r+J8nlpkREKlxO wuh8un8QnOKS3GbDBKs7mu5iXjdu/qt5JcZek+dzO8FHnO+x+xOa0wZCk0XQiBwAb1Kh LiCgh7GiOFByoFRRi+sNwOl9y+4IEdPaufiIs6iLjoWsg4Df+U7HlY1BrpcPDzetWPom Mpyvwk1CXmwjWWtNk72dHYEneZMxmaaawHCIgGR6y7VVz3f3o8RIHDGaKvpXTUEqMbX6 ek8JXkM3OU4nOEUIl/763ClbpV0xIFEXBXg8QnNA4Nq9sGYdZt25uyWPKzghKFFew3c0 ZU2Q== X-Forwarded-Encrypted: i=1; AJvYcCXko7gwlrNIBzs8xOKMxPYKpN5gMHfdlsCbTWX/BwhRIjmjmsTbFNGVCHHTu9qPodlzYlYxHlZcSw==@kvack.org X-Gm-Message-State: AOJu0YxWSkIK+5dIjSIvbuU0y/IMldD1cUw4MVzfJR02+uaX1Q/ot+1Y CKcZpbtkJkmJ39z78Yn/35TLkwTd/3nVlndwKveNUAOK9YpUimOL X-Gm-Gg: ASbGncszMEshut1Z93XQfOO4elCrh9yw/wgXazoE1cgNpd13OW+K7aK+LVZp8uqBjbf NPjw6yoK4/kJI81HMcNKUTWRi/J5SKwZvwpD5Ro6QFmrzCQz/udCOUgvkmRQjca9VH5R6/YLza+ uYtUtD3nj8RN7/vjsnDRNwm4G/BMgQfm0wr2N0OVajY4x+s/ju2wtULKgbUtspkZ3+hiXH6aeW2 N9PvCzO0mPBk3yRF2wJhm9eFvzAovpujirSKRd9t86PbOa3YqAZCjnfJRAJgo2o/RAQ+baFNlGN PdOp5BVfkeZg86UpsEpD6zRNUAO4tlgohW4QeQjxd4FPqoAixw== X-Google-Smtp-Source: AGHT+IFXy/P5J+sCx9sY/U/MAEMVw0FrBRnsYEuISK2pqOweK9ST2IkQe2QsXaxkx9P9LFDey130mw== X-Received: by 2002:a05:6a00:2294:b0:730:9768:ccdf with SMTP id d2e1a72fcca58-7322c3a5ef2mr7987610b3a.14.1739417779128; Wed, 12 Feb 2025 19:36:19 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-732425468bdsm222056b3a.31.2025.02.12.19.36.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 12 Feb 2025 19:36:18 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v8 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs. Date: Wed, 12 Feb 2025 19:35:56 -0800 Message-Id: <20250213033556.9534-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250213033556.9534-1-alexei.starovoitov@gmail.com> References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4276A140004 X-Stat-Signature: u1ndpeta8jwyhrg7r5yej14fdas6ge87 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739417780-268587 X-HE-Meta: U2FsdGVkX18xo2KCnTZfpc0d+vyfKdx8wsvxgsb5NfDEi4dPgQWiGOMjD+hGrWW1Iso8HjIM3LX9Oj0XzTwtTAs7AW9SCPHPqgpLxZeMp1Nt5rviBwYm6qN1njLPiWP7aPsUdlg4+yjq6L7y36YjoXkdGsaiJlaxYsSUt96DtCaOocZ7OJBSmvyEHotATZ61wYEWeqEj/J6GY3NbEoI+bZ++sPCF1ArUaQ2dodlnOMfdsr2SgZIIaU/OEE2/cJXazFu0djVvrlVUPB6ClwfFqxv85usOuhshdCkhOsP7R6Sri5n/MxVGkvDWCxdg7FqVUs2OiNSuiU5+Cvnz+Qg6JzMWoJp3fWLAJ+zHZtQLx0fB+XofagBy5PqRB7ZvlP9t+q2crei3mRG8L776XsUrJVwmTy1IAOVePXJdcyniB3nTyaUhZjn23QrWVUWihwySnhZfBbTVeUit4ICEQm+gh8RbYev1r4uTzkvc1ST2Ik5AAy6hripLgyeWRDpcln7bgLrBBxv5Klr1ZdIa67XD2kw1mxwzIbrL7uOanOlSBQnPPgSEKf+hLUUwobM3LzWJ1zIx1yu1gSt9qgubru9zCzaOtrmfs4wXi8ZB1+UxTCrgnTZSZNXJkWhAOXUjn477or94mh1AiVEMhM1KbNI11GHWpqz6EARvT/jSK9OIerVqa7Eev+yAqZpzwHxxbyQrR5JfYYi6q+v+T+YcgWPQk3+Fisbka0X8mhJ4pHksFtnH5NLpJV5lLFBxmuYu0J4WXcCmV4VTOGgkuQmHjLxEFTXvPe8NB0DkpwWAvOXpZyuA7HWJ005zbfhHJSPpZStAo+HmvB9/JKLDVXbKMt3N/KgEG0UArUw/cZfcLK8RqkpDKTIa7YGV0fqaVgfzDM7CA1mXxCM2P6/ZgORqBzhez6C/QsnMa2kA7KiEv9PuYGxvsebUoNeswmEk52R202D35wg5D59B6r93H+F90Bs CopDyC/t 7N9rXmkc2Kimd99mOrNjUwnxbBbOJz17bqCkpr2Srg078fWTHFmpdr9TYN9dqG/BSDCKSppPNKC05im/wyEvveuA0sp+femB3tb27sBXSZ8KEUkoJGS/OMEKIEfB5Zd+GQJ5IwH4/yPVjd09i+9gBPICJAlw4GuxG0Y4JnuriBttc/QoWManp2DSz2CXqkgObu6jXcgggBQVpEhwPb2U/3rjDZynWz7aV7tHxpqhoFZKy6bG/SwNo8qJofvoZQzKXMZSpZz0srT/5NhT1s4z60pf7fOFneLNfBJCFrZHPij8/6uVVidiSugc9Lv/dwpd3ZxLgKhDt2K1Je4yOttviEbneuZUFng2q7ejhl9KZ8oPF9s30HJ3n/OBNxuqNGi4lwHrY4RDywfjWqg/gR8jUc3lWhX594oakA7pY/DCusImwMnxz9qMb+thB0k/M6JO5gvxLPloy3hFj5QRqx/7xPvAd0czJI69CzfddO0nx4B9gKzWwcNbYW35mUL0A7H8nMy4x5YxUZNK4Tz/fq7UbB6wxMiNX5rnUAVbVOvSyAFZyPbc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Use try_alloc_pages() and free_pages_nolock() for BPF needs when context doesn't allow using normal alloc_pages. This is a prerequisite for further work. Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 +- kernel/bpf/arena.c | 5 ++--- kernel/bpf/syscall.c | 23 ++++++++++++++++++++--- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f3f50e29d639..e1838a341817 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2348,7 +2348,7 @@ int generic_map_delete_batch(struct bpf_map *map, struct bpf_map *bpf_map_get_curr_or_next(u32 *id); struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id); -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **page_array); #ifdef CONFIG_MEMCG void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 0975d7f22544..8ecc62e6b1a2 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -287,7 +287,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGSEGV; /* Account into memcg of the process that created bpf_arena */ - ret = bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, 1, &page); + ret = bpf_map_alloc_pages(map, NUMA_NO_NODE, 1, &page); if (ret) { range_tree_set(&arena->rt, vmf->pgoff, 1); return VM_FAULT_SIGSEGV; @@ -465,8 +465,7 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt if (ret) goto out_free_pages; - ret = bpf_map_alloc_pages(&arena->map, GFP_KERNEL | __GFP_ZERO, - node_id, page_cnt, pages); + ret = bpf_map_alloc_pages(&arena->map, node_id, page_cnt, pages); if (ret) goto out; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c420edbfb7c8..a7af8d0185d0 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -569,7 +569,24 @@ static void bpf_map_release_memcg(struct bpf_map *map) } #endif -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +static bool can_alloc_pages(void) +{ + return preempt_count() == 0 && !irqs_disabled() && + !IS_ENABLED(CONFIG_PREEMPT_RT); +} + +static struct page *__bpf_alloc_page(int nid) +{ + if (!can_alloc_pages()) + return try_alloc_pages(nid, 0); + + return alloc_pages_node(nid, + GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT + | __GFP_NOWARN, + 0); +} + +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **pages) { unsigned long i, j; @@ -582,14 +599,14 @@ int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, old_memcg = set_active_memcg(memcg); #endif for (i = 0; i < nr_pages; i++) { - pg = alloc_pages_node(nid, gfp | __GFP_ACCOUNT, 0); + pg = __bpf_alloc_page(nid); if (pg) { pages[i] = pg; continue; } for (j = 0; j < i; j++) - __free_page(pages[j]); + free_pages_nolock(pages[j], 0); ret = -ENOMEM; break; }