From patchwork Sat Feb 22 02:44:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13986513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9946C021B3 for ; Sat, 22 Feb 2025 02:44:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AAAE6B0085; Fri, 21 Feb 2025 21:44:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 133966B0088; Fri, 21 Feb 2025 21:44:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA21D6B0089; Fri, 21 Feb 2025 21:44:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C60AD6B0085 for ; Fri, 21 Feb 2025 21:44:39 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6C97FA2D50 for ; Sat, 22 Feb 2025 02:44:39 +0000 (UTC) X-FDA: 83146037478.15.7AE02D3 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf24.hostedemail.com (Postfix) with ESMTP id 869D3180006 for ; Sat, 22 Feb 2025 02:44:37 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ffzQzPQq; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740192277; a=rsa-sha256; cv=none; b=oNmVlFdBCkNwER5YUo34t8huKL1fv8RzzL0eYp43THbwk13zaeZvN4BFQ044nnMk4dWnZf F3irmnJMp2TSzdizKd1/kf9gyxBb5nHiOyElvTal5Jfd9whCVhD6n9A2t9V+6RrZK1h4NA 9WZpxNrX8RMquO98l+DhmU8C/+xw78w= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ffzQzPQq; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740192277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sk02Orm/sDWbmfQ5EAte0I5oPgAuhzQ+C9dMQxWikoo=; b=LVDRBQCXRoxNrugJ9aKEEMdSBh3BTrKhfSIkaxr42abvJ1Hncxl2DHpgupdMbXjgbNcfxY omMdXRMqT8qS1z2UCwPuaUDK3s57mjxOFFJlCtnNVVWRTEGKCmYo/rVg4CYa5/0ZisbvVA WfZQOpOl0WttDZVQH71NS9qUBwgsBlA= Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2fbffe0254fso5567446a91.3 for ; Fri, 21 Feb 2025 18:44:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740192276; x=1740797076; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sk02Orm/sDWbmfQ5EAte0I5oPgAuhzQ+C9dMQxWikoo=; b=ffzQzPQqZryA9VtCclfBCKO5XWMRF0gOb4840S1gLTvECulaMLrwePUjuFdaGIhBRa XMRMZA++aJpGZlWIV/Ymd59qJkd2vPtZAjHSM566U1lumpGXhjUSfPi4C3ZQfC919B2h NUbl2duVwGQ2ilOqhzrnhNtEoe8PuiKLMhODmFLu4vnG+kL7AChNPIiu50iNy9H4puAn 3ewVcr//9IExk4rqRHvKwPtVUiQgBgHtOx76eB/Vg1tkydNtKjCtBIQd9pkOtWbO/G/d 4O4o26KTunDyT9T/W+unStkg0cYwOVeGf+vUWaenHprRjsj6MgWKJubVj5BQp5YzwST+ v3tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740192276; x=1740797076; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sk02Orm/sDWbmfQ5EAte0I5oPgAuhzQ+C9dMQxWikoo=; b=IdgaiT45abHlekh8h/U4DezrMYEfSZr1KWr9FLY2ltDbtQc3Nxx4fQzFk1r0bWZ+FY 8NM/+Aza/LcdQfJoE0pzOxxA6iuusWXGObkqLjfunywtWxPtoRcQ6AoUDIvekq/5WH4s 18zxXQFuswf3wv0cr9BhJgTgPqoUdIZIrnPtFqln7cOFOUowd031Pl/KhsA0+jqGOm7u hfLb6O29RXJES+y7pS/tAOBceLFTAebAVMoxMXP/losUgiPDFZMRn4DUMCa+gTplmzE1 kn8PS9hlQMAKbgS7vYGy2s9f6l8G74GqnBV2Gaz3fpCDBHGAO48cGsrwoDF0zur8f1DM 7nZQ== X-Forwarded-Encrypted: i=1; AJvYcCXe6cjcKJBoJnxyNUX8J3iwZGiHaLiULipyG2z00o1gXkTtYC6QKX0st/jJyv7FqvHg/oaPLBZHrw==@kvack.org X-Gm-Message-State: AOJu0Yxw1C5eaeP75skzLJ4O1mkMoaTU/3hHmDmnt/0q3zNQMTztodV/ yi9+HZB9c60Bg8H7etbiBLO1C9ZCf5KWBrAp4MCvWQQFbBdlcf8X X-Gm-Gg: ASbGncs0WKAhWRJQ0H20YxZeYFn92s2L5po8aeVs5laHqo3QXLDNSgz9gznYmtJSLIJ gGt3kFZOq48wFryf3mTJsgc379maAO5VDOGOqZC5HbWwFIoVP9022Y9BW+aVC3TxO2JgKjkeJgM cTtPVRLJJw5d2zDybQqlU+6CVHjk2qDskjsrfBDTa+r8zaBAJiNjgM6m9oa7+BnIOsJaBv0ybXI 3JHI8znumgo/ZwOBSxT2FDfexSKf6VpU0m0gUIF2VQGmD8UtOOxwTxTShus7m12Ck36RScyHf4v 0DMJgcP9qSn8YlTL3gb3i/vL9hRPf4ZQ6Y9TU9PkzVWQcDm+LC9WdA== X-Google-Smtp-Source: AGHT+IGlSaTzpWPllvJGdoZ8RptnpXSPkTv04MwCkc6eCe1wznNZ2YqQBDcM5ibR5tXrH+w8nZ0TTA== X-Received: by 2002:a05:6a00:4fc4:b0:730:7771:39c6 with SMTP id d2e1a72fcca58-73426cab09emr10492884b3a.8.1740192276435; Fri, 21 Feb 2025 18:44:36 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:fd1b]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73242546158sm17069167b3a.16.2025.02.21.18.44.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 18:44:36 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v9 1/6] locking/local_lock: Introduce localtry_lock_t Date: Fri, 21 Feb 2025 18:44:22 -0800 Message-Id: <20250222024427.30294-2-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250222024427.30294-1-alexei.starovoitov@gmail.com> References: <20250222024427.30294-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 869D3180006 X-Stat-Signature: hn863xm187wgpa6wkpis5p79q8bc5bsw X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740192277-500115 X-HE-Meta: U2FsdGVkX19p4ZnP723L6nVJ1ujsdu3USToJ+5p9QcKaF23yc2eRrA9hp00uXZBEcvKhrQe+mpIyLar+GZrYfJoVQmVafryWa7H/DRTr+GTY2H+bXvipvEzLqtGnyNymf7BMx7Vd3qYYu9tapAD+Pi5kyAuMS9itiPaxXoDCv8AGT9lbSkjTVGoCSbk2nHkEI5CoCR+BpqcUSUyLsF0iJxioKnXT2rRuUjVFD4OtrI7zgbU0a8oU++rQTKesz+uhi+l2W3QX7wkkHk+Mm3wn0bfdNrUwZU23ZzmCFGXPQfkG4aoYJCbZN+5fyAmYZjXZ8/7qZBFcFTc/QoKbA/Akd1RwUkaJhSC9bxjPr37CMXCja7b/yS6HVvJVhmJemXHXkUDGbcXUc0shW92lRMixev2U0GI4RGj4eaSlpGNtNVlPs9h/JLUMexOsgZjQsDh6uOgUH63tBLQvQXoX0RdpIUqc2vtag/GiUZ/QxUUuYP88UbckD+E7TM1Txn+ukmRDM4O5b23tsLUFbjNEajdceiLzpGsWmr1TjvuZl+skxikt2uKGTX+PTpIX1W+pkgi+mN+WW9OoJv2WIVE1NuGn1yFQLkme2bkyIXAVWW853+nFwm9NXwWfOwKws7wkx6FM0TUcdE+D3BJqUZO1URtoOkrzAZQb1QQjtsegd2m/NzyDirYUBmqts3uWqjgk2OvX8vK3v2HeFCMFWlLZ5QyOas3SZi2vTDBGn8OZDOMWDOmO3bt5vVpYlsgKugza6SJYStw7h+5JvKo2Bj3uSydJy/X29NqYMEXH02RV5lW1InXhxX/UeiEcTjMtvPM7FbzdVQl9H5KDq7YTs+NNFhH3MXM7+fkwIA+BluaHiRiTZLe61A5UppbiXi/nogOPjSWXtDxXZwJtk2mZYO1NUlpyci2n98v5rCGHfveeD1r/jmSIyJ6jthPueGQdXwIX9ZvaXh5XR9mAZtXOGySO2AP N5g5MmnH fV3dZtKqMoSEGAjwO25O2P28GjFb9t+LSJMG4NWSKMHOEQ/kpltd2y++/4gSnRsSrCsLXcqyZh+Rq2EJSKv8yrm8E82VO2Kc4O+mD6Wko+fjuZXg4pJvhOLAmd/rUIPo39falfssQtEBzBira7ahfQ6xYImIvIqfNGSqODznmOkZUcTgcBYIFCQCzuCRCWINhARVToPuT/9Z7lj4cEJSB8q5vvS5aUy7bYZAlvjziYXicK7p3WhirKHnal4G/HRt0EpkcZBvfqTQ2Bl82a4U0fV2g0wXuGT4Js02ehrVJYcmOrNJmadcZ9QPe9cZDTrSMOm0bFSelyVjBVNJCvvouYTOJnNj1PeITS1VyF9AFEBmYcomQZlDGA+q3/7muqsDV0crdwNc2dlf9KgojjGi0YpTZFFfZ54DsTgeBTYYgT9VVE3A4wcU8wEu/ISSCh5vPjP6VUbvjPgHZg3APKDpWOVyt/OwZx5EMvJ4rvHqVtiO7J7IeSt5AylsozQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Sebastian Andrzej Siewior In !PREEMPT_RT local_lock_irqsave() disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave() for exclusive access. Introduce localtry_lock_t and localtry_lock_irqsave() that disables interrupts and sets acquired=1, so localtry_lock_irqsave() from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map localtry_lock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to explicit locking in the underneath rt_spin_trylock() implementation. Removing this explicit locking and attempting only "trylock" is undesired due to PI implications. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Vlastimil Babka Signed-off-by: Alexei Starovoitov --- include/linux/local_lock.h | 70 +++++++++++++ include/linux/local_lock_internal.h | 146 ++++++++++++++++++++++++++++ 2 files changed, 216 insertions(+) diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h index 091dc0b6bdfb..1a0bc35839e3 100644 --- a/include/linux/local_lock.h +++ b/include/linux/local_lock.h @@ -51,6 +51,76 @@ #define local_unlock_irqrestore(lock, flags) \ __local_unlock_irqrestore(lock, flags) +/** + * localtry_lock_init - Runtime initialize a lock instance + */ +#define localtry_lock_init(lock) __localtry_lock_init(lock) + +/** + * localtry_lock - Acquire a per CPU local lock + * @lock: The lock variable + */ +#define localtry_lock(lock) __localtry_lock(lock) + +/** + * localtry_lock_irq - Acquire a per CPU local lock and disable interrupts + * @lock: The lock variable + */ +#define localtry_lock_irq(lock) __localtry_lock_irq(lock) + +/** + * localtry_lock_irqsave - Acquire a per CPU local lock, save and disable + * interrupts + * @lock: The lock variable + * @flags: Storage for interrupt flags + */ +#define localtry_lock_irqsave(lock, flags) \ + __localtry_lock_irqsave(lock, flags) + +/** + * localtry_trylock - Try to acquire a per CPU local lock. + * @lock: The lock variable + * + * The function can be used in any context such as NMI or HARDIRQ. Due to + * locking constrains it will _always_ fail to acquire the lock in NMI or + * HARDIRQ context on PREEMPT_RT. + */ +#define localtry_trylock(lock) __localtry_trylock(lock) + +/** + * localtry_trylock_irqsave - Try to acquire a per CPU local lock, save and disable + * interrupts if acquired + * @lock: The lock variable + * @flags: Storage for interrupt flags + * + * The function can be used in any context such as NMI or HARDIRQ. Due to + * locking constrains it will _always_ fail to acquire the lock in NMI or + * HARDIRQ context on PREEMPT_RT. + */ +#define localtry_trylock_irqsave(lock, flags) \ + __localtry_trylock_irqsave(lock, flags) + +/** + * local_unlock - Release a per CPU local lock + * @lock: The lock variable + */ +#define localtry_unlock(lock) __localtry_unlock(lock) + +/** + * local_unlock_irq - Release a per CPU local lock and enable interrupts + * @lock: The lock variable + */ +#define localtry_unlock_irq(lock) __localtry_unlock_irq(lock) + +/** + * localtry_unlock_irqrestore - Release a per CPU local lock and restore + * interrupt flags + * @lock: The lock variable + * @flags: Interrupt flags to restore + */ +#define localtry_unlock_irqrestore(lock, flags) \ + __localtry_unlock_irqrestore(lock, flags) + DEFINE_GUARD(local_lock, local_lock_t __percpu*, local_lock(_T), local_unlock(_T)) diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 8dd71fbbb6d2..67bd13d142fa 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -15,6 +15,11 @@ typedef struct { #endif } local_lock_t; +typedef struct { + local_lock_t llock; + unsigned int acquired; +} localtry_lock_t; + #ifdef CONFIG_DEBUG_LOCK_ALLOC # define LOCAL_LOCK_DEBUG_INIT(lockname) \ .dep_map = { \ @@ -31,6 +36,13 @@ static inline void local_lock_acquire(local_lock_t *l) l->owner = current; } +static inline void local_trylock_acquire(local_lock_t *l) +{ + lock_map_acquire_try(&l->dep_map); + DEBUG_LOCKS_WARN_ON(l->owner); + l->owner = current; +} + static inline void local_lock_release(local_lock_t *l) { DEBUG_LOCKS_WARN_ON(l->owner != current); @@ -45,11 +57,13 @@ static inline void local_lock_debug_init(local_lock_t *l) #else /* CONFIG_DEBUG_LOCK_ALLOC */ # define LOCAL_LOCK_DEBUG_INIT(lockname) static inline void local_lock_acquire(local_lock_t *l) { } +static inline void local_trylock_acquire(local_lock_t *l) { } static inline void local_lock_release(local_lock_t *l) { } static inline void local_lock_debug_init(local_lock_t *l) { } #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ #define INIT_LOCAL_LOCK(lockname) { LOCAL_LOCK_DEBUG_INIT(lockname) } +#define INIT_LOCALTRY_LOCK(lockname) { .llock = { LOCAL_LOCK_DEBUG_INIT(lockname.llock) }} #define __local_lock_init(lock) \ do { \ @@ -118,6 +132,104 @@ do { \ #define __local_unlock_nested_bh(lock) \ local_lock_release(this_cpu_ptr(lock)) +/* localtry_lock_t variants */ + +#define __localtry_lock_init(lock) \ +do { \ + __local_lock_init(&(lock)->llock); \ + WRITE_ONCE((lock)->acquired, 0); \ +} while (0) + +#define __localtry_lock(lock) \ + do { \ + localtry_lock_t *lt; \ + preempt_disable(); \ + lt = this_cpu_ptr(lock); \ + local_lock_acquire(<->llock); \ + WRITE_ONCE(lt->acquired, 1); \ + } while (0) + +#define __localtry_lock_irq(lock) \ + do { \ + localtry_lock_t *lt; \ + local_irq_disable(); \ + lt = this_cpu_ptr(lock); \ + local_lock_acquire(<->llock); \ + WRITE_ONCE(lt->acquired, 1); \ + } while (0) + +#define __localtry_lock_irqsave(lock, flags) \ + do { \ + localtry_lock_t *lt; \ + local_irq_save(flags); \ + lt = this_cpu_ptr(lock); \ + local_lock_acquire(<->llock); \ + WRITE_ONCE(lt->acquired, 1); \ + } while (0) + +#define __localtry_trylock(lock) \ + ({ \ + localtry_lock_t *lt; \ + bool _ret; \ + \ + preempt_disable(); \ + lt = this_cpu_ptr(lock); \ + if (!READ_ONCE(lt->acquired)) { \ + WRITE_ONCE(lt->acquired, 1); \ + local_trylock_acquire(<->llock); \ + _ret = true; \ + } else { \ + _ret = false; \ + preempt_enable(); \ + } \ + _ret; \ + }) + +#define __localtry_trylock_irqsave(lock, flags) \ + ({ \ + localtry_lock_t *lt; \ + bool _ret; \ + \ + local_irq_save(flags); \ + lt = this_cpu_ptr(lock); \ + if (!READ_ONCE(lt->acquired)) { \ + WRITE_ONCE(lt->acquired, 1); \ + local_trylock_acquire(<->llock); \ + _ret = true; \ + } else { \ + _ret = false; \ + local_irq_restore(flags); \ + } \ + _ret; \ + }) + +#define __localtry_unlock(lock) \ + do { \ + localtry_lock_t *lt; \ + lt = this_cpu_ptr(lock); \ + WRITE_ONCE(lt->acquired, 0); \ + local_lock_release(<->llock); \ + preempt_enable(); \ + } while (0) + +#define __localtry_unlock_irq(lock) \ + do { \ + localtry_lock_t *lt; \ + lt = this_cpu_ptr(lock); \ + WRITE_ONCE(lt->acquired, 0); \ + local_lock_release(<->llock); \ + local_irq_enable(); \ + } while (0) + +#define __localtry_unlock_irqrestore(lock, flags) \ + do { \ + localtry_lock_t *lt; \ + lt = this_cpu_ptr(lock); \ + WRITE_ONCE(lt->acquired, 0); \ + local_lock_release(<->llock); \ + local_irq_restore(flags); \ + } while (0) + #else /* !CONFIG_PREEMPT_RT */ /* @@ -125,8 +237,10 @@ do { \ * critical section while staying preemptible. */ typedef spinlock_t local_lock_t; +typedef spinlock_t localtry_lock_t; #define INIT_LOCAL_LOCK(lockname) __LOCAL_SPIN_LOCK_UNLOCKED((lockname)) +#define INIT_LOCALTRY_LOCK(lockname) INIT_LOCAL_LOCK(lockname) #define __local_lock_init(l) \ do { \ @@ -169,4 +283,36 @@ do { \ spin_unlock(this_cpu_ptr((lock))); \ } while (0) +/* localtry_lock_t variants */ + +#define __localtry_lock_init(lock) __local_lock_init(lock) +#define __localtry_lock(lock) __local_lock(lock) +#define __localtry_lock_irq(lock) __local_lock(lock) +#define __localtry_lock_irqsave(lock, flags) __local_lock_irqsave(lock, flags) +#define __localtry_unlock(lock) __local_unlock(lock) +#define __localtry_unlock_irq(lock) __local_unlock(lock) +#define __localtry_unlock_irqrestore(lock, flags) __local_unlock_irqrestore(lock, flags) + +#define __localtry_trylock(lock) \ + ({ \ + int __locked; \ + \ + if (in_nmi() | in_hardirq()) { \ + __locked = 0; \ + } else { \ + migrate_disable(); \ + __locked = spin_trylock(this_cpu_ptr((lock))); \ + if (!__locked) \ + migrate_enable(); \ + } \ + __locked; \ + }) + +#define __localtry_trylock_irqsave(lock, flags) \ + ({ \ + typecheck(unsigned long, flags); \ + flags = 0; \ + __localtry_trylock(lock); \ + }) + #endif /* CONFIG_PREEMPT_RT */ From patchwork Sat Feb 22 02:44:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13986514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16177C021B3 for ; Sat, 22 Feb 2025 02:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E8AD280001; Fri, 21 Feb 2025 21:44:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 970516B008A; Fri, 21 Feb 2025 21:44:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EAA6280001; Fri, 21 Feb 2025 21:44:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5E7D76B0089 for ; Fri, 21 Feb 2025 21:44:44 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 13789B49D2 for ; Sat, 22 Feb 2025 02:44:44 +0000 (UTC) X-FDA: 83146037688.22.F57D2A8 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf12.hostedemail.com (Postfix) with ESMTP id 24A2D40005 for ; Sat, 22 Feb 2025 02:44:41 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nC6L5NVJ; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740192282; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oJx8bfHvcsPdmDmREOhTX7J+3gMm5tOq1v0fsOdzZbU=; b=1mcBLyX6Mfg7c1zoA4TaeQINSOYzzACKyTUrh53FtoOwgckf25BU4H9dT84k3cJM12gLmV pxzIbvARycxyYm9xGpxzr+Xh2ewr+kcBeddBzh4T4ZdCLHdlYiKda9j/J3eGsWvmn2tHWN cR7mvYepppdCOdSMETMyTx4stFJbs54= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nC6L5NVJ; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740192282; a=rsa-sha256; cv=none; b=ploY6WJsoev0tJ4FDaUqdkRbzKD6/hkJSeWj6VMCBB+dJhBAFZMrJZQQmTfzG+DCD8Y4FJ 7DXjTODNmp9GvxtI5CmaEI5GJX789PZb7Xx1ds9zvW5V9G/KF7rjtEv38CPRShGQGK+TmU 0iHiWcclkAJ24nhSHxG9caoPq2+xCXE= Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2fbfe16cc39so5521366a91.3 for ; Fri, 21 Feb 2025 18:44:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740192281; x=1740797081; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oJx8bfHvcsPdmDmREOhTX7J+3gMm5tOq1v0fsOdzZbU=; b=nC6L5NVJF3ZjMOnW1oMDiB4awPHa3UPq8D6/9jmx1P2BwCQ1IYyYPqIMOPyblaOjXk LsKm/O1/r8ZZ6aFkh+wLUim5VYtKEmL3ZAEFlmL2dZT1n9beJO6VglvmnsACo1W7X2PG mjDdX/v2iklzIf2MfZamyLKCMqJ7Rq62I+Y4tHEffkcbgjucYuVygXWZtXsRP8LYW6b4 EEKi/DlJ/wEtlQlx/D2MEWXtKbWVcwHHnskn+5ED4CWwO2pqpjijmLOyR5ecUc07HdZt td46ewRtU/eP36c9obt/bEFGH9exq4Jygg6ACHR+QJg+DCklPxMPjbHHhdDRnuebCy6A PQSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740192281; x=1740797081; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oJx8bfHvcsPdmDmREOhTX7J+3gMm5tOq1v0fsOdzZbU=; b=DpaRAYaVrRxdks0MLXOYyYM1IOCO1CQJULiClgSnvsqNnoKn7MlSXVOlFD3ddDvhxl xFfEEZQumjiwywKyTZdq2f/pPX9RIkFOs9xAKRSqhoFE1X6SrzaXmix72kboWE8vTmpq umdVXMOU+uR3it/FeXRA8PBI7RYFpvedivbfYJDmmMVGJT0nqD3Oq/HBX4gWauIxk/Cv B/iBTmk692SuvoEyxysP0LftUIMKNJQ0FVrLqvoFO2nzTRDA2zEd6bWmyiygDAu78db9 JjZv7+16V1s8dPynQpY5NOJETbd6Sl+kjbEnruFfnZsNVurGjOiEircb2CSITBOgEfBg 0VGA== X-Forwarded-Encrypted: i=1; AJvYcCVvh3+CGTvQtjYfRx8JzO56sHKBbCM5q73as9NhzGSKUc5tcLf363i+CWlod5NqUPftr38c2Dw81g==@kvack.org X-Gm-Message-State: AOJu0YzqK5AtfJudbImgGPNv9PBIF6zzumbRGbC4c6mn0i1Lab0/WypI cLQpVdmNesSMMdMzlgtG8mq7ZTy6z163x9YRAKeq8S9tykk5owwd X-Gm-Gg: ASbGncucJF9UVxe89ME/BDKVUpabsxunQiplvBOjadPA+Vjw7mtchQx/iu6YmEqRE7H LZ5bHlSaV4pvrRVm89nDCsWXH6sKVtJZd32vheYA2KHbGeTqHKSel/09UGOdY/VhwPPhR70BFkt JoCwK9d57O8MB2re1hR9nAeNfdDf16q+TWcmlRm6eJ9nrD8TG7TZVNFRmT8w0n1qIcIAMVNWKAP CalXY/vnSGbGi8jIGcT/FUqIdV/uVziSSXMWeQMUi1NZ7YP4QkNEtztPIp3a3EfW06xhg2Y7V/S rCLlFqYvkRCyBut1ZCuQvvT2PWV5fIkDbL7ETkutZTsekB0wtOqyvw== X-Google-Smtp-Source: AGHT+IFCLpvNiUlzSLAmScZOHfvBMu91BobDCkKJ+lN7CcdzTuYNWTOHAeEoc/CAjS1q1J3gFUhfzQ== X-Received: by 2002:a05:6a00:98f:b0:732:1eb2:7bf3 with SMTP id d2e1a72fcca58-73426d9f1abmr9857341b3a.21.1740192280978; Fri, 21 Feb 2025 18:44:40 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:fd1b]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73275615173sm11027845b3a.168.2025.02.21.18.44.38 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 18:44:40 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Date: Fri, 21 Feb 2025 18:44:23 -0800 Message-Id: <20250222024427.30294-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250222024427.30294-1-alexei.starovoitov@gmail.com> References: <20250222024427.30294-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 24A2D40005 X-Stat-Signature: qk4sk7pgtekkeukng3stbdtt4fn3w3x5 X-Rspamd-Server: rspam03 X-HE-Tag: 1740192281-719558 X-HE-Meta: U2FsdGVkX1+pTdRpZC8Aq2QSjG/SFnKUFPZXH080ohcF5oAIefYw6NcDUk7j63VoJ66DIeS9ccOrn4t80jD8DHHl81g546ugbVdGzQzd21gW1IJ/ekj2i7P9g9Eycc/Zhw1y46VHe/1ZKhyvy8yPpQjh7z6tfH6FCFDSp05of4q9XXBC74EurXvE1wUz+oYCWprmI0RVHn/Gsvq3mzD7TIMa8GTNDPNmGOnK38RZlWyynqVQG5nts4el/kl4EneRmt3+M6Et2w6t4DFVXaEzZYONFxRBJR3ZejiaNGodp2GHfEp7aExDIQByr4fTKsGTx6r5Er5L8kPkktm9c5v2OaUnBNxkMtCE+QaCtlCqnf36xSgSYck5uWtvcWOMsl2lw7svZnGUZ9J5L0ajMXJmnpnvOocStiViFRE01HnOJrv+g+IU2AqICBSYQsXnTXtUMml/OxywWe5avvoFyJe4vzhXVuB27I9Ej5CNmeLlQZ8TzFajnebhdjX0rKsAgaB5Z2CG26oay/1yW/6tzSBhH079c+EUqZZJpQgFQtekxa8O7Rnl6ZmKG5aAqjMdlOjonBjhx6QA4jCgYgeWU5+V7rr+hcQtKwdntVrPiR7Y5bZt8OWvg5XHszJ7aa33zKwONzkHdq4HPmXpe7NFXSJHb1CmcZZ175Olxd2Xh0K1dZXVMccQnvEzHNgHkYiAXIuBLOkGnD5Job0nGCrbLaON1hPJKNpC5jwkf/RA9+YKrR0TJ0dxzgR1FTWzg+G9rHHww3czmcfuBzCzjmt4X399skQ8jSOG9e4eWNy/mU+IzInZ3j9wEXGXxd6zWhgT6BUoapBr+glS8ttytfuxshkaf0HnVXVe9SB9TNUD4xvipvaOwYITXLCnd/jA4jucPrjdLVfcNO9zknbx1VgDCtqdiG5TgE9zlNc8EZk5xY4ca0gGp+2Gj4XI24SjyUgELkLX9/s2uQcLM5fOvEFjFbw fHdZNeR1 1hopLYLbbfT7Htwy7E3NGTWHgFlXhLgubxmF4lcc8YMamiI8nj7SGwKF+ZjZcOaG3M+zyD923lP94D8EcqF3EpwO2j34P4Ju6fluIbv0SKQ2yr4MN9jmL42Zcjya9RAQWPhLcHc02g9zraRUY/H9SNNBKRDG1oZr+QRRcd9PPvV7+cL8KBjMyRQL7X3KY5+w8H9ruas6ybowOMH6VIJvhzYDiJPtwuBJbZE3CpoVlrEy7b/PmAZORyNV8LDVGPHnNJ+zGS61Hc3WVIR1F9HNTpONmly8ngr63LsZk5SkUMqGL5jbuxSjk7BTGatLUjFIt5+G3whIOfskwbs1V6uOKyNwS0lLC8xU/adtmKcy1no24viXgCArKToagmjIPyk2RmL+fxIz0RRh+1ZrUT2beQfhIOXAeUGr+y3iX7tmup+BKEM3FbQMK+QZD4shQfZaJVeyW5mGKhNCHynsCgI1XBXNuVKECNCtlIU3c6FEPmHjNgV+sLme1+jZz5MD75Sa/CCawHXSBmSCrRIx1YH61x3HTdipVhDKoj/Wec1ne/vLlMVBBUItgUWdVPW6HswWyyPTpg2t16Ny1F9LCYY2XEfQrZiUOSftsDuPmkJ2yI82/oNA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior Reviewed-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 22 ++++++++++ lib/stackdepot.c | 5 ++- mm/internal.h | 1 + mm/page_alloc.c | 104 ++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 127 insertions(+), 5 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 6bb1a5a7a4ae..5d9ee78c74e4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -39,6 +39,25 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) return !!(gfp_flags & __GFP_DIRECT_RECLAIM); } +static inline bool gfpflags_allow_spinning(const gfp_t gfp_flags) +{ + /* + * !__GFP_DIRECT_RECLAIM -> direct claim is not allowed. + * !__GFP_KSWAPD_RECLAIM -> it's not safe to wake up kswapd. + * All GFP_* flags including GFP_NOWAIT use one or both flags. + * try_alloc_pages() is the only API that doesn't specify either flag. + * + * This is stronger than GFP_NOWAIT or GFP_ATOMIC because + * those are guaranteed to never block on a sleeping lock. + * Here we are enforcing that the allocation doesn't ever spin + * on any locks (i.e. only trylocks). There is no high level + * GFP_$FOO flag for this use in try_alloc_pages() as the + * regular page allocator doesn't fully support this + * allocation mode. + */ + return !(gfp_flags & __GFP_RECLAIM); +} + #ifdef CONFIG_HIGHMEM #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM #else @@ -335,6 +354,9 @@ static inline struct page *alloc_page_vma_noprof(gfp_t gfp, } #define alloc_page_vma(...) alloc_hooks(alloc_page_vma_noprof(__VA_ARGS__)) +struct page *try_alloc_pages_noprof(int nid, unsigned int order); +#define try_alloc_pages(...) alloc_hooks(try_alloc_pages_noprof(__VA_ARGS__)) + extern unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order); #define __get_free_pages(...) alloc_hooks(get_free_pages_noprof(__VA_ARGS__)) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 245d5b416699..377194969e61 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -591,7 +591,8 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, depot_stack_handle_t handle = 0; struct page *page = NULL; void *prealloc = NULL; - bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC; + bool allow_spin = gfpflags_allow_spinning(alloc_flags); + bool can_alloc = (depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC) && allow_spin; unsigned long flags; u32 hash; @@ -630,7 +631,7 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, prealloc = page_address(page); } - if (in_nmi()) { + if (in_nmi() || !allow_spin) { /* We can never allocate in NMI context. */ WARN_ON_ONCE(can_alloc); /* Best effort; bail if we fail to take the lock. */ diff --git a/mm/internal.h b/mm/internal.h index 109ef30fee11..10a8b4b3b86e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1187,6 +1187,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_NOFRAGMENT 0x0 #endif #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ +#define ALLOC_TRYLOCK 0x400 /* Only use spin_trylock in allocation path */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */ /* Flags that allow allocations below the min watermark. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 579789600a3c..1f2a4e1c70ae 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2307,7 +2307,11 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return 0; + spin_lock_irqsave(&zone->lock, flags); + } for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, alloc_flags); @@ -2907,7 +2911,11 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(alloc_flags & ALLOC_TRYLOCK)) + return NULL; + spin_lock_irqsave(&zone->lock, flags); + } if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { @@ -4511,7 +4519,12 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, might_alloc(gfp_mask); - if (should_fail_alloc_page(gfp_mask, order)) + /* + * Don't invoke should_fail logic, since it may call + * get_random_u32() and printk() which need to spin_lock. + */ + if (!(*alloc_flags & ALLOC_TRYLOCK) && + should_fail_alloc_page(gfp_mask, order)) return false; *alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, *alloc_flags); @@ -7071,3 +7084,88 @@ static bool __free_unaccepted(struct page *page) } #endif /* CONFIG_UNACCEPTED_MEMORY */ + +/** + * try_alloc_pages - opportunistic reentrant allocation from any context + * @nid - node to allocate from + * @order - allocation order size + * + * Allocates pages of a given order from the given node. This is safe to + * call from any context (from atomic, NMI, and also reentrant + * allocator -> tracepoint -> try_alloc_pages_noprof). + * Allocation is best effort and to be expected to fail easily so nobody should + * rely on the success. Failures are not reported via warn_alloc(). + * See always fail conditions below. + * + * Return: allocated page or NULL on failure. + */ +struct page *try_alloc_pages_noprof(int nid, unsigned int order) +{ + /* + * Do not specify __GFP_DIRECT_RECLAIM, since direct claim is not allowed. + * Do not specify __GFP_KSWAPD_RECLAIM either, since wake up of kswapd + * is not safe in arbitrary context. + * + * These two are the conditions for gfpflags_allow_spinning() being true. + * + * Specify __GFP_NOWARN since failing try_alloc_pages() is not a reason + * to warn. Also warn would trigger printk() which is unsafe from + * various contexts. We cannot use printk_deferred_enter() to mitigate, + * since the running context is unknown. + * + * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below + * is safe in any context. Also zeroing the page is mandatory for + * BPF use cases. + * + * Though __GFP_NOMEMALLOC is not checked in the code path below, + * specify it here to highlight that try_alloc_pages() + * doesn't want to deplete reserves. + */ + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + unsigned int alloc_flags = ALLOC_TRYLOCK; + struct alloc_context ac = { }; + struct page *page; + + /* + * In PREEMPT_RT spin_trylock() will call raw_spin_lock() which is + * unsafe in NMI. If spin_trylock() is called from hard IRQ the current + * task may be waiting for one rt_spin_lock, but rt_spin_trylock() will + * mark the task as the owner of another rt_spin_lock which will + * confuse PI logic, so return immediately if called form hard IRQ or + * NMI. + * + * Note, irqs_disabled() case is ok. This function can be called + * from raw_spin_lock_irqsave region. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) + return NULL; + if (!pcp_allowed_order(order)) + return NULL; + +#ifdef CONFIG_UNACCEPTED_MEMORY + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ + if (has_unaccepted_memory()) + return NULL; +#endif + /* Bailout, since _deferred_grow_zone() needs to take a lock */ + if (deferred_pages_enabled()) + return NULL; + + if (nid == NUMA_NO_NODE) + nid = numa_node_id(); + + prepare_alloc_pages(alloc_gfp, order, nid, NULL, &ac, + &alloc_gfp, &alloc_flags); + + /* + * Best effort allocation from percpu free list. + * If it's empty attempt to spin_trylock zone->lock. + */ + page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); + + /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + + trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); + kmsan_alloc_page(page, order, alloc_gfp); + return page; +} From patchwork Sat Feb 22 02:44:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13986515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02AB4C021B3 for ; Sat, 22 Feb 2025 02:44:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8617F280003; Fri, 21 Feb 2025 21:44:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C2E9280002; Fri, 21 Feb 2025 21:44:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63C82280003; Fri, 21 Feb 2025 21:44:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3F6C9280002 for ; Fri, 21 Feb 2025 21:44:49 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D52C850A6C for ; Sat, 22 Feb 2025 02:44:48 +0000 (UTC) X-FDA: 83146037856.16.638645C Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf01.hostedemail.com (Postfix) with ESMTP id C739140008 for ; Sat, 22 Feb 2025 02:44:46 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SL0fNzUv; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740192286; a=rsa-sha256; cv=none; b=mdZSRV4ZY/alh29vcigYuIWMPVL8MnmaSu9CnDTEjDfH0EGJH6s2kcs6zLrQJp/wXRCgZx 9WDnhNv4QJ2owaDuBrlaLbtkSQ1+/9451tOz2f9+Rz4InfpP0/SqAHGQRNnovMBXFbRS6q R1h/c5qf99PVaj3SRNlxRm7+c4apxIM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SL0fNzUv; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740192286; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o7gSAyaMDg2aK78kqVgF49+jZKrcrw3bhnHUdoMZI7o=; b=t06lOODcRw+6/FVJFbVPuUhnzwMwf23P3dxMToLf67eCRN4tNiLZSnNVd3/hglzX/8kW+H z4sWPv1u6aVWp3uXor4DKjSZxLZCoVAQqcqz0mLKfEmVsSdUPGzZ0TUT8txb09dvxk59IM NF0mhxPBipgRYyw1yAzNt+HL607dIDw= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-22100006bc8so49468565ad.0 for ; Fri, 21 Feb 2025 18:44:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740192286; x=1740797086; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o7gSAyaMDg2aK78kqVgF49+jZKrcrw3bhnHUdoMZI7o=; b=SL0fNzUv5HHgE5S+h8266ybLBCjtr2D3+yjzvRzUZ+mjYbx4Sy/6kSZIqaeSOjRRyx GdLywH7DO+fJxr1dyjw2VWisUEYBXMkqatLcyZPVe/ee48Fc5TJjq/HJNRQgcdShlSgz 2C+L/COWHH7t/UtDD67FcrNyVdHtnKoNNuzt15N3r4zDVierjnh4MA6EQPb0NROI381O caaUi2l8UImcrf8dY80I7Kb3k71jp229wyrYrjGOzsztNVt7Gn8vsIOrJAwDU7hRAJn8 wihL+MFr7fLBa2H7ykE7I+ebAPp5h9hjG1Ot3eBw+TDothgc+tMgbzcu9+4Xe+X6M+7a tYKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740192286; x=1740797086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o7gSAyaMDg2aK78kqVgF49+jZKrcrw3bhnHUdoMZI7o=; b=gIQKUpGe+4L26Gz6cQgiLMCefj/oR1iolgb+C9ehgfnIXXW1sxYJ6qXxIgMs/lID6O Pym1UIr50IuhjuqczNCLJNJdiLmHNDX1Dw1KhLkCFd/cLKqkSq6AHzCL1qxgcgAaYki0 YJl6f11hqIISqv/n2Bl0/1U/Wt0jMsiiCmXmNGFXuYOxExd+Qk9lQe9Mb/5KI95uvR9f LXIvGcnn5B7C2q3hsjovxQ/9G42XQG5A8518szK9WTO6pwFRBpY0hLlkDFjDjl+G0HIy m2H5L3cc4SElh6hHZ/z5T5nYgVMOPN5BFzsh2/mrCBhZkcNX2GcisgFxnjLc7sgmkJb6 7n4Q== X-Forwarded-Encrypted: i=1; AJvYcCX0LcZmqeXTQLcbsvLEE0V39sl9oaJQPSuwKShz6JOp8v7ZBSZxnfKkLdLWE+5iSE+Uhs1bU8XchA==@kvack.org X-Gm-Message-State: AOJu0Yy8GoZj5sdcUhKDkaO+gR1UhnfXafg2hx6Cuk0gYg1pHLZukukC MCDAkpkjSqsCO67qsXuyjsV0SittPzocDF6aMUlNtLdThBrwaKmr X-Gm-Gg: ASbGncsUTZjQw/k3yxxrIiZyQ4tlJCQv4sMHToei3f+9BJz3eJDfW2xOqOLF9ulrmgu vZeWsKzazIle6dxJ1DKzt4hvmGGkFUpF2ojcx6eCYFYAwc6rgbWWAplI6krTE5QbUDQLow+dGBz oR+qTWE76tBFcix0w+FZUo+XRvfeNyaJbp025CRpKzAxUfE2CtQ6OO3n+TRS3+CB6rTTMGrehTE xg8+xCy3t7NUa3XsisVF9dzWrRrthHnCva+7hA0Azvh/7RPuYrPZZdgcqrwK5A+0ekq7WBd0EfD Nx7/rXFSksQlTbBEXkNswRGR3hfFjBpn9ZCUF51m55e/LK2djVYk+g== X-Google-Smtp-Source: AGHT+IH21mr57VC26OTPfaSlSbo/9B6lZeQIfI31TFaxfhjBWjCtyIwbhx9A5VtfOfNAoOWI2FAXPA== X-Received: by 2002:a05:6a20:2444:b0:1ee:d06c:cddc with SMTP id adf61e73a8af0-1eef3da4cf2mr11663031637.30.1740192285626; Fri, 21 Feb 2025 18:44:45 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:fd1b]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73242568b03sm16979202b3a.54.2025.02.21.18.44.43 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 18:44:45 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v9 3/6] mm, bpf: Introduce free_pages_nolock() Date: Fri, 21 Feb 2025 18:44:24 -0800 Message-Id: <20250222024427.30294-4-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250222024427.30294-1-alexei.starovoitov@gmail.com> References: <20250222024427.30294-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C739140008 X-Stat-Signature: 7xsm3578ib9woq8n4upj8yugcf4byb3p X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740192286-362094 X-HE-Meta: U2FsdGVkX1//Jw9/4ilrQVmCBYlta8B4B8PnxuY31bXfin+CEprhrmTyHrBzfywu5tQ57qCisZGc1u/EwEA2cYmSdJm/WkSjMh2C7Ihmkg2u3cO0yWoA2AceKiuYYqvipzEnC3b9GRm6UrkdjkbqtygtWKBVw2d3gjeyOIpIRhu7sJp1iwXtDwxVsK17msGzb2GF91OxWCi0tUIOcg9AVEpUqGOlNQrB0nM1oOjGfOFhOvR7dsq1OgiUkq4y4ytmIwEFnyojfavJuSlDFSLQn0Sj6dVIC6NPqMhzIcc8n2hOdYLwaOoLUFjFf3YssFTwEtKviUFzPxDSTbXgbaPUkcYlI5BNppHMPm/WIh/GorAxkfGr/MIoJfnY7fEaxgQLlqudzmc6cQbm6jjSY/9j9wX5VBkgUBhnDrc9qIVsvP1zCkzvJdY7A02uEu9cZdAC8ny7j2DKhbVu7Zdu4xq8sKsuej9XeBOcAU2FdfsG6nu0XJ6NdPJ20uqdgTt5KVDsQvwKbMeDKvkHpRvxrZI6N071XoiPPVZnwIAlSjddoiTa1aeeiBx6AbC/5NjspZMh8qQtPbmbCC951eT+/b+Gv3ruv4RHY4yVVH5t+iXDloQxShW5xvGJP+nPZOQB+Ii8KqPxrU4n3hRPqFo+e/OSWnN/R2K0amCzjrvKAMgBaVowSMhHmMbx57vhYR/NRVrrFP5LuJNsCEkGHIQBL3x8ZG72qzkC8tQyjNsx0fRUN3+HBVZHuIRpf9cYx/HySQkgpDfLn45AkVuu4a8xfuLg0qrOfkftCz5brT0ebe8mR2OR+eoLbJxvP3ANUSuV3qg6MaYpUMjogrvhvLymuKfNQxS7LRR5d/pD3vileZ6BZ1rHvvIv1LVN905weEHF+BY4iCH4US7uMrUBUxZcqKIaIAtNCblg0aqqsUAQSsrSg9N162ViSxeFvAZ/2pqneFRPYzw0qRYvSoxO6t/xLwl P6ngL3SJ hQ/PhNLbfv8joDEhGRaZx2hctgAOHXXlv3nJ+lTYS+F4r5IKqQ8ZnQn89heQmhyCLdl4MdtetMUqnYOi0X7fbFmj9Zd2U/sWUL6e7PQD7Un8V9MzvPult46tnbK6s26VcZDcUr8glr2HoLfhYHbU5BxRtKq9Kg/luQdg/57cQ0nKKcyZEFtXgYAm6SspolWx3a2ZZ7EnZJCX2YPPVKbt5AX7Hh1sAn/REjzTn/bWsjH/xL5L7qyzexeb0tBA/9TKL9g1BaSdDBdrIQjMVx92vp99Cg4QOcFRYc3eQN7+LwuxjL+ztIXFGroWLa3GYO2QyIKHmCEu+3DCgfC58x2BjmR7kU5J7NvYlDdeScnXpv1FBMAjyOgvsI5WSZYLCccHZD/MLgDbHJXeT6U0CRWURbXTarim1HpGVv51v7BqxCuxHr9GZtDFOZSUf68LLyv5i2++RIXmh7JzDA3HnDbRgGa92oqxKxz12aDqBgGmb/j3h3y8npQSZPlGvFA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior Reviewed-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 1 + include/linux/mm_types.h | 4 ++ include/linux/mmzone.h | 3 ++ lib/stackdepot.c | 5 ++- mm/page_alloc.c | 90 +++++++++++++++++++++++++++++++++++----- mm/page_owner.c | 8 +++- 6 files changed, 98 insertions(+), 13 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 5d9ee78c74e4..ceb226c2e25c 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -379,6 +379,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas __get_free_pages((gfp_mask) | GFP_DMA, (order)) extern void __free_pages(struct page *page, unsigned int order); +extern void free_pages_nolock(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6b27db7f9496..483aa90242cd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -99,6 +99,10 @@ struct page { /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; + struct { + struct llist_node pcp_llist; + unsigned int order; + }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9540b41894da..e16939553930 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -972,6 +972,9 @@ struct zone { /* Primarily protects free_area */ spinlock_t lock; + /* Pages to be freed when next trylock succeeds */ + struct llist_head trylock_free_pages; + /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 377194969e61..73d7b50924ef 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -672,7 +672,10 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, exit: if (prealloc) { /* Stack depot didn't use this memory, free it. */ - free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); + if (!allow_spin) + free_pages_nolock(virt_to_page(prealloc), DEPOT_POOL_ORDER); + else + free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); } if (found) handle = found->handle.handle; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1f2a4e1c70ae..79b39ad4bb1b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; */ #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) +/* Free the page without taking locks. Rely on trylock only. */ +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1249,13 +1252,44 @@ static void split_large_buddy(struct zone *zone, struct page *page, } while (1); } +static void add_page_to_zone_llist(struct zone *zone, struct page *page, + unsigned int order) +{ + /* Remember the order */ + page->order = order; + /* Add the page to the free list */ + llist_add(&page->pcp_llist, &zone->trylock_free_pages); +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { + struct llist_head *llhead; unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + add_page_to_zone_llist(zone, page, order); + return; + } + spin_lock_irqsave(&zone->lock, flags); + } + + /* The lock succeeded. Process deferred pages. */ + llhead = &zone->trylock_free_pages; + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) { + struct llist_node *llnode; + struct page *p, *tmp; + + llnode = llist_del_all(llhead); + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { + unsigned int p_order = p->order; + + split_large_buddy(zone, p, page_to_pfn(p), p_order, fpi_flags); + __count_vm_events(PGFREE, 1 << p_order); + } + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2599,7 +2633,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_frozen_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, fpi_t fpi_flags) { int high, batch; int pindex; @@ -2634,6 +2668,14 @@ static void free_frozen_page_commit(struct zone *zone, } if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) pcp->free_count += (1 << order); + + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + /* + * Do not attempt to take a zone lock. Let pcp->count get + * over high mark temporarily. + */ + return; + } high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2648,7 +2690,8 @@ static void free_frozen_page_commit(struct zone *zone, /* * Free a pcp page */ -void free_frozen_pages(struct page *page, unsigned int order) +static void __free_frozen_pages(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2657,7 +2700,7 @@ void free_frozen_pages(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, fpi_flags); return; } @@ -2675,23 +2718,33 @@ void free_frozen_pages(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); return; } migratetype = MIGRATE_MOVABLE; } + if (unlikely((fpi_flags & FPI_TRYLOCK) && IS_ENABLED(CONFIG_PREEMPT_RT) + && (in_nmi() || in_hardirq()))) { + add_page_to_zone_llist(zone, page, order); + return; + } pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_frozen_page_commit(zone, pcp, page, migratetype, order); + free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); } pcp_trylock_finish(UP_flags); } +void free_frozen_pages(struct page *page, unsigned int order) +{ + __free_frozen_pages(page, order, FPI_NONE); +} + /* * Free a batch of folios */ @@ -2780,7 +2833,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_frozen_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, FPI_NONE); } if (pcp) { @@ -4841,22 +4894,37 @@ EXPORT_SYMBOL(get_zeroed_page_noprof); * Context: May be called in interrupt context or while holding a normal * spinlock, but not in NMI context or while holding a raw spinlock. */ -void __free_pages(struct page *page, unsigned int order) +static void ___free_pages(struct page *page, unsigned int order, + fpi_t fpi_flags) { /* get PageHead before we drop reference */ int head = PageHead(page); struct alloc_tag *tag = pgalloc_tag_get(page); if (put_page_testzero(page)) - free_frozen_pages(page, order); + __free_frozen_pages(page, order, fpi_flags); else if (!head) { pgalloc_tag_sub_pages(tag, (1 << order) - 1); while (order-- > 0) - free_frozen_pages(page + (1 << order), order); + __free_frozen_pages(page + (1 << order), order, + fpi_flags); } } +void __free_pages(struct page *page, unsigned int order) +{ + ___free_pages(page, order, FPI_NONE); +} EXPORT_SYMBOL(__free_pages); +/* + * Can be called while holding raw_spin_lock or from IRQ and NMI for any + * page type (not only those that came from try_alloc_pages) + */ +void free_pages_nolock(struct page *page, unsigned int order) +{ + ___free_pages(page, order, FPI_TRYLOCK); +} + void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) { diff --git a/mm/page_owner.c b/mm/page_owner.c index 2d6360eaccbb..90e31d0e3ed7 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -294,7 +294,13 @@ void __reset_page_owner(struct page *page, unsigned short order) page_owner = get_page_owner(page_ext); alloc_handle = page_owner->handle; - handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); + /* + * Do not specify GFP_NOWAIT to make gfpflags_allow_spinning() == false + * to prevent issues in stack_depot_save(). + * This is similar to try_alloc_pages() gfp flags, but only used + * to signal stack_depot to avoid spin_locks. + */ + handle = save_stack(__GFP_NOWARN); __update_page_owner_free_handle(page_ext, handle, order, current->pid, current->tgid, free_ts_nsec); page_ext_put(page_ext); From patchwork Sat Feb 22 02:44:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13986516 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20199C021B3 for ; Sat, 22 Feb 2025 02:44:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A366E280004; Fri, 21 Feb 2025 21:44:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 998EC280002; Fri, 21 Feb 2025 21:44:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 812FF280004; Fri, 21 Feb 2025 21:44:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 60D0E280002 for ; Fri, 21 Feb 2025 21:44:53 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 21A8350A79 for ; Sat, 22 Feb 2025 02:44:53 +0000 (UTC) X-FDA: 83146038066.09.AFEF81E Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf16.hostedemail.com (Postfix) with ESMTP id 44A10180014 for ; Sat, 22 Feb 2025 02:44:51 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lCZaFuxc; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740192291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v3GSE8xL4BCDC6o3OqMQgPMvNv4GpRTdC8gwhotmjF4=; b=UOirDdi/u3tIarYVJTmMJA9+3cUsriX7KkBrwgUwSHoL/9RkgjE9RXGUxbtyws+BLTpaI3 cblyVMlgQwEpojc+U0rhxXUG6FyOr/25bv/wxFfPp0U7v5MpLi0Ga1ti/K6UOfDBqLoVDj FaIct8xnMgV+DTaUGFbaNk0SY42+kpw= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lCZaFuxc; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740192291; a=rsa-sha256; cv=none; b=3ImQKYUtKmmV6uXPClrU14M3O+JPm9Vg8/gdHCQ1W2cUzncntDNMMruAisDJ5rb1jJdiBD 1OkYkAWYv+65eAM6bvojwx10BvVnXA/S/EFqpcOZ/5ZdLVFEKNk/1F5fJN9JcZzCHe08nB CcD85U9CjV8ee41eOrkJwTONQXiRCa8= Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2fcc99efe9bso4381989a91.2 for ; Fri, 21 Feb 2025 18:44:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740192290; x=1740797090; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v3GSE8xL4BCDC6o3OqMQgPMvNv4GpRTdC8gwhotmjF4=; b=lCZaFuxcIZGzYu+2Ik59QEZ12+N6pS6VRXl8TzuUHu5ASu8606jbVU7IwSVOJ8vtqs 4h38JejinEp70z1piU93Luu8DGbZtLyBe97iOS5QkBbc4oj0NUr6DVT/Vebci7AbF4L0 ZsxBz0e4SL3hr3/DDUPsn4DxfCU8O8mmOYeqdDW/xITl/RZda2sZ8vqtZAibnAixdJ7i iizvKpGiQsmMJ7EoO2PDGC/ax1a6FLocT9YTdzScfcXjpb3a9qTBnGKUtv0c29isd4Cv qLR3VZo3arPDfGFhmIWSk7VBmOVgbFFxRCSD4plaqjDF2K4s7FTYRKGP2YshjEhyj5uC GdLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740192290; x=1740797090; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v3GSE8xL4BCDC6o3OqMQgPMvNv4GpRTdC8gwhotmjF4=; b=ZIcZ0rYrU9qjqppAKj0WKs3+IYk9P2M/rifCZI1gVTCZXViHVSaQruQsdR5fOYWEFf h2W00ggf2hwhSZYjGXKFkWsUCJj2DZ4Rvh0KSuSoL4ihySNcnySP+unl8gWNT4kh4MQ9 XowiBYfXQjEvDeJFbLETRYwEcuFsvViiuZi7lM9s3yi0N4OZ/qywyLckAToPUYGWUKK1 JDfv8OREaS6Kjok/r9zJNXSnjtrJ4UzPGtS/Mjl9PIzmbiPLOCrKC02CoF8aNTc2dYD8 va/wbufXpePbiFxdrwvmGFqRBJ5secz3vJcWClwUby7FkbmzFrbDDI5YxWqUZmMeZlwD Oa3w== X-Forwarded-Encrypted: i=1; AJvYcCVeN2MAMwQmB/v/fASgdsg+XQs9CyUtX40TorYMlnYDNG6ZTWwh4WU78oqSmHWbgb6p0wMjOoN1Bg==@kvack.org X-Gm-Message-State: AOJu0YxufsN0MO1wfC7910I63++nHn1GaLUJ94DOe2Y0ohkro/2yKBQD 1v8YCiKho1Q1IJG5Q5+YgPwt1iMZVyhoCH1OwXeKWEL3yu/02hxQ X-Gm-Gg: ASbGncvJoJJN+QjxyEGTmsdC/dWnRJ99We/oXV+a0UtvTbRALVYH10cf+4MSb4XOmU+ nHMnqBHymiCJNxdaPxFFDBA1dQoNkB0Op08MuNRLFTDWoefjP2tDsjomUwV5CCg1E+V1QAlmkxZ eYH4jetanU+85n7GtWlleH/JecZrlceUpTY1iaqqp/8NtvstdO6svKKiVTJ14HK69N8IdpMSM4h El4m5WDwS66h1pMbDRPz6sMI8anzuIq5jIf8AIlZxPfSZM01hl06qQ9twj+9bHWALFLiwovL4gP pMFcyYizYbH4qSKwk6xw188Ygto5eZemn03Wjibi2wXQ/swmr63u3g== X-Google-Smtp-Source: AGHT+IFLGHpgxyBxSlJviOmokWQ79A/ZPTGdiqZO1+sWjAAQ8SXex0TBGIOGdCqxJtQew4NhBBCvqg== X-Received: by 2002:a17:90b:2e8c:b0:2ee:d024:e4e2 with SMTP id 98e67ed59e1d1-2fce868cc3bmr9200563a91.7.1740192290163; Fri, 21 Feb 2025 18:44:50 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:fd1b]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fceb0a0ab7sm2122738a91.47.2025.02.21.18.44.48 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 18:44:49 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v9 4/6] memcg: Use trylock to access memcg stock_lock. Date: Fri, 21 Feb 2025 18:44:25 -0800 Message-Id: <20250222024427.30294-5-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250222024427.30294-1-alexei.starovoitov@gmail.com> References: <20250222024427.30294-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: k74jwyyaziyasbndx3bc51ncs7rntxtn X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 44A10180014 X-HE-Tag: 1740192291-608449 X-HE-Meta: U2FsdGVkX1/58bstEGA5zpkv70USyGPJlihnCno5Vud5GqilbaAQkwp98IipEvqCMcaox7bisMM/kLicOsUzqbJthyGnmQBQ9QhG6feSO804aagqnUIquHvPBD2HLRy5jRqVsFz+ydpbE4PbfLfWPiqrijy/d9MsGsMZsda/s9xHUfcr+eO3+jXiCI99LS7ycW33zJwl/lxO97YwKCRHhJpvw+EA+h8vNh83YGCVP2oPu3vNJNM3uuFrtpxihpXEbglJUbTykNghHjFr7yROx7X9Dj7bi0efzJSvzirzN8MesuQ3Zp9f/S9ArZt+v3MUc3ChmIYzRxP8v2P9Lr+rKDU3EHfIt+9HNGI8E/ipoIQizMDoXVbKcLoowM4vHOZKj2cGz76WeQ5rRPDuZQaaEkar27DPkroSiIcqSierkPh8pYFL8icCT2pstb+069767QpcAqwjjnauEdv5hTuI2mMtSmBh14DgIf04uYbvDWVY0Rlf7d1EOoY8wwAdGY0yBFXErEMSrdJ7FwE0J2zGy1BAOBAlXiQxXqHZKBTcS3hqlWHMN+Lol3HXLsQXhXGuM5dPrR3XgIK5qYysFl4tGG2h6Iy6f/xXVSgXkxF/90IH/2Be0oLhvFIgd84Jz9g6VuTEDPB7IbKX0d06Gc7sfuBkFuV3v5vruPbzitnbDK3qG/QTFwVck16ScdrCD9GnjT7+Zcyhe98+9w3EEWzxEdRHJ0g+wiOa4Vy1Vn1QKpEmiNtOlVKaBEbGEesxBIBNG6E8tkIB0LhNIIYb6piyKLdwFiV8PaVDW84r/uUyxeqGdJS1Tj8SRAjGwaspEtG2tXIF3BZXKGBE4zPYPNwBcjPJqlBP13Dj4Rr6x/4sJ6LzL1xyQvLuSqhlfevF/T2V1ONUPcyVFiUamEKSJrHDk74RcbLjk/NQEUGXwszkPxQ7LHCZcpqagYNbexxeqRn1HPo3kvy8dPF38TW9/SV OZtKjyRI gMBCDf+NvsEbJ6AsvTUFFBi5MYbs/uXEOcbU3HzVS1aqpGCbdPrUhVhULYhlslbChBC3vNVqQjeM6fgNlSkgg5TlMVTDnEitbQOabTRS4pL7cdQzgtHawp9xN1Q/FU6cH8v3xSuywmOXQu6UL4KR3DRE8VUktvJc7yfpx4wFqSsVEzifzQ4msUsr3C5RbauvtDxG1//hiHWOHTCl+A9f541iaWwZvkZiqj/1V0q2cv/jQiB7iLiV5GSBir4WisqJdOWFLbPO1XX+40Gz1g1BH5psuEMKBSwVOx5fpqDBBMrSApXbxR9hN1HbbDAu4NmtZw81DwGWunrCRcBKdl8NKUE0682ZVyWALnCEWoFRpDM8FKEnTY7mLwoDXXkweRDUb6/iqm2UlNFSYkQL72I/LSwWBdSlkVSOKFzOhJDJ4bYL3tF5HVA0Pr4ciiZb1gyV3p8rHPsZSlP5S1UTMKG7s6b6rGZZw9AT4KTAfQGVPS9afzcWC49IMa+TRvqJvEgrPNH5BdR2vhFHbUuHLUSApZPQ0ckrpmVOHxygo6zElmeE7BuSxXeKbNXYc4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Teach memcg to operate under trylock conditions when spinning locks cannot be used. localtry_trylock might fail and this would lead to charge cache bypass if the calling context doesn't allow spinning (gfpflags_allow_spinning). In those cases charge the memcg counter directly and fail early if that is not possible. This might cause a pre-mature charge failing but it will allow an opportunistic charging that is safe from try_alloc_pages path. Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- mm/memcontrol.c | 52 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4de6acb9b8ec..97a7307d4932 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1739,7 +1739,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) } struct memcg_stock_pcp { - local_lock_t stock_lock; + localtry_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; @@ -1754,7 +1754,7 @@ struct memcg_stock_pcp { #define FLUSHING_CACHED_CHARGE 0 }; static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = { - .stock_lock = INIT_LOCAL_LOCK(stock_lock), + .stock_lock = INIT_LOCALTRY_LOCK(stock_lock), }; static DEFINE_MUTEX(percpu_charge_mutex); @@ -1773,7 +1773,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, * * returns true if successful, false otherwise. */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, + gfp_t gfp_mask) { struct memcg_stock_pcp *stock; unsigned int stock_pages; @@ -1783,7 +1784,11 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) if (nr_pages > MEMCG_CHARGE_BATCH) return ret; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + if (!gfpflags_allow_spinning(gfp_mask)) + return ret; + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); + } stock = this_cpu_ptr(&memcg_stock); stock_pages = READ_ONCE(stock->nr_pages); @@ -1792,7 +1797,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) ret = true; } - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } @@ -1831,14 +1836,14 @@ static void drain_local_stock(struct work_struct *dummy) * drain_stock races is that we always operate on local CPU stock * here with IRQ disabled */ - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); old = drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); obj_cgroup_put(old); } @@ -1868,9 +1873,20 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { unsigned long flags; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) { + /* + * In case of unlikely failure to lock percpu stock_lock + * uncharge memcg directly. + */ + if (mem_cgroup_is_root(memcg)) + return; + page_counter_uncharge(&memcg->memory, nr_pages); + if (do_memsw_account()) + page_counter_uncharge(&memcg->memsw, nr_pages); + return; + } __refill_stock(memcg, nr_pages); - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); } /* @@ -2213,9 +2229,13 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned long pflags; retry: - if (consume_stock(memcg, nr_pages)) + if (consume_stock(memcg, nr_pages, gfp_mask)) return 0; + if (!gfpflags_allow_spinning(gfp_mask)) + /* Avoid the refill and flush of the older stock */ + batch = nr_pages; + if (!do_memsw_account() || page_counter_try_charge(&memcg->memsw, batch, &counter)) { if (page_counter_try_charge(&memcg->memory, batch, &counter)) @@ -2699,7 +2719,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, unsigned long flags; int *bytes; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); /* @@ -2752,7 +2772,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, if (nr) __mod_objcg_mlstate(objcg, pgdat, idx, nr); - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); obj_cgroup_put(old); } @@ -2762,7 +2782,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) unsigned long flags; bool ret = false; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { @@ -2770,7 +2790,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) ret = true; } - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } @@ -2862,7 +2882,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, unsigned long flags; unsigned int nr_pages = 0; - local_lock_irqsave(&memcg_stock.stock_lock, flags); + localtry_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ @@ -2880,7 +2900,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, stock->nr_bytes &= (PAGE_SIZE - 1); } - local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); obj_cgroup_put(old); if (nr_pages) From patchwork Sat Feb 22 02:44:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13986517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7816C021B3 for ; Sat, 22 Feb 2025 02:44:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68C7D280005; Fri, 21 Feb 2025 21:44:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 615DE280002; Fri, 21 Feb 2025 21:44:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4689C280005; Fri, 21 Feb 2025 21:44:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 25516280002 for ; Fri, 21 Feb 2025 21:44:58 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DD459141130 for ; Sat, 22 Feb 2025 02:44:57 +0000 (UTC) X-FDA: 83146038234.04.E41A36D Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf30.hostedemail.com (Postfix) with ESMTP id 0AF2F80008 for ; Sat, 22 Feb 2025 02:44:55 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nmA1zWy0; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740192296; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bolBZqmxiF6GETckqAk9OFKRadV+pg4VUXhxs/9P6QM=; b=Daa2EHA1Q3u7P9BQGPXB2tUlBjEOrfyJvGAgOjTvE4CrD0Fl0StkIuWSMeDuWyRvXSIdrK PxBNIhPxrX4a3j4vOJVTOgMtzRHILBiALzbbU+ZJoXZ+HU9slveDL0ErmCiaXr69dO8oip uFcsg47WXeKsCZBFHUwt7aRpH+feStQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nmA1zWy0; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740192296; a=rsa-sha256; cv=none; b=3FC3N05TcY1phPP1xZ1//mV0gTAX9gDykZJb4t20INUjmwKpRA16Bx8jwBqBKn/I02Qc8u 3zoOKo3fmnGgbVKdj0YG4cvB9hBsZ4xLvALwWgILaOat2At2XtW8PRf4j/0X83YnF3oJRb rIlRGhKyXqXg08x9iY5PumssQqWhKvw= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-220d601886fso42044395ad.1 for ; Fri, 21 Feb 2025 18:44:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740192295; x=1740797095; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bolBZqmxiF6GETckqAk9OFKRadV+pg4VUXhxs/9P6QM=; b=nmA1zWy0jKXbONY/iDJMbR1WEJQ/umxRWpyg6uo2BhGwcZiA/lgarEt3Z9TI8YQkyw 188wzqP2O2/FLEtvKLmjabrO5K++eShmTrFb7PUknMrtZo3ANUeI5/2T/NDG+E8f8MNs AwXYeqlmeTZmDS/GxeKitOLmRbaU1uKzFj6/pTH3IbHweKS0wVcMrIKPPUg7dbBjv2XS NUiZYsSTI5WXZmfd7Uc0CppDIUdpM3WHAB/j1SMuQitZ61ZUzOWynUH+I23raCvYyI30 64TtagaVsjK8AeAM7mBhCG+lWsyzlvEVcVgtm2bLMxM8B3ha2zZdQvT2KJoc8Gw+pX+8 9j9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740192295; x=1740797095; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bolBZqmxiF6GETckqAk9OFKRadV+pg4VUXhxs/9P6QM=; b=IRlj/PAHUCS9GA0BASSrowB4U6QEXG5yN0HTOyOBgddFIGK/RjxTpdBF39VGkBOiIM vVgF9gaz00R8LM6EYQnWQYhOfF7xe2CmZxaI3Xfj5HrNBv3r7VrJzXr5HBDpYl1lFLmq F5KN+B5Zp1V/cux8/6f0QsU99c6o3/0oqVAGycbd9AXlFsgIdZjBUxv4pXiUZeEsb7Sn f1oIrZ8cFzNTGx4tSyYKthAzI4tnQtiKm33Bjw/f3G0lyKhyvFF6RQ2EfxQy5JLiiKP4 468I2AKF/Cq+E4kvtvDwcKscLzrYDe25W/5iEUr59E/KybsKbP/guLQpsG2X36NfcV8P WMuQ== X-Forwarded-Encrypted: i=1; AJvYcCU5i/6ph0oA3XfsakkVpmtYZnBGoABzyqIrTqjpqsZNJI1EtNPhY/MM/mDpdpvXbcokX9nJgKn6Jg==@kvack.org X-Gm-Message-State: AOJu0Ywc7+e0Q4bK6+FZfQGDPjD6g+eQUVJDXe1BAJfunRwyeoALqteH 4xdFrTbetMqI+ikHv2BHML2yU1mEKmL5QJHXaJESDVKnCN9cmZZK X-Gm-Gg: ASbGncubmCPXsHo7MhQMV7MhPFPIfXlLMVBE0sygjxYfcOKAaosgoQmzTlNqlLX4yxk NjO2EYWcjP6SIkYgua3KXBjWHg7JJn7ibQxpbgpQhB11cFt4A9JwIjuoTKdimtq52dN//llh6Rv 8wYdskXnGdKcyYRrMeKKmnxN1Q1RX7ihN2Mtm4GkRXgoLmsv0GoAoyBdVp+4F50huqg5PtDhOqk TIxXZX4HKLGsKYLBbRvwOmN/Tk8Bhbs2KALeZXS2SXLaqoQ8dW4vUm5inb5/dQCjQm4VjPEJ5DD NzFqrGB6yre9Q6qUOqotZvnW6i0U9C+xXQcK81NK3SJGuXr/4phaRQ== X-Google-Smtp-Source: AGHT+IFK4i6e5Q6HRwwkypHWm+mgn/XZd3aa3wu+p1h8KExIL05px6pJ1dJj76k5LTDiVqFHNgy7uw== X-Received: by 2002:a05:6a20:7f81:b0:1ee:c6bf:7c68 with SMTP id adf61e73a8af0-1eef3c8a93emr10233940637.17.1740192294949; Fri, 21 Feb 2025 18:44:54 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:fd1b]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-ae0950ecb95sm7549891a12.41.2025.02.21.18.44.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 18:44:54 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v9 5/6] mm, bpf: Use memcg in try_alloc_pages(). Date: Fri, 21 Feb 2025 18:44:26 -0800 Message-Id: <20250222024427.30294-6-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250222024427.30294-1-alexei.starovoitov@gmail.com> References: <20250222024427.30294-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0AF2F80008 X-Stat-Signature: 7m4sro98cmwrpgrn1z3t34m36rfzb5sp X-HE-Tag: 1740192295-375828 X-HE-Meta: U2FsdGVkX18ApmSPfDyK/wZUwAOhxRQWmudT+Nw94LtdqHu1edNv3g8oypkhebliCu6cJ98FNzTTaZFfZtcGqAcYxGV4oLJsvMilpA44dnHNGYQkXxjoBNEOmdO9slQFjIAt9CpSJnWziAdlng+ljx4F8Rn6CFcBwopJ2vcap5dNrn+yj3VTPGDnI4LADJ7aldH+iCuo0b6960tHoi3OBy7ipO4XZU18ZwB7kctTFFSVnvVGP1XecU6f+7uthBYlUAko8CRHTWR3ZCJjQ9GQsGNd5QHeE6de+HWTKwaEQBe6I7a1w91dO/HYeVyCSqWIURLBwKyUY/Q/C3unssSg+vWZF0FI6J4XQ+vi5TlUVvIQ0CvpQqE+z/lxIO625/AKfv8cDbWYXCiJc0pgQH/ZCeX23mcferit2Z6A2pvIjuhwrfp85kySIqPBCSlaoJs/8EgkxIJC00wePCtNMHPNQiZS4vRwvuSyHhOsd/46ToM/cXYwetTiG/+frf+b3lfoLTPvrHGefXCcKGSpOd7155nwf7kyOh73qaZDGIHTkH4SEPYgKaIQJpm0u70M1CZ5rN4b0fBbBDGJmkClccmfJffaCV3QF+y+AykQT7NBvoHA9AX+4D00H6FNP6KpNCF7PGsU56iDHbvBn567mpCGhqabe1cRumRIKg1t7xSNpNfYhGsSyDhz1Mh0pNzthDRI2V9BSpEmL7CpqyCxhb8HeBnx00gblvGW+CthxLB9Sx6cioX+x44X4FlEHiKxI5eU4mVweiP2v3EU17UEQmVlre/ixWwvquxtNhckEdSGsVmqQ55Hm4tsFWC3rblJNy80tA6kEJDKsFfsPkfECA4qULysw+zhrL93+wjvzrgKG3EHkvcsycVIhIF4tZno75zCVRCRcOBPVg1ibtQ6zJOshFcjRUTDTC3GU2qvA45KF0H++zbDpk74kQ0EV3VEwrJtEUqhsL8/eyp6ueCZQGB wkCfxxwE tIOF9uJ49IzCgXFqi7UM4gAC1bX7EzCvVWJg4REAutsS3y2Xn51aw0Czsbu/UX4O1sNz3uxfh2oLIKS/j9yl3Y45BOf9WZD4aIunoh90K7m5JQbh1yrdw1Q4mLRxcBG5o3I9EFQToSdt8AltuXT8di81dzqcjcGvkj0D412y5aEXKKbNJNS+E7aAXqCw8MwmJJJ7O+20Nb7xz+Wncduo70BDQ1TzwuXRKAKIY07JX/GeqFxoajI04Bo/T9+envt4sVjBsluDvkHEL2AHQpBvsucCfmyKevINazIQ6e82KUGaG/Wvu9iF4b2aZjegPCOY9vk2FjfAkSBo5/MyS2tX6aAYVbhXN2NnnTwHq4v/PbfRr3ax4O7xydqd/Z7KYkeTUgWnN01nRQnVkaVtQx+EPlzL5jw2y01cZxDPkVTXuFjBh7y+DOpm4zB9StPGl185DgaPgKsj5hexd0OpZICeCUxPEIwRQh8/ZwE4HSs7Ob4K80f+Fhjq+jVtMbg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Unconditionally use __GFP_ACCOUNT in try_alloc_pages(). The caller is responsible to setup memcg correctly. All BPF memory accounting is memcg based. Acked-by: Vlastimil Babka Acked-by: Shakeel Butt Signed-off-by: Alexei Starovoitov --- mm/page_alloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 79b39ad4bb1b..4ad0ba7801a8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7189,7 +7189,8 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) * specify it here to highlight that try_alloc_pages() * doesn't want to deplete reserves. */ - gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC; + gfp_t alloc_gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_NOMEMALLOC + | __GFP_ACCOUNT; unsigned int alloc_flags = ALLOC_TRYLOCK; struct alloc_context ac = { }; struct page *page; @@ -7233,6 +7234,11 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order) /* Unlike regular alloc_pages() there is no __alloc_pages_slowpath(). */ + if (memcg_kmem_online() && page && + unlikely(__memcg_kmem_charge_page(page, alloc_gfp, order) != 0)) { + free_pages_nolock(page, order); + page = NULL; + } trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); return page; From patchwork Sat Feb 22 02:44:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13986518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0C35C021B3 for ; Sat, 22 Feb 2025 02:45:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C5B2280006; Fri, 21 Feb 2025 21:45:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 75100280002; Fri, 21 Feb 2025 21:45:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C80A280006; Fri, 21 Feb 2025 21:45:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 399A2280002 for ; Fri, 21 Feb 2025 21:45:03 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B425150A82 for ; Sat, 22 Feb 2025 02:45:02 +0000 (UTC) X-FDA: 83146038444.11.E26C48E Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf30.hostedemail.com (Postfix) with ESMTP id DF9E680008 for ; Sat, 22 Feb 2025 02:45:00 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fFlXZiOG; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740192300; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SPs3yc+PiAaWT0mDird4KgOQtxAcHODpsiThplYmhyc=; b=iatWt9wQ8yWwtiquCv2JP/cipmSAmiD/Y1rnSmavSUCaU36mjezbQ7W7yfkTxFvLz2EDOi pcutxWxIzT38nX2jc0rRJuzt9UIMgoaEyO+jEUsYd//xtMICGavM3J0GdXjK2QqrkL/p2t 4Boz8UxtqBMfKlheKHYUV24sR0NhNE8= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fFlXZiOG; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740192300; a=rsa-sha256; cv=none; b=lSh8dp/ocRaFRxxrII/xZHpL9dgz6bl+OjVskTzyt//rwVfj5JmBMY3Nuj+FZQPvoi4yXN /7tLg5umkyDKRR7/WOFJBoHHMk/hzr5PYJI1ps6Oj8eVR6qxT0e6k3trqOBDTdgXk3CzwG PeBh+FsKWX4LpuxFZDgOHg/YYlPImOs= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-220d28c215eso43583885ad.1 for ; Fri, 21 Feb 2025 18:45:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740192300; x=1740797100; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SPs3yc+PiAaWT0mDird4KgOQtxAcHODpsiThplYmhyc=; b=fFlXZiOGdukoaj5Nn1V+AnDlvQxEF+OAeoY+us1Ypgm9NUUkRydW+gNQWNcD5I9hey iVHN6sJgBA4WUrA95ARnnnNtF01kYvub8FgHpe7qkxDRHEAFwOUDvxq9ff32zuQqHdnl g5BO4OYitH4zCAPZOQI4vBkSv3VWye3bIvcR8RyGabzSU84ga03z/N9juQENaIqV1vmr s1w9dijl15RGyy5sqhWBBRUOK+pwWY8haEkEX1PLcXZl/ejRuGb5SqVj4JTpV9PelAi6 00xSdLwjSNhxtWkWU7zeSAFv3MSyKTmL733HmZfRwMqNfQcPhhxHvV2uPq/HiobHZG93 95KA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740192300; x=1740797100; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SPs3yc+PiAaWT0mDird4KgOQtxAcHODpsiThplYmhyc=; b=nO/N7VZIJ2ptAQYx9rIW2nmlTELjzrsmP71ypMVlWb7jdkeJCzqQfm3h12D3s9K0Gj DLk/BV/HF2x7eMUD/k9mWY1mMBUny8Z1dtXiywa/BwTwQXwLWNUbEKX5PFBMzNbiZU9S dv4sOxVT4L7SwQ+yNnsFDcIQE+tBfxJi9AVovbNJYZ8+g++xNO87oexMy9yMKlkOq/Lm BWfS3FA2v3NBHqa2mhAJGWzrI7LFKSOuXEAnJTNn0qTX/LU0lp5aJpYuQl76tnw6oRLw jbZ9HU7xmMiO6Vvi85Sue65OZZt8cvNBUsI4pVXaGxCMSMw3IFbnabL+irwZaELcFpIY NSkQ== X-Forwarded-Encrypted: i=1; AJvYcCVptSjHjDQoOaLWpi5r+p21GaxXEXK8tgke4p2lLu4SVTcpsx5MLM7AnSMbB7VJoFX3CRq3H+d+/w==@kvack.org X-Gm-Message-State: AOJu0YxS2K7nyUO8xjBq3cMUd77Etd5K1TeE/5GZkASiGqhrElDci/FC gIfqWBcMIGs+fzjT2uWupbWJoKiziDvDD3yJ8teM/wu55UlTbyUO X-Gm-Gg: ASbGncveRhn4BZOyZPy6JiA7i3jIXXQWBHWwIuXwx8F3r1GpvRQDIS4CdZDcKfU9BBH etbD1JA0aDbB7ykzo8xEu0+SbgUEJ8ovoTkU22lOA9hLG0nt2FeLHk2xxqis5Tt3epR0frL4nQp bZqhKL+aYqz2WUECGln+lXNRgCzLuk2WjAOXMxKVKxv8XhB56pZwtyqEnTSXRRlldzYUkwFY45e 94hAjPUZiPfaNXdUlr41DDnJBxNMtR2vKv4O3rParh4q4cMVHQDqBy25h9FWW9untfCjJ2LG0eg 3rVxyX6+Hh+Eikj3BktZczSHY+LuMj/i3Ox65MLACi9UcOeUV1LTjg== X-Google-Smtp-Source: AGHT+IGe5ruubfI6Kv2JX5V72ftGr8+QNktHSqGRav+UGawI2sLc8KeipECS2WW3ixuo2162gZmKxQ== X-Received: by 2002:a17:903:8c5:b0:21f:4c8b:c514 with SMTP id d9443c01a7336-2219fffa9f3mr82089215ad.45.1740192299832; Fri, 21 Feb 2025 18:44:59 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:fd1b]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d545d43dsm144772515ad.158.2025.02.21.18.44.57 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 18:44:59 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v9 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs. Date: Fri, 21 Feb 2025 18:44:27 -0800 Message-Id: <20250222024427.30294-7-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250222024427.30294-1-alexei.starovoitov@gmail.com> References: <20250222024427.30294-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DF9E680008 X-Stat-Signature: ukwi56ibrqd3ikk5uu7wo16fq8eyyhrf X-Rspam-User: X-HE-Tag: 1740192300-792729 X-HE-Meta: U2FsdGVkX1+Z8f23zzdBCSQH5l3Dzo9DYvM+1kZsvSHQLoUYtPlWR9aJdB/FqklYgPIL5EyTCH7HPVdvuc92/jurVIxHwwp5EkMUN014roKyPyRSJa7t0C4YdQlsouOELDhAGSnDtWMs2jwZ4ISYMJTMOe/ywN+FN0BDLiIJjzg98oeTWjShlPiejXImHqw4kfbP2b8jBuNkJ2eT6S2II2VC/8Ad3CIs5tnJI1JVCVNSSujqPY6f15qJCaIuQ3K3niv6FdogVbO9uAk5OfodbfR5PPAeZO7Xmv7GZ1OcMC3+08PX9rVCSIsQ10z8CtTOWW0IBi2/QX39DjqOWUM0P0mIWOxqnd8/XXIVDjvCUfBX9nlocqOultrpUXFP6Fz6OFzUdzHvm8JSgf0G/rViJ8wQnpomb51V4I7068OQBLvqXbjdXoVW0PtmZ17OgCxB23z1Nj3tgcVxdVnFdcnnh7NtZSU86GJoCfo7yN0EKNCH7b/3Q3QSLRT1qMLwGFP+0AYW6USna6JrkAstaZhaXADqlTmuvNWNBEHHPYI7icBuNyA2VMBNUhKoQnKavc4y8E6E3kNBs4MpiPCNQl3qQi3/j2i6W6fsJ2uTgQI84597p0kE/1R+ItyXH68UciGBnKTKdvoSCrlOKSQKs1ZrrROIgPUbsz6eIOcHmQr6Qddh/VbtEVAkhQRWR2UqaOJaH7JudJF2ty88G7vm10SFp1i36w1lbNqioFK20kgwteOPgjD6OhW231oTQVlQDXXiqqlz55Ko0K/Qfr5Tyq+hHTmtq9qeTewFe2Xoc24YSvBE89XhuTQ9c1/evj1yI4xZZvoYs2CybiIIaPUpYfZuN0wFv46hXCupdszjbcmqTpXh7YgpBciTUSl6rFb38ISagTejT06/UcBXb5tsrHz+myw9A6NXZWv9ZjLpmIBgIwiKC6YivWF9HRW6czWvyaPeqJ5NTrDLQ6NgtH3V2yC WKXQJBbX F2caaV+PJVLYrvBP8CmBB+GPQeEYnkHmCCMzVHrHjPWN+K8X4o9wfGMhfis6e07q2rCEhujUE2hraFL0kJ1LV2+BjsbjLgAg5kGp9TbwLfoiwwLDxk9554rtgS6w5/SNBoQ+yRILqwZku16lxbm14edSx0AJ+MgPr+z6Mo+EJ7eQa78qEv++aYdEmg7jAzKVyE5A3SZ12US0+e1Jxa6yi9J20iYjVLmd8zvFkZIFTGvdgLKQD+/ZCpYMM3xdKSQG+MOV1wWTj7a8e9oNmDt07Y+dq1E3l4+f6Xn6vidwf0Hus3W4mBkBVYx6mPnn2X2d2kPf0NsmWOqwFplouRNj1KTvRNAkYFnf2XG435mkRV8g748hU+LhcSAFXLeWDYjvM3Q4EvcZCXtkjXqLywF+zixBOMwVgqsF8VIAebRvypJrqYVDn1akkya/bnQqlZ66/FDJ+MrNpAUmodn/snNk3ROeEod55Ij2nPFpwQk1CKsuzKOdlX+BBaTqoGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Use try_alloc_pages() and free_pages_nolock() for BPF needs when context doesn't allow using normal alloc_pages. This is a prerequisite for further work. Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 +- kernel/bpf/arena.c | 5 ++--- kernel/bpf/syscall.c | 23 ++++++++++++++++++++--- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 15164787ce7f..aec102868b93 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2354,7 +2354,7 @@ int generic_map_delete_batch(struct bpf_map *map, struct bpf_map *bpf_map_get_curr_or_next(u32 *id); struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id); -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **page_array); #ifdef CONFIG_MEMCG void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 647b709d7d77..0d56cea71602 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -287,7 +287,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGSEGV; /* Account into memcg of the process that created bpf_arena */ - ret = bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, 1, &page); + ret = bpf_map_alloc_pages(map, NUMA_NO_NODE, 1, &page); if (ret) { range_tree_set(&arena->rt, vmf->pgoff, 1); return VM_FAULT_SIGSEGV; @@ -465,8 +465,7 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt if (ret) goto out_free_pages; - ret = bpf_map_alloc_pages(&arena->map, GFP_KERNEL | __GFP_ZERO, - node_id, page_cnt, pages); + ret = bpf_map_alloc_pages(&arena->map, node_id, page_cnt, pages); if (ret) goto out; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index dbd89c13dd32..28680896c6a4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -569,7 +569,24 @@ static void bpf_map_release_memcg(struct bpf_map *map) } #endif -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +static bool can_alloc_pages(void) +{ + return preempt_count() == 0 && !irqs_disabled() && + !IS_ENABLED(CONFIG_PREEMPT_RT); +} + +static struct page *__bpf_alloc_page(int nid) +{ + if (!can_alloc_pages()) + return try_alloc_pages(nid, 0); + + return alloc_pages_node(nid, + GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT + | __GFP_NOWARN, + 0); +} + +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **pages) { unsigned long i, j; @@ -582,14 +599,14 @@ int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, old_memcg = set_active_memcg(memcg); #endif for (i = 0; i < nr_pages; i++) { - pg = alloc_pages_node(nid, gfp | __GFP_ACCOUNT, 0); + pg = __bpf_alloc_page(nid); if (pg) { pages[i] = pg; continue; } for (j = 0; j < i; j++) - __free_page(pages[j]); + free_pages_nolock(pages[j], 0); ret = -ENOMEM; break; }