From patchwork Fri Jan 24 03:56:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13948878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55E50C02181 for ; Fri, 24 Jan 2025 03:57:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6863E28002F; Thu, 23 Jan 2025 22:57:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6332028002E; Thu, 23 Jan 2025 22:57:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FA6228002F; Thu, 23 Jan 2025 22:57:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 31D8928002E for ; Thu, 23 Jan 2025 22:57:02 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8E073478B6 for ; Fri, 24 Jan 2025 03:57:01 +0000 (UTC) X-FDA: 83040984642.25.60B2AC1 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf21.hostedemail.com (Postfix) with ESMTP id B7E471C0010 for ; Fri, 24 Jan 2025 03:56:59 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WnEMyXE3; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737691019; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=MLZNcZOzM4HMR5QxZcLvTip7sFNfzroHw5hu6l3uVKs=; b=smsVOwa8H6OASMKn2ZaBeQbmQWu0QriYoqItv/Iycxi4HdM0hISIu/3I7ij8KfjlM94mBy RgrzcFDrLk2v9UcxBvF40Ol8lCkml9jc8jbm2qoqEsLREB53WiIwV3njKr35lOfBjdi78p 8lqSVHj64EMhI0U/ZnkAbkn7h64lrhs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737691019; a=rsa-sha256; cv=none; b=7OLltrrsdQHeI250bnXbP54XZdGr1ZziMnnpz0vQvkTII1Cl1IdAvA+dXMU5zFJ2/V3p36 XNy3clCzaGPgFPdST/j7El3awV7gm6j7HgKm/ul9/HQ08aRPFgsxctUKHEy1se9OYVfSkC yZ2PNYdwiJYIEqnzOLeigKZMIp2LL7Q= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WnEMyXE3; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-2ef8c012913so2380883a91.3 for ; Thu, 23 Jan 2025 19:56:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737691018; x=1738295818; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=MLZNcZOzM4HMR5QxZcLvTip7sFNfzroHw5hu6l3uVKs=; b=WnEMyXE3TEuHFzBicvZ/Z7iggxw1xU8SkFWQ0behOShFrVe6+9AGkg5ndCcpgNyfZa 5ll2MUaFRGIR2oopuD+ACMQy0/L3FGc2JTCe37D5Rqb2G5Xhbe3ubN7uBxEUqBhaGwfn sbKZIye0p63wIDFI5OQ9LjnGdWbTBYpTOhqmrfJwXF6zhMON4YH4EWmBIaKKLR5yTw1C RgdY8WlZtgb5r4EKfIeCwHCKPNdqLo7xtHNiTiRvUba9YJ0R7BjAjzzUdUGW5weBZpGH 5T0Lq2DnFmEz3gb0G0tpWniaYtFd6LhyU/SeVg5qI8Ua3nf8BVhfiV2vVK4ieO8ThRT4 jWHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737691018; x=1738295818; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MLZNcZOzM4HMR5QxZcLvTip7sFNfzroHw5hu6l3uVKs=; b=rhvkJ51/3x0gaQJ7QyX0Bok36E+KOnBHNfstIoO6/o20WzrMm+WOSjmtb2SPPEavvz xDXf10kQlXnim1UX1RS3PLEmWMV6q1L0b8FB0hMJyxv7AC5cEgIusdY7nrM+1XgyqSRi hCluzHMqYOpYR/nXQSqV5iheI8dYzzMtxWg+i0M7cY0Ez0Kj3EL9zQfChh9w1NO3Dqq4 cImMGrHCkD84sqrv8KSDE6yvEY3dc46JjyM78+Djajjx2Za8G5VUW1Lbxeh7BUd4T4RB uo93Le0Bh7c2+tfUTVY8Pu5AI1N9Hxz8Kxsr4C9Nj319/yQzVDMACdQXNYYnvfyHXyNz ZjLw== X-Forwarded-Encrypted: i=1; AJvYcCU79g34XzEe6CLVIzg03zZQUX8V+iE0999HKpQ+dJqD0/hu0h5/FRNtTTiv2oa+9GqAzWbjBPGpSA==@kvack.org X-Gm-Message-State: AOJu0YzPmi30y56zXdSWEfzF/KTKptftf81Owkf8ELq7DFaNWMEWG2jP Gek/18JVVtIpp2VXZfnLe2WX1Sc0232lmsoQE2lByhU6ZTpQkTVv X-Gm-Gg: ASbGnctaJDhRzrBaVIUU1LHFjz36oKzvsyNE1ql3FnWNNB7s0JxPSz7DJ8E3UwwtOdo 1oegbHuGeA3BZFyD2mfEseVZ3u9PE1IQ3PpxxDutBb4yKFdI0Z1lXbJnf/a4UanWKSYQSiQ64F1 270sGDKneRzAOkhLuhqctFynzijWIe7uX4WruF5aaSELlCBKpziLVEPtDEZiov2758b4qEPDQxx 3QniybKHuVpp9QUin6BeL88weC7bpBaCgTk7gdBWqwzMh4UFIlrW5AqqF9I26yH5dZAeJ86buSR IGiAF4maEoBOn+9jzayM63Ix6t64runpHys6a9Q= X-Google-Smtp-Source: AGHT+IFd/i+i3ARovVNSw+aB9/e2We/yPoeff71p/2Ozdal3L84ASztsnD4x85dfNBBwdATQXMdVCQ== X-Received: by 2002:a17:90b:51c1:b0:2ee:b2fe:eeeb with SMTP id 98e67ed59e1d1-2f782d2ea80mr34100890a91.22.1737691018229; Thu, 23 Jan 2025 19:56:58 -0800 (PST) Received: from macbookpro.lan ([2603:3023:16e:5000:8af:ecd2:44cd:8027]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f7ffa67f7asm639692a91.23.2025.01.23.19.56.57 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 23 Jan 2025 19:56:57 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v6 0/6] bpf, mm: Introduce try_alloc_pages() Date: Thu, 23 Jan 2025 19:56:49 -0800 Message-Id: <20250124035655.78899-1-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) MIME-Version: 1.0 X-Stat-Signature: jyfcqr63k3k8ogciun1jq5z41j3d6orw X-Rspamd-Queue-Id: B7E471C0010 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1737691019-649989 X-HE-Meta: U2FsdGVkX1/dpBtbz5UeL9Bd2fgAhv5oLa8f828oJ8G7VpEQoJzbEJCoOVr4hNSrz/So5ETlvo+ogRuW61j7aC1w3G8dQgCfg9dGqoDkXFZzVykvevtio08G2SVzrHuVMTOvyxjsS0sqxUCCt2FY3PzMtMFNdllcfOqZ+vmunCYacq/SAvXKK3Kc5Kcx0LoDC5kAsh7PAYWnTmGzxIJdfwKihqDKqq+CzOLeP2OQtNocgK8F9H4puQk/6F+edmvjbGIYlC1kgnsHc1LpJ9TFQ6Vu1WxEnJx+YBFnhVRxoICkrlxS6QOrU1IWoRCbGu6CHQ0m9Ww5e2shgjtcs+lZeySattElVbk178LBtS+KwCB8/aekSNCF5acy+CNHfwfOTz/VIuIOBr9IiCGZVgai/TP3xOuoiNelXhOF9FNjaBujBSX7B/at8o4qjdV66sdmCFIzSWMteHCcyOgJtJOnKT9bLPSHiRw4njxIHboGpnxqGo+cM/nxis7nKSY0BYOGg4khthCzy58ScQbymw2Q8ZqD1GBTvKuGjT0x7vBlNQm31IANR3cEapiLXV/stwFowqopVU+mi7MX0+HNvLGQKVpv9DHiWwiaj8lQgc29bFqI5VrZT/cYUucF5O9aRKL4gWTz6NL9F7oSICICX0e9l06THVSiuaMnvyWeAC+Jjy78AUDx19YTGZEgEO4/JdcNaGIkaeMEc1mJOfj44aem6nMealB19c5zpXsigOq53au97QJc77kIlFhC1uACSq56bMcm6lFvdCcWFn7tUrrp9IeFYaoqn01ZkCoSpC9pWJSJNYTp44b+AX5CkZE7jrD3T/dg3sJPVsWoF9yvIkelNg09IRR+p2AS0117zo/4G0v38VK8ABnyl+C/88/HzNY1QUbdXgA1IU4pxR0Tjc2GUm8sq/in1XgZRZ9LGsRL0lIDYiQvQLoaL6DNt90dSMYwyoME0q97q4vvKbv5ezC ILXYeZXE 3XaUbRmARNUMqkbNtp3rDHRA2+RSCSG1+oa/JuA+S6OnYT9vbs6gCDfRvqGAw5Ub0JRIS28osUUqhfLBym2Opo81Ad9YoZRNl7DxOmHPveGzwYe7w/7knmdMbOqDh6MMJRN3+ANxuXUvKdRc3NU+4GiUWu10DYU6A/2IdHMuWbMUdUwGMAFy+sV3lXiA54M2iP1brHkVIrZDyKvA3qQFFn+xNBpKawlkUlOf1uVVcbjHn66Xwi8IUFCODW4Mg9kJWeiHsmKx4cGzbnT7d9N/c6oEBpy3GmPcirB4971uLsAilJ1in9ijqTIIFbecan0xU8l1ECDZtwau5x61GjNLbkIJj+KcHp812NRzgPuDlqOmO6rUkmKasQbdHp1nImC4ifjOSz93CkV6Vl5fNxgJVHsiVORZwpkrrX5/ThzaGcRfOQwJy34uKHbL6JV3ZRp0oKHgxGYnDw/r/qxM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Hi All, The main motivation is to make alloc page and slab reentrant and remove bpf_mem_alloc. v5->v6: - Addressed comments from Sebastian, Vlastimil - New approach for local_lock_t in patch 3. Instead of unconditionally increasing local_lock_t size to 4 bytes introduce local_trylock_t and use _Generic() tricks to manipulate active field. - Address stackdepot reentrance issues. alloc part in patch 1 and free part in patch 2. - Inlined mem_cgroup_cancel_charge() in patch 4 since this helper is being removed. - Added Acks. - Dropped failslab, kfence, kmemleak patch. - Improved bpf_map_alloc_pages() in patch 6 a bit to demo intended usage. It will be refactored further. - Considered using __GFP_COMP in try_alloc_pages to simplify free_pages_nolock a bit, but then decided to make it work for all types of pages, since free_pages_nolock() is used by stackdepot and currently it's using non-compound order 2. I felt it's best to leave it as-is and make free_pages_nolock() support all pages. v5: https://lore.kernel.org/all/20250115021746.34691-1-alexei.starovoitov@gmail.com/ v4->v5: - Fixed patch 1 and 4 commit logs and comments per Michal suggestions. Added Acks. - Added patch 6 to make failslab, kfence, kmemleak complaint with trylock mode. It's a prerequisite for reentrant slab patches. v4: https://lore.kernel.org/bpf/20250114021922.92609-1-alexei.starovoitov@gmail.com/ v3->v4: Addressed feedback from Michal and Shakeel: - GFP_TRYLOCK flag is gone. gfpflags_allow_spinning() is used instead. - Improved comments and commit logs. v3: https://lore.kernel.org/bpf/20241218030720.1602449-1-alexei.starovoitov@gmail.com/ v2->v3: To address the issues spotted by Sebastian, Vlastimil, Steven: - Made GFP_TRYLOCK internal to mm/internal.h try_alloc_pages() and free_pages_nolock() are the only interfaces. - Since spin_trylock() is not safe in RT from hard IRQ and NMI disable such usage in lock_trylock and in try_alloc_pages(). In such case free_pages_nolock() falls back to llist right away. - Process trylock_free_pages llist when preemptible. - Check for things like unaccepted memory and order <= 3 early. - Don't call into __alloc_pages_slowpath() at all. - Inspired by Vlastimil's struct local_tryirq_lock adopted it in local_lock_t. Extra 4 bytes in !RT in local_lock_t shouldn't affect any of the current local_lock_t users. This is patch 3. - Tested with bpf selftests in RT and !RT and realized how much more work is necessary on bpf side to play nice with RT. The urgency of this work got higher. The alternative is to convert bpf bits left and right to bpf_mem_alloc. v2: https://lore.kernel.org/bpf/20241210023936.46871-1-alexei.starovoitov@gmail.com/ v1->v2: - fixed buggy try_alloc_pages_noprof() in PREEMPT_RT. Thanks Peter. - optimize all paths by doing spin_trylock_irqsave() first and only then check for gfp_flags & __GFP_TRYLOCK. Then spin_lock_irqsave() if it's a regular mode. So new gfp flag will not add performance overhead. - patches 2-5 are new. They introduce lockless and/or trylock free_pages_nolock() and memcg support. So it's in usable shape for bpf in patch 6. v1: https://lore.kernel.org/bpf/20241116014854.55141-1-alexei.starovoitov@gmail.com/ Alexei Starovoitov (6): mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation mm, bpf: Introduce free_pages_nolock() locking/local_lock: Introduce local_trylock_t and local_trylock_irqsave() memcg: Use trylock to access memcg stock_lock. mm, bpf: Use memcg in try_alloc_pages(). bpf: Use try_alloc_pages() to allocate pages for bpf needs. include/linux/bpf.h | 2 +- include/linux/gfp.h | 23 ++++ include/linux/local_lock.h | 9 ++ include/linux/local_lock_internal.h | 79 ++++++++++- include/linux/mm_types.h | 4 + include/linux/mmzone.h | 3 + kernel/bpf/arena.c | 5 +- kernel/bpf/syscall.c | 23 +++- lib/stackdepot.c | 10 +- mm/internal.h | 1 + mm/memcontrol.c | 30 ++++- mm/page_alloc.c | 200 ++++++++++++++++++++++++++-- mm/page_owner.c | 8 +- 13 files changed, 365 insertions(+), 32 deletions(-)