From patchwork Fri Jul 19 09:16:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13737039 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61E3CC3DA5D for ; Fri, 19 Jul 2024 09:17:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC5DE6B0088; Fri, 19 Jul 2024 05:17:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E754C6B0089; Fri, 19 Jul 2024 05:17:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3D6E6B008C; Fri, 19 Jul 2024 05:17:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AC1736B0088 for ; Fri, 19 Jul 2024 05:17:27 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5FD951A04FD for ; Fri, 19 Jul 2024 09:17:27 +0000 (UTC) X-FDA: 82355948934.11.77C4273 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf01.hostedemail.com (Postfix) with ESMTP id 5398F40007 for ; Fri, 19 Jul 2024 09:17:25 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=NykYiRAA; dkim=pass header.d=suse.com header.s=susede1 header.b=NykYiRAA; spf=pass (imf01.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721380604; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fmyu5jbhji4elkLqRwur2P37OFeRBcl+AErISbo0FHE=; b=Iq9g1BURBDONZH+3e4y1W/i3nRi7sfUb8qSgR1jQI06nBHY67t/0Mg/ro6tJ87wwbug4fc ut9U3VTW6LaV4LDRfujdJVsmmgk2VD9lnAzI3QBNyIh12uVeRwFKQQGn/TcRYms0onQK9e KFKkoPwYOpUV0+Tmqaj2sj8imWtF3sI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721380604; a=rsa-sha256; cv=none; b=ljaF+0PXNEzfBO1SZGnSqpxMMiJ5+QAqDEFVtg/2+wxWEckDvaNmlc7noZWlifqQ7mE4Tu xVnjjhzZfwj8uxRnzzM5r13fWQb70snfVZxW/tToRl/7/cPIoXrsGq86OXXjYMqedFMH/N s30RC+lUAJrcbr7NvkQjwf1UwF8fH/U= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=NykYiRAA; dkim=pass header.d=suse.com header.s=susede1 header.b=NykYiRAA; spf=pass (imf01.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ADDB01F79B; Fri, 19 Jul 2024 09:17:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721380643; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fmyu5jbhji4elkLqRwur2P37OFeRBcl+AErISbo0FHE=; b=NykYiRAAxu7CsxrlNeU2oyqWd3jlozTIsGR11G4SkfqRlbI0mqPsl3e6Wb1hTNnODc9YrV UgOwGz8b95Bc0JBCiUCqZU3qPFb6PdG+AQkB6mSRNthtOZkH6SUVQLQqHXjjLlolivIcUf WY6pLkbue+RduN6RYvcivFlX4Mm7cwc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721380643; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fmyu5jbhji4elkLqRwur2P37OFeRBcl+AErISbo0FHE=; b=NykYiRAAxu7CsxrlNeU2oyqWd3jlozTIsGR11G4SkfqRlbI0mqPsl3e6Wb1hTNnODc9YrV UgOwGz8b95Bc0JBCiUCqZU3qPFb6PdG+AQkB6mSRNthtOZkH6SUVQLQqHXjjLlolivIcUf WY6pLkbue+RduN6RYvcivFlX4Mm7cwc= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D4E30136F7; Fri, 19 Jul 2024 09:17:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id YLdvIyAvmmZPRAAAD6G6ig (envelope-from ); Fri, 19 Jul 2024 09:17:20 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v6 1/3] memcontrol: define root_mem_cgroup for CONFIG_MEMCG=n cases Date: Fri, 19 Jul 2024 18:46:57 +0930 Message-ID: <299298648bc5689b2f163c7876936179338301ba.1721380449.git.wqu@suse.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 5398F40007 X-Stat-Signature: 34x4bti168uetfwjnwcy9fg9akio4b4t X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721380645-711291 X-HE-Meta: U2FsdGVkX18zI1QFDmEUy2Df0MZ0OcZWVmvEI33/ONDAAloE/QGfCTs0avzguBgu87X5k2x6FQyuOJohhI74G9ISLhWNo1t/n6RLPk/ZABkCMCqBaA2ml8G8VM2f/KAd2HyTvOyeF1OqFEgqBD3cDo1IVKkdeHEvmUcgC0T3TN77wFmVV4fZoVkGS2WkP53l+p3GgMTQ/8xvM+hZpbTDVMul9CwVyJsEKk8GFB0SJOSxW9LfJIgbRjtkaZxkbMHDzqWdYKWaa+0JS3ISMfrFJc38ryZy41TE9Ex09XYEt2rl3tAeLn9J5khQ6+wzOIHPGC+SPrtu7F0+dk4QnybZ4mhgRb+Uhlgptk/c3r09ak8C/W2pG7JQ9YUKBh97K5SSVs1ekxLiflE8FmQSjvqb2vEsJFCwYCbZHSkZgwOjMUljM7FbQmIiCQsefLkWLV4NsmRmaOGuWVfftFxvPP5S1umGpHStM5hI+Wm5iO9e0Tb4M0i8EwMj2RBLQ8OpnD6Xew483RZSl+dBBW1dSsS7Eb0OtMixQgRqwcWZVsuIO39t6veWiBpmGp0Rt9PDm8ORqiB66FYPb+C6L94gSHOsVDgLPwnHuoFdiS6JnOT37zhZXGwTVPHgNFIbfCuZPeQNIkZRntWnkSNaTyu3cAbAF8lA3y5a5qPEPBj3yXVAVlbBDwNdUWO/Z2IPgxkPcX3HnW+5YqBkgNNrQ20buphcW5qzAz5YM5VrM1AHbF8n+1bvLpgIL1uJ90JH++e3I7LXS6vQ+hzK1/OgcYkk14SAwss0wJWQvF5eXHUV906rkZagM77yy8Qp5Sw8/eORiiK97jC9UWmYegXYGGLmZt691rkBfsWhTnLNOZa1rzTXWE4+IitA55KiOqE8Fa5JTWCOMuBGVVuBB8r7TyulYJZOSyAUCHZA1G6EW8KL9x+6d3frDR5brVQVH03EiBVYEW4HsRvJHn1CBej9AuP9Yc4 TlzOHIfO CuxsBvSM/tOLZV8M/w+NHJy5KyTrrhKhzZoHDaOsnbAZNoK95i1gzNOe6du+6r5QQYTmFslkNup0IBmiDEK0f1I78gAGBIhUf3URzB2HAQSAt9uGTqYXsDV0cIuBDwdM4xjbNu2mKlTzlz/Zohn3R4PAAnxMykC5d4+vVAqAAnny4kc4OhQcW7/0uwd+/xLVnqcI0WQ/Wan2/GX5OygxC8BNxb96IGt5CHUd176CWkfyJ1tDv0K90BMceckdyUlibLxyC9d1coBF+Y7bViXAtnE/rJtQAzi4GXftIFGtO9OSfCPUU2qO0ypGoug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There is an incoming btrfs patchset, which will use @root_mem_cgroup as the active cgroup to attach metadata folios to its internal btree inode, so that btrfs can skip the possibly costly charge for the internal inode which is only accessible by btrfs itself. However @root_mem_cgroup is not always defined (not defined for CONFIG_MEMCG=n case), thus all such callers need to do the extra handling for different CONFIG_MEMCG settings. So here we add a special macro definition of root_mem_cgroup, making it to always be NULL. The advantage of this, other than pulling the pointer definition out, is that we will avoid wasting global data section space for such pointer. Signed-off-by: Qu Wenruo --- include/linux/memcontrol.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 030d34e9d117..a268585babdc 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -329,8 +329,6 @@ struct mem_cgroup { */ #define MEMCG_CHARGE_BATCH 64U -extern struct mem_cgroup *root_mem_cgroup; - enum page_memcg_data_flags { /* page->memcg_data is a pointer to an slabobj_ext vector */ MEMCG_DATA_OBJEXTS = (1UL << 0), @@ -346,6 +344,12 @@ enum page_memcg_data_flags { #define __FIRST_OBJEXT_FLAG (1UL << 0) +/* + * For CONFIG_MEMCG=n case, still define a root_mem_cgroup, but that will + * always be NULL and not taking any global data section space. + */ +#define root_mem_cgroup (NULL) + #endif /* CONFIG_MEMCG */ enum objext_flags { From patchwork Fri Jul 19 09:16:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13737040 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 536F7C3DA5D for ; Fri, 19 Jul 2024 09:17:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2E4A6B008C; Fri, 19 Jul 2024 05:17:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDFE96B0092; Fri, 19 Jul 2024 05:17:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A58076B0093; Fri, 19 Jul 2024 05:17:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 80BE16B008C for ; Fri, 19 Jul 2024 05:17:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EEC1C14051B for ; Fri, 19 Jul 2024 09:17:30 +0000 (UTC) X-FDA: 82355949060.16.4E83E53 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf22.hostedemail.com (Postfix) with ESMTP id D426EC0014 for ; Fri, 19 Jul 2024 09:17:28 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=mbTEsiXY; dkim=pass header.d=suse.com header.s=susede1 header.b=mbTEsiXY; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf22.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721380615; a=rsa-sha256; cv=none; b=FvdcifgFqJZYZ3QtTtPMGQgCpB27bztP91bh0Uag6PZBkFONX0BBTz+yazQVps9q0Glvmw HkfriBO4pCpUklVxNZDUYWFK8YODiwQR0epiCX54YaBLut4mLYFEeLjH1GmmoUw9F3fA4W yQ22sk3cDk1Bwo3Ij8mM6X6jNT0iGL0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=mbTEsiXY; dkim=pass header.d=suse.com header.s=susede1 header.b=mbTEsiXY; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf22.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721380615; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s3EcZAi4vQxqy0nGMwVlJFTnfaXGD9fvU+gggCPP2PQ=; b=koUB+58NN/AUB643AfuZFSQTIPV5WQ+eWBK3OCGBMx0zHvvk0bPLcFKuDCaH4z3gNL0uIL vm+5jjoGA1iCYp0F8EX4EbFo+b0fg7z75DeAtgtNGTUe4EhSnt6zOrWOvPoHU/iEmo+JiF rMktdbxuSv3MS8EsNB5OQay868gKHRs= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8649B1F79C; Fri, 19 Jul 2024 09:17:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721380647; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s3EcZAi4vQxqy0nGMwVlJFTnfaXGD9fvU+gggCPP2PQ=; b=mbTEsiXYw587yHmWA8OdqpXmG3VLkE9UPFrS6k5KpYe4ttNHWwohgXza1bgqEnzFhAOXl6 n/1sWeLxL+SSsfVUohj3Z8SQmfhRhVl1X2ZecWizGOt8eGgI+Bqb3J2xMToW8scQt7xTx1 l++YfCiWYu0ZZ2Xb8zoYVmZbR+HrxNM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721380647; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s3EcZAi4vQxqy0nGMwVlJFTnfaXGD9fvU+gggCPP2PQ=; b=mbTEsiXYw587yHmWA8OdqpXmG3VLkE9UPFrS6k5KpYe4ttNHWwohgXza1bgqEnzFhAOXl6 n/1sWeLxL+SSsfVUohj3Z8SQmfhRhVl1X2ZecWizGOt8eGgI+Bqb3J2xMToW8scQt7xTx1 l++YfCiWYu0ZZ2Xb8zoYVmZbR+HrxNM= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 313D6136F7; Fri, 19 Jul 2024 09:17:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id AOMcNyMvmmZPRAAAD6G6ig (envelope-from ); Fri, 19 Jul 2024 09:17:23 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , Vlastimil Babka Subject: [PATCH v6 2/3] btrfs: always uses root memcgroup for filemap_add_folio() Date: Fri, 19 Jul 2024 18:46:58 +0930 Message-ID: <8a425904c03623790d2ffd2ff5ea4944cc6fe876.1721380449.git.wqu@suse.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D426EC0014 X-Stat-Signature: uqkgx11u5wijrma3d3nz7e68sw1s64h8 X-Rspam-User: X-HE-Tag: 1721380648-728017 X-HE-Meta: U2FsdGVkX1+spDl8zlV4mkncBaDGLnrSRETtNG0mDyXOgHpKDRvStaRYT9fp9Zt+MqzrRltnEcgBY+iHaWCxLYeQfxDMLwWGtBB70UDTpbSZmsk3ITDw3WBTBuL7YBvsNbXfQV5Cr3JNWdJTlBGpDhw22OocNn6+49jcb4AOm4w1h+r2x2fQsg8+UJVKoHdpXnSJTQNXn3rQ9oglsu0QIfqF8qr1dEb9lAgIUYuIRZWtakumSlNWMIn1SyrRtIAjsTyqhsgXkY1EF1IhQXWNRoXScg9ejK+VXySYRjqPPY2Cfq5IkB1A6XRrN9IdBiJcVV78ThFtna42dlU2kUak7gf04RcQfJStspSAJ+STyZayZ/eIM8ZDEM6Vq1KQ7aEMjNFPUeO90AInU+ElFj99xLXTrfSnXzDBHvmQ8iQPB/jaTT49jgcUdx9clguhvOam5FlNZpg3ifO98eR3wxo5V7q0ksJBTEW5m1x869jo9wHg0/tuajtCCkwy1WSUJCjHsgxnBMO4E9GKtGEkge/1oACyJFIgVjjNfFa2At8eBmVYSZju0FfotOhh0fq/D0uQZ6M1bIt6j5ezpvCMN/fYvKZ85f6+Gn4O3ftfeXu/qMQ13ouAIBFMp8WGof/y6zmtQd10QZYvU9a6R8RYkFZ/SEf7hinlNnJbYCrIjr+GnAAk9n3GOsfclLmddzHAzI+OBTpFjr4+wOSzERBbm3ecQJupkUwMFDvcfh0fRjKyeoi7uW/OVJc7bFuMXtea9V3ehXqEx/a+qftKR7sUTCG62UzBwbyNCkZ3tga0nsfTmAB9hy1ip2KwHqY3yq8Zx6dxGra//v74VXkweJRyBk/f/dbIPvaY0BzZCYp3UfYLmST1ufAppKCC9cUJnxddIjGlTngishHEJe2bpaXnkLNiyMrp2pbGk5xz0FWK6UDBPI3265YmoJiuqkpRHGGlzC+RHpAFozMxFPhaYPAKyQ+ jVA5qBbd /ryq8N+0ygHCFu1A1y+Kk5YTgS32MUh5Lg7oq5QBf6k0NklcjgyqQagVyey5dWBy+O8XD+mluHBkbgatoKAvGOwqQ+XTG0KVt8ZOvxp8k9N4DGzeq77pkqQwLj7+pNHSYM9LbLymv1ld6p6QVlgORcDzAUfzJvxbm1tC5h26gdIeE1cEiv0foKRUq3k341Oo6RYu2modX9KoHTAV7CBL3Y0QxWjBh1xUvujj2XEuHbCDcpjlzDLwvyW/aT2/faoh+uSIXA7xTEI1ul7kOmpqJj627ynMMGEi62IvZpyj+mOOuxTr2kgTBew+JmfOBW6MoT0XahYX774IUQkE0BguiL5NT3YQXjx7Oh/zlp7Np+3CIUR3hDN13CmRmCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [BACKGROUND] The function filemap_add_folio() charges the memory cgroup, as we assume all page caches are accessible by user space progresses thus needs the cgroup accounting. However btrfs is a special case, it has a very large metadata thanks to its support of data csum (by default it's 4 bytes per 4K data, and can be as large as 32 bytes per 4K data). This means btrfs has to go page cache for its metadata pages, to take advantage of both cache and reclaim ability of filemap. This has a tiny problem, that all btrfs metadata pages have to go through the memcgroup charge, even all those metadata pages are not accessible by the user space, and doing the charging can introduce some latency if there is a memory limits set. Btrfs currently uses __GFP_NOFAIL flag as a workaround for this cgroup charge situation so that metadata pages won't really be limited by memcgroup. [ENHANCEMENT] Instead of relying on __GFP_NOFAIL to avoid charge failure, use root memory cgroup to attach metadata pages. With root memory cgroup, we directly skip the charging part, and only rely on __GFP_NOFAIL for the real memory allocation part. Suggested-by: Michal Hocko Suggested-by: Vlastimil Babka (SUSE) Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index aa7f8148cd0d..cfeed7673009 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2971,6 +2971,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, struct btrfs_fs_info *fs_info = eb->fs_info; struct address_space *mapping = fs_info->btree_inode->i_mapping; + struct mem_cgroup *old_memcg; const unsigned long index = eb->start >> PAGE_SHIFT; struct folio *existing_folio = NULL; int ret; @@ -2981,8 +2982,17 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, ASSERT(eb->folios[i]); retry: + /* + * Btree inode is a btrfs internal inode, and not exposed to any + * user. + * Furthermore we do not want any cgroup limits on this inode. + * So we always use root_mem_cgroup as our active memcg when attaching + * the folios. + */ + old_memcg = set_active_memcg(root_mem_cgroup); ret = filemap_add_folio(mapping, eb->folios[i], index + i, GFP_NOFS | __GFP_NOFAIL); + set_active_memcg(old_memcg); if (!ret) goto finish; From patchwork Fri Jul 19 09:16:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13737041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C897C3DA59 for ; Fri, 19 Jul 2024 09:17:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E04276B0093; Fri, 19 Jul 2024 05:17:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D64356B0095; Fri, 19 Jul 2024 05:17:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB9256B0096; Fri, 19 Jul 2024 05:17:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9B3EF6B0093 for ; Fri, 19 Jul 2024 05:17:34 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4E2B08054F for ; Fri, 19 Jul 2024 09:17:34 +0000 (UTC) X-FDA: 82355949228.28.A2460F9 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf16.hostedemail.com (Postfix) with ESMTP id 39E6A180005 for ; Fri, 19 Jul 2024 09:17:31 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=CsUAb8SO; dkim=pass header.d=suse.com header.s=susede1 header.b=CsUAb8SO; spf=pass (imf16.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721380631; a=rsa-sha256; cv=none; b=xaZrhSZeyQ8+7JtPLuehiR840sfL4KzYw5C8RGY/UJP3yHlZ8Cm9dTDg2/yhiMa+rUtTb4 GlPxNOqQqIzfxs1sRpi0510s+OLqpxyuMocaODYw4CZuDIRWaVmLVeWQdmUS774zx5BKp4 EDCDP6KmBBIgv1gpDCWdr3jY3chOwko= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=CsUAb8SO; dkim=pass header.d=suse.com header.s=susede1 header.b=CsUAb8SO; spf=pass (imf16.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721380631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mYLTpU+TQ80VryXSHafsvunreYjyK5cGZvwUB+lRbyk=; b=tk3fl5Vqk6mYoLcZjk2Uw3raHcEyVkg0ocUt2b3ymroArI6K3E1BhJyTs0GdBRfpiLCQkk DwR7FxkkfmjudVvw/c1bc5s5S7cBTQpUDy7ZFE7KY4jtrNHDa1S0eCl6MXU5lgbZYHhrzW lM4hHYgRnzRcP2KRjpb3Lylo8EpnGlk= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D9C251F823; Fri, 19 Jul 2024 09:17:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721380650; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mYLTpU+TQ80VryXSHafsvunreYjyK5cGZvwUB+lRbyk=; b=CsUAb8SOTf8AI5/ia+PPxaCd7MhxyOq/jr1IFrwIdIsTetlSsfJnPpf17MAAp9bBeiWGPM 1ESBEr5zu24GX6N+UpvpZw11Hz0IKQZMfddmyhtc6XieNQ/dG88YEBOx3NmuI6itE1/3bP iti2c+hCPmuXMI+dyQKCgdBQHXEiqfg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721380650; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mYLTpU+TQ80VryXSHafsvunreYjyK5cGZvwUB+lRbyk=; b=CsUAb8SOTf8AI5/ia+PPxaCd7MhxyOq/jr1IFrwIdIsTetlSsfJnPpf17MAAp9bBeiWGPM 1ESBEr5zu24GX6N+UpvpZw11Hz0IKQZMfddmyhtc6XieNQ/dG88YEBOx3NmuI6itE1/3bP iti2c+hCPmuXMI+dyQKCgdBQHXEiqfg= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 15FD8136F7; Fri, 19 Jul 2024 09:17:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OKwdMCcvmmZPRAAAD6G6ig (envelope-from ); Fri, 19 Jul 2024 09:17:27 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v6 3/3] btrfs: prefer to allocate larger folio for metadata Date: Fri, 19 Jul 2024 18:46:59 +0930 Message-ID: <559b93a99b9640b5857bbb93c35b5f361c941964.1721380449.git.wqu@suse.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: p7711fgrdbyjdpopf9n3589oy4cjuag6 X-Rspamd-Queue-Id: 39E6A180005 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1721380651-881247 X-HE-Meta: U2FsdGVkX18D2J4mZXTs/4VnyrBnvYKk1FapclRA5qgq1ccgOk9waYUqFMXUi68FJbHqvHlxtnwR/dnokJiiW3Voriq3ZCZhrxFnBv1JAnjLJUeKvqK/PZrRMn3hh4doZJqcFoZhn2JORF9VwC1xgblYeaTOw4G+Hxourr8UOVLKDFvwoX+9PHtktZLjBvsm4+vyXEm/zSe4wMGgHE2DustJJgHrfFF9LdQXd7j2XGMr8ChnNH9jeieWotsZFqqW2hYKZ2shZcbE6kgnCCvsBK1fztkRzes8+VHyTprNccoCROn77MW+EUnJxKSID4u3+8GmXZwMKVJFCBtN6Hhf6K8w5b4gvnZMlwQgYiut+maOSugNfhiYp2Im2s6pfFOENbfAQylxJnfc52ZdTUwth+mDgVcrrL08Hp3+bvTqxJPqAgLmDXdOCgMdqgXuymc7MRQSbH5ceX7YN8opqSEo+rrBA4cAlOigxrlZ38C4iTvpStS0vlEdSL/Bdd3hvCiyoOtIVDwRpjEat9cSR25AzU58EYhonUrX7rtjZbWT73S4GKC1eTS5dcX78fdJ7wF2YYJ270/GP2sqWEJdHqyS/oo0KT+NCD6d3h/Yh2W3E1cRg6sPX4v22D0676UPjsBGwCDWZ4Rkd0PsP0PI0AEXYVj2dFkGKjZ5rz4+InsQ7yKwukpNAG6uQ0YMDdB3AitQDJqDU+NQrnA5U4JnSyzCde7pblMS1r2vmBcLIAiZLTcRePoDrwivPENVApWM3thIncIbW60p70YJCf2ybA7pGkB3AD29LZSyhUGt6qDUv1HTYvhaJ/X8JvC4CfZicnqjtAufLvZfJCicYhCEHJ5oEaKv6HHVXAiqY5qOeZYDQ8YdHQ0vHJSqKuTD/gHB6RvoGZdgdFPXZOLk6qNuxKqFgN0sLzn6KG9sM4wkBFfqQ5pSFpNcZUppqC+Zbwhy78zDe0xVcSEwVMK6csCGqoJ 5wRYaEVd tUcYLxa3IyzfYk69+ByVFbAsjkLgP/EHSHDTP+XJlefJqCU/TsmZXfrD2ytNyWIvqmieEmrWsxUWBfwXmhop/zmO/kkqM3frfJx8/6FU7lARkuKmd07EOXyShpZCAbAOqFTY9uFi5QWoc7E5iCjnXL/OqLLVozWsNvcv+H3/NEsLbDSRvSJZgAREskFDsGIiNBLZsoPkgg+PJt4XuwMvzEdKg8CKnCdCZEQX3LsEW9nGQS5F72iT5cybLvq7B49EP81QqEhfzgJa3bozJzGYM+WH63qFJzxJ4BKcnn5Tzxgrj1f8h5noze23gIOrHGnl6hk46NPM4gI28dYAGnoPuDfOhoACIioy0C6olM2YfCqz7xJNjZxS9EIIm8Y8775dNBPJf1OLmv71IPeg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For btrfs metadata, the high order folios are only utilized when all the following conditions are met: - The extent buffer start is aligned to nodesize This should be the common case for any btrfs in the last 5 years. - The nodesize is larger than page size Or there is no need to use larger folios at all. - MM layer can fulfill our folio allocation request - The larger folio must exactly cover the extent buffer No longer no smaller, must be an exact fit. This is to make extent buffer accessors much easier. They only need to check the first slot in eb->folios[], to determine their access unit (need per-page handling or a large folio covering the whole eb). There is another small blockage, filemap APIs can not guarantee the folio size. For example, by default we go 16K nodesize on x86_64, meaning a larger folio we expect would be with order 2 (size 16K). We don't accept 2 order 1 (size 8K) folios, or we fall back to 4 order 0 (page sized) folios. So here we go a different workaround, allocate a order 2 folio first, then attach them to the filemap of metadata. Thus here comes several results related to the attach attempt of eb folios: 1) We can attach the pre-allocated eb folio to filemap This is the most simple and hot path, we just continue our work setting up the extent buffer. 2) There is an existing folio in the filemap 2.0) Subpage case We would reuse the folio no matter what, subpage is doing a different way handling folio->private (a bitmap other than a pointer to an existing eb). 2.1) There is already a live extent buffer attached to the filemap folio This should be more or less hot path, we grab the existing eb and free the current one. 2.2) No live eb. 2.2.1) The filemap folio is larger than eb folio This is a better case, we can reuse the filemap folio, but we need to cleanup all the pre-allocated folios of the new eb before reusing. Later code should take the folio size change into consideration. 2.2.2) The filemap folio is the same size of eb folio We just free the current folio, and reuse the filemap one. No other special handling needed. 2.2.3) The filemap folio is smaller than eb folio This is the most tricky corner case, we can not easily replace the folio in filemap using our eb folio. Thus here we return -EAGAIN, to inform our caller to re-try with order 0 (of course with our larger folio freed). Otherwise all the needed infrastructure is already here, we only need to try allocate larger folio as our first try in alloc_eb_folio_array(). For now, the higher order allocation is only a preferable attempt for debug build, before we had enough test coverage and push it to end users. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 102 ++++++++++++++++++++++++++++--------------- 1 file changed, 68 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index cfeed7673009..d7824644d593 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -719,12 +719,28 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array, * * For now, the folios populated are always in order 0 (aka, single page). */ -static int alloc_eb_folio_array(struct extent_buffer *eb, bool nofail) +static int alloc_eb_folio_array(struct extent_buffer *eb, int order, + bool nofail) { struct page *page_array[INLINE_EXTENT_BUFFER_PAGES] = { 0 }; int num_pages = num_extent_pages(eb); int ret; + if (order) { + gfp_t gfp; + + if (order > 0) + gfp = GFP_NOFS | __GFP_NORETRY | __GFP_NOWARN; + else + gfp = nofail ? (GFP_NOFS | __GFP_NOFAIL) : GFP_NOFS; + eb->folios[0] = folio_alloc(gfp, order); + if (likely(eb->folios[0])) { + eb->folio_size = folio_size(eb->folios[0]); + eb->folio_shift = folio_shift(eb->folios[0]); + return 0; + } + /* Fallback to 0 order (single page) allocation. */ + } ret = btrfs_alloc_page_array(num_pages, page_array, nofail); if (ret < 0) return ret; @@ -2707,7 +2723,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src) */ set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags); - ret = alloc_eb_folio_array(new, false); + ret = alloc_eb_folio_array(new, 0, false); if (ret) { btrfs_release_extent_buffer(new); return NULL; @@ -2740,7 +2756,7 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info, if (!eb) return NULL; - ret = alloc_eb_folio_array(eb, false); + ret = alloc_eb_folio_array(eb, 0, false); if (ret) goto err; @@ -2955,6 +2971,14 @@ static int check_eb_alignment(struct btrfs_fs_info *fs_info, u64 start) return 0; } +static void free_all_eb_folios(struct extent_buffer *eb) +{ + for (int i = 0; i < INLINE_EXTENT_BUFFER_PAGES; i++) { + if (eb->folios[i]) + folio_put(eb->folios[i]); + eb->folios[i] = NULL; + } +} /* * Return 0 if eb->folios[i] is attached to btree inode successfully. @@ -2974,6 +2998,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, struct mem_cgroup *old_memcg; const unsigned long index = eb->start >> PAGE_SHIFT; struct folio *existing_folio = NULL; + const int eb_order = folio_order(eb->folios[0]); int ret; ASSERT(found_eb_ret); @@ -3003,15 +3028,6 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, goto retry; } - /* For now, we should only have single-page folios for btree inode. */ - ASSERT(folio_nr_pages(existing_folio) == 1); - - if (folio_size(existing_folio) != eb->folio_size) { - folio_unlock(existing_folio); - folio_put(existing_folio); - return -EAGAIN; - } - finish: spin_lock(&mapping->i_private_lock); if (existing_folio && fs_info->nodesize < PAGE_SIZE) { @@ -3020,6 +3036,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, eb->folios[i] = existing_folio; } else if (existing_folio) { struct extent_buffer *existing_eb; + int existing_order = folio_order(existing_folio); existing_eb = grab_extent_buffer(fs_info, folio_page(existing_folio, 0)); @@ -3031,9 +3048,34 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, folio_put(existing_folio); return 1; } - /* The extent buffer no longer exists, we can reuse the folio. */ - __free_page(folio_page(eb->folios[i], 0)); - eb->folios[i] = existing_folio; + if (existing_order > eb_order) { + /* + * The existing one has higher order, we need to drop + * all eb folios before resuing it. + * And this should only happen for the first folio. + */ + ASSERT(i == 0); + free_all_eb_folios(eb); + eb->folios[i] = existing_folio; + } else if (existing_order == eb_order) { + /* + * Can safely reuse the filemap folio, just + * release the eb one. + */ + folio_put(eb->folios[i]); + eb->folios[i] = existing_folio; + } else { + /* + * The existing one has lower order. + * + * Just retry and fallback to order 0. + */ + ASSERT(i == 0); + folio_unlock(existing_folio); + folio_put(existing_folio); + spin_unlock(&mapping->i_private_lock); + return -EAGAIN; + } } eb->folio_size = folio_size(eb->folios[i]); eb->folio_shift = folio_shift(eb->folios[i]); @@ -3066,6 +3108,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 lockdep_owner = owner_root; bool page_contig = true; int uptodate = 1; + int order = 0; int ret; if (check_eb_alignment(fs_info, start)) @@ -3082,6 +3125,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, btrfs_warn_32bit_limit(fs_info); #endif + if (IS_ENABLED(CONFIG_BTRFS_DEBUG) && fs_info->nodesize > PAGE_SIZE && + IS_ALIGNED(start, fs_info->nodesize)) + order = ilog2(fs_info->nodesize >> PAGE_SHIFT); + eb = find_extent_buffer(fs_info, start); if (eb) return eb; @@ -3116,7 +3163,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, reallocate: /* Allocate all pages first. */ - ret = alloc_eb_folio_array(eb, true); + ret = alloc_eb_folio_array(eb, order, true); if (ret < 0) { btrfs_free_subpage(prealloc); goto out; @@ -3134,26 +3181,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, } /* - * TODO: Special handling for a corner case where the order of - * folios mismatch between the new eb and filemap. - * - * This happens when: - * - * - the new eb is using higher order folio - * - * - the filemap is still using 0-order folios for the range - * This can happen at the previous eb allocation, and we don't - * have higher order folio for the call. - * - * - the existing eb has already been freed - * - * In this case, we have to free the existing folios first, and - * re-allocate using the same order. - * Thankfully this is not going to happen yet, as we're still - * using 0-order folios. + * Got a corner case where the existing folio is lower order, + * fallback to 0 order and retry. */ if (unlikely(ret == -EAGAIN)) { - ASSERT(0); + order = 0; + free_all_eb_folios(eb); goto reallocate; } attached++; @@ -3164,6 +3197,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, * and free the allocated page. */ folio = eb->folios[i]; + num_folios = num_extent_folios(eb); WARN_ON(btrfs_folio_test_dirty(fs_info, folio, eb->start, eb->len)); /*