From patchwork Fri Jul 19 10:28:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13737165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89702C3DA59 for ; Fri, 19 Jul 2024 10:29:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11BB46B0092; Fri, 19 Jul 2024 06:29:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A5666B0093; Fri, 19 Jul 2024 06:29:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E60136B0095; Fri, 19 Jul 2024 06:29:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C3D7F6B0092 for ; Fri, 19 Jul 2024 06:29:13 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 42B26160592 for ; Fri, 19 Jul 2024 10:29:13 +0000 (UTC) X-FDA: 82356129786.30.F0EC72C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf10.hostedemail.com (Postfix) with ESMTP id 27E0DC0021 for ; Fri, 19 Jul 2024 10:29:10 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="k5swL/Mk"; dkim=pass header.d=suse.com header.s=susede1 header.b="k5swL/Mk"; spf=pass (imf10.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721384930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s3EcZAi4vQxqy0nGMwVlJFTnfaXGD9fvU+gggCPP2PQ=; b=JkVk7v2MJUxTXcHt7WtfLOqsyNzskQ89suvw/odtI1E67u8IrZRa4fpmXgE5+tH3ZahrhB OAjIZ0jHKOrq7R1wqws24uG2j0cg1R+cXQGysk0a+oZwxMmaOSiqRf7Hj76LFjU9OZW7HF WAKJfnllGxOL6qjrXGgX60w9/YWBI+g= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="k5swL/Mk"; dkim=pass header.d=suse.com header.s=susede1 header.b="k5swL/Mk"; spf=pass (imf10.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721384930; a=rsa-sha256; cv=none; b=qQjEPHYAplFfdYg+cU2F5vrMl+ouPeqig+6se6BQNkcivBrCTfE3hiuS5m2taTVOOAcq93 xSsDgLm0Ny334Za3gWyZtpQ5JMNIf6N2huxNm8N49HX9KigmckIjNeQXDUH8YRY20MnkkN WfFrUz13T+RnZtV/M4Rel+aaW7Hbij4= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BD4A81F79D; Fri, 19 Jul 2024 10:29:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721384949; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s3EcZAi4vQxqy0nGMwVlJFTnfaXGD9fvU+gggCPP2PQ=; b=k5swL/MkcBozQOr+CmNrwz55vgkBMo56w49y7GuPx9f3QvRn5KtfViWi8NB5qc2RiUAphJ W7H0v/O2fQN+fkvo/zpdtKa8N5xU2moK/YzqnbXyCqbEltTF/cGUzwFP2N1Cf7BJnHAmfn uhWtnkp3TVj3JAIv3RolC7MtFJk46ek= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1721384949; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s3EcZAi4vQxqy0nGMwVlJFTnfaXGD9fvU+gggCPP2PQ=; b=k5swL/MkcBozQOr+CmNrwz55vgkBMo56w49y7GuPx9f3QvRn5KtfViWi8NB5qc2RiUAphJ W7H0v/O2fQN+fkvo/zpdtKa8N5xU2moK/YzqnbXyCqbEltTF/cGUzwFP2N1Cf7BJnHAmfn uhWtnkp3TVj3JAIv3RolC7MtFJk46ek= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 64C3A132CB; Fri, 19 Jul 2024 10:29:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id QPMACPI/mmb5WAAAD6G6ig (envelope-from ); Fri, 19 Jul 2024 10:29:06 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , Vlastimil Babka Subject: [PATCH v7 2/3] btrfs: always uses root memcgroup for filemap_add_folio() Date: Fri, 19 Jul 2024 19:58:40 +0930 Message-ID: <6a9ba2c8e70c7b5c4316404612f281a031f847da.1721384771.git.wqu@suse.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Action: no action X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 27E0DC0021 X-Stat-Signature: 7ecxta5np1sb6uzpxut9t1bu3kajm9bg X-Rspam-User: X-HE-Tag: 1721384950-731572 X-HE-Meta: U2FsdGVkX18QTtOqW7H+bE9pqkSGkkkx5ONsnhuk8KEIqd7lDqDM+lnQWODpmq5cu5OTS0kn8AQC0XQuJHMMbn7182OhvJcgrAAjcHYRgpqnpGdjpNARu+KRj8x5clpn3vTB+bzRB8azdtRwHeq3C58M8rC7KxQt+Ax2sxCYRtw9DiJTLAX7FGo7ugvLmGoa3Wlu+pSURENo2H9Y1cauTbbPMzPm4Vxt9UElkrIjtitwa0pFt+YyeSoleNv9vegV3Vntm7Zr9ytKx96no7vopuPxCDFnAYB318IOFxtGBaElPZeXY+Lr+DCFfLgg+zvsg2Z8p3S8d22o33PKFcc+pwvZOOwA1Y9aYqWLzaq3Eb2e4593UcLaeBGlM5SJJEOzwfJi0Q9JGAyt2jDhz7AveKj94Xwc4rgvRjT5e4OeBkvXHFjken4BM83qdJixMv3bHvGSgMAPt1d7Yigodswvmu1hpjYhjQMhMaR3X9OWuWa0D7n5TsAcLMzJ2y016eDNTCqkmm21du71God4RLtOCs8XwcjgvR6uwYUFIwtRP4gDCZxKW2yco4S/gRAcbd6vK9dG5HjyAeyOm/33ToGMTBBSZnygYXGAcPR/TyLL0/nztg6YIuS7ReCrEvAitKnuDggXl9jcDezmcfvM64ozt/kKo42GEWh2tMTlaFdlrG5kU3q8F3RF6om0nLILq57BFw+t8SbnWOuF6Z5/ZVBWCOLDMLEtCp7p0Dt0teKnaJRDDlhlbEsWSsnSGg4QeHoZRdZRTweu5O43p+mdUhflup+Bw6rgXg1FVE+dCs3xZyR7qOg8xg8WXhd5V6D1OAEIi5h1Z0QsPRWPBSJHXOQrCVVcks/LRxiYdNGQ/aJKjptWFqXxg+DALcSzJg4dN3wypuAwWUww0ZpVbamvEWnK486Um/9EOEoGg9KplTXWcaBzU4j2eqgVvHJl7ln+/a7nlJIwP92IoqNnSzUZD/c HYMuhCN4 uSZEyeIw8MMZ//JjsYr/DPeurDbGR0VtJfJ/L2SKYWIJXv1IakncPjLLf57ee0Ra0ZztQDTfOqQxMfEzVqDzzy5svCBRaOPW092hK/WtGPRiXgetiLwjlaU/R26r0SaxMOX/TXad21xwjaQ9UpYulxqRx3pnfSNExFyq9iq3/era+ray9xPWo0lbDx29m4khXg+7giZ7rC2myn7g9aP6qe8SUnUJ0Nm/yGkrwTZiefxxq1t1nbvEIMyIvPMWZOkBmdyMmYOVEsBNUZfGN7CHTkOxrZONW/iT1WxUiTEP7qqTYqugbRbO/YSI/bcuv+a83yOJCUNKDngYfGQheFA/tDEtrVAOWmkbOukW31vEyOf3cbpqBI/z20AajBrtGC3ZTnTXoHU6hBA1sTz1h8cbAS0ZSHvy+KfLLUWkr8LxARq5qt871WPIwdSm1KRWX4RxCc1UB/xi4D+klHm0LGMp6GPNsb5HPZZttCTl3zN9V4uwd6RzBihXa9WqR+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [BACKGROUND] The function filemap_add_folio() charges the memory cgroup, as we assume all page caches are accessible by user space progresses thus needs the cgroup accounting. However btrfs is a special case, it has a very large metadata thanks to its support of data csum (by default it's 4 bytes per 4K data, and can be as large as 32 bytes per 4K data). This means btrfs has to go page cache for its metadata pages, to take advantage of both cache and reclaim ability of filemap. This has a tiny problem, that all btrfs metadata pages have to go through the memcgroup charge, even all those metadata pages are not accessible by the user space, and doing the charging can introduce some latency if there is a memory limits set. Btrfs currently uses __GFP_NOFAIL flag as a workaround for this cgroup charge situation so that metadata pages won't really be limited by memcgroup. [ENHANCEMENT] Instead of relying on __GFP_NOFAIL to avoid charge failure, use root memory cgroup to attach metadata pages. With root memory cgroup, we directly skip the charging part, and only rely on __GFP_NOFAIL for the real memory allocation part. Suggested-by: Michal Hocko Suggested-by: Vlastimil Babka (SUSE) Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index aa7f8148cd0d..cfeed7673009 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2971,6 +2971,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, struct btrfs_fs_info *fs_info = eb->fs_info; struct address_space *mapping = fs_info->btree_inode->i_mapping; + struct mem_cgroup *old_memcg; const unsigned long index = eb->start >> PAGE_SHIFT; struct folio *existing_folio = NULL; int ret; @@ -2981,8 +2982,17 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, ASSERT(eb->folios[i]); retry: + /* + * Btree inode is a btrfs internal inode, and not exposed to any + * user. + * Furthermore we do not want any cgroup limits on this inode. + * So we always use root_mem_cgroup as our active memcg when attaching + * the folios. + */ + old_memcg = set_active_memcg(root_mem_cgroup); ret = filemap_add_folio(mapping, eb->folios[i], index + i, GFP_NOFS | __GFP_NOFAIL); + set_active_memcg(old_memcg); if (!ret) goto finish;