From patchwork Sat Sep 28 04:45:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13814626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EDF2CF6493 for ; Sat, 28 Sep 2024 04:46:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 259E26B0186; Sat, 28 Sep 2024 00:46:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 209696B0187; Sat, 28 Sep 2024 00:46:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D1636B0188; Sat, 28 Sep 2024 00:46:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E3B896B0186 for ; Sat, 28 Sep 2024 00:46:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3F9B6161912 for ; Sat, 28 Sep 2024 04:46:08 +0000 (UTC) X-FDA: 82612910016.01.18216EF Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf09.hostedemail.com (Postfix) with ESMTP id 370FE140008 for ; Sat, 28 Sep 2024 04:46:04 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="cX4/qXw4"; dkim=pass header.d=suse.com header.s=susede1 header.b="cX4/qXw4"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727498728; a=rsa-sha256; cv=none; b=GyvJPmt05ooIehUwrGDv/VtHADz0+8wfnJfEb+4Wa/9MBJ5eyPzxr4ijh7oznSwbyQg5Ko HSTijRAYFlM6aoiKSvTEqWFx1TagKpS+58GAnS1bFxWxh5k+6s9S/ohWXvcl0UNqW7Zc6X aUO1MVUD6e5qiBFZv5wx8zG/aVTtoV8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="cX4/qXw4"; dkim=pass header.d=suse.com header.s=susede1 header.b="cX4/qXw4"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf09.hostedemail.com: domain of wqu@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=wqu@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727498728; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=8rnCt/V46QSqAx330W1615tslS9zKGk6eFgRTHFjKk0=; b=bSvnXFMlniDi92DvNpGiDEEvnbkyrKZ2Y+kCE+ij9DeAArMdKaEudvswznnLpnsZZpRx5P 4QX4SCQXhBj3BfZnlIhumkfo6OAYidzaqtpMXaNJQMmKPWR2dckMu9eleEMAM5lefR3T/p p5G3z+hom7+bA9vpe71Graae5nxxrsc= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F20A31F8D7; Sat, 28 Sep 2024 04:46:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1727498763; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=8rnCt/V46QSqAx330W1615tslS9zKGk6eFgRTHFjKk0=; b=cX4/qXw40cY2GcC3rqBQTQpi393Hfj51eqOCs1XiJZOVf0yVOrvna5ViG8Mu3AKuSttA18 are3zTUlDRz/Y5BQ1e5WrexLmIgpY2GvFsMWeu4okw8M9UdJJDx1xg4Rn4Wcb5N8wQC8tz 1L9uGZLSV9eeTUwwna2sG6iYaawcLn8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1727498763; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=8rnCt/V46QSqAx330W1615tslS9zKGk6eFgRTHFjKk0=; b=cX4/qXw40cY2GcC3rqBQTQpi393Hfj51eqOCs1XiJZOVf0yVOrvna5ViG8Mu3AKuSttA18 are3zTUlDRz/Y5BQ1e5WrexLmIgpY2GvFsMWeu4okw8M9UdJJDx1xg4Rn4Wcb5N8wQC8tz 1L9uGZLSV9eeTUwwna2sG6iYaawcLn8= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C885513A6E; Sat, 28 Sep 2024 04:45:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id zzp2IgeK92ZyDQAAD6G6ig (envelope-from ); Sat, 28 Sep 2024 04:45:59 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , "Vlastimil Babka (SUSE)" Subject: [PATCH] btrfs: root memcgroup for metadata filemap_add_folio() Date: Sat, 28 Sep 2024 14:15:56 +0930 Message-ID: X-Mailer: git-send-email 2.46.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 370FE140008 X-Rspamd-Server: rspam01 X-Stat-Signature: no9wruu1ar6jeg9i35s345sjyosw9ke5 X-HE-Tag: 1727498764-784142 X-HE-Meta: U2FsdGVkX1/2/OmFNBm1h9LIlcqlm0KJ3GZfNnRD/gQ4QFvdWcAAAo5ygHWNidPDX0tFK8p53pVSciJO1W3Clq2Sp/zxWB8ahprlTOS0ZdOuqwl+1/Go9n3VDKhFChglErfCxUgL1kjympo6/QxAN2mtai8VxIKZrkCiRKdvBeRqJ2IJUJb8kEj4s9VjKFAIDJztzE6IriVJkWxewSf6Wx+XJRSGwqCf+IlNVv5azF5eUjncw46z4jq2aVed+Y/0sQuCs6HZF2Dpp/Oea9JzakFzgZcpbRFzZoNFjEsJZuzwBYFZZVqPwZIoaDtRMVuB25JsmzxEyEJ4qDyE+IvBTHpfuYbhVCUmbKr5klQ8OiuJRK0/2IDh1ix95Nwan4oFKPxaunzsh6HH7EA6OB7xoOIAeUimAZkmEMnh4Sqf6J8UogGMAwjRx2s5USn/N/QbNwi94/Tc+TnDwx5rTTbVyfmmoaLLlBgwGiY7vgIWpPuchDPYKetaN10JVBuhMBmqSEIhT5rmL1o2/1QXoG5hWI3hbzKIEMiMIfveL7u+3HXv77O5IIEOfyHj4W8KQKODUyQv8g98c6XBSOo7fRlp7qCiTVMdktbQbPhgPJ8QDVTkB15cR6eBFcJ6DIoaIPbweUTl48uwyGA0Qth8txeDto+VnxI1/dMOJD0FsK0Ag1jcq2j7M65l8h38/JZhJjxDIwgP228uCowO4y2IthNy5PdxAOeaNEt538Jzmbruf2AJcCm1XhCdYqF6Y15k2hUIu+sYWcylpY0xeh+SveWec6IolQjsGxGh1kS7xzEU8zMlM7LYorefR+O/yBxwjI92adCrB4gKNKp9UzBTnVFeLT1gtxJj2IqXu+kxEnrReoFZarp1Cu1WCKAzUbNQsKx0ozPVJL7yaS/yVzoaXUPH5r6X12fenJlfNfJAiwMmHCXnxKSbcfLl8QDCcW3qsEovAf8gSpB0BkpcDHN4g0R mzZHsyYg HAAYuHI501Npzkh/4VDYbGnCxerFsb3u2/FklYSDCTH9F56IhGgui5+nq/hhgh1TQHJD3HK4NLPS/GeT97RIl98tBUR61dFPts47xoe+hCzVuNGuL14k0JxlKw3V7lPDCeML1NZj5xYs/kG82teEVOevtBS7gImRi8xZY6VGjx/HvOv/VvkTmdKCVN1o2att76ZBBuYEk5k65N0T/M+DcdoC5XCApfLmIJ4jcaoXnmUZKCp6sHIICvmZccMl76GqD5P3sUIOXgmxV26zU6tAvt2eE4wB9FbuKn8GjL1prJYycwphaM94VDzv+bTy7JqukCsK+qQDrczDp4yZqn21vGsrdMH8q5xeDZ6bPutx0gwQ72eA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [BACKGROUND] The function filemap_add_folio() charges the memory cgroup, as we assume all page caches are accessible by user space progresses thus needs the cgroup accounting. However btrfs is a special case, it has a very large metadata thanks to its support of data csum (by default it's 4 bytes per 4K data, and can be as large as 32 bytes per 4K data). This means btrfs has to go page cache for its metadata pages, to take advantage of both cache and reclaim ability of filemap. This has a tiny problem, that all btrfs metadata pages have to go through the memcgroup charge, even all those metadata pages are not accessible by the user space, and doing the charging can introduce some latency if there is a memory limits set. Btrfs currently uses __GFP_NOFAIL flag as a workaround for this cgroup charge situation so that metadata pages won't really be limited by memcgroup. [ENHANCEMENT] Instead of relying on __GFP_NOFAIL to avoid charge failure, use root memory cgroup to attach metadata pages. Although this needs to export the symbol mem_root_cgroup for CONFIG_MEMCG, or define mem_root_cgroup as NULL for !CONFIG_MEMCG. With root memory cgroup, we directly skip the charging part, and only rely on __GFP_NOFAIL for the real memory allocation part. Suggested-by: Michal Hocko Suggested-by: Vlastimil Babka (SUSE) Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 9 +++++++++ include/linux/memcontrol.h | 2 ++ mm/memcontrol.c | 1 + 3 files changed, 12 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9302fde9c464..a3a3fb825a47 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2919,6 +2919,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, struct address_space *mapping = fs_info->btree_inode->i_mapping; const unsigned long index = eb->start >> PAGE_SHIFT; struct folio *existing_folio = NULL; + struct mem_cgroup *old_memcg; int ret; ASSERT(found_eb_ret); @@ -2927,8 +2928,16 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, ASSERT(eb->folios[i]); retry: + /* + * Btree inode is a btrfs internal inode, and not exposed to any user. + * + * We do not want any cgroup limit on this inode, thus using + * root_mem_cgroup for metadata filemap. + */ + old_memcg = set_active_memcg(root_mem_cgroup); ret = filemap_add_folio(mapping, eb->folios[i], index + i, GFP_NOFS | __GFP_NOFAIL); + set_active_memcg(old_memcg); if (!ret) goto finish; diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0e5bf25d324f..efec74344a4d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1067,6 +1067,8 @@ void split_page_memcg(struct page *head, int old_order, int new_order); #define MEM_CGROUP_ID_SHIFT 0 +#define root_mem_cgroup (NULL) + static inline struct mem_cgroup *folio_memcg(struct folio *folio) { return NULL; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d563fb515766..2dd1f286364d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -75,6 +75,7 @@ struct cgroup_subsys memory_cgrp_subsys __read_mostly; EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; +EXPORT_SYMBOL(root_mem_cgroup); /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg);