From patchwork Wed Sep 25 22:47:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 13812501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77DDECCFA05 for ; Wed, 25 Sep 2024 22:47:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B45C6B00B9; Wed, 25 Sep 2024 18:47:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03C7D6B00BA; Wed, 25 Sep 2024 18:47:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E46B56B00BC; Wed, 25 Sep 2024 18:47:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C33556B00B9 for ; Wed, 25 Sep 2024 18:47:39 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3FB36A10A9 for ; Wed, 25 Sep 2024 22:47:39 +0000 (UTC) X-FDA: 82604749038.07.0DD4B76 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) by imf17.hostedemail.com (Postfix) with ESMTP id 941E440004 for ; Wed, 25 Sep 2024 22:47:36 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=poOh5QVY; spf=pass (imf17.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727304395; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g+3OMHz/a1Wuc/jYKVUdzYlQ51tOrama2GsuX31VDms=; b=NoGbwjyHuCHuzCVH9THGQbae9cJ61xoQGQXvgu7YqiBNGkcTM4IrCpBSbep+bKj0cBHVQB BKcvGKGcDncr0yY6Y7fY2VBatY0t7VqvExta2Nkrow4NfLztre/5kKmePjpqrUY39zFeqP OGfC7tm3xa47rpa754MXSBRH8e3roOg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=poOh5QVY; spf=pass (imf17.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727304395; a=rsa-sha256; cv=none; b=hYv/hrGFXFXW6wpohmDOpPJ5RIqN9z4U3z9/G3YMYlLR/LwdIRvf47JOXnPOCePrWorhcw ZF3oD7MGODYpSvmP12qE82qXCQUEkAnfuI6sz7NjPdnXbs+EVIGNpAWc8s/LllWefdKrqx 9YcTLOP/oEmv5UZz6gFGTtzq5dA9bks= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1727304454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g+3OMHz/a1Wuc/jYKVUdzYlQ51tOrama2GsuX31VDms=; b=poOh5QVY7GqwCIGs3N3EgUq00XLMjZVziVPSxT/oZnhktBQ98KhavI5/FmhMPdgT2NDmdX lYrvY/IXi0XwdZcdRgIshA4r1/ScVCRIQJPqZYEKtHPmNWQ0WHyaFJGaetAHpqy/sSFMYP mHZqHopFHnlCTGKvYkanahcImiiMTJA= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Matthew Wilcox , Omar Sandoval , Chris Mason , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH v2 1/2] mm: optimize truncation of shadow entries Date: Wed, 25 Sep 2024 15:47:15 -0700 Message-ID: <20240925224716.2904498-2-shakeel.butt@linux.dev> In-Reply-To: <20240925224716.2904498-1-shakeel.butt@linux.dev> References: <20240925224716.2904498-1-shakeel.butt@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 941E440004 X-Stat-Signature: q54kdhyukizdrac66x7erm4hern635jj X-HE-Tag: 1727304456-693573 X-HE-Meta: U2FsdGVkX1+FJECPBJwYD1Iu2eDDWD+NQAB1Zke0mfzIHNxY0obfpkVHkPG8+ZBB9T/j9CJ7YLJKz61fM6EX44lXk+U3dOHnC/TspxTT0qQChO3zMbitPN1xXswzRwtlT78EJb2WnQ+Ylas4l276VPZJUFYgucqH51XryM0j7UDqno5XOkylFjv67FRUE/3e3vtYy1pAMRiEEc3xlz+oibWN9CyaMH8EgACh0ZQTsFg3OyBJZPfOdBW24CSblsZHa20urGwHdbslazslevAF963eTtQdlJ4lDSI7ihHjd4Q0vmaTibphwvT3H9Hdvy/dSCJNzWfWhb7U02VzHEzpoRl0z5JQkUchIPDh5s60Xe1pXq65k/sRZaHbeqc3OO8MHfKhC3/7B/qKqXEFI+J0bBvaF5bbF0viCTQ17e9Zp5Rci8W8LaxQK1FgYWXRuE3DNLcUdxhhXTpHZUM/Ru0Ux9/BFtP0xbyisrvYKB/Vb6DbYD3xlHXSfzqch0jAXVEMdTm+9+ZiySeHb+wiGfB6U3kx8aIWGLHqFhWrmRrNGzbBvsCIhJDp6ECw4PGLzueYKLn38mVesrHB5FMU9ao8MsdlCQhlJS7e7sAYY0Z17CJQwSXQJSHlLOaLDDyFLivtTsL5G1WQ7wsmVCMMohhqBkimvnUk5B6HulATAD7pLLanWknmqHP+zdryK03eE1mqnTXePZXXvRRb05NIkaMLIRL2AM4bKglf4I/4t4eY+/EWWLgPG18Supc/mII/jOpD61FqAjehLb4Jj8Pvn055LdNEmML1QtOGJ4igBTpxJsvyvhOZH6tfe8IkM53+I3czMoepltJWKvhFWJ9rQvh9v7917T0Kn6l4PSBAAYPywoozct24Z6EC1UWib1bH5sDscsyAuVLMv4JvRM9sTM9vTBkCIE9Zmu+0cMF5y+4f9I1ZtTsixE+Jhy+MGTVB2tSHDn7KDkJPSXZSFixZaq6 5Izzl3lu +NDDCalfTwEv96SgkQHu5fneqHAh5hdiI4YiWO/Bls31f+nHTo/ao9h4eJ3d8vaZTeC2QkVTcsKzOVvcc9J8yO1zHTDiEIrD/5ibM1ROljtycKdrbeQ7n7gGhWgdI9s8ucnZWVsxOWxOaRC9b5sLRpkjNPE+Ii1OI44zqxGi+2OmbwzJpO7HP+pTTMwzf8LvP9cmiy10jttp0xFQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The kernel truncates the page cache in batches of PAGEVEC_SIZE. For each batch, it traverses the page cache tree and collects the entries (folio and shadow entries) in the struct folio_batch. For the shadow entries present in the folio_batch, it has to traverse the page cache tree for each individual entry to remove them. This patch optimize this by removing them in a single tree traversal. On large machines in our production which run workloads manipulating large amount of data, we have observed that a large amount of CPUs are spent on truncation of very large files (100s of GiBs file sizes). More specifically most of time was spent on shadow entries cleanup, so optimizing the shadow entries cleanup, even a little bit, has good impact. To evaluate the changes, we created 200GiB file on a fuse fs and in a memcg. We created the shadow entries by triggering reclaim through memory.reclaim in that specific memcg and measure the simple truncation operation. # time truncate -s 0 file time (sec) Without 5.164 +- 0.059 With-patch 4.21 +- 0.066 (18.47% decrease) Acked-by: Johannes Weiner Signed-off-by: Shakeel Butt --- Changes since v1: - Added a comment on the assumption of indices array (Johannes) mm/truncate.c | 53 +++++++++++++++++++++++++-------------------------- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 0668cd340a46..1d51c023d9c5 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -68,54 +68,53 @@ static void clear_shadow_entries(struct address_space *mapping, * Unconditionally remove exceptional entries. Usually called from truncate * path. Note that the folio_batch may be altered by this function by removing * exceptional entries similar to what folio_batch_remove_exceptionals() does. + * Please note that indices[] has entries in ascending order as guaranteed by + * either find_get_entries() or find_lock_entries(). */ static void truncate_folio_batch_exceptionals(struct address_space *mapping, struct folio_batch *fbatch, pgoff_t *indices) { + XA_STATE(xas, &mapping->i_pages, indices[0]); + int nr = folio_batch_count(fbatch); + struct folio *folio; int i, j; - bool dax; /* Handled by shmem itself */ if (shmem_mapping(mapping)) return; - for (j = 0; j < folio_batch_count(fbatch); j++) + for (j = 0; j < nr; j++) if (xa_is_value(fbatch->folios[j])) break; - if (j == folio_batch_count(fbatch)) + if (j == nr) return; - dax = dax_mapping(mapping); - if (!dax) { - spin_lock(&mapping->host->i_lock); - xa_lock_irq(&mapping->i_pages); + if (dax_mapping(mapping)) { + for (i = j; i < nr; i++) { + if (xa_is_value(fbatch->folios[i])) + dax_delete_mapping_entry(mapping, indices[i]); + } + goto out; } - for (i = j; i < folio_batch_count(fbatch); i++) { - struct folio *folio = fbatch->folios[i]; - pgoff_t index = indices[i]; - - if (!xa_is_value(folio)) { - fbatch->folios[j++] = folio; - continue; - } + xas_set(&xas, indices[j]); + xas_set_update(&xas, workingset_update_node); - if (unlikely(dax)) { - dax_delete_mapping_entry(mapping, index); - continue; - } + spin_lock(&mapping->host->i_lock); + xas_lock_irq(&xas); - __clear_shadow_entry(mapping, index, folio); + xas_for_each(&xas, folio, indices[nr-1]) { + if (xa_is_value(folio)) + xas_store(&xas, NULL); } - if (!dax) { - xa_unlock_irq(&mapping->i_pages); - if (mapping_shrinkable(mapping)) - inode_add_lru(mapping->host); - spin_unlock(&mapping->host->i_lock); - } - fbatch->nr = j; + xas_unlock_irq(&xas); + if (mapping_shrinkable(mapping)) + inode_add_lru(mapping->host); + spin_unlock(&mapping->host->i_lock); +out: + folio_batch_remove_exceptionals(fbatch); } /**