From patchwork Fri Jan 10 15:46:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13935058 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25270E77188 for ; Fri, 10 Jan 2025 15:47:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A385C6B00C3; Fri, 10 Jan 2025 10:47:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E89D6B00C4; Fri, 10 Jan 2025 10:47:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 889956B00C5; Fri, 10 Jan 2025 10:47:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 661466B00C3 for ; Fri, 10 Jan 2025 10:47:13 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A89AB44CDF for ; Fri, 10 Jan 2025 15:47:12 +0000 (UTC) X-FDA: 82991971104.02.E3A0332 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) by imf04.hostedemail.com (Postfix) with ESMTP id CFA3940010 for ; Fri, 10 Jan 2025 15:47:10 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=nXkOUhBM; spf=pass (imf04.hostedemail.com: domain of "prvs=09809d163=kalyazin@amazon.co.uk" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=09809d163=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736524030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=vieCYRbHQr8ti5sbtbp4BprSH2wVytUDu3n2fMb58+U=; b=4KmfA1DuQz68LEEywaAcCF/1hVyKing6NVmsf9HvUm0ToKUN1Yni0TSSc4cclocSByO83t 7mmGd/ojWrNsgn6bgrXg4PlRWJtmZaNedCXIYW+AF7H5iShaW2r6I80YyPjyAIZLaZ6qxo r0dHJQnIh8AJQX4aSK4GDR/YOMjc2So= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=nXkOUhBM; spf=pass (imf04.hostedemail.com: domain of "prvs=09809d163=kalyazin@amazon.co.uk" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=09809d163=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736524030; a=rsa-sha256; cv=none; b=1u1OGQ/+Te1VC0CDwYmFf8Ib7jA2z3JYRe7OWeMV8r7qthTAvSYYYu8Uf1XFJZbUMCP4/5 oTNecNpfAa7WxTj9t+U/wLYTcGBiB/264D3gONcTciKzVf5SkrIqnUkhtLgrngNozi3b6G B5xd4aoCF2QbP62UxTjR8qqh5uBp0sU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1736524031; x=1768060031; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=vieCYRbHQr8ti5sbtbp4BprSH2wVytUDu3n2fMb58+U=; b=nXkOUhBMpp7htraXHBSQZcj0WVMZb4MtfpgQBM28Qyzez94rYBbxcJfG m0FdqvZEEt/0STfO/g4GmTaIRRmxp8m/ZC9rsVcxosGhqTDhGnaymCa8S dWJ7obKutASrlAM62Vx+BeNPDXi6hVWC6ao7aYrZD5ZS8Yl9ISKA5K+2J Y=; X-IronPort-AV: E=Sophos;i="6.12,303,1728950400"; d="scan'208";a="709972795" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2025 15:47:07 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.7.35:8609] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.13.17:2525] with esmtp (Farcaster) id bb1cc44d-cda3-4cca-9ac4-269129b53c73; Fri, 10 Jan 2025 15:47:06 +0000 (UTC) X-Farcaster-Flow-ID: bb1cc44d-cda3-4cca-9ac4-269129b53c73 Received: from EX19D020UWA004.ant.amazon.com (10.13.138.231) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Fri, 10 Jan 2025 15:47:02 +0000 Received: from EX19MTAUWC001.ant.amazon.com (10.250.64.145) by EX19D020UWA004.ant.amazon.com (10.13.138.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Fri, 10 Jan 2025 15:47:01 +0000 Received: from email-imr-corp-prod-iad-all-1b-1323ce6b.us-east-1.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39 via Frontend Transport; Fri, 10 Jan 2025 15:47:01 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-iad-all-1b-1323ce6b.us-east-1.amazon.com (Postfix) with ESMTPS id CCC1F405B0; Fri, 10 Jan 2025 15:46:59 +0000 (UTC) From: Nikita Kalyazin To: , , , , , CC: , , , , , , , , , , , Subject: [RFC PATCH 0/2] mm: filemap: add filemap_grab_folios Date: Fri, 10 Jan 2025 15:46:57 +0000 Message-ID: <20250110154659.95464-1-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: CFA3940010 X-Rspamd-Server: rspam12 X-Stat-Signature: nd74rdo3rxw9izijy3e48mwc1gy87zzr X-Rspam-User: X-HE-Tag: 1736524030-750315 X-HE-Meta: U2FsdGVkX18JKJlwUMvDYAp+zVBXcpwni5g6W47hbcEcfI+UauYVrAulxv1rK5pVuBnOGNPY4UKdf/OpFlbxKbSuEr51PetZhCJrsNOPCYNJEGZ0HEZhSqsHbBscOlfjJ9PQ9AWXzi5xOQzzEu2BJ3dSvCMvTHXFJstjOYHctNmKwu2qQNWYflY5bSuLSBu8X1FrN8GzZ4F8YWYwHjem6jpUvHF8fphkMLYUqpPq2UaMkRBSbipvVEjd+9E9n6KIQhTIUJVxkIJN4LBvduXHldO41xz1obsjBO2b5Df6kvIne6oamVKV3zQKxxbd4eCZW4IsYpKCMnJda8cuckCZRMhsPALB4ttdYB6MSyhP3C43dzuprW2m1fFK/ZWZfYNTkDUYqTHZKV+RFqV2gVDHsMw3CIn8nmVBF4m8whYygQ5e1nNeYomc9PUAEIJ4puvYeTo+srmRM6ICImF+1DUzlTNeWNZb61Xbz6NKDUIcajgeqyL20T4xUnQnqJ+1QPPVFRw3CBSN2av2G6Q5xtw/evYj20M25qHFqLdc7xra5x5eSXVkFjs3zQSHL9oHg2/oIQ32HDK091WegJ3qs0caPfw/95Lamoh2Y5nrS5bGzRFnzfHIYMugnD849+5do+r1k2W4B8o8F/f7ckAoZOKZa/61FpMmJtyJiL33QyBOSHzeZ1SgilNHE160Tt08RbEKH9cqKMbA2uSLaSCfYXpzpFmjd8znqhO/LqQHu/66ZXGA96lYP8P6dMhOx1idT+jtwAoApE8NYIxtlUD9BKhnZuFPj/uKPa3FJ8IM24CLdiAZx5Gbi5jvNttkS19Oy2GWIXNOTbuN23/hbdh+zsTlCnYFaf3kh22+ZqD2o+ZdO2Uu4K8b9VCWLv8bVzaKMsMxR0RPkoMXsN5chF6GmlnEUUP8bXK2AyrBnX2g1GiAAGNfdAq5iXuAn6/CBkqNc1QAlG3eGmSXk6JNFdduw9Q K3xocNwJ 7614CeJu3/gWxpVFzSHJxGd0a8/4m4ZvkNxkgra4g7wvMOlRnXiTgQzxBWbC6BlUjInIfucXk7KElbvgLydd7/LLLLLnz5XhK81yg10X96uu9P34kBHpRYkCMmzW7xxNdK7jIoYFQI0LUnlD2VW9N+yX91aOZ0QEDNe4Nkuj2BKzmC1ppdVt2BAvfXt2zPdOz9yKyNs75FIqHIn60h6dG+ermcDzy7DZM5mE6fo5NufmKH68hoO5Gzz9WyVdoS6Kp0TXmiMaJ8eBOYK+YRt9dGjfgBRCPEXO5xs33Ybj8aMMN4/pSAcyzbHIfRnJiH1/y5143LOwSE/JUF6BB6jPaNnT4aPXZU6e9vjU9jDpV6DOF7YYOpT2jRfr5iySpvK4iWONlwUoozBDHOOt3bT1wgiRpT9wMeWzwTcNH204BqTHiPVMcxupEGLmfBbMpB6K2wOiqLLRWXWVcU0QnnbePPwG6AWCjkjA9t8d15kA8qgg2/3qAL9sIB+lGTlk+uqJMrPhXVT2bZuUWLn407+2OL+7AFKI7yRV/sw0TlMz9s1IgcOcjkJ7an9veWCiWT5AbwOc7q1w13mBXflV8+OBzqCx1z3cF34OJAAF+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.001986, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Based on David's suggestion for speeding up guest_memfd memory population [1] made at the guest_memfd upstream call on 5 Dec 2024 [2], this adds `filemap_grab_folios` that grabs multiple folios at a time. Motivation When profiling guest_memfd population and comparing the results with population of anonymous memory via UFFDIO_COPY, I observed that the former was up to 20% slower, mainly due to adding newly allocated pages to the pagecache. As far as I can see, the two main contributors to it are pagecache locking and tree traversals needed for every folio. The RFC attempts to partially mitigate those by adding multiple folios at a time to the pagecache. Testing With the change applied, I was able to observe a 10.3% (708 to 635 ms) speedup in a selftest that populated 3GiB guest_memfd and a 9.5% (990 to 904 ms) speedup when restoring a 3GiB guest_memfd VM snapshot using a custom Firecracker version, both on Intel Ice Lake. Limitations While `filemap_grab_folios` handles THP/large folios internally and deals with reclaim artifacts in the pagecache (shadows), for simplicity reasons, the RFC does not support those as it demonstrates the optimisation applied to guest_memfd, which only uses small folios and does not support reclaim at the moment. Implementation I am aware of existing filemap APIs operating on folio batches, however I was not able to find one for the use case in question. I was also thinking about making use of the `folio_batch` struct, but was not able to convince myself that it was useful. Instead, a plain array of folio pointers is allocated on stack and passed down the callchain. A bitmap is used to keep track of indexes whose folios were already present in the pagecache to prevent allocations. This does not look very clean to me and I am more than open to hearing about better approaches. Not being an expert in xarray, I do not know an idiomatic way to advance the index if `xas_next` is called directly after instantiation of the state that was never walked, so I used a call to `xas_set`. While the series focuses on optimising _adding_ folios to the pagecache, I was also experimenting with batching of pagecache _querying_. Specifically, I tried to make use of `filemap_get_folios` instead of `filemap_get_entry`, but I could not observe any visible speedup. The series is applied on top of [1] as the 1st patch implements `filemap_grab_folios`, while the 2nd patch makes use of it in the guest_memfd's write syscall as a first user. Questions: - Does the approach look reasonable in general? - Can the API be kept specialised to the non-reclaim-supported case or does it need to be generic? - Would it be sensible to add a specialised small-folio-only version of `filemap_grab_folios` at the beginning and extend it to large folios later on? - Are there better ways to implement batching or even achieve the optimisaton goal in another way? [1]: https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com/T/ [2]: https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?tab=t.0 Thanks Nikita Nikita Kalyazin (2): mm: filemap: add filemap_grab_folios KVM: guest_memfd: use filemap_grab_folios in write include/linux/pagemap.h | 31 +++++ mm/filemap.c | 263 ++++++++++++++++++++++++++++++++++++++++ virt/kvm/guest_memfd.c | 176 ++++++++++++++++++++++----- 3 files changed, 437 insertions(+), 33 deletions(-) base-commit: 643cff38ebe84c39fbd5a0fc3ab053cd941b9f94