From patchwork Wed Dec 20 22:44:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13500613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F11AFC3DA6E for ; Wed, 20 Dec 2023 22:45:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F7E56B007E; Wed, 20 Dec 2023 17:45:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A64E6B0082; Wed, 20 Dec 2023 17:45:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 448456B0083; Wed, 20 Dec 2023 17:45:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2DEC46B007E for ; Wed, 20 Dec 2023 17:45:14 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ED6DA16010B for ; Wed, 20 Dec 2023 22:45:13 +0000 (UTC) X-FDA: 81588678906.11.BE4E9E2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 5D2F540009 for ; Wed, 20 Dec 2023 22:45:12 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Rf8Vg2wh; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703112312; a=rsa-sha256; cv=none; b=HBUsdmkjCpeTmH9DPLpZQfn0/S5iluPAgTnA1Xt6mMKTmRi2ECCCVp65AiNc+u3Q007VyI 9meccT2KWLX/9SslnLg3CF1ihiI3LRvNkEbu8h0LMBLyiqghKy9hbEauC63UXnMLs+sUzT SksMQ+WtUkBtlexhQqQNT5ZLLPnW0wc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Rf8Vg2wh; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703112312; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=SXpRjUwOAApTFQs7VGlgFv1h+kk8wlkxw5XJ/yaTDVw=; b=rsBMyDwIoN448B8PE7CrnBF+SqruaZFIQwpLW/BX0nKOS3a//BTKYt7r254zPgdX6IaAtL lciwxdUEsaloMz8nrOcWsaNogZqQk+N0oFk8Sy0w0LQTLyEsPfMWV9EJkFLvZ+h9VOlq+s qfGnktKmwpobW9IpfBh5r1ko0y8P8cg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1703112310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=SXpRjUwOAApTFQs7VGlgFv1h+kk8wlkxw5XJ/yaTDVw=; b=Rf8Vg2whRIOtV99WepBGNhjvKpb65RVb4bKFzmA1dV0X/uT7Q749qDWqt8VFDmEC0VTtkW rn3GmULzcPyWH1YVD6ToGnx2pmQTUxLc5QfDgRly1JyT9V52FwoH+e2XBpIQJ9N07fUggH tuLpmbdW6wrZB0DynSL8R2ILk7x1f80= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-117-a7cLD810Mbink02RtMBusA-1; Wed, 20 Dec 2023 17:45:08 -0500 X-MC-Unique: a7cLD810Mbink02RtMBusA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 00E52185A780; Wed, 20 Dec 2023 22:45:08 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.192.101]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53D4240C6EB9; Wed, 20 Dec 2023 22:45:05 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Hugh Dickins , Ryan Roberts , Yin Fengwei , Mike Kravetz , Muchun Song , Peter Xu Subject: [PATCH v2 00/40] mm/rmap: interface overhaul Date: Wed, 20 Dec 2023 23:44:24 +0100 Message-ID: <20231220224504.646757-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5D2F540009 X-Stat-Signature: g8zree1yeh71jcmn6nmknt84f3mrjof5 X-HE-Tag: 1703112312-646404 X-HE-Meta: U2FsdGVkX18G2+GW7XFi0T5s/gst1GyecnRZ7kaSKtR+swPBgdCxyYzcufv8zoX3z6VdP2nwPZZiQLU9mFdZkhIXEva6/hJTwbAb/xuh7PfYfv5mev0Kx8jo/Tv7UaJTbqKzS46rfohPC9BKj+W1LgTOjDYyo8XRC7BSkGNt9DIF3xRdELGdYYxUnzRIH/A/3tX3uCk0RYxVQfJemS7QgMKstGAhHsypGkNIQZj/6/Qrydc6ZO0sV1VOkkoF3JeqVQTI5QHIt4L84taR0ytIfr4oaJmdJqaRGqw0mQFVecEV+junMgiz2RSWg1HffvddEpj9CMQwyNGKfWTItcdkBMqIYIi5QiDkz+GiDo4ndUmfKD8GN0qEJBtjx4hXHzXURJ4nWbNJnVG63cqLem522coAt4hDr0NcZtV2rbWPNrTdQobORZch0I8CfuK8Vy9IIvZWYU2uULOCPZF/K/AbNud201sye9p77YI1CI+O8eVSd5/Nt0u2qBmykQyOG9D6lvrPLotjJCa5GwfnemYZQUmpKCM5g8E+iUARNj1j5Vu5Jquuj9wPK6maZCOGpqOdYQmsAx6Dlpdls+6SSXUENyX5m4aGAYxo8dPw9viWMDWpw/J1d162L3aEJznqoRTK2d4LG27d9lglWu1Yk/++hsHBuE9X8fQiwCwjy+XiEC4YIf9aWWMVhYoMEHsDD2QCfIt1tL7plxw4gfpmqeyZKsCIJbvixWOgZThTEQ1pgrCfk7aYzCZgmxxPSV23Aht9OfiLKqISL5QRvyawC+EaefxfO+UPuruzmn6Q2B6oSUTIC2aloHOlvW1wBOz5a2ruPO/zJfi2muyMfeX54WYAm7tuo4nBLioReZGk/YUG4hzTD/+eDa03ts+qLE2oBvhoMZsaQFbzV73ONC3HBjx9hWBeegRtn59K2gqw9d5FEcn8N0xCkEvvOoiSfUnbmXoqScBuSKBuljpQ3EJf6I7 iVbUrlov L4YtcjrKzbXh/SeGmXrQLnRcRXtSXfIE1mSGVsarIsrM5To8hXAeqdAk0c20ME+OwgfPOp5FBfe7Y2/S6fKSJxDVAvIncCGKCyoPFAXNL8pVVWtd5DzKBGDb0cQ9YIMm6XdIHREVeerzqQU3rOQp9xdUsi/VsywWO3P67rNd6iGh4YAJ0TxczXKSivdPUqyhSu5FeTY3t1VcrF0i0G6RnT8iuTxeE2/A9KYzy1OkJMODY+KUg0YcOTiyQQONBTD9799Mcr3RVwZwG5LDb7MbzUCLQQokQyALbJXK6qTvyi8QexT/Rpdpeu3ObHLFwu3Ir+kl6Z8lxR/KkAFmB/IZ8plwk4x3CF+d0/s7dq6rcuDmGPzXLGascgeTtt/Iinala/TwlON+f50vW9xzTq1LRXZb2KOyLeLGuW0YvwuWTb7rXGRhZLui6HJBzq+hjWO9/+nszE54IJ/U+xxtW/MhOTbg7RtfFK9Ab2kVi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This series overhauls the rmap interface, to get rid of the "bool compound" / RMAP_COMPOUND parameter with the goal of making the interface less error prone, more future proof, and more natural to extend to "batching". Also, this converts the interface to always consume folio+subpage, which speeds up operations on large folios. Further, this series adds PTE-batching variants for 4 rmap functions, whereby only folio_add_anon_rmap_ptes() is used for batching in this series when PTE-remapping a PMD-mapped THP. folio_remove_rmap_ptes(), folio_try_dup_anon_rmap_ptes() and folio_dup_file_rmap_ptes() will soon come in handy[1,2]. This series performs a lot of folio conversion along the way. Most of the added LOC in the diff are only due to documentation. As we're moving to a pte/pmd interface where we clearly express the mapping granularity we are dealing with, we first get the remainder of hugetlb out of the way, as it is special and expected to remain special: it treats everything as a "single logical PTE" and only currently allows entire mappings. Even if we'd ever support partial mappings, I strongly assume the interface and implementation will still differ heavily: hopefull we can avoid working on subpages/subpage mapcounts completely and only add a "count" parameter for them to enable batching. New (extended) hugetlb interface that operates on entire folio: * hugetlb_add_new_anon_rmap() -> Already existed * hugetlb_add_anon_rmap() -> Already existed * hugetlb_try_dup_anon_rmap() * hugetlb_try_share_anon_rmap() * hugetlb_add_file_rmap() * hugetlb_remove_rmap() New "ordinary" interface for small folios / THP:: * folio_add_new_anon_rmap() -> Already existed * folio_add_anon_rmap_[pte|ptes|pmd]() * folio_try_dup_anon_rmap_[pte|ptes|pmd]() * folio_try_share_anon_rmap_[pte|pmd]() * folio_add_file_rmap_[pte|ptes|pmd]() * folio_dup_file_rmap_[pte|ptes|pmd]() * folio_remove_rmap_[pte|ptes|pmd]() folio_add_new_anon_rmap() will always map at the largest granularity possible (currently, a single PMD to cover a PMD-sized THP). Could be extended if ever required. In the future, we might want "_pud" variants and eventually "_pmds" variants for batching. I ran some simple microbenchmarks on an Intel(R) Xeon(R) Silver 4210R: measuring munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE remapping PMD-mapped THPs on 1 GiB of memory. For small folios, there is barely a change (< 1% improvement for me). For PTE-mapped THP: * PTE-remapping a PMD-mapped THP is more than 10% faster. * fork() is more than 4% faster. * MADV_DONTNEED is 2% faster * COW when writing only a single byte on a COW-shared PTE is 1% faster * munmap() barely changes (< 1%). [1] https://lkml.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com [2] https://lkml.kernel.org/r/20231204105440.61448-1-ryan.roberts@arm.com --- If we pull this into mm/unstable in 2023, I'll have my notebook ready to debug next to the Christmas tree. ;) Based on current mm/mm-unstable. Compile-tested with/wihout THP on x86-64 and with defconig on a bunch more. Tested on x86-64. v1 -> v2: * Rebased on top of mm-unstable (minor conflicts) * Move some sanity checks from #6 into #2 -> #5 and leave the remainder in #6 * Call it "rmap_level" instead of "rmap_mode". * Consistently use "int" instead of "unsigned int" in rmap code * Drop some stale comments * Minor comment/description fixups + additions * Spotted one last comment leftover, addressed in the (new) last patch * Added RBs RFC -> v1: * Rebased on top of mm-unstable (containing mTHP) * Use switch()-case and _always_inline for helper functions * Fixed some (intermittend) compile issues and some smaller stuff * folio_try_dup_anon_rmap_[pte|ptes|pmd]() rewrite * Pass nr_pages consistently as "int" * Simplify sanity checks * Added RBs Cc: Andrew Morton Cc: "Matthew Wilcox (Oracle)" Cc: Hugh Dickins Cc: Ryan Roberts Cc: Yin Fengwei Cc: Mike Kravetz Cc: Muchun Song Cc: Peter Xu David Hildenbrand (40): mm/rmap: rename hugepage_add* to hugetlb_add* mm/rmap: introduce and use hugetlb_remove_rmap() mm/rmap: introduce and use hugetlb_add_file_rmap() mm/rmap: introduce and use hugetlb_try_dup_anon_rmap() mm/rmap: introduce and use hugetlb_try_share_anon_rmap() mm/rmap: add hugetlb sanity checks for anon rmap handling mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]() mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]() mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd() mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte() mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte() mm/rmap: remove page_add_file_rmap() mm/rmap: factor out adding folio mappings into __folio_add_rmap() mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]() mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd() mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/rmap: remove page_add_anon_rmap() mm/rmap: remove RMAP_COMPOUND mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]() kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte() mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd() mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte() mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte() mm/memory: page_remove_rmap() -> folio_remove_rmap_pte() mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte() mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte() Documentation: stop referring to page_remove_rmap() mm/rmap: remove page_remove_rmap() mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]() mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]() mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd() mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte() mm/rmap: remove page_try_dup_anon_rmap() mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED mm: remove one last reference to page_add_*_rmap() Documentation/mm/transhuge.rst | 4 +- Documentation/mm/unevictable-lru.rst | 4 +- include/linux/mm.h | 6 +- include/linux/rmap.h | 397 +++++++++++++++++++----- kernel/events/uprobes.c | 2 +- mm/filemap.c | 10 +- mm/gup.c | 2 +- mm/huge_memory.c | 85 +++--- mm/hugetlb.c | 21 +- mm/internal.h | 14 +- mm/khugepaged.c | 17 +- mm/ksm.c | 15 +- mm/memory-failure.c | 4 +- mm/memory.c | 60 ++-- mm/migrate.c | 12 +- mm/migrate_device.c | 41 +-- mm/mmu_gather.c | 2 +- mm/rmap.c | 433 ++++++++++++++++----------- mm/swapfile.c | 2 +- mm/userfaultfd.c | 2 +- 20 files changed, 739 insertions(+), 394 deletions(-) base-commit: 2072407a394d0b3a3056f78a5630903da9471db0