From patchwork Mon Dec 4 14:21:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13478491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66135C10DCE for ; Mon, 4 Dec 2023 14:21:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6459B6B008A; Mon, 4 Dec 2023 09:21:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F2DC6B008C; Mon, 4 Dec 2023 09:21:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46C8A6B0092; Mon, 4 Dec 2023 09:21:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 30DF16B008A for ; Mon, 4 Dec 2023 09:21:56 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 08E5C802B1 for ; Mon, 4 Dec 2023 14:21:56 +0000 (UTC) X-FDA: 81529349832.23.AAC9AC6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 64357140018 for ; Mon, 4 Dec 2023 14:21:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XtjNlDZ+; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701699713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=1DVxi53AdHJct3hXH5+VGS1svVEDHY6RYmwnFv3N0GA=; b=Qx8oAbe1arIieH6cmwDNkfcKN2CzWR1yxRuX1wvLWcTBuvgiErY5hPnASubo7uikNhQr13 Yqv5SfUUA6aBgQWdQRsluPxp5TT/IAkeJtlRowAmUzhdD/1521MsYZjr3Y1lF+96fAdH7k 0IGz7nbw2LLLzovJLGwYwNaAjGKBEmQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XtjNlDZ+; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701699713; a=rsa-sha256; cv=none; b=VLzfRvNpMDdZ0auMQpBHysAohg3sXdPq1QwcDH/N9KpP1IzI7Zmo1lHIbIvluusoWx0Jb0 YH8lZBBW26szmxBxOA2UfgDXnzv2m1CANcjDuDJvjjkUuOEqczyiDwy86CTkrozKiQ1zwh CRA7G1eG7vJiKhkk8WcmqtN4cJFfAuI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701699712; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=1DVxi53AdHJct3hXH5+VGS1svVEDHY6RYmwnFv3N0GA=; b=XtjNlDZ+tYxDys6OREkcRw4ugSb1YfzVRqLK/hlbmDi3EG5avsboWODlkp971GOvzHQBmC YekqDvRnQOtsAOe/fWmCI/mt34DM37SyOFO7DHQrNlVN5goCoZfXqlgfGeZM+4WaYZWTpi 7sPkwxLpRYTpahnaf5aC0lmPmDVBoAg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-480-Tpjb8xMcMU6W6IJk6zWHwA-1; Mon, 04 Dec 2023 09:21:49 -0500 X-MC-Unique: Tpjb8xMcMU6W6IJk6zWHwA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8ED8A811E7E; Mon, 4 Dec 2023 14:21:48 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.195.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB0DD2026D4C; Mon, 4 Dec 2023 14:21:46 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Hugh Dickins , Ryan Roberts , Yin Fengwei , Mike Kravetz , Muchun Song , Peter Xu Subject: [PATCH RFC 00/39] mm/rmap: interface overhaul Date: Mon, 4 Dec 2023 15:21:07 +0100 Message-ID: <20231204142146.91437-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Rspam-User: X-Stat-Signature: ymemqbktbytq91emhdnu5ujt5fphc4fa X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 64357140018 X-HE-Tag: 1701699713-715533 X-HE-Meta: U2FsdGVkX19wfM2KeVsLAfw6o3JPTY3yX6MIm58biLy3o4C14vi2HQZu2vwxR66KRvBx27yf2dFSbnc7jbZvI6uE6gGUAbIzqYVmYzVS/XIVLPuF4M8gST0m0urXCQuMY5GDBm2eO2OKs+HPNScOglbOS0QTIzeaN6fzPuvfDp4s8j3HZBGis76c+GnssZD0Y3DxITcMLZSIM97B2Kk9jg26tVy+/bxeqTEPqnW+To/kVhbSa542/g7QDt0TabXn23bt2Ee/6ydvAW0pFNssHXgZpoROGddxymywe8C7TBZxTGzIb3FPKRJ2uM1N9vd6BeXYUtcQ37A9ISjz49xvFxzuhfPBRS0ICyem5vZ65sAaMlx6xkjNzCCPKMo5Hca5te7/n7yjmEd6wwDQkC8veL6Dr/sa12LHbyUo+prb3BLyXAmxiYqyAkcXu2SqGxF9pMpl7pNpRJhh1GufX+P7S5dfZ8cjRCoSiMjbNcksqCS3nj7pi9IT+E1eGSRUxKEQLUEw6+S4E1sPQ3UKR/NSP8OJ8uznxHd7a97j8zyZDlB6rnSYxBmLZWb8bgs80RnnGIP7GB17wsCNuiDRtmG9xi8wa+l3r/QF0wpHaCDVUVVrkyTDQcfU8lxMIwSjEO8nl3w9ZkJfHJ6GkNutmyAadlfSaF5EOdiYyoI0X5XCW5dyZuJiqIQy46OvTOIDLOXCptmp4fg6lvHRDKK22k2dVepPLebGARXkrJ4Bfvf/H/dGc1jkgN2GxWhi1NhzpuP9c8TIBGiVAO90DOxDdyGAnqLR6SkHlvGIjx9ba4sjYaJCg2csnjyEv+lfRruecdjzxqE1ahcQAyIVkwlS5RS9GoWmyOCtOlwE1KwHlglJIGD2DO1OizFakQRnWOhHj6G/RZpyRrrc0a8WnfENtRm8lxQzJ3OfTqIX/JuFleOqg+zMk7OzdF1qVM0qM+NnspMXH9wUy0CZ1hZf9bwJIQj w80wmJIX l3QDhAYu+Viz8cpfS+D0LANCWUXllcBDFE46PM9yD+FlzThdDFfTV1zqVzbyMALmEFtItNB5aZj5gtbBFw0D+UIGKA94KTKqpeFftCqUb8ouAJiyQ+GEmIkyvZsRi+3tnJqvs7C6Zec7HyvM3TWsF6EsAfrOtsvkM7n9yNgGHOEj3Q0KjtMsBpIIFdlgZnzlmSR+NZRhEJJjSrbF4z16ulSuU8g+IlMmMQlbZHD8UCFf/7F7QJ1r7E+cnCZzDV3lXPnmG2i6x6+ER/9+CUcnUnciuRKSQHaEMKdRjHkT7C8DHRLURkXP/lSk0tN85+P3e/gf3yHcnIkhJEGhCUfFQmUTvyh5av9np2Oc8KVeA1G8r3xm3SfZXWyiZMfezqKXMBc1/Y9lE4h0YZKw2c/yfyZCD5XdgxuiIk9vsEyK7ZmhFHD2c1I7bEqoIEeYnqRllsNSolgLKvmfMQcIjIjj47FGCCbstnucwijQs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Baed on mm-stable from a couple of days. This series proposes an overhaul to our rmap interface, to get rid of the "bool compound" / RMAP_COMPOUND parameter with the goal of making the interface less error prone, more future proof, and more natural to extend to "batching". Also, this converts the interface to always consume folio+subpage, which speeds up operations on large folios. Further, this series adds PTE-batching variants for 4 rmap functions, whereby only folio_add_anon_rmap_ptes() is used for batching in this series when PTE-remapping a PMD-mapped THP. Ryan has series where we would make use of folio_remove_rmap_ptes() [1] -- he carries his own batching variant right now -- and folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2]. There is some overlap with both series (and some other work, like multi-size THP [3]), so that will need some coordination, and likely a stepwise inclusion. I got that started [4], but it made sense to show the whole picture. The patches of [4] are contained in here, with one additional patch added ("mm/rmap: introduce and use hugetlb_try_share_anon_rmap()") and some slight patch description changes. In general, RMAP batching is an important optimization for PTE-mapped THP, especially once we want to move towards a total mapcount or further, as shown with my WIP patches on "mapped shared vs. mapped exclusively" [5]. The rmap batching part of [5] is also contained here in a slightly reworked fork [and I found a bug du to the "compound" parameter handling in these patches that should be fixed here :) ]. This series performs a lot of folio conversion, that could be separated if there is a good reason. Most of the added LOC in the diff are only due to documentation. As we're moving to a pte/pmd interface where we clearly express the mapping granularity we are dealing with, we first get the remainder of hugetlb out of the way, as it is special and expected to remain special: it treats everything as a "single logical PTE" and only currently allows entire mappings. Even if we'd ever support partial mappings, I strongly assume the interface and implementation will still differ heavily: hopefull we can avoid working on subpages/subpage mapcounts completely and only add a "count" parameter for them to enable batching. New (extended) hugetlb interface that operate on entire folio: * hugetlb_add_new_anon_rmap() -> Already existed * hugetlb_add_anon_rmap() -> Already existed * hugetlb_try_dup_anon_rmap() * hugetlb_try_share_anon_rmap() * hugetlb_add_file_rmap() * hugetlb_remove_rmap() New "ordinary" interface for small folios / THP:: * folio_add_new_anon_rmap() -> Already existed * folio_add_anon_rmap_[pte|ptes|pmd]() * folio_try_dup_anon_rmap_[pte|ptes|pmd]() * folio_try_share_anon_rmap_[pte|pmd]() * folio_add_file_rmap_[pte|ptes|pmd]() * folio_dup_file_rmap_[pte|ptes|pmd]() * folio_remove_rmap_[pte|ptes|pmd]() folio_add_new_anon_rmap() will always map at the biggest granularity possible (currently, a single PMD to cover a PMD-sized THP). Could be extended if ever required. In the future, we might want "_pud" variants and eventually "_pmds" variants for batching. Further, if hugepd is ever a thing outside hugetlb code, we might want some variants for that. All stuff for the distant future. I ran some simple microbenchmarks from [5] on an Intel(R) Xeon(R) Silver 4210R: munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE remapping PMD-mapped THPs on 1 GiB of memory. For small folios, there is barely a change (< 1 % performance improvement), whereby fork() still stands out with 0.74% performance improvement, but it might be just some noise. Folio optimizations don't help that much with small folios. For PTE-mapped THP: * PTE-remapping a PMD-mapped THP is more than 10% faster. -> RMAP batching * fork() is more than 4% faster. -> folio conversion * MADV_DONTNEED is 2% faster -> folio conversion * COW by writing only a single byte on a COW-shared PTE -> folio conversion * munmap() is only slightly faster (< 1%). [1] https://lkml.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com [2] https://lkml.kernel.org/r/20231204105440.61448-1-ryan.roberts@arm.com [3] https://lkml.kernel.org/r/20231204102027.57185-1-ryan.roberts@arm.com [4] https://lkml.kernel.org/r/20231128145205.215026-1-david@redhat.com [5] https://lkml.kernel.org/r/20231124132626.235350-1-david@redhat.com Cc: Andrew Morton Cc: "Matthew Wilcox (Oracle)" Cc: Hugh Dickins Cc: Ryan Roberts Cc: Yin Fengwei Cc: Mike Kravetz Cc: Muchun Song Cc: Peter Xu David Hildenbrand (39): mm/rmap: rename hugepage_add* to hugetlb_add* mm/rmap: introduce and use hugetlb_remove_rmap() mm/rmap: introduce and use hugetlb_add_file_rmap() mm/rmap: introduce and use hugetlb_try_dup_anon_rmap() mm/rmap: introduce and use hugetlb_try_share_anon_rmap() mm/rmap: add hugetlb sanity checks mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]() mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]() mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd() mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte() mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte() mm/rmap: remove page_add_file_rmap() mm/rmap: factor out adding folio mappings into __folio_add_rmap() mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]() mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd() mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte() mm/rmap: remove page_add_anon_rmap() mm/rmap: remove RMAP_COMPOUND mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]() kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte() mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd() mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte() mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte() mm/memory: page_remove_rmap() -> folio_remove_rmap_pte() mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte() mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte() Documentation: stop referring to page_remove_rmap() mm/rmap: remove page_remove_rmap() mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]() mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]() mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd() mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte() mm/rmap: remove page_try_dup_anon_rmap() mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED Documentation/mm/transhuge.rst | 4 +- Documentation/mm/unevictable-lru.rst | 4 +- include/linux/mm.h | 6 +- include/linux/rmap.h | 380 +++++++++++++++++++----- kernel/events/uprobes.c | 2 +- mm/gup.c | 2 +- mm/huge_memory.c | 85 +++--- mm/hugetlb.c | 21 +- mm/internal.h | 12 +- mm/khugepaged.c | 17 +- mm/ksm.c | 15 +- mm/memory-failure.c | 4 +- mm/memory.c | 60 ++-- mm/migrate.c | 12 +- mm/migrate_device.c | 41 +-- mm/mmu_gather.c | 2 +- mm/rmap.c | 422 ++++++++++++++++----------- mm/swapfile.c | 2 +- mm/userfaultfd.c | 2 +- 19 files changed, 709 insertions(+), 384 deletions(-)