From patchwork Fri Nov 24 13:26:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467654 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 332AEC61DF4 for ; Fri, 24 Nov 2023 13:26:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAF508D0075; Fri, 24 Nov 2023 08:26:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5F288D006E; Fri, 24 Nov 2023 08:26:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A02878D0075; Fri, 24 Nov 2023 08:26:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8AB3A8D006E for ; Fri, 24 Nov 2023 08:26:45 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5835DC0505 for ; Fri, 24 Nov 2023 13:26:45 +0000 (UTC) X-FDA: 81492922770.12.711FE72 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 81E7AC0015 for ; Fri, 24 Nov 2023 13:26:43 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=V0Wo+es3; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832403; a=rsa-sha256; cv=none; b=KtnFv8cajxoMQ4JuzAWHBD29EWxm3mYc6Em/1ifNvFDSw+pJGTDn/OzImSmeX3LqbjjZW+ Acgk9gJqJbCpgmi9uRgeIch/Q1JskmWROrRCud42kJRLjyE7GjAuaYx/lIaqD+EEZSgXHq X5iHASwlIq14iQcbXSQCrAaYRRFmOio= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=V0Wo+es3; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832403; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2NlC2FNJ1Esba7mAbHYEoZIvTqXmerwYmRl67CA2zqk=; b=nF8gguruYJWrQXg9RA8STpDObQZx6I1SC97CKLrEpZRDURzaa/dSDK2QlnGOTlZj/Pk6ZG 41VA+icGpKB8HVRRAMhftVZj0L5P5M7NAuZYuBNBik2ck3vsk6OTIHPudmzbGhIbGqnN4Z SGu9iOQdFi+eufUfXBZu51wEWMzQa0c= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832402; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2NlC2FNJ1Esba7mAbHYEoZIvTqXmerwYmRl67CA2zqk=; b=V0Wo+es35XpIq2x2pNtDzqEBE8+xF8vwrOU+mZgxSAkxJqdW1yornZEisCpFtKFr/ImJlf BvF5FcciJnBbkmIJbg1vaftj2eCZCz6giIqzK2Ne/ZOiILgTPhe1VV9t4rlZh2HQwJHkHh 5y1fxe4p2B/EAjPEYa374aP60lXKa7Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-529-4jiaO-6aNt2opR94wzdfSg-1; Fri, 24 Nov 2023 08:26:35 -0500 X-MC-Unique: 4jiaO-6aNt2opR94wzdfSg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1C5F185A58A; Fri, 24 Nov 2023 13:26:35 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9B2C32166B2A; Fri, 24 Nov 2023 13:26:31 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 01/20] mm/rmap: factor out adding folio range into __folio_add_rmap_range() Date: Fri, 24 Nov 2023 14:26:06 +0100 Message-ID: <20231124132626.235350-2-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 81E7AC0015 X-Stat-Signature: b6gzsngw1tct7z5drpr898n39i9fix8t X-Rspam-User: X-HE-Tag: 1700832403-463242 X-HE-Meta: U2FsdGVkX1/VlKnBRU1aNOwGItqeau5PyaP8IFbbIFqZYnRauzLtr7Jf9DR10jbm8SqIszRtcA7xiGBOOEOdDFXXn9SEbU9L1EEwE4rVAPS8rfGig6nmoEMROSkyPrgFQbcThF9wfoZIaJf+BVZOJTbrU7m0jt0HDPG954fPq/NK6RdCGq6JnwVnssLWfzKw0btwEZa/xxITu7MDftrAz++bOnxj7REcZsreZQW7JtfRfdIhrb6dI5lb59YC8sr63PiuLhIdDmHYBGfMiGel2q3D1s3S+sKlVV2TFeVxrRi7mZlkaoU6DOdC0itR2ez1IKrv9+c74yEaa+tgbyJCt4G0vUebYX9a96aTOaqbmu6OwJECk0dcpFPiyl+wh0zZ5+I1Ur68/4DhNApBVbvjek/Z6H7hWq4h5rWUrdaMuUxKdFiA2aKdLF9sYPL8TjIMpgqDaUuAOhccJnerStzN5qgrfjXOzJlMFR+twtQMBPfbQ+Re8mp38kTepa0TX/yugzr5dtFnBcW+FmkUUcDVU44YC6if/V4wN08yeC0ReKFZViumRo/jUBnZXCLnPcKJoCi7KL519Ya/XiADT80VM/ZfgRdUiINlnxkV1ymZjxGe8kJftfL3lmhpyPkP+tcYOF/XNlV4qVqcdAESvAVar+exFg1TGftGAcB2Y881pWi7kecbowOLcfYMzC6MkCuyXBRWvQV0DAHhOFi6k9zXENEbxb3cReMsx4u9ejKmImKzbqHSwMzVdWoEo+nqqX7vVgTQsW4+5YlJ6JTXv/cYwAZE9p83lzvDXkyZV0ROwyseHrqs1O41gDIbWN5Uyu0mHSyCBErxhHeWfqMPzRoVyzmELRzQlIOpdXjl/RDvsAPf/76dbN0D1dyWzRUf0tlRLlVmvJAVFBPsqxWV+KF7JGZZpds4AwhxgQZhlRxy2OT5Xr7nEF6JOAwfsAT5DE7u5ivXESytrCi4KDL4ghH EHsHviRM 3HnCODTDPlGttP7ivx6XC0BcliytQ7sEfAXH6AvKOBpm7l28B4aaX8exuDESo5sof0fhAAukbkhNJPSTQAuSVcxkADcAKR71+tSTrYGFtDzSwm6wIEiXzTu02ynuzqTUguCQM+iF3XqvBEUwR1kUI2KV7yMUNGv+lyzFzT65DkTpWcvU8OZNAs8rJnhErv6W7kjkMp9C/tytIuuxfPl/joKv+Qw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's factor it out, optimize for small folios, and add some more sanity checks. Signed-off-by: David Hildenbrand --- mm/rmap.c | 119 ++++++++++++++++++++++++------------------------------ 1 file changed, 53 insertions(+), 66 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7a27a2b41802..afddf3d82a8f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1127,6 +1127,54 @@ int folio_total_mapcount(struct folio *folio) return mapcount; } +static unsigned int __folio_add_rmap_range(struct folio *folio, + struct page *page, unsigned int nr_pages, bool compound, + int *nr_pmdmapped) +{ + atomic_t *mapped = &folio->_nr_pages_mapped; + int first, nr = 0; + + VM_WARN_ON_FOLIO(compound && page != &folio->page, folio); + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); + VM_WARN_ON_FOLIO(compound && nr_pages != folio_nr_pages(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_large(folio) && nr_pages != 1, folio); + + if (likely(!folio_test_large(folio))) + return atomic_inc_and_test(&page->_mapcount); + + /* Is page being mapped by PTE? Is this its first map to be added? */ + if (!compound) { + do { + first = atomic_inc_and_test(&page->_mapcount); + if (first) { + first = atomic_inc_return_relaxed(mapped); + if (first < COMPOUND_MAPPED) + nr++; + } + } while (page++, --nr_pages > 0); + } else if (folio_test_pmd_mappable(folio)) { + /* That test is redundant: it's for safety or to optimize out */ + + first = atomic_inc_and_test(&folio->_entire_mapcount); + if (first) { + nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); + if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { + *nr_pmdmapped = folio_nr_pages(folio); + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); + /* Raced ahead of a remove and another add? */ + if (unlikely(nr < 0)) + nr = 0; + } else { + /* Raced ahead of a remove of COMPOUND_MAPPED */ + nr = 0; + } + } + } else { + VM_WARN_ON_ONCE_FOLIO(true, folio); + } + return nr; +} + /** * folio_move_anon_rmap - move a folio to our anon_vma * @folio: The folio to move to our anon_vma @@ -1227,38 +1275,10 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { struct folio *folio = page_folio(page); - atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; + unsigned int nr, nr_pmdmapped = 0; bool compound = flags & RMAP_COMPOUND; - bool first; - - /* Is page being mapped by PTE? Is this its first map to be added? */ - if (likely(!compound)) { - first = atomic_inc_and_test(&page->_mapcount); - nr = first; - if (first && folio_test_large(folio)) { - nr = atomic_inc_return_relaxed(mapped); - nr = (nr < COMPOUND_MAPPED); - } - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ - - first = atomic_inc_and_test(&folio->_entire_mapcount); - if (first) { - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { - nr_pmdmapped = folio_nr_pages(folio); - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); - /* Raced ahead of a remove and another add? */ - if (unlikely(nr < 0)) - nr = 0; - } else { - /* Raced ahead of a remove of COMPOUND_MAPPED */ - nr = 0; - } - } - } + nr = __folio_add_rmap_range(folio, page, 1, compound, &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped); if (nr) @@ -1349,43 +1369,10 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, unsigned int nr_pages, struct vm_area_struct *vma, bool compound) { - atomic_t *mapped = &folio->_nr_pages_mapped; - unsigned int nr_pmdmapped = 0, first; - int nr = 0; - - VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); - - /* Is page being mapped by PTE? Is this its first map to be added? */ - if (likely(!compound)) { - do { - first = atomic_inc_and_test(&page->_mapcount); - if (first && folio_test_large(folio)) { - first = atomic_inc_return_relaxed(mapped); - first = (first < COMPOUND_MAPPED); - } - - if (first) - nr++; - } while (page++, --nr_pages > 0); - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ - - first = atomic_inc_and_test(&folio->_entire_mapcount); - if (first) { - nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { - nr_pmdmapped = folio_nr_pages(folio); - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); - /* Raced ahead of a remove and another add? */ - if (unlikely(nr < 0)) - nr = 0; - } else { - /* Raced ahead of a remove of COMPOUND_MAPPED */ - nr = 0; - } - } - } + unsigned int nr, nr_pmdmapped = 0; + nr = __folio_add_rmap_range(folio, page, nr_pages, compound, + &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped); From patchwork Fri Nov 24 13:26:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71B18C636D0 for ; Fri, 24 Nov 2023 13:26:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 813078D0076; Fri, 24 Nov 2023 08:26:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B7BC8D006E; Fri, 24 Nov 2023 08:26:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 595368D0076; Fri, 24 Nov 2023 08:26:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 46DBC8D006E for ; Fri, 24 Nov 2023 08:26:46 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1D85B1605E9 for ; Fri, 24 Nov 2023 13:26:46 +0000 (UTC) X-FDA: 81492922812.03.06957FB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id ED7E340011 for ; Fri, 24 Nov 2023 13:26:43 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="CrFVJ/yg"; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832404; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/66tdUTtIlpBX26NzgkLUQsEQZwgNiRXmWKFnjYfBOk=; b=uENl/ACp3cMFVvRFF21A7npQ4GoS0oiP2TrxDTkmoY6R0+4yOLhmrKWytJcNS5ZZGGmzdT +oYwY8yois5WXgUDhmNcpqcDDsS55VGllY4f2L7QIi9azmTQklBqtpjJXGnta9efroeNDj j18x2XTndQ8/Yl0WMUzztRrIktfuNG8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832404; a=rsa-sha256; cv=none; b=572TVPWyiMWHljjNzjypkzUX0xrH/Fi53K5o17NvcvIV0s8Sqcru/MGMYoW9Ogf511vhDm 6van9/rchhs39f9hh6WiRxAYisxpX8x7whMZxNB8UyQGHp4n5v9cNld4n1DFbVCheJw+VT NnW9+MVNo3gHNYJxVtlc7qHTswkJyMw= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="CrFVJ/yg"; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/66tdUTtIlpBX26NzgkLUQsEQZwgNiRXmWKFnjYfBOk=; b=CrFVJ/ygWBgIybi1/fSnA/wAKPEiWLgD0ZYic3dKzEo7XN3acPK5j4U63r7qSRCpCLuDDJ nkO4oYWIVrbYFGdhp025iQF6a2Eay6m+71puab/MJgGnD6Y63sJ1ovmczu/IfFKQu/oi39 MsNZ0NV1Pjophm8DPa4gTe83gBX68CA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-74-ojdg3s0mO9myWpWvuZxs4w-1; Fri, 24 Nov 2023 08:26:39 -0500 X-MC-Unique: ojdg3s0mO9myWpWvuZxs4w-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 01090185A781; Fri, 24 Nov 2023 13:26:39 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7B9A22166B2A; Fri, 24 Nov 2023 13:26:35 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 02/20] mm: add a total mapcount for large folios Date: Fri, 24 Nov 2023 14:26:07 +0100 Message-ID: <20231124132626.235350-3-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: ED7E340011 X-Rspam-User: X-Stat-Signature: gq7c3bqx9gfx6egwoamh93xfek3zseqc X-Rspamd-Server: rspam03 X-HE-Tag: 1700832403-153365 X-HE-Meta: U2FsdGVkX19ZVrYOaQUGTJLKQTqmCCdDa3kqxuY9p7Vl/neM2Yx3IN3kivyGGGZz10JRYI5CShi9Ug/yBHBxEWp5pXCosbKNHbGuuZhr1ll5+Iuze1DdhYfIuTPmALQa6lKUQTguNa7nxsdzHJehbH134SxOGM7e+ZaboUscXb0xLrpSoGazGpOPbU1lOszvzyoxATNBnCGWQL2ZcY4Klwtx+KFW1L79rS3HjehB4NFgcr818hBM2TbQXIlXerg+rFfciHE5mlsUQTJyrABJR8dlcCzc99ubhkmx9RsDKmvgzL1qASJ1FgQPMaqlOTrtfJ6JqUIEElxP82IvNCuPtdXrLJG00CAtju5uf2ZMbGb7adUgge2+oNEjYE84wsI96tGQBroaQV85K2GCKf5zbMvjlDBeUdrgQzKvLvenT/JIqrY+RFcs4EqbEHrkaQ3zUuZm+PM4gEL6yrdthLXG1aOev1aS5pFcsOcgYml4Rw16kxf50QiztJpvFWStHIz0l1ymbMX5O+BjF3MlQ/FmQ0TVciIbNtkzOo4IIVIqSF8X56mXIFr/SAEAuCGisgNt7EK3MBCPTICjbjZdMrJoEVusIAfEzrIf6MHYP4bRkRuPPpKvfsNNz/aVouPjaz5zDz27i7PIGuW02l1hSoXdlwkIEIEfTMDvnzy97AHNOBLVUnOkLe+LkcGKBi42jsLp6iaUF7uw9YnvTSM2Vypg5lc/sd1nWjWBriDQWjqO2OUtGEerrmH+lNfgk4FSPUsByhkU7OP2Y68GUITGansl8YfMotTYmuqebecTaPnx2sjnt1BDL0R1xkKrxB9cZCMt8Lp8PN2vEOEifHVkrlKjDZiR8yjU1kE5QiS9ftg+gmpXib9nkivSwq97Tqzb0iSFKZO31D2Np/7YZ2er39UqHH4JyWgQITmcsaVqw3vdghQ1lsy1EjeVnkm25YdaM9QGgGZtDhC7lvKO3R22d/Q +Iz6wdR9 kiNM0ta+igZNdABgWxMqMUJuVwXNI2atGtsvsWGaqY/+xooHDB/k4v/9E/AVwbhn5GgcCp6GeFJQ7O+ecr4xwrHvlWsoJgs6Rat6wyizAcHXosjUziWaO/OVnEwXTmkcF2fUX9IXciIZ/LPIrmGEpRVqNsrxyWnA2iw2ZLZSnM4vraAz1+5uMLopozxjf1lqljtpBGHy0tuRPtq/cfLK7N1IYDsC6XnKhdArcfy9kcP0LT9uyADgcxlNcxoTG9oqduouoa3jZYTJXniS2RqJ9ZUtFOQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's track the total mapcount for all large folios in the first subpage. The total mapcount is what we actually want to know in folio_mapcount() and it is also sufficient for implementing folio_mapped(). With PTE-mapped THP becoming more important and soon more widely used, we want to avoid looping over all pages of a folio just to calculate the total mapcount. Further, we might soon want to use the total mapcount in other context more frequently, so prepare for reading it efficiently and atomically. Maintain the total mapcount also for hugetlb pages. Use the total mapcount to implement folio_mapcount(). Make folio_mapped() simply call folio_mapped(). We can now get rid of folio_large_is_mapped() and move folio_large_total_mapcount() to mm.h. Similarly, get rid of folio_nr_pages_mapped() and stop dumping that value in __dump_page(). While at it, simplify total_mapcount() by calling folio_mapcount() and page_mapped() by calling folio_mapped(): it seems to add only one more MOV instruction on x86-64 to the compiled code, which we shouldn't have to worry about. _nr_pages_mapped is now only used in rmap code, so not accidentally externally where it might be used on arbitrary order-1 pages. The remaining usage is: (1) Detect how to adjust stats: NR_ANON_MAPPED and NR_FILE_MAPPED -> If we would account the total folio as mapped when mapping a page (based on the total mapcount), we could remove that usage. We'll have to be careful about memory-sensitive applications that also adjust /sys/kernel/debug/fault_around_bytes to not get a large folio completely mapped on page fault. (2) Detect when to add an anon folio to the deferred split queue -> If we would apply a different heuristic, or scan using the rmap on the memory reclaim path for partially mapped anon folios to split them, we could remove that usage as well. For now, these things remain as they are, they need more thought. Hugh really did a fantastic job implementing that tracking after all. Note that before the total mapcount would overflow, already our refcount would overflow: each distinct mapping requires a distinct reference. Probably, in the future, we want 64bit refcount+mapcount for larger folios. Reviewed-by: Zi Yan Reviewed-by: Ryan Roberts Reviewed-by: Yin Fengwei Signed-off-by: David Hildenbrand --- Documentation/mm/transhuge.rst | 12 +++++------ include/linux/mm.h | 37 +++++++++----------------------- include/linux/mm_types.h | 5 +++-- include/linux/rmap.h | 15 ++++++++----- mm/debug.c | 4 ++-- mm/hugetlb.c | 4 ++-- mm/internal.h | 10 +-------- mm/page_alloc.c | 4 ++++ mm/rmap.c | 39 ++++++++++++---------------------- 9 files changed, 52 insertions(+), 78 deletions(-) diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst index 9a607059ea11..b0d3b1d3e8ea 100644 --- a/Documentation/mm/transhuge.rst +++ b/Documentation/mm/transhuge.rst @@ -116,14 +116,14 @@ pages: succeeds on tail pages. - map/unmap of a PMD entry for the whole THP increment/decrement - folio->_entire_mapcount and also increment/decrement - folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount - goes from -1 to 0 or 0 to -1. + folio->_entire_mapcount, increment/decrement folio->_total_mapcount + and also increment/decrement folio->_nr_pages_mapped by COMPOUND_MAPPED + when _entire_mapcount goes from -1 to 0 or 0 to -1. - map/unmap of individual pages with PTE entry increment/decrement - page->_mapcount and also increment/decrement folio->_nr_pages_mapped - when page->_mapcount goes from -1 to 0 or 0 to -1 as this counts - the number of pages mapped by PTE. + page->_mapcount, increment/decrement folio->_total_mapcount and also + increment/decrement folio->_nr_pages_mapped when page->_mapcount goes + from -1 to 0 or 0 to -1 as this counts the number of pages mapped by PTE. split_huge_page internally has to distribute the refcounts in the head page to the tail pages before clearing all PG_head/tail bits from the page diff --git a/include/linux/mm.h b/include/linux/mm.h index 418d26608ece..fe91aaefa3db 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1207,17 +1207,16 @@ static inline int page_mapcount(struct page *page) return mapcount; } -int folio_total_mapcount(struct folio *folio); +static inline int folio_total_mapcount(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_large(folio), folio); + return atomic_read(&folio->_total_mapcount) + 1; +} /** - * folio_mapcount() - Calculate the number of mappings of this folio. + * folio_mapcount() - Number of mappings of this folio. * @folio: The folio. * - * A large folio tracks both how many times the entire folio is mapped, - * and how many times each individual page in the folio is mapped. - * This function calculates the total number of times the folio is - * mapped. - * * Return: The number of times this folio is mapped. */ static inline int folio_mapcount(struct folio *folio) @@ -1229,19 +1228,7 @@ static inline int folio_mapcount(struct folio *folio) static inline int total_mapcount(struct page *page) { - if (likely(!PageCompound(page))) - return atomic_read(&page->_mapcount) + 1; - return folio_total_mapcount(page_folio(page)); -} - -static inline bool folio_large_is_mapped(struct folio *folio) -{ - /* - * Reading _entire_mapcount below could be omitted if hugetlb - * participated in incrementing nr_pages_mapped when compound mapped. - */ - return atomic_read(&folio->_nr_pages_mapped) > 0 || - atomic_read(&folio->_entire_mapcount) >= 0; + return folio_mapcount(page_folio(page)); } /** @@ -1252,9 +1239,7 @@ static inline bool folio_large_is_mapped(struct folio *folio) */ static inline bool folio_mapped(struct folio *folio) { - if (likely(!folio_test_large(folio))) - return atomic_read(&folio->_mapcount) >= 0; - return folio_large_is_mapped(folio); + return folio_mapcount(folio) > 0; } /* @@ -1264,9 +1249,7 @@ static inline bool folio_mapped(struct folio *folio) */ static inline bool page_mapped(struct page *page) { - if (likely(!PageCompound(page))) - return atomic_read(&page->_mapcount) >= 0; - return folio_large_is_mapped(page_folio(page)); + return folio_mapped(page_folio(page)); } static inline struct page *virt_to_head_page(const void *x) @@ -2139,7 +2122,7 @@ static inline size_t folio_size(struct folio *folio) * looking at the precise mapcount of the first subpage in the folio, and * assuming the other subpages are the same. This may not be true for large * folios. If you want exact mapcounts for exact calculations, look at - * page_mapcount() or folio_total_mapcount(). + * page_mapcount() or folio_mapcount(). * * Return: The estimated number of processes sharing a folio. */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 957ce38768b2..99b84b4797b9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -264,7 +264,8 @@ typedef struct { * @virtual: Virtual address in the kernel direct map. * @_last_cpupid: IDs of last CPU and last process that accessed the folio. * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). - * @_nr_pages_mapped: Do not use directly, call folio_mapcount(). + * @_total_mapcount: Do not use directly, call folio_mapcount(). + * @_nr_pages_mapped: Do not use outside of rmap code. * @_pincount: Do not use directly, call folio_maybe_dma_pinned(). * @_folio_nr_pages: Do not use directly, call folio_nr_pages(). * @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h. @@ -323,8 +324,8 @@ struct folio { struct { unsigned long _flags_1; unsigned long _head_1; - unsigned long _folio_avail; /* public: */ + atomic_t _total_mapcount; atomic_t _entire_mapcount; atomic_t _nr_pages_mapped; atomic_t _pincount; diff --git a/include/linux/rmap.h b/include/linux/rmap.h index b26fe858fd44..42e2c74d4d6e 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -210,14 +210,19 @@ void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *, static inline void __page_dup_rmap(struct page *page, bool compound) { - if (compound) { - struct folio *folio = (struct folio *)page; + struct folio *folio = page_folio(page); - VM_BUG_ON_PAGE(compound && !PageHead(page), page); - atomic_inc(&folio->_entire_mapcount); - } else { + VM_BUG_ON_PAGE(compound && !PageHead(page), page); + if (likely(!folio_test_large(folio))) { atomic_inc(&page->_mapcount); + return; } + + if (compound) + atomic_inc(&folio->_entire_mapcount); + else + atomic_inc(&page->_mapcount); + atomic_inc(&folio->_total_mapcount); } static inline void page_dup_file_rmap(struct page *page, bool compound) diff --git a/mm/debug.c b/mm/debug.c index ee533a5ceb79..97f6f6b32ae7 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -99,10 +99,10 @@ static void __dump_page(struct page *page) page, page_ref_count(head), mapcount, mapping, page_to_pgoff(page), page_to_pfn(page)); if (compound) { - pr_warn("head:%p order:%u entire_mapcount:%d nr_pages_mapped:%d pincount:%d\n", + pr_warn("head:%p order:%u entire_mapcount:%d total_mapcount:%d pincount:%d\n", head, compound_order(head), folio_entire_mapcount(folio), - folio_nr_pages_mapped(folio), + folio_mapcount(folio), atomic_read(&folio->_pincount)); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1169ef2f2176..cf84784064c7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1509,7 +1509,7 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, struct page *p; atomic_set(&folio->_entire_mapcount, 0); - atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_total_mapcount, 0); atomic_set(&folio->_pincount, 0); for (i = 1; i < nr_pages; i++) { @@ -2119,7 +2119,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, /* we rely on prep_new_hugetlb_folio to set the destructor */ folio_set_order(folio, order); atomic_set(&folio->_entire_mapcount, -1); - atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_total_mapcount, -1); atomic_set(&folio->_pincount, 0); return true; diff --git a/mm/internal.h b/mm/internal.h index b61034bd50f5..bb2e55c402e7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -67,15 +67,6 @@ void page_writeback_init(void); */ #define SHOW_MEM_FILTER_NODES (0x0001u) /* disallowed nodes */ -/* - * How many individual pages have an elevated _mapcount. Excludes - * the folio's entire_mapcount. - */ -static inline int folio_nr_pages_mapped(struct folio *folio) -{ - return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED; -} - static inline void *folio_raw_mapping(struct folio *folio) { unsigned long mapping = (unsigned long)folio->mapping; @@ -429,6 +420,7 @@ static inline void prep_compound_head(struct page *page, unsigned int order) struct folio *folio = (struct folio *)page; folio_set_order(folio, order); + atomic_set(&folio->_total_mapcount, -1); atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 733732e7e0ba..aad45758c0c7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -988,6 +988,10 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) bad_page(page, "nonzero entire_mapcount"); goto out; } + if (unlikely(atomic_read(&folio->_total_mapcount) + 1)) { + bad_page(page, "nonzero total_mapcount"); + goto out; + } if (unlikely(atomic_read(&folio->_nr_pages_mapped))) { bad_page(page, "nonzero nr_pages_mapped"); goto out; diff --git a/mm/rmap.c b/mm/rmap.c index afddf3d82a8f..38765796dca8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1104,35 +1104,12 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, return page_vma_mkclean_one(&pvmw); } -int folio_total_mapcount(struct folio *folio) -{ - int mapcount = folio_entire_mapcount(folio); - int nr_pages; - int i; - - /* In the common case, avoid the loop when no pages mapped by PTE */ - if (folio_nr_pages_mapped(folio) == 0) - return mapcount; - /* - * Add all the PTE mappings of those pages mapped by PTE. - * Limit the loop to folio_nr_pages_mapped()? - * Perhaps: given all the raciness, that may be a good or a bad idea. - */ - nr_pages = folio_nr_pages(folio); - for (i = 0; i < nr_pages; i++) - mapcount += atomic_read(&folio_page(folio, i)->_mapcount); - - /* But each of those _mapcounts was based on -1 */ - mapcount += nr_pages; - return mapcount; -} - static unsigned int __folio_add_rmap_range(struct folio *folio, struct page *page, unsigned int nr_pages, bool compound, int *nr_pmdmapped) { atomic_t *mapped = &folio->_nr_pages_mapped; - int first, nr = 0; + int first, count, nr = 0; VM_WARN_ON_FOLIO(compound && page != &folio->page, folio); VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); @@ -1144,6 +1121,7 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, /* Is page being mapped by PTE? Is this its first map to be added? */ if (!compound) { + count = nr_pages; do { first = atomic_inc_and_test(&page->_mapcount); if (first) { @@ -1151,7 +1129,8 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, if (first < COMPOUND_MAPPED) nr++; } - } while (page++, --nr_pages > 0); + } while (page++, --count > 0); + atomic_add(nr_pages, &folio->_total_mapcount); } else if (folio_test_pmd_mappable(folio)) { /* That test is redundant: it's for safety or to optimize out */ @@ -1169,6 +1148,7 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, nr = 0; } } + atomic_inc(&folio->_total_mapcount); } else { VM_WARN_ON_ONCE_FOLIO(true, folio); } @@ -1348,6 +1328,10 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } + if (folio_test_large(folio)) + /* increment count (starts at -1) */ + atomic_set(&folio->_total_mapcount, 0); + __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); __folio_set_anon(folio, vma, address, true); SetPageAnonExclusive(&folio->page); @@ -1427,6 +1411,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(compound && !PageHead(page), page); + if (folio_test_large(folio)) + atomic_dec(&folio->_total_mapcount); + /* Hugetlb pages are not counted in NR_*MAPPED */ if (unlikely(folio_test_hugetlb(folio))) { /* hugetlb pages are always mapped with pmds */ @@ -2576,6 +2563,7 @@ void hugepage_add_anon_rmap(struct folio *folio, struct vm_area_struct *vma, VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); atomic_inc(&folio->_entire_mapcount); + atomic_inc(&folio->_total_mapcount); if (flags & RMAP_EXCLUSIVE) SetPageAnonExclusive(&folio->page); VM_WARN_ON_FOLIO(folio_entire_mapcount(folio) > 1 && @@ -2588,6 +2576,7 @@ void hugepage_add_new_anon_rmap(struct folio *folio, BUG_ON(address < vma->vm_start || address >= vma->vm_end); /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); + atomic_set(&folio->_total_mapcount, 0); folio_clear_hugetlb_restore_reserve(folio); __folio_set_anon(folio, vma, address, true); SetPageAnonExclusive(&folio->page); From patchwork Fri Nov 24 13:26:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90093C61D97 for ; Fri, 24 Nov 2023 13:26:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2891C8D0077; Fri, 24 Nov 2023 08:26:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 238D08D006E; Fri, 24 Nov 2023 08:26:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 08B548D0077; Fri, 24 Nov 2023 08:26:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E779C8D006E for ; Fri, 24 Nov 2023 08:26:50 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8B2CD140943 for ; Fri, 24 Nov 2023 13:26:50 +0000 (UTC) X-FDA: 81492922980.18.A756178 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id AE99E1C000D for ; Fri, 24 Nov 2023 13:26:48 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HaLDzevB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832408; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NG++BiHeM5ZPoN7pJxd4ztX7J63H+nPZ6vnmgIm1VTs=; b=QPG04zwxvIB82u/xONlgQ5LXLTpBgiLZ+MFOpjLYTKwIgrDf0kp5HsVnq3FVGcn/BBZd3R muNli+vU1NWonuTIlt67dWlg7rcLeh9BNMV+1A9RXihWSnT8linlPGqgnxOu4JQuhLRPzK d/4Ae94wr9tdawNyt19OslJ7z4XnyN8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HaLDzevB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832408; a=rsa-sha256; cv=none; b=Cs9CqZeHuCXqdrmE3TlykoYt0Sp4RtMaD9vPjieRXi39rHXLfXOlvUGQBsinXeBeJPE1Vh uiXjzVy8VTkNSkPv/tatrZw/4G+Le5IagOfuGwWYdnl/2bamb2KlHp1f+8E+adFsjPR4xz O/V+pTkiNF/woxQf2B5oI7hkt7V6FA0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832408; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NG++BiHeM5ZPoN7pJxd4ztX7J63H+nPZ6vnmgIm1VTs=; b=HaLDzevB3vWu/6f/84rUSBdsEsZhXRlrNOWwbw0QGUG8ZLiYCq+D24JY4JxYptT4+2nocp SW9oeAombuRiMvY6459CFVeRaLcBQjx09xPaF31v3dc9vdGc+N9giJqCH6HWWmgqguvAXh Dw9SGAKZm3XkpUTUlzmC6kuWxCWmyUQ= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-ZJGKa9ZtNFmIPA6BX8h3SQ-1; Fri, 24 Nov 2023 08:26:43 -0500 X-MC-Unique: ZJGKa9ZtNFmIPA6BX8h3SQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0023E2806052; Fri, 24 Nov 2023 13:26:43 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 476F32166B2A; Fri, 24 Nov 2023 13:26:39 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 03/20] mm: convert folio_estimated_sharers() to folio_mapped_shared() and improve it Date: Fri, 24 Nov 2023 14:26:08 +0100 Message-ID: <20231124132626.235350-4-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: AE99E1C000D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: cop9h69ujeoft99xarmpabs3bb5szecr X-HE-Tag: 1700832408-803167 X-HE-Meta: U2FsdGVkX1/bHYPIQd4yl871Q45slho6L4TI4pmgdVB56Y+6it2obweBF0WqDc1d1QZQY63yySX2h4W/J2XPUeLNKiBuZIBF/hRLUU1m8MDOGZzGX08jbbd/AvDGtWyutel4MJBM/TaNqDZzZnr/1XtBCv+KsDMEG8VsD1a0M1fDXS32pA435GK1vtRjCGDTW/4Rnn/VW7iV5/G8dr14VeewdgF3bSclpvhG4pqWuvlJdrJlNRRWC7jyy8O9s9BLyLb0sl/2snbhlbSj7NoYaVuPeLrJZuKj+ndfVuAPtR7YALqCijZC2QJurJhmxSbEQHs6px0V/1ST6tUOc/MFYO/hlmtkGjyS7AcdBQb7waf30CSNhPoCiG5ZTvN8GsVdrnCzb3OmZhXQr0r4W4J7YLVjS4oBwmy8FaTr7nAe3N3zQgGtM1IL8ZhI/uAzclPPP35k8a3fXax55MA7yZL+e4w9wOibbPSzgu6KVN0VZYnis8OqB8UEjJUEZwSMSwzQabSINOho6S7Be+o9Yfk7wj/7sSnKh/HgxivTbSRnJnVWzoqd/H1rnmV0flAIAeSSEYTiyn2tCQ8QBFaxKf6eZR1KIgGkXZEqcULb79Dm/KNznLJVjpxtuqePZyl4COq5KS8Kqs9TFy4sT6w2NP70FiK1wq8dBgfSKTnz1xrSPFPxuOGjh49TVn4p7T+8vzspYNnFG/9ztKY+pO4e9/b3Kh0MN+7tP3jGIWlGXhBzzM6r88ITlEL5+3Twb8/KzO9eBqmNBAxlyBmJikEJgoBJhKDOx2dG6o16DoHNWenXEUCieIXtrCj9zVBG1EN0nFMouCBJRHfEPQlMJrnk0R+bf1Su/6cwqe+vPnl9mf5oqPmwL8NSPZ3Uzk6vd5jZEGkbN+8uo4o+E2zBCaEVgds0XCaC0WCmYWpIJLG3U1RVc/8tp2IaIoN3b35HyIro99cWYqJlnR4zamQlhATaFPD +E5tOrfZ LJUA+nU2jr7JPSffjfo+pjH/EUC6ObzzYxW/cxsoUo6xkEIzMQ4UAA6MxGrmt6jfeyWGo4ro7rWmiYuvVx/9T+omeEBVPejnwYD4gw7MVhDRgITtb6Q34emu26dBH8VF8ZuyQlKYMHHBMXulIPNI1GGzHgRLbIwMKETa9qvF7SNFzAhvgV3TWw+sg8PkV2mRYGuILaUwlCAkIIBQj6A4siKAr3LjQ5rWCgE6D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Callers of folio_estimated_sharers() only care about "mapped shared vs. mapped exclusively". Let's rename the function and improve our detection for partially-mappable folios (i.e., PTE-mapped THPs). For now we can only implement, based on our guess, "certainly mapped shared vs. maybe mapped exclusively". Ideally, we'd have something like "maybe mapped shared vs. certainly mapped exclusive" -- or even better "certainly mapped shared vs. certainly mapped exclusively" instead. But these semantics are currently impossible using our guess-based heuristic we apply for partially-mappable folios. Naming the function "folio_certainly_mapped_shared" could be possible, but let's just keep it simple an call it "folio_mapped_shared" and document the fuzziness that applies for now. As we can now read the total mapcount of large folios very efficiently, use that to improve our implementation, falling back to making a guess only in case the folio is not "obviously mapped shared". Signed-off-by: David Hildenbrand --- include/linux/mm.h | 68 +++++++++++++++++++++++++++++++++++++++------- mm/huge_memory.c | 2 +- mm/madvise.c | 6 ++-- mm/memory.c | 2 +- mm/mempolicy.c | 14 ++++------ mm/migrate.c | 2 +- 6 files changed, 70 insertions(+), 24 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index fe91aaefa3db..17dac913f367 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2114,21 +2114,69 @@ static inline size_t folio_size(struct folio *folio) } /** - * folio_estimated_sharers - Estimate the number of sharers of a folio. + * folio_mapped_shared - Report if a folio is certainly mapped by + * multiple entities in their page tables * @folio: The folio. * - * folio_estimated_sharers() aims to serve as a function to efficiently - * estimate the number of processes sharing a folio. This is done by - * looking at the precise mapcount of the first subpage in the folio, and - * assuming the other subpages are the same. This may not be true for large - * folios. If you want exact mapcounts for exact calculations, look at - * page_mapcount() or folio_mapcount(). + * This function checks if a folio is certainly *currently* mapped by + * multiple entities in their page table ("mapped shared") or if the folio + * may be mapped exclusively by a single entity ("mapped exclusively"). * - * Return: The estimated number of processes sharing a folio. + * Usually, we consider a single entity to be a single MM. However, some + * folios (KSM, pagecache) can be mapped multiple times into the same MM. + * + * For KSM folios, each individual page table mapping is considered a + * separate entity. So if a KSM folio is mapped multiple times into the + * same process, it is considered "mapped shared". + * + * For pagecache folios that are entirely mapped multiple times into the + * same MM (i.e., multiple VMAs in the same MM cover the same + * file range), we traditionally (and for simplicity) consider them, + * "mapped shared". For partially-mapped folios (e..g, PTE-mapped THP), we + * might detect them either as "mapped shared" or "mapped exclusively" -- + * whatever is simpler. + * + * For small folios and entirely mapped large folios (e.g., hugetlb, + * PMD-mapped PMD-sized THP), the result will be exactly correct. + * + * For all other (partially-mappable) folios, such as PTE-mapped THP, the + * return value is partially fuzzy: true is not fuzzy, because it means + * "certainly mapped shared", but false means "maybe mapped exclusively". + * + * Note that this function only considers *current* page table mappings + * tracked via rmap -- that properly adjusts the folio mapcount(s) -- and + * does not consider: + * (1) any way the folio might get mapped in the (near) future (e.g., + * swapcache, pagecache, temporary unmapping for migration). + * (2) any way a folio might be mapped besides using the rmap (PFN mappings). + * (3) any form of page table sharing. + * + * Return: Whether the folio is certainly mapped by multiple entities. */ -static inline int folio_estimated_sharers(struct folio *folio) +static inline bool folio_mapped_shared(struct folio *folio) { - return page_mapcount(folio_page(folio, 0)); + unsigned int total_mapcount; + + if (likely(!folio_test_large(folio))) + return atomic_read(&folio->page._mapcount) != 0; + total_mapcount = folio_total_mapcount(folio); + + /* A single mapping implies "mapped exclusively". */ + if (total_mapcount == 1) + return false; + + /* If there is an entire mapping, it must be the only mapping. */ + if (folio_entire_mapcount(folio) || unlikely(folio_test_hugetlb(folio))) + return total_mapcount != 1; + /* + * Partially-mappable folios are tricky ... but some are "obviously + * mapped shared": if we have more (PTE) mappings than we have pages + * in the folio, some other entity is certainly involved. + */ + if (total_mapcount > folio_nr_pages(folio)) + return true; + /* ... guess based on the mapcount of the first page of the folio. */ + return atomic_read(&folio->page._mapcount) > 0; } #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f31f02472396..874eeeb90e0b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1638,7 +1638,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * If other processes are mapping this folio, we couldn't discard * the folio unless they all do MADV_FREE so let's skip the folio. */ - if (folio_estimated_sharers(folio) != 1) + if (folio_mapped_shared(folio)) goto out; if (!folio_trylock(folio)) diff --git a/mm/madvise.c b/mm/madvise.c index cf4d694280e9..1a82867c8c2e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -365,7 +365,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, folio = pfn_folio(pmd_pfn(orig_pmd)); /* Do not interfere with other mappings of this folio */ - if (folio_estimated_sharers(folio) != 1) + if (folio_mapped_shared(folio)) goto huge_unlock; if (pageout_anon_only_filter && !folio_test_anon(folio)) @@ -441,7 +441,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (folio_test_large(folio)) { int err; - if (folio_estimated_sharers(folio) != 1) + if (folio_mapped_shared(folio)) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break; @@ -665,7 +665,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (folio_test_large(folio)) { int err; - if (folio_estimated_sharers(folio) != 1) + if (folio_mapped_shared(folio)) break; if (!folio_trylock(folio)) break; diff --git a/mm/memory.c b/mm/memory.c index 1f18ed4a5497..6bcfa763a146 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4848,7 +4848,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) * Flag if the folio is shared between multiple address spaces. This * is later used when determining whether to group tasks together */ - if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHARED)) + if (folio_mapped_shared(folio) && (vma->vm_flags & VM_SHARED)) flags |= TNF_SHARED; nid = folio_nid(folio); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 10a590ee1c89..0492113497cc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -605,12 +605,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio. * Choosing not to migrate a shared folio is not counted as a failure. * - * To check if the folio is shared, ideally we want to make sure - * every page is mapped to the same process. Doing that is very - * expensive, so check the estimated sharers of the folio instead. + * See folio_mapped_shared() on possible imprecision when we cannot + * easily detect if a folio is shared. */ if ((flags & MPOL_MF_MOVE_ALL) || - (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte))) + (!folio_mapped_shared(folio) && !hugetlb_pmd_shared(pte))) if (!isolate_hugetlb(folio, qp->pagelist)) qp->nr_failed++; unlock: @@ -988,11 +987,10 @@ static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio. * Choosing not to migrate a shared folio is not counted as a failure. * - * To check if the folio is shared, ideally we want to make sure - * every page is mapped to the same process. Doing that is very - * expensive, so check the estimated sharers of the folio instead. + * See folio_mapped_shared() on possible imprecision when we cannot + * easily detect if a folio is shared. */ - if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) { + if ((flags & MPOL_MF_MOVE_ALL) || !folio_mapped_shared(folio)) { if (folio_isolate_lru(folio)) { list_add_tail(&folio->lru, foliolist); node_stat_mod_folio(folio, diff --git a/mm/migrate.c b/mm/migrate.c index 35a88334bb3c..fda41bc09903 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2559,7 +2559,7 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, * every page is mapped to the same process. Doing that is very * expensive, so check the estimated mapcount of the folio instead. */ - if (folio_estimated_sharers(folio) != 1 && folio_is_file_lru(folio) && + if (folio_mapped_shared(folio) && folio_is_file_lru(folio) && (vma->vm_flags & VM_EXEC)) goto out; From patchwork Fri Nov 24 13:26:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72E66C636CB for ; Fri, 24 Nov 2023 13:26:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08E1A8D0078; Fri, 24 Nov 2023 08:26:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0409C8D006E; Fri, 24 Nov 2023 08:26:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1F238D0078; Fri, 24 Nov 2023 08:26:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C84A18D006E for ; Fri, 24 Nov 2023 08:26:52 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 99252806D3 for ; Fri, 24 Nov 2023 13:26:52 +0000 (UTC) X-FDA: 81492923064.27.D958652 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf27.hostedemail.com (Postfix) with ESMTP id D679940017 for ; Fri, 24 Nov 2023 13:26:50 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MyolCqsD; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf27.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832410; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lJ96IMZVe5oCg8NRltA+Bp5PZmcoSK+MjUKn4pYnke0=; b=wqm06hwrczjIajo40w4JV9PtPiMxipZnokCtZbZuBr3j0krS2Y0VNNYQssFpxkPkcdXeqs RkzTE4cpB4bo1HgBsHB/KU941A/6PCEJ9rncS40sU+eaHyzkdFQ8EZSbi4Ktf87zfklkDC nVqwXz1ioz+v4AfB02AbdjzVa/QZOqU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MyolCqsD; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf27.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832410; a=rsa-sha256; cv=none; b=07z5xqANMx8izfFLCMkn88vFJzHpV9F8KlUvjvvxLzPe5AZtAN6/Td0K3BgUQqi1DiYLyr CJBLCUtEGz4NQBTSGwj4LrZILoilV+peMQsO6itFgMkKyM3Ufaui2Lx+kls82a/GCj25Xd tInDeaq/cUoBf/RI4W74A6zScMbTxRw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832410; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lJ96IMZVe5oCg8NRltA+Bp5PZmcoSK+MjUKn4pYnke0=; b=MyolCqsDv9PQGc9S0KByhL/ohR0o51auR0uBmgFVMt3Pm/BQNB3+j7RXFdIqvKVoGs8l77 TO8EcJ1Cy8bCh2W8pUysfjPPOVKgbqnAv/r5uIM6/i7+2fOm4UDSom8U9qD2tMmDd5OjUD iEO9t4JWeETYcHxrhtW4Lc/MTy/bMF0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-507-j-wvy56dM2qBkwjy1JR5fQ-1; Fri, 24 Nov 2023 08:26:47 -0500 X-MC-Unique: j-wvy56dM2qBkwjy1JR5fQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5E5A4811E7B; Fri, 24 Nov 2023 13:26:46 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 46A822166B2A; Fri, 24 Nov 2023 13:26:43 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 04/20] mm/rmap: pass dst_vma to page_try_dup_anon_rmap() and page_dup_file_rmap() Date: Fri, 24 Nov 2023 14:26:09 +0100 Message-ID: <20231124132626.235350-5-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: D679940017 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: yk85ux4gt1uxnpj43ccsek4ed3tfzij3 X-HE-Tag: 1700832410-934114 X-HE-Meta: U2FsdGVkX18bXdr0TdcxszLHcFrDTt6ZEG+b3D69DOcUUiOKM3DWdJSYMzG+wTyWqTQRnsiTb8/LLAtKZf3aDR7PYRYhKvJOG3z+btg+6B5fu02sbSFKrOvUbROhrK7JCULpSyCgmakP7AVHfzbSrrdkkKE8h4EkCz9w3h8XGeQQnAv4GuS7qc/KjM/RsK2HTKrixrg4htu01EmgciDdbUqhoYQR7zoIOuv5Wwfo2ZzI4XpgaqDlEhjiR+P+8yWjSx9v4p2jKHzNlpa2N/FnV2wn+wq6/fK5HD8dgSIl9kjWp2/oqZzdwuPjYmdUZLdCJ4TfDKXsFfOaQrNAhDV9cKpVRXs5Eud4YePLMXUffsROGueJBsC1UMqTrV23Fnq2gfNFgaA/tFdm9abDtygQOiD7fDObSj4T/4NVhqJNCjNsAQD91SfoqlcsEx7JgqO7Pei0Ffk0iawZF110nwO9E+kJuIRlJh/fKgxHMflCM+3cPkqU3HzObE35naR3H0U+3zqqOewgkNVKKGIRs+GYxvtFhKA3KQiClKMPUubCvYe8f1jl54g4w3zmxML8QwpXtqCnAGpNzADhGvuhGEyZN2ewKpDZloeWcZ0ZxjDe3S3/vuOFPxqZ+7TczH4H8yCcDags9rSU4AEPCW8wZqjyjupyCKMjlD2W7eUz/2BNGXWuzY3f35ZXtjPb1W45V4MsX8uMNSxxfhj+P0bFYFX19xbiG9y1+TclRXvXQpNvKJpQ71fWocBMLx2tcjBPcdyoy+kae1OvloTaffFujdrbCsf32o0oUBb/6qnhh7b0E+N/CDoEkhZOSu2Vt6Brx79bY/relMqQbpIwOUPxBmRMW9GED0lhZ7oirzYQw6tjFXyet4ZXrOCAd095XgJK7ploXOU5I4hE6nTHj13qHhckFQfUlQAaaIbI5kCSZRd0xfHeujIe8CjQ6ENdHz9m9QyAUEJAyx+taBSLjMcOw12 r/3QwZKP cRIpEQYWefqv/JVjGbIHcSXD4vrd3GG4Ek2aUS3krc/JJuf/nQU2k9Dn2khfY9GsarJaZgJX99kDDJij29HD0Yz3JxJMn6MiS6KG6s28EU7iKbNlkFlj1oAFA+bv8AexYk5VJ7n0OSKQlwZeo9JhS25IR6KQxj7sBo75BZPfG92oFWvJWKnLVWVmFXf80V34cOJkKxVUBT4yItzdgZFGSBUiJFGp/Ek1btnjn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We'll need access to the destination MM when modifying the total mapcount of a partially-mappable folio next. So pass in the destination VMA for consistency. While at it, change the parameter order for page_try_dup_anon_rmap() such that the "bool compound" parameter is last, to match the other rmap functions. Signed-off-by: David Hildenbrand --- include/linux/rmap.h | 21 +++++++++++++-------- mm/huge_memory.c | 2 +- mm/hugetlb.c | 9 +++++---- mm/memory.c | 6 +++--- mm/migrate.c | 2 +- 5 files changed, 23 insertions(+), 17 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 42e2c74d4d6e..6cb497f6feab 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -208,7 +208,8 @@ void hugepage_add_anon_rmap(struct folio *, struct vm_area_struct *, void hugepage_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); -static inline void __page_dup_rmap(struct page *page, bool compound) +static inline void __page_dup_rmap(struct page *page, + struct vm_area_struct *dst_vma, bool compound) { struct folio *folio = page_folio(page); @@ -225,17 +226,19 @@ static inline void __page_dup_rmap(struct page *page, bool compound) atomic_inc(&folio->_total_mapcount); } -static inline void page_dup_file_rmap(struct page *page, bool compound) +static inline void page_dup_file_rmap(struct page *page, + struct vm_area_struct *dst_vma, bool compound) { - __page_dup_rmap(page, compound); + __page_dup_rmap(page, dst_vma, compound); } /** * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped * anonymous page * @page: the page to duplicate the mapping for + * @dst_vma: the destination vma + * @src_vma: the source vma * @compound: the page is mapped as compound or as a small page - * @vma: the source vma * * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq. * @@ -247,8 +250,10 @@ static inline void page_dup_file_rmap(struct page *page, bool compound) * * Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise. */ -static inline int page_try_dup_anon_rmap(struct page *page, bool compound, - struct vm_area_struct *vma) +static inline int page_try_dup_anon_rmap(struct page *page, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, + bool compound) { VM_BUG_ON_PAGE(!PageAnon(page), page); @@ -267,7 +272,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, * future on write faults. */ if (likely(!is_device_private_page(page) && - unlikely(page_needs_cow_for_dma(vma, page)))) + unlikely(page_needs_cow_for_dma(src_vma, page)))) return -EBUSY; ClearPageAnonExclusive(page); @@ -276,7 +281,7 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, * the page R/O into both processes. */ dup: - __page_dup_rmap(page, compound); + __page_dup_rmap(page, dst_vma, compound); return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 874eeeb90e0b..51a878efca0e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1166,7 +1166,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, VM_BUG_ON_PAGE(!PageHead(src_page), src_page); get_page(src_page); - if (unlikely(page_try_dup_anon_rmap(src_page, true, src_vma))) { + if (unlikely(page_try_dup_anon_rmap(src_page, dst_vma, src_vma, true))) { /* Page maybe pinned: split and retry the fault on PTEs. */ put_page(src_page); pte_free(dst_mm, pgtable); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cf84784064c7..1ddef4082cad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5401,9 +5401,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * sleep during the process. */ if (!folio_test_anon(pte_folio)) { - page_dup_file_rmap(&pte_folio->page, true); + page_dup_file_rmap(&pte_folio->page, dst_vma, + true); } else if (page_try_dup_anon_rmap(&pte_folio->page, - true, src_vma)) { + dst_vma, src_vma, true)) { pte_t src_pte_old = entry; struct folio *new_folio; @@ -6272,7 +6273,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, if (anon_rmap) hugepage_add_new_anon_rmap(folio, vma, haddr); else - page_dup_file_rmap(&folio->page, true); + page_dup_file_rmap(&folio->page, vma, true); new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); /* @@ -6723,7 +6724,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out_release_unlock; if (folio_in_pagecache) - page_dup_file_rmap(&folio->page, true); + page_dup_file_rmap(&folio->page, dst_vma, true); else hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr); diff --git a/mm/memory.c b/mm/memory.c index 6bcfa763a146..14416d05e1b6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -836,7 +836,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, get_page(page); rss[mm_counter(page)]++; /* Cannot fail as these pages cannot get pinned. */ - BUG_ON(page_try_dup_anon_rmap(page, false, src_vma)); + BUG_ON(page_try_dup_anon_rmap(page, dst_vma, src_vma, false)); /* * We do not preserve soft-dirty information, because so @@ -950,7 +950,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, * future. */ folio_get(folio); - if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { + if (unlikely(page_try_dup_anon_rmap(page, dst_vma, src_vma, false))) { /* Page may be pinned, we have to copy. */ folio_put(folio); return copy_present_page(dst_vma, src_vma, dst_pte, src_pte, @@ -959,7 +959,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, rss[MM_ANONPAGES]++; } else if (page) { folio_get(folio); - page_dup_file_rmap(page, false); + page_dup_file_rmap(page, dst_vma, false); rss[mm_counter_file(page)]++; } diff --git a/mm/migrate.c b/mm/migrate.c index fda41bc09903..341a84c3e8e4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -252,7 +252,7 @@ static bool remove_migration_pte(struct folio *folio, hugepage_add_anon_rmap(folio, vma, pvmw.address, rmap_flags); else - page_dup_file_rmap(new, true); + page_dup_file_rmap(new, vma, true); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte, psize); } else From patchwork Fri Nov 24 13:26:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467658 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44EECC61D97 for ; Fri, 24 Nov 2023 13:26:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEA118D0079; Fri, 24 Nov 2023 08:26:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C988D8D006E; Fri, 24 Nov 2023 08:26:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B38568D0079; Fri, 24 Nov 2023 08:26:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9FDC78D006E for ; Fri, 24 Nov 2023 08:26:56 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 667221205D1 for ; Fri, 24 Nov 2023 13:26:56 +0000 (UTC) X-FDA: 81492923232.16.F3997E5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id A2EF4180016 for ; Fri, 24 Nov 2023 13:26:54 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="WWc0PFP/"; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832414; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/ZXmCy/cptnrW109242Bp79Yc/hIzrll+sgQ1b8Vj4U=; b=zBQLv8/jmF5060XN7ps3Qhk52JxbE8XD9eSvAS26xLOU/9VVEOaC/O0ToGhvzUkEups78Y 4AAvrfPCx0+9fTivy6NuOeQeCmlHv9pg6Crs/VnZWQRw0QJ22h3YRtc7RNOJrcu/aZy2p6 y6FLpbgioqr4kWc8aPYk3DrH42mV/ro= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="WWc0PFP/"; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832414; a=rsa-sha256; cv=none; b=Jf6Cj5w3stSrj3uHR6uMJ8z/ZdLshpUIlslVDAryTZPaXbjfI4+11TjHU8lsG0w/p/FW2J 1wq8S/XyE5BpTUcu/JC0JaUAli7/I4MyZ3XVhSdR9GREfzL3Fy/mD+6Rr1BUp5ZWIz+QY8 IDSZj/DR3Kq9eL4HsNxqW8q3r+2xBrc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/ZXmCy/cptnrW109242Bp79Yc/hIzrll+sgQ1b8Vj4U=; b=WWc0PFP/SEr2Nfpj+NLRRullOb5rnC27ee2+kLZ940wFljIKjU5o8EBIWIMB96TI0JXD0z Wd9r9mdrDRKukmwC764UPNFuQ2E2op8tjI8shDYKHhr8lVPo55Ctv1q5OSPfSAkpHxx7ls iMDNtuNQ6RAh3PUIoAd+p+HvoxiNGwE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-459-PPvWI2VbOfaXtOuRopE19A-1; Fri, 24 Nov 2023 08:26:50 -0500 X-MC-Unique: PPvWI2VbOfaXtOuRopE19A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D82C7811E7B; Fri, 24 Nov 2023 13:26:49 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id A59242166B2A; Fri, 24 Nov 2023 13:26:46 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 05/20] mm/rmap: abstract total mapcount operations for partially-mappable folios Date: Fri, 24 Nov 2023 14:26:10 +0100 Message-ID: <20231124132626.235350-6-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: A2EF4180016 X-Rspam-User: X-Stat-Signature: np14jojjwjm65wuqd6cne13bktkfsqh4 X-Rspamd-Server: rspam01 X-HE-Tag: 1700832414-930217 X-HE-Meta: U2FsdGVkX18xnMHRV6EVdOUrm8KawUHnSfu7rPX4IuEJlnwxZxeYS8ZA0k3cn4SqcVvOfyVHzMkHSzj9VuLnVlC7Hjp8VpFKXlkf8BKtTUN8snRYsGtHTBmC61Buv5FaWsUzt1WnwyGMz6tj0kPjs55IOn6ZA5+LgFSNAa/25MQeONJwiJEfzYXFeJ9Ana47F06wg4RsyqEXSEzrmYkxxCHX41Qyjue8sg8LzvCDIMDQS5VFrloM+IjSaZuKRQyVBViy4dwHOMSpQeMMa72CRbm1c50ydCxwxxPyQx+ILr3ru4DQQg4uvw1cU6ZO2nriqBzg/XmraKW0eYDOqNVmm4vfx4q3zKY+yqzkdVSmvyJ323yjZkoeM7gvTwLkPvMwebcogdbNoOkuwv9gUixrhvIpHgsWc0mcaySKP4avatodsfSldg0wBXAs9tw+18QTp+49lX05cWySRDVBTSG4yzq6XSRyvCyE+p4n0gQz1FPQTRnW6wAIY/kB1pAi473K1vI15cckQ1p7jajkOJ1wshvX6sF3L+GQSLe/NJtTwAPBQe3QA2gSDBA/AwwbGuJNjGqvDV6Wp94tKgkI4FXIDudzvLR3j/Co4/qKqC1iL+LjiD4OcvkiknWhwmA5ec1XN7Rm+si3nMwdZ/47j3Cml2hlEUVYteRZb2Sq298T8z0kfmagEfkFfNlyiym42YbYpp1d1pZvULmeg15SGB7HHWMEPpt+ESfxsbNZJySmjkkQByaW5ZExSG+Q+G10dkjCi+a0khtnpV3yDocmKBelTph3kqfqMp4jXVVL25HPnfJWwwW6piaRmT2644S84sKwDHQEqIvdCZD+J9ydWCJ/4Ro8davGk1Bqy9pQm2bTnz+GeSBmkLHlQTRx4fhPDgTuOpFdI2TtH3xZe5WXfqQ4Ncgu6yie4BaRi6+/Bhf7qwwgCed5qhE1Txk6VWNnpQOfILoouxdDWKMrnRSyQxB jdu+ZjQP SIF4eNgSJQ9cujwcRdVDOvpt6lKJM+BHxXwcudsywPIXM2RTgkydOyXznT8AVxf0zMI8Cqp9ynbOWww++u1lkUJNDldYU2h56EAianXtHA0ph2I/Q9ceq1sF2gcOkoen+tV1sR0l45nG4QfwvnfAyLH0QBnBVe0TQsQJr5QHHoNOYPkI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's prepare for doing additional accounting whenever modifying the total mapcount of partially-mappable (!hugetlb) folios. Pass the VMA as well. Signed-off-by: David Hildenbrand --- include/linux/rmap.h | 41 ++++++++++++++++++++++++++++++++++++++++- mm/rmap.c | 23 ++++++++++++----------- 2 files changed, 52 insertions(+), 12 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 6cb497f6feab..9d5c2ed6ced5 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -168,6 +168,39 @@ static inline void anon_vma_merge(struct vm_area_struct *vma, struct anon_vma *folio_get_anon_vma(struct folio *folio); +static inline void folio_set_large_mapcount(struct folio *folio, + int count, struct vm_area_struct *vma) +{ + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + /* increment count (starts at -1) */ + atomic_set(&folio->_total_mapcount, count - 1); +} + +static inline void folio_inc_large_mapcount(struct folio *folio, + struct vm_area_struct *vma) +{ + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + atomic_inc(&folio->_total_mapcount); +} + +static inline void folio_add_large_mapcount(struct folio *folio, + int count, struct vm_area_struct *vma) +{ + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + atomic_add(count, &folio->_total_mapcount); +} + +static inline void folio_dec_large_mapcount(struct folio *folio, + struct vm_area_struct *vma) +{ + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + atomic_dec(&folio->_total_mapcount); +} + /* RMAP flags, currently only relevant for some anon rmap operations. */ typedef int __bitwise rmap_t; @@ -219,11 +252,17 @@ static inline void __page_dup_rmap(struct page *page, return; } + if (unlikely(folio_test_hugetlb(folio))) { + atomic_inc(&folio->_entire_mapcount); + atomic_inc(&folio->_total_mapcount); + return; + } + if (compound) atomic_inc(&folio->_entire_mapcount); else atomic_inc(&page->_mapcount); - atomic_inc(&folio->_total_mapcount); + folio_inc_large_mapcount(folio, dst_vma); } static inline void page_dup_file_rmap(struct page *page, diff --git a/mm/rmap.c b/mm/rmap.c index 38765796dca8..689ad85cf87e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1105,8 +1105,8 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, } static unsigned int __folio_add_rmap_range(struct folio *folio, - struct page *page, unsigned int nr_pages, bool compound, - int *nr_pmdmapped) + struct page *page, unsigned int nr_pages, + struct vm_area_struct *vma, bool compound, int *nr_pmdmapped) { atomic_t *mapped = &folio->_nr_pages_mapped; int first, count, nr = 0; @@ -1130,7 +1130,7 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, nr++; } } while (page++, --count > 0); - atomic_add(nr_pages, &folio->_total_mapcount); + folio_add_large_mapcount(folio, nr_pages, vma); } else if (folio_test_pmd_mappable(folio)) { /* That test is redundant: it's for safety or to optimize out */ @@ -1148,7 +1148,7 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, nr = 0; } } - atomic_inc(&folio->_total_mapcount); + folio_inc_large_mapcount(folio, vma); } else { VM_WARN_ON_ONCE_FOLIO(true, folio); } @@ -1258,7 +1258,8 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned int nr, nr_pmdmapped = 0; bool compound = flags & RMAP_COMPOUND; - nr = __folio_add_rmap_range(folio, page, 1, compound, &nr_pmdmapped); + nr = __folio_add_rmap_range(folio, page, 1, vma, compound, + &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped); if (nr) @@ -1329,8 +1330,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, } if (folio_test_large(folio)) - /* increment count (starts at -1) */ - atomic_set(&folio->_total_mapcount, 0); + folio_set_large_mapcount(folio, 1, vma); __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); __folio_set_anon(folio, vma, address, true); @@ -1355,7 +1355,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page, { unsigned int nr, nr_pmdmapped = 0; - nr = __folio_add_rmap_range(folio, page, nr_pages, compound, + nr = __folio_add_rmap_range(folio, page, nr_pages, vma, compound, &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? @@ -1411,16 +1411,17 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(compound && !PageHead(page), page); - if (folio_test_large(folio)) - atomic_dec(&folio->_total_mapcount); - /* Hugetlb pages are not counted in NR_*MAPPED */ if (unlikely(folio_test_hugetlb(folio))) { /* hugetlb pages are always mapped with pmds */ atomic_dec(&folio->_entire_mapcount); + atomic_dec(&folio->_total_mapcount); return; } + if (folio_test_large(folio)) + folio_dec_large_mapcount(folio, vma); + /* Is page being unmapped by PTE? Is this its last map to be removed? */ if (likely(!compound)) { last = atomic_add_negative(-1, &page->_mapcount); From patchwork Fri Nov 24 13:26:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 337D0C61DF4 for ; Fri, 24 Nov 2023 13:27:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C08E38D007A; Fri, 24 Nov 2023 08:27:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3A6B8D006E; Fri, 24 Nov 2023 08:27:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93F758D007A; Fri, 24 Nov 2023 08:27:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 72CC78D006E for ; Fri, 24 Nov 2023 08:27:00 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4F1BB16096D for ; Fri, 24 Nov 2023 13:27:00 +0000 (UTC) X-FDA: 81492923400.28.49CBB59 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 884C3180019 for ; Fri, 24 Nov 2023 13:26:58 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fspcQFqT; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ujZJ3g6hK6oBZOKXOf3RJw4ZrpKoYgRWOTMhG7TrjzA=; b=3TWD0kbF0BY+gm6oZz1kl6NLQfNvX2U4upBLueyg1ThOh2vrocKQqh5c8B23QxdKYvw4lw Rhm0VLDmmtvUkXlMZV1crQiIe+1jLEAzn1yEQ1Uu7fqHwF5Nv1jj8bGYkOD7pdCfDRG+UK //WPtYZVqGohe9oJpzy3I8E0IF+j/Ls= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fspcQFqT; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832418; a=rsa-sha256; cv=none; b=7lN+kwgbxK1UmO8Omj/Wi8/LHFxuvxWx1hWONS9yvMI72h3f0jO+s5SepHRyAQVN9z+kfu +Rb6yqqzpbCXoAUhmS9jsspvHH5rKCOek08NqXLk02lkKnyip5YCCXTCHMV64jFU3gxlTp ZR+Vq6bjgJujh0kQHPEn5VcBRxzEn+w= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ujZJ3g6hK6oBZOKXOf3RJw4ZrpKoYgRWOTMhG7TrjzA=; b=fspcQFqTlRM75476PwzS3whu90fck31/BUvnd/+0e4h3wh2hxj2olZCMXA+GhARvm2vP2G 5KBRDn85ZWajIb7muXjG7Lics0j2GOt/XS4eOYS0cLL7CtUtw/7/ZOyMjB1x4PoMFoYYf4 mu5P0j7WiBBmpto0IChr9DszUQZAFtg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-446-U_verE8yOyS0gXTE1oEkOw-1; Fri, 24 Nov 2023 08:26:54 -0500 X-MC-Unique: U_verE8yOyS0gXTE1oEkOw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A1C4538116E0; Fri, 24 Nov 2023 13:26:53 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 232362166B2A; Fri, 24 Nov 2023 13:26:50 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 06/20] atomic_seqcount: new (raw) seqcount variant to support concurrent writers Date: Fri, 24 Nov 2023 14:26:11 +0100 Message-ID: <20231124132626.235350-7-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 884C3180019 X-Stat-Signature: peskoogzf3t7s73pj97sutf3pj6494hb X-Rspam-User: X-HE-Tag: 1700832418-942902 X-HE-Meta: U2FsdGVkX1/Qp5l+XGQEBMlBvsXuchOKTIafhzT6N9erPUTasyX4lfTeNe5aLCZZEnLa+uO/i0ct7seUmxO2yfGLuxdAHaHH0A+7ZKN9DBUacu9VyBwH3YEPxZaknQqTf8bs8zOPvsUmQcnkkykx3TlDzK+doPpX8OGlxUptJrAOV8oW8LzzJUDVBUhQe8uRVp2aqQWOltycHHbHDocTNYYElPCvv5ldGkCNtJvWDzgA0OBduacKHaJFmInuCeE3XNWVdbS69CMNV5/YINYK4Hv0fI6NCglcSfBMSdpc+NIazmSpMjZMolvAvh/Ub19V9jwjZj/VFHaV6s/+tSXBQd/shtzvc+8q8MMDWDlIVYPAiV+QVkW92ps7eA5EWMEI60obiCMI1gAGMgIgezzYDbnZCNl5ox8F5aQ43tUTSkhzGMP0dFLNF+QRLWN1J7KIWFbBjDurO5gawizdEBNqi2XFOvZqxyzrZkWY9qwMN8CchbbM2GwJc5RdC5aeElhGtN8nduWnHYHUyQcHQ3SyfDwhRtL/LIw7ilGjW9RdSEYJzygGymtK5B42NtxQgVpNAge2wihr6aE6APq8g1vh1u20Jso+5r4Fl6m7ub6An51EBjshnS2mr2AxwQfLd6fG1ZSUbR//n7ElbhblNIrRkSGj2i69WF2KlF6jSvjxwlSsSMmzDd7xIkMma9i0Jq34OtGy0ts6UNyrpz9ZQAs40xD+gmTiTtSl4JeVeHsvcJrJkMwalxwwvka9CBIlWMxpSzV1ScKs2Hff5LLp9OkNrSEuAnqVQxREVrksKcvZLfKVyVuQoCx6+jfG1z7cUc5mPWdlryLXV6oVZOZ78jWwD3gJKoIbPWGFenWlmzjb5Fa1rAVBbaZA/XbKnZAeu1LcgYNZ0/5Kvg8TvjxrYDwsmUiUgwkufQIEjE2Io9S7HmMxGK0FVHAg4Whiuw2AIwH4cJXbFDGptmk48m4LGuA ul7fRfYF JSxOCev4U+nebmkj6lHyGclY+W/QfeGYKGCFelw85tCTpUByWJma5fgXrXntUCNaSwiger6D8HyUb80lYoFxiovsqy6+/JTdqK+jQlaeZT53zxlZwXttdm4TEzfa1gnowkPvliBvdw1xpyuyHqCL0gX7pwH22SANvX+YPWqGgaIfzgwnmzyVFIm8dVprPsguClV3qmb1JNmMVANnF/gFfDAZP2zry5cFfz87Q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Assume we have a writer side that is fairly simple and only updates some counters by adding some values: folio->counter_a += diff_a; folio->counter_b += diff_b; folio->counter_c += diff_c; ... Further, assume that our readers want to always read consistent set of counters. That is, they not only want to read each counter atomically, but also get a consistent/atomic view across *all* counters, detecting the case where there are concurrent modifications of the counters. Traditionally, we'd use a seqcount protected by some locking on the writer side. The readers can run lockless, detect when there were concurrent updates, to simply retry again to re-read all values. However, a seqcount requires to serialize all writers to only allow for a single writer at a time. Alternatives might include per-cpu counters / local atomics, but for the target use cases, both primitives are not applicable: We want to store counters (2 to 7 for now, depending on the folio size) in the "struct folio" of some larger folios (order >=2 ) whereby the counters get adjusted whenever we (un)map part of a folio. (a) The reader side must be able to get a consistent view of the counters and be able to detect concurrent changes (i.e., concurrent (un)mapping), as described above. In some cases we can simply stop immediately if we detect any concurrent writer -- any concurrent (un)map activity. (b) The writer side updates the counters as described above and should ideally run completely lockless. In many cases, we always have a single write at a time. But in some scenarios, we can trigger a lot of concurrent writers. We want the writer side to be able to make progress instead of repeadetly spinning, waiting for possibly many other writers. (c) Space in the "struct folio" especially for smallish folios is very limited, and the "struct page" layout imposes various restrictions on where we can even put new data; growing the size of the "struct page" is not desired because it can result in serious metadata overhead and easily has performance implications (cache-line). So we cannot place ordinary spinlocks in there (especially also because they change their size based on lockdep and actual implementation), and the only real alternative is a bit spinlock, which is really undesired. If we want to allow concurrent writers, we can use atomic RMW operations when updating the counters: atomic_add(diff_a, &folio->counter_a); atomic_add(diff_b, &folio->counter_b); atomic_add(diff_c, &folio->counter_c); ... But the existing seqcount to make the reader size detect concurrent updates is not capable of handling concurrent writers. So let's add a new atomic seqcount for exactly that purpose. Instead of using a single LSB in the seqcount to detect a single concurrent writer, it uses multiple LSBs to detect multiple concurrent writers. As the seqcount can be modified concurrently, it ends up being an atomic type. In theory, each CPU can participate, so we have to steal quite some LSBs on 64bit. As that reduces the bits available for the actual sequence quite drastically especially on 64bit, and there is the concern that 16bit for the sequence might not be sufficient, just use an atomic_long_t for now. For the use case discussed, we will place the new atomic seqcount into the "struct folio"/"struct page", where the limitations as described above apply. For that use case, the "raw" variant -- raw_atomic_seqcount_t -- is required, so we only add that. For the normal seqcount on the writer side, we have the following memory ordering: s->sequence++ smp_wmb(); [critical section] smp_wmb(); s->sequence++ It's important that other CPUs don't observe stores to the sequence to be reordered with stores in the critical section. For the atomic seqcount, we could have similarly used: atomic_long_add(SHARED, &s->sequence); smp_wmb(); [critical section] smp_wmb(); atomic_long_add(STEP - SHARED, &s->sequence); But especially on x86_64, the atomic_long_add() already implies a full memory barrier. So instead, we can do: atomic_long_add(SHARED, &s->sequence); __smp_mb__after_atomic(); [critical section] __smp_mb__before_atomic(); atomic_long_add(STEP - SHARED, &s->sequence); Or alternatively: atomic_long_add_return(SHARED, &s->sequence); [critical section] atomic_long_add_return(STEP - SHARED, &s->sequence); Could we use acquire-release semantics? Like the following: atomic_long_add_return_acquire(SHARED, &s->sequence) [critical section] atomic_long_add_return_release(STEP - SHARED, &s->sequence) Maybe, but (a) it would make it different to normal seqcounts, because stores before/after the atomic_long_add_*() could now be reordered and; (b) memory-barriers.txt might indicate that the sequence counter store might be reordered: "For compound atomics performing both a load and a store, ACQUIRE semantics apply only to the load and RELEASE semantics apply only to the store portion of the operation.". So let's keep it simple for now. Effectively, with the atomic seqcount We end up with more atomic RMW operations in the critical section but get no writer starvation / lock contention in return. We'll limit the implementation to !PREEMPT_RT and disallowing readers/writers from interrupt context. Signed-off-by: David Hildenbrand --- include/linux/atomic_seqcount.h | 170 ++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 11 +++ 2 files changed, 181 insertions(+) create mode 100644 include/linux/atomic_seqcount.h diff --git a/include/linux/atomic_seqcount.h b/include/linux/atomic_seqcount.h new file mode 100644 index 000000000000..109447b663a1 --- /dev/null +++ b/include/linux/atomic_seqcount.h @@ -0,0 +1,170 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef __LINUX_ATOMIC_SEQLOCK_H +#define __LINUX_ATOMIC_SEQLOCK_H + +#include +#include +#include + +/* + * raw_atomic_seqcount_t -- a reader-writer consistency mechanism with + * lockless readers (read-only retry loops), and lockless writers. + * The writers must use atomic RMW operations in the critical section. + * + * This locking mechanism is applicable when all individual operations + * performed by writers can be expressed using atomic RMW operations + * (so they can run lockless) and readers only need a way to get an atomic + * view over all individual atomic values: like writers atomically updating + * multiple counters, and readers wanting to observe a consistent state + * across all these counters. + * + * For now, only the raw variant is implemented, that doesn't perform any + * lockdep checks. + * + * Copyright Red Hat, Inc. 2023 + * + * Author(s): David Hildenbrand + */ + +typedef struct raw_atomic_seqcount { + atomic_long_t sequence; +} raw_atomic_seqcount_t; + +#define raw_seqcount_init(s) atomic_long_set(&((s)->sequence), 0) + +#ifdef CONFIG_64BIT + +#define ATOMIC_SEQCOUNT_SHARED_WRITER 0x0000000000000001ul +/* 65536 CPUs */ +#define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x0000000000008000ul +#define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x000000000000fffful +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000000000fffful +/* We have 48bit for the actual sequence. */ +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000000010000ul + +#else /* CONFIG_64BIT */ + +#define ATOMIC_SEQCOUNT_SHARED_WRITER 0x00000001ul +/* 64 CPUs */ +#define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x00000040ul +#define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x0000007ful +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x0000007ful +/* We have 25bit for the actual sequence. */ +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x00000080ul + +#endif /* CONFIG_64BIT */ + +#if CONFIG_NR_CPUS > ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX +#error "raw_atomic_seqcount_t does not support such large CONFIG_NR_CPUS" +#endif + +/** + * raw_read_atomic_seqcount() - read the raw_atomic_seqcount_t counter value + * @s: Pointer to the raw_atomic_seqcount_t + * + * raw_read_atomic_seqcount() opens a read critical section of the given + * raw_atomic_seqcount_t, and without checking or masking the sequence counter + * LSBs (using ATOMIC_SEQCOUNT_WRITERS_MASK). Calling code is responsible for + * handling that. + * + * Return: count to be passed to raw_read_atomic_seqcount_retry() + */ +static inline unsigned long raw_read_atomic_seqcount(raw_atomic_seqcount_t *s) +{ + unsigned long seq = atomic_long_read(&s->sequence); + + /* Read the sequence before anything in the critical section */ + smp_rmb(); + return seq; +} + +/** + * raw_read_atomic_seqcount_begin() - begin a raw_seqcount_t read section + * @s: Pointer to the raw_atomic_seqcount_t + * + * raw_read_atomic_seqcount_begin() opens a read critical section of the + * given raw_seqcount_t. This function must not be used in interrupt context. + * + * Return: count to be passed to raw_read_atomic_seqcount_retry() + */ +static inline unsigned long raw_read_atomic_seqcount_begin(raw_atomic_seqcount_t *s) +{ + unsigned long seq; + + BUILD_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT)); +#ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT + DEBUG_LOCKS_WARN_ON(in_interrupt()); +#endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + while ((seq = atomic_long_read(&s->sequence)) & + ATOMIC_SEQCOUNT_WRITERS_MASK) + cpu_relax(); + + /* Load the sequence before any load in the critical section. */ + smp_rmb(); + return seq; +} + +/** + * raw_read_atomic_seqcount_retry() - end a raw_seqcount_t read critical section + * @s: Pointer to the raw_atomic_seqcount_t + * @start: count, for example from raw_read_atomic_seqcount_begin() + * + * raw_read_atomic_seqcount_retry() closes the read critical section of the + * given raw_seqcount_t. If the critical section was invalid, it must be ignored + * (and typically retried). + * + * Return: true if a read section retry is required, else false + */ +static inline bool raw_read_atomic_seqcount_retry(raw_atomic_seqcount_t *s, + unsigned long start) +{ + /* Load the sequence after any load in the critical section. */ + smp_rmb(); + return unlikely(atomic_long_read(&s->sequence) != start); +} + +/** + * raw_write_seqcount_begin() - start a raw_seqcount_t write critical section + * @s: Pointer to the raw_atomic_seqcount_t + * + * raw_write_seqcount_begin() opens the write critical section of the + * given raw_seqcount_t. This function must not be used in interrupt context. + */ +static inline void raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s) +{ + BUILD_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT)); +#ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT + DEBUG_LOCKS_WARN_ON(in_interrupt()); +#endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + preempt_disable(); + atomic_long_add(ATOMIC_SEQCOUNT_SHARED_WRITER, &s->sequence); + /* Store the sequence before any store in the critical section. */ + smp_mb__after_atomic(); +#ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT + DEBUG_LOCKS_WARN_ON((atomic_long_read(&s->sequence) & + ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK) > + ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX); +#endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ +} + +/** + * raw_write_seqcount_end() - end a raw_seqcount_t write critical section + * @s: Pointer to the raw_atomic_seqcount_t + * + * raw_write_seqcount_end() closes the write critical section of the + * given raw_seqcount_t. + */ +static inline void raw_write_atomic_seqcount_end(raw_atomic_seqcount_t *s) +{ +#ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT + DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & + ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK)); +#endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + /* Store the sequence after any store in the critical section. */ + smp_mb__before_atomic(); + atomic_long_add(ATOMIC_SEQCOUNT_SEQUENCE_STEP - + ATOMIC_SEQCOUNT_SHARED_WRITER, &s->sequence); + preempt_enable(); +} + +#endif /* __LINUX_ATOMIC_SEQLOCK_H */ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cc7d53d9dc01..569c2c6ed47f 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1298,6 +1298,7 @@ config PROVE_LOCKING select DEBUG_MUTEXES if !PREEMPT_RT select DEBUG_RT_MUTEXES if RT_MUTEXES select DEBUG_RWSEMS + select DEBUG_ATOMIC_SEQCOUNT if !PREEMPT_RT select DEBUG_WW_MUTEX_SLOWPATH select DEBUG_LOCK_ALLOC select PREEMPT_COUNT if !ARCH_NO_PREEMPT @@ -1425,6 +1426,16 @@ config DEBUG_RWSEMS This debugging feature allows mismatched rw semaphore locks and unlocks to be detected and reported. +config DEBUG_ATOMIC_SEQCOUNT + bool "Atomic seqcount debugging: basic checks" + depends on DEBUG_KERNEL && !PREEMPT_RT + help + This feature allows some atomic seqcount semantics violations to be + detected and reported. + + The debug checks are only performed when running code that actively + uses atomic seqcounts; there are no dedicated test cases yet. + config DEBUG_LOCK_ALLOC bool "Lock debugging: detect incorrect freeing of live locks" depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT From patchwork Fri Nov 24 13:26:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 339BBC61D97 for ; Fri, 24 Nov 2023 13:27:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2B178D007B; Fri, 24 Nov 2023 08:27:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BAE5F8D006E; Fri, 24 Nov 2023 08:27:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9402E8D007B; Fri, 24 Nov 2023 08:27:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7B4878D006E for ; Fri, 24 Nov 2023 08:27:03 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4B0F51A06CE for ; Fri, 24 Nov 2023 13:27:03 +0000 (UTC) X-FDA: 81492923526.25.F7844AE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 6FDC318002B for ; Fri, 24 Nov 2023 13:27:01 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=blneaO6V; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832421; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a5yfSJmvw+walwOxM5XR2U+SrOM6iOWFmuLu46mrnEU=; b=hBRlH11yxUhj6wLXeQYC50d2BoODqg8lsJ1L5SWL0yaWVFgJ8Su+BEskAtw7d2HevQPiDf fPr+8uLvQboWXwJuDk0wnp3QoKvKPjOJMkysDZR58zoWAqvaiFn9mhtS0UMZdH47MJ5712 SEilFdEyjhF7Vc224MSXlgNO6HacsJM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=blneaO6V; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832421; a=rsa-sha256; cv=none; b=Sf8FlyaidmHe6BsEscKc5gmVoKYjlIQe7sR63HaA0sCqNHn8k0vhazN9+FjtQc07EXX+ZA sykQdkK4lQZ6gnyniW5Fcd6vPwQe5A/9z7TZRfND8WzpQ9yHq7Nu2pVTcXq62GkR4qTkqW zXxKvxCnhBqTAVM3zneDa0r85wVo9ao= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a5yfSJmvw+walwOxM5XR2U+SrOM6iOWFmuLu46mrnEU=; b=blneaO6Von/7If1kkxAo+LYa5rv//QxgVYhvipio1uf59ixRdPSzdR41gUbs9hq+vqvk5M 2ZAC6xSD6EvmYcR0DOSBpfzjKvkeJ7koYcTBinUVIQCNVd6HWHqwLzo4XGT6PvnNzVdQlA 45zF0teKtWOwMSDeBPtLcGhN7cMj5U8= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-689-ORZ29QTLNFSiOT-xdaI3QA-1; Fri, 24 Nov 2023 08:26:57 -0500 X-MC-Unique: ORZ29QTLNFSiOT-xdaI3QA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BD5003C108C7; Fri, 24 Nov 2023 13:26:56 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 09AB92166B2A; Fri, 24 Nov 2023 13:26:53 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio Date: Fri, 24 Nov 2023 14:26:12 +0100 Message-ID: <20231124132626.235350-8-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 6FDC318002B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 6mysa5asecny5ef6e8ojsu564rq3sgdo X-HE-Tag: 1700832421-403653 X-HE-Meta: U2FsdGVkX1/mf5BHo3tMHS1snJHBc6yU5GJ69wWHeQYBVA0kXSd336JRNJ/wZ/Tik92wJXUs1Y6godoXid9aUucxOtdu/iYbaUE76nGGHUvJaWzyFunMK0f1MKctEMT96SddfMJR8NZb/0kDe4SrAIGSYLAyB9sIX3l7K6SfnpHEm62z0Vpo8D9MiUiSS7K0+9Y3ff3nZzV/FuqgORy8MlvdzfoYFi56U93SjIFis1za9BXv5Mg3uA4huj4mNkDo7ENdxrRaT05qzaTpbWq194Zn7eWeYxEosmzSloSFEup6wYiGaq2R4mER76pC5KKoV/6Akn5kCn/2JrH2mHRdjRDPr292QiSSuWAJQGAacm/J60aQlbJHqAVD9VKXk+Kbip22tcTDtCxPCLYMPnNMTJdu3sO6aQfVnWSag1gUEFnt9acC+237Wt77gW7/P4WzZhTujXqIq3G91wqodW3SWLs2tpQ7BeCwYLf6wPX04RiavAJGlG6NeDz880Ee8Jww378apZdHRxmt5DyWlcV1/O2emt+FMKcy/JNpWBBBrlR6tqnQ2nHEUCw4XRr9iOYbNq56gAa7eJ5311pzDzHgG5U1C7RdM6XCfi9aZcLqIsp2R7tJ8Bt+6la+Fkc25pjtXA1Vdw84gk3lKVgdaYbpeBLaniGb0TigzPXYGrYetFRo70Ge4g7CBcIQ8rBjzOOkZFZmenkbHbEK6xkB5a1FQWS0+WqRFckk0wNrQjyA7iqkIuoM3p6Gzz4WRnMoELgCWdk14hx+mPDpVcxAX4iGCJvoaBsNskf4B1Wu7HKiaD2Qp9TJi6+m6nPg82Fj65tNga1Pr0ZtNKgODxeOL2phmpfVB19y7NAZSDrFALbvyYOUkQoJxmDK/8FH6+jZ2dKeCdx1rZ3paQ7HtNq3qLKtRsHXhP9CDNr6gLHk060RZg1bQx9lF47zkrTpbDHLqP5TjX6QVFWMlhurU97w4rN ps7BWgJT aP1z1n9TMPcm5FZHCpMDsKRJNSHv6fg/EnneZnMTnCjDPutbkOp/nwiM3YRDL9ZISIpsiqolLnwdDHDRzNe7qFhGKIn3XY5Rlna+9pcGtGSVra/Uqvtz0Ms12c0PaxzdbSG1EAxFn/wXPf63AzaU+6X5lNDmB9vq8/2LLS8kmuK6wDJHSHWUTmmdwgbWbeqHXsC1VWAYDpWKyNEGDBxU2PfjnQvVIXwfAQWv/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In contrast to small folios and hugetlb folios, for a partially-mappable folio (i.e., THP), the total mapcount is often not expressive to identify whether such a folio is "mapped shared" or "mapped exclusively". For small folios and hugetlb folios that are always entirely mapped, the single mapcount is traditionally used for that purpose: is it 1? Then the folio is currently mapped exclusively; is it bigger than 1? Then it's mapped at least twice, and, therefore, considered "mapped shared". For a partially-mappable folio, each individual PTE/PMD/... mapping requires exactly one folio reference and one folio mapcount; folio_mapcount() > 1 does not imply that the folio is "mapped shared". While there are some obvious cases when we can conclude that partially-mappable folios are "mapped shared" -- see folio_mapped_shared() -- but it is currently not always possible to precisely tell whether a folio is "mapped exclusively". For implementing a precise variant of folio_mapped_shared() and for COW-reuse support of PTE-mapped anon THP, we need an efficient and precise way to identify "mapped shared" vs. "mapped exclusively". So how could we track if more than one MM is currently mapping a folio in its page tables? Having a list of MMs per folio, or even a counter for each MM for each folio is clearly not feasible. ... but what if we could play some fun math games to perform this tracking while requiring a handful of counters per folio, the exact number of counters depending on the size of the folio? 1. !!! Experimental Feature !!! =============================== We'll only support CONFIG_64BIT and !CONFIG_PREEMPT_RT (implied by THP support) for now. As we currently never get partially-mappable folios without CONFIG_TRANSPARENT_HUGEPAGE, let's limit to that to avoid unnecessary rmap ID allocations for setups without THP. 32bit support might be possible if there is demand, limiting it to 64k rmap IDs and reasonably sized folio sizes (e.g., <= order-15). Similarly, RT might be possible if there is ever real demand for it. The feature will be experimental initially, and, therefore, disabled as default. Once the involved math is considered solid, the implementation saw extended testing, and the performance implications are clear and have either been optimized (e.g., rmap batching) or mitigated (e.g., do we really have to perform this tracking for folios that are always assumed shared, like folios mapping executables or shared libraries? Is some hardware problematic?), we can consider always enabling it as default. 2. Per-mm rmap IDs ================== We'll have to assign each MM an rmap ID that is smaller than 16*1024*1024 on 64bit. Note that these are significantly more than the maximum number of processes we can possibly have in the system. There isn't really a difference between supporting 16M IDs and 2M/4M IDs. Due to the ID size limitation, we cannot use the MM pointer value and need a separate ID allocator. Maybe, we want to cache some rmap IDs per CPU? Maybe we want to improve the allocation path? We can add such improvements when deemed necessary. In the distant future, we might want to allocate rmap IDs for selected VMAs: for example, imagine a systemcall that does something like fork (COW-sharing of pages) within a process for a range of anonymous memory, ending up with a new VMA that wants a separate rmap ID. For now, per-MM is simple and sufficient. 3. Tracking Overview ==================== We derive a sequence of special sub-IDs from our MM rmap ID. Any time we map/unmap a part (e.g., PTE, PMD) of a partially-mappable folio to/from a MM, we: (1) Adjust (increment/decrement) the mapcount of the folio (2) Adjust (add/remove) the folio rmap values using the MM sub-IDs So the rmap values are always linked to the folio mapcount. Consequently, we know that a single rmap value in the folio is the sum of exactly #folio_mapcount() rmap sub-IDs. To identify whether a single MM is responsible for all folio_mapcount() mappings of a folio ("mapped exclusively") or whether other MMs are involved ("mapped shared"), we perform the following checks: (1) Do we have more mappings than the folio has pages? Then the folio is certainly shared. That is, when "folio_mapcount() > folio_nr_pages()" (2) For each rmap value X, does that rmap value folio->_rmap_valX correspond to "folio_mapcount() * sub-ID[X]" of the MM? Then the folio is certainly exclusive. Note that we only check that when "folio_mapcount() <= folio_nr_pages()". 4. Synchronization ================== We're using an atomic seqcount, stored in the folio, to allow for readers to detect concurrent (un)mapping, whereby they could obtain a wrong snapshot of the mapcount+rmap values and make a wrong decision. Further, the mapcount and all rmap values are updated using RMW atomics, to allow for concurrent updates. 5. sub-IDs ========== To achieve (2), we generate sub-IDs that have the following property, assuming that our folio has P=folio_nr_pages() pages. "2 * sub-ID" cannot be represented by the sum of any other *2* sub-IDs "3 * sub-ID" cannot be represented by the sum of any other *3* sub-IDs "4 * sub-ID" cannot be represented by the sum of any other *4* sub-IDs ... "P * sub-ID" cannot be represented by the sum of any other *P* sub-IDs The sub-IDs are generated in generations, whereby (1) Generation #0 is the number 0 (2) Generation #N takes all numbers from generations #0..#N-1 and adds (P + 1)^(N - 1), effectively doubling the number of sub-IDs Consequently, the smallest number S in gen #N is: S[#N] = (P + 1)^(N - 1) The largest number L in gen #N is: L[#N] = (P + 1)^(N - 1) + (P + 1)^(N - 2) + ... (P + 1)^0 + 0. -> [geometric sum with "P + 1 != 1"] = (1 - (P + 1)^N) / (1 - (P + 1)) = (1 - (P + 1)^N) / (-P) = ((P + 1)^N - 1) / P Example with P=4 (order-2 folio): Generation #0: 0 ------------------------ + (4 + 1)^0 = 1 Generation #1: 1 ------------------------ + (4 + 1)^1 = 5 Generation #2: 5 6 ------------------------ + (4 + 1)^2 = 25 Generation #3: 25 26 30 31 ------------------------ + (4 + 1)^3 = 125 [...] Intuitively, we are working with sub-counters that cannot overflow as long as we have <= P components. Let's consider the simple case of P=3, whereby our sub-counters are exactly 2-bit wide. Subid | Bits | Sub-counters -------------------------------- 0 | 0000 0000 | 0,0,0,0 1 | 0000 0001 | 0,0,0,1 4 | 0000 0100 | 0,0,1,0 5 | 0000 0101 | 0,0,1,1 16 | 0001 0000 | 0,1,0,0 17 | 0001 0001 | 0,1,0,1 20 | 0001 0100 | 0,1,1,0 21 | 0001 0101 | 0,1,1,1 64 | 0100 0000 | 1,0,0,0 65 | 0100 0001 | 1,0,0,1 68 | 0100 0100 | 1,0,1,0 69 | 0100 0101 | 1,0,1,1 80 | 0101 0100 | 1,1,0,0 81 | 0101 0001 | 1,1,0,1 84 | 0101 0100 | 1,1,1,0 85 | 0101 0101 | 1,1,1,1 So if we, say, have: 3 * 17 = 0,3,0,3 how could we possible get to that number by using 3 other subids? It's impossible, because the sub-counters won't overflow as long as we stay <= 3. Interesting side note that might come in handy at some point: we also cannot get to 0,3,0,3 by using 1 or 2 other subids. But, we could get to 1 * 17 = 0,1,0,1 by using 2 subids (16 and 1) or similarly to 2 * 17 = 0,2,0,2 by using 4 subids (2x16 and 2x1). Looks like we cannot get to X * subid using any 1..X other subids. Note 1: we'll add the actual detection logic used to be used by folio_mapped_shared() and wp_can_reuse_anon_folio() separately. Note 2: we might want to use that infrastructure for hugetlb as well in the future: there is nothing THP-specific about rmap ID handling. Signed-off-by: David Hildenbrand --- include/linux/mm_types.h | 58 +++++++ include/linux/rmap.h | 126 +++++++++++++- kernel/fork.c | 26 +++ mm/Kconfig | 21 +++ mm/Makefile | 1 + mm/huge_memory.c | 16 +- mm/init-mm.c | 4 + mm/page_alloc.c | 9 + mm/rmap_id.c | 351 +++++++++++++++++++++++++++++++++++++++ 9 files changed, 604 insertions(+), 8 deletions(-) create mode 100644 mm/rmap_id.c diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 99b84b4797b9..75305c57ef64 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -273,6 +274,14 @@ typedef struct { * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head(). * @_deferred_list: Folios to be split under memory pressure. + * @_rmap_atomic_seqcount: Seqcount protecting _total_mapcount and _rmapX. + * Does not apply to hugetlb. + * @_rmap_val0 Do not use outside of rmap code. Does not apply to hugetlb. + * @_rmap_val1 Do not use outside of rmap code. Does not apply to hugetlb. + * @_rmap_val2 Do not use outside of rmap code. Does not apply to hugetlb. + * @_rmap_val3 Do not use outside of rmap code. Does not apply to hugetlb. + * @_rmap_val4 Do not use outside of rmap code. Does not apply to hugetlb. + * @_rmap_val5 Do not use outside of rmap code. Does not apply to hugetlb. * * A folio is a physically, virtually and logically contiguous set * of bytes. It is a power-of-two in size, and it is aligned to that @@ -331,6 +340,9 @@ struct folio { atomic_t _pincount; #ifdef CONFIG_64BIT unsigned int _folio_nr_pages; +#ifdef CONFIG_RMAP_ID + raw_atomic_seqcount_t _rmap_atomic_seqcount; +#endif /* CONFIG_RMAP_ID */ #endif /* private: the union with struct page is transitional */ }; @@ -356,6 +368,34 @@ struct folio { }; struct page __page_2; }; + union { + struct { + unsigned long _flags_3; + unsigned long _head_3; + /* public: */ +#ifdef CONFIG_RMAP_ID + atomic_long_t _rmap_val0; + atomic_long_t _rmap_val1; + atomic_long_t _rmap_val2; + atomic_long_t _rmap_val3; +#endif /* CONFIG_RMAP_ID */ + /* private: the union with struct page is transitional */ + }; + struct page __page_3; + }; + union { + struct { + unsigned long _flags_4; + unsigned long _head_4; + /* public: */ +#ifdef CONFIG_RMAP_ID + atomic_long_t _rmap_val4; + atomic_long_t _rmap_val5; +#endif /* CONFIG_RMAP_ID */ + /* private: the union with struct page is transitional */ + }; + struct page __page_4; + }; }; #define FOLIO_MATCH(pg, fl) \ @@ -392,6 +432,20 @@ FOLIO_MATCH(compound_head, _head_2); FOLIO_MATCH(flags, _flags_2a); FOLIO_MATCH(compound_head, _head_2a); #undef FOLIO_MATCH +#define FOLIO_MATCH(pg, fl) \ + static_assert(offsetof(struct folio, fl) == \ + offsetof(struct page, pg) + 3 * sizeof(struct page)) +FOLIO_MATCH(flags, _flags_3); +FOLIO_MATCH(compound_head, _head_3); +#undef FOLIO_MATCH +#undef FOLIO_MATCH +#define FOLIO_MATCH(pg, fl) \ + static_assert(offsetof(struct folio, fl) == \ + offsetof(struct page, pg) + 4 * sizeof(struct page)) +FOLIO_MATCH(flags, _flags_4); +FOLIO_MATCH(compound_head, _head_4); +#undef FOLIO_MATCH + /** * struct ptdesc - Memory descriptor for page tables. @@ -975,6 +1029,10 @@ struct mm_struct { #endif } lru_gen; #endif /* CONFIG_LRU_GEN */ + +#ifdef CONFIG_RMAP_ID + int mm_rmap_id; +#endif /* CONFIG_RMAP_ID */ } __randomize_layout; /* diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9d5c2ed6ced5..19c9dc3216df 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -168,6 +168,116 @@ static inline void anon_vma_merge(struct vm_area_struct *vma, struct anon_vma *folio_get_anon_vma(struct folio *folio); +#ifdef CONFIG_RMAP_ID +/* + * For init_mm and friends, we don't actually expect to ever rmap pages. So + * we use a reserved dummy ID that we'll never hand out the normal way. + */ +#define RMAP_ID_DUMMY 0 +#define RMAP_ID_MIN (RMAP_ID_DUMMY + 1) +#define RMAP_ID_MAX (16 * 1024 * 1024u - 1) + +void free_rmap_id(int id); +int alloc_rmap_id(void); + +#define RMAP_SUBID_4_MAX_ORDER 10 +#define RMAP_SUBID_5_MIN_ORDER 11 +#define RMAP_SUBID_5_MAX_ORDER 12 +#define RMAP_SUBID_6_MIN_ORDER 13 +#define RMAP_SUBID_6_MAX_ORDER 15 + +static inline void __folio_prep_large_rmap(struct folio *folio) +{ + const unsigned int order = folio_order(folio); + + raw_seqcount_init(&folio->_rmap_atomic_seqcount); + switch (order) { +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + case RMAP_SUBID_6_MIN_ORDER ... RMAP_SUBID_6_MAX_ORDER: + atomic_long_set(&folio->_rmap_val5, 0); + fallthrough; +#endif +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + case RMAP_SUBID_5_MIN_ORDER ... RMAP_SUBID_5_MAX_ORDER: + atomic_long_set(&folio->_rmap_val4, 0); + fallthrough; +#endif + default: + atomic_long_set(&folio->_rmap_val3, 0); + atomic_long_set(&folio->_rmap_val2, 0); + atomic_long_set(&folio->_rmap_val1, 0); + atomic_long_set(&folio->_rmap_val0, 0); + break; + } +} + +static inline void __folio_undo_large_rmap(struct folio *folio) +{ +#ifdef CONFIG_DEBUG_VM + const unsigned int order = folio_order(folio); + + switch (order) { +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + case RMAP_SUBID_6_MIN_ORDER ... RMAP_SUBID_6_MAX_ORDER: + VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val5)); + fallthrough; +#endif +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + case RMAP_SUBID_5_MIN_ORDER ... RMAP_SUBID_5_MAX_ORDER: + VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val4)); + fallthrough; +#endif + default: + VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val3)); + VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val2)); + VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val1)); + VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val0)); + break; + } +#endif +} + +static inline void __folio_write_large_rmap_begin(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount); +} + +static inline void __folio_write_large_rmap_end(struct folio *folio) +{ + raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount); +} + +void __folio_set_large_rmap_val(struct folio *folio, int count, + struct mm_struct *mm); +void __folio_add_large_rmap_val(struct folio *folio, int count, + struct mm_struct *mm); +#else +static inline void __folio_prep_large_rmap(struct folio *folio) +{ +} +static inline void __folio_undo_large_rmap(struct folio *folio) +{ +} +static inline void __folio_write_large_rmap_begin(struct folio *folio) +{ + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); +} +static inline void __folio_write_large_rmap_end(struct folio *folio) +{ +} +static inline void __folio_set_large_rmap_val(struct folio *folio, int count, + struct mm_struct *mm) +{ +} +static inline void __folio_add_large_rmap_val(struct folio *folio, int count, + struct mm_struct *mm) +{ +} +#endif /* CONFIG_RMAP_ID */ + static inline void folio_set_large_mapcount(struct folio *folio, int count, struct vm_area_struct *vma) { @@ -175,30 +285,34 @@ static inline void folio_set_large_mapcount(struct folio *folio, VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); /* increment count (starts at -1) */ atomic_set(&folio->_total_mapcount, count - 1); + __folio_set_large_rmap_val(folio, count, vma->vm_mm); } static inline void folio_inc_large_mapcount(struct folio *folio, struct vm_area_struct *vma) { - VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + __folio_write_large_rmap_begin(folio); atomic_inc(&folio->_total_mapcount); + __folio_add_large_rmap_val(folio, 1, vma->vm_mm); + __folio_write_large_rmap_end(folio); } static inline void folio_add_large_mapcount(struct folio *folio, int count, struct vm_area_struct *vma) { - VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + __folio_write_large_rmap_begin(folio); atomic_add(count, &folio->_total_mapcount); + __folio_add_large_rmap_val(folio, count, vma->vm_mm); + __folio_write_large_rmap_end(folio); } static inline void folio_dec_large_mapcount(struct folio *folio, struct vm_area_struct *vma) { - VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); - VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + __folio_write_large_rmap_begin(folio); atomic_dec(&folio->_total_mapcount); + __folio_add_large_rmap_val(folio, -1, vma->vm_mm); + __folio_write_large_rmap_end(folio); } /* RMAP flags, currently only relevant for some anon rmap operations. */ diff --git a/kernel/fork.c b/kernel/fork.c index 10917c3e1f03..773c93613ca2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -814,6 +814,26 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) #define mm_free_pgd(mm) #endif /* CONFIG_MMU */ +#ifdef CONFIG_RMAP_ID +static inline int mm_alloc_rmap_id(struct mm_struct *mm) +{ + int id = alloc_rmap_id(); + + if (id < 0) + return id; + mm->mm_rmap_id = id; + return 0; +} + +static inline void mm_free_rmap_id(struct mm_struct *mm) +{ + free_rmap_id(mm->mm_rmap_id); +} +#else +#define mm_alloc_rmap_id(mm) (0) +#define mm_free_rmap_id(mm) +#endif /* CONFIG_RMAP_ID */ + static void check_mm(struct mm_struct *mm) { int i; @@ -917,6 +937,7 @@ void __mmdrop(struct mm_struct *mm) WARN_ON_ONCE(mm == current->active_mm); mm_free_pgd(mm); + mm_free_rmap_id(mm); destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); check_mm(mm); @@ -1298,6 +1319,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, if (mm_alloc_pgd(mm)) goto fail_nopgd; + if (mm_alloc_rmap_id(mm)) + goto fail_normapid; + if (init_new_context(p, mm)) goto fail_nocontext; @@ -1317,6 +1341,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, fail_cid: destroy_context(mm); fail_nocontext: + mm_free_rmap_id(mm); +fail_normapid: mm_free_pgd(mm); fail_nopgd: free_mm(mm); diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..bb0b7b885ada 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -861,6 +861,27 @@ choice benefit. endchoice +menuconfig RMAP_ID + bool "Rmap ID tracking (EXPERIMENTAL)" + depends on TRANSPARENT_HUGEPAGE && 64BIT + help + Use per-MM rmap IDs and the unleashed power of math to track + whether partially-mappable hugepages (i.e., THPs for now) are + "mapped shared" or "mapped exclusively". + + This tracking allow for efficiently and precisely detecting + whether a PTE-mapped THP is mapped by a single process + ("mapped exclusively") or mapped by multiple ones ("mapped + shared"), with the cost of additional tracking when (un)mapping + (parts of) such a THP. + + If this configuration is not enabled, an heuristic is used + instead that might result in false "mapped exclusively" + detection; some features relying on this information might + operate slightly imprecise (e.g., MADV_PAGEOUT succeeds although + it should fail) or might not be available at all (e.g., + Copy-on-Write reuse support). + config THP_SWAP def_bool y depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP && 64BIT diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..b0cf2563f33a 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -138,3 +138,4 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o +obj-$(CONFIG_RMAP_ID) += rmap_id.o diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 51a878efca0e..0228b04c4053 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -614,6 +614,7 @@ void folio_prep_large_rmappable(struct folio *folio) { VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio); INIT_LIST_HEAD(&folio->_deferred_list); + __folio_prep_large_rmap(folio); folio_set_large_rmappable(folio); } @@ -2478,8 +2479,8 @@ static void __split_huge_page_tail(struct folio *folio, int tail, (1L << PG_dirty) | LRU_GEN_MASK | LRU_REFS_MASK)); - /* ->mapping in first and second tail page is replaced by other uses */ - VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, + /* ->mapping in some tail page is replaced by other uses */ + VM_BUG_ON_PAGE(tail > 4 && page_tail->mapping != TAIL_MAPPING, page_tail); page_tail->mapping = head->mapping; page_tail->index = head->index + tail; @@ -2550,6 +2551,16 @@ static void __split_huge_page(struct page *page, struct list_head *list, ClearPageHasHWPoisoned(head); +#ifdef CONFIG_RMAP_ID + /* + * Make sure folio->_rmap_atomic_seqcount, which overlays + * tail->private, is 0. All other folio->_rmap_valX should be 0 + * after unmapping the folio. + */ + if (likely(nr >= 4)) + raw_seqcount_init(&folio->_rmap_atomic_seqcount); +#endif /* CONFIG_RMAP_ID */ + for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(folio, i, lruvec, list); /* Some pages can be beyond EOF: drop them from page cache */ @@ -2809,6 +2820,7 @@ void folio_undo_large_rmappable(struct folio *folio) struct deferred_split *ds_queue; unsigned long flags; + __folio_undo_large_rmap(folio); /* * At this point, there is no one trying to add the folio to * deferred_list. If folio is not in deferred_list, it's safe diff --git a/mm/init-mm.c b/mm/init-mm.c index cfd367822cdd..8890271b50c6 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -46,6 +47,9 @@ struct mm_struct init_mm = { .cpu_bitmap = CPU_BITS_NONE, #ifdef CONFIG_IOMMU_SVA .pasid = IOMMU_PASID_INVALID, +#endif +#ifdef CONFIG_RMAP_ID + .mm_rmap_id = RMAP_ID_DUMMY, #endif INIT_MM_CONTEXT(init_mm) }; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aad45758c0c7..c1dd039801e7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1007,6 +1007,15 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) * deferred_list.next -- ignore value. */ break; +#ifdef CONFIG_RMAP_ID + case 3: + case 4: + /* + * the third and fourth tail page: ->mapping may be + * used to store RMAP values for RMAP ID tracking. + */ + break; +#endif /* CONFIG_RMAP_ID */ default: if (page->mapping != TAIL_MAPPING) { bad_page(page, "corrupted mapping in tail page"); diff --git a/mm/rmap_id.c b/mm/rmap_id.c new file mode 100644 index 000000000000..e66b0f5aea2d --- /dev/null +++ b/mm/rmap_id.c @@ -0,0 +1,351 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * rmap ID tracking for precise "mapped shared" vs. "mapped exclusively" + * detection of partially-mappable folios (e.g., PTE-mapped THP). + * + * Copyright Red Hat, Inc. 2023 + * + * Author(s): David Hildenbrand + */ + +#include +#include +#include + +#include "internal.h" + +static DEFINE_SPINLOCK(rmap_id_lock); +static DEFINE_IDA(rmap_ida); + +/* For now we only expect folios from the buddy, not hugetlb folios. */ +#if MAX_ORDER > RMAP_SUBID_6_MAX_ORDER +#error "rmap ID tracking does not support such large MAX_ORDER" +#endif + +/* + * We assign each MM a unique rmap ID and derive from it a sequence of + * special sub-IDs. We add/remove these sub-IDs to/from the corresponding + * folio rmap values (folio->rmap_valX) whenever (un)mapping (parts of) a + * partially mappable folio. + * + * With 24bit rmap IDs, and a folio size that is compatible with 4 + * rmap values (more below), we calculate the sub-ID sequence like this: + * + * rmap ID : | 3 3 3 3 3 3 | 2 2 2 2 2 2 | 1 1 1 1 1 1 | 0 0 0 0 0 0 | + * sub-ID IDX : | IDX #3 | IDX #2 | IDX #1 | IDX #0 | + * + * sub-IDs : [ subid_4(#3), subid_4(#2), subid_4(#1), subid_4(#0) ] + * rmap value : [ _rmap_val3, _rmap_val2, _rmap_val1, _rmap_val0 ] + * + * Any time we map/unmap a part (e.g., PTE, PMD) of a partially-mappable + * folio to/from a MM, we: + * (1) Adjust (increment/decrement) the mapcount of the folio + * (2) Adjust (add/remove) the folio rmap values using the MM sub-IDs + * + * So the rmap values are always linked to the folio mapcount. + * Consequently, we know that a single rmap value in the folio is the sum + * of exactly #folio_mapcount() rmap sub-IDs. As one example, if the folio + * is completely unmapped, the rmap values must be 0. As another example, + * if the folio is mapped exactly once, the rmap values correspond to the + * MM sub-IDs. + * + * To identify whether a given MM is responsible for all #folio_mapcount() + * mappings of a folio ("mapped exclusively") or whether other MMs are + * involved ("mapped shared"), we perform the following checks: + * (1) Do we have more mappings than the folio has pages? Then the folio + * is mapped shared. So when "folio_mapcount() > folio_nr_pages()". + * (2) Do the rmap values corresond to "#folio_mapcount() * sub-IDs" of + * the MM? Then the folio is mapped exclusive. + * + * To achieve (2), we generate sub-IDs that have the following property, + * assuming that our folio has P=folio_nr_pages() pages. + * "2 * sub-ID" cannot be represented by the sum of any other *2* sub-IDs + * "3 * sub-ID" cannot be represented by the sum of any other *3* sub-IDs + * "4 * sub-ID" cannot be represented by the sum of any other *4* sub-IDs + * ... + * "P * sub-ID" cannot be represented by the sum of any other *P* sub-IDs + * + * Further, we want "P * sub-ID" (the maximum number we will ever look at) + * to not overflow. If we overflow with " > P" mappings, we don't care as + * we won't be looking at the numbers until theya re fully expressive + * again. + * + * Consequently, to not overflow 64bit values with "P * sub-ID", folios + * with large P require more rmap values (we cannot generate that many sub + * IDs), whereby folios with smaller P can get away with less rmap values + * (we can generate more sub-IDs). + * + * The sub-IDs are generated in generations, whereby + * (1) Generation #0 is the number 0 + * (2) Generation #N takes all numbers from generations #0..#N-1 and adds + * (P + 1)^(N - 1), effectively doubling the number of sub-IDs + * + * Note: a PMD-sized THP can, for a short time while PTE-mapping it, be + * mapped using PTEs and a single PMD, resulting in "P + 1" mappings. + * For now, we don't consider this case, as we are ususally not + * looking at such folios while they being remapped, because the + * involved page tables are locked and stop any page table walkers. + */ + +/* + * With 1024 (order-10) possible exclusive mappings per folio, we can have 64 + * sub-IDs per 64bit value. + * + * With 4 such 64bit values, we can support 64^4 == 16M IDs. + */ +static const unsigned long rmap_subids_4[64] = { + 0ul, + 1ul, + 1025ul, + 1026ul, + 1050625ul, + 1050626ul, + 1051650ul, + 1051651ul, + 1076890625ul, + 1076890626ul, + 1076891650ul, + 1076891651ul, + 1077941250ul, + 1077941251ul, + 1077942275ul, + 1077942276ul, + 1103812890625ul, + 1103812890626ul, + 1103812891650ul, + 1103812891651ul, + 1103813941250ul, + 1103813941251ul, + 1103813942275ul, + 1103813942276ul, + 1104889781250ul, + 1104889781251ul, + 1104889782275ul, + 1104889782276ul, + 1104890831875ul, + 1104890831876ul, + 1104890832900ul, + 1104890832901ul, + 1131408212890625ul, + 1131408212890626ul, + 1131408212891650ul, + 1131408212891651ul, + 1131408213941250ul, + 1131408213941251ul, + 1131408213942275ul, + 1131408213942276ul, + 1131409289781250ul, + 1131409289781251ul, + 1131409289782275ul, + 1131409289782276ul, + 1131409290831875ul, + 1131409290831876ul, + 1131409290832900ul, + 1131409290832901ul, + 1132512025781250ul, + 1132512025781251ul, + 1132512025782275ul, + 1132512025782276ul, + 1132512026831875ul, + 1132512026831876ul, + 1132512026832900ul, + 1132512026832901ul, + 1132513102671875ul, + 1132513102671876ul, + 1132513102672900ul, + 1132513102672901ul, + 1132513103722500ul, + 1132513103722501ul, + 1132513103723525ul, + 1132513103723526ul, +}; + +static unsigned long get_rmap_subid_4(struct mm_struct *mm, int nr) +{ + const unsigned int rmap_id = mm->mm_rmap_id; + + VM_WARN_ON_ONCE(rmap_id < RMAP_ID_MIN || rmap_id > RMAP_ID_MAX || nr > 3); + return rmap_subids_4[(rmap_id >> (nr * 6)) & 0x3f]; +} + +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER +/* + * With 4096 (order-12) possible exclusive mappings per folio, we can have + * 32 sub-IDs per 64bit value. + * + * With 5 such 64bit values, we can support 32^5 > 16M IDs. + */ +static const unsigned long rmap_subids_5[32] = { + 0ul, + 1ul, + 4097ul, + 4098ul, + 16785409ul, + 16785410ul, + 16789506ul, + 16789507ul, + 68769820673ul, + 68769820674ul, + 68769824770ul, + 68769824771ul, + 68786606082ul, + 68786606083ul, + 68786610179ul, + 68786610180ul, + 281749955297281ul, + 281749955297282ul, + 281749955301378ul, + 281749955301379ul, + 281749972082690ul, + 281749972082691ul, + 281749972086787ul, + 281749972086788ul, + 281818725117954ul, + 281818725117955ul, + 281818725122051ul, + 281818725122052ul, + 281818741903363ul, + 281818741903364ul, + 281818741907460ul, + 281818741907461ul, +}; + +static unsigned long get_rmap_subid_5(struct mm_struct *mm, int nr) +{ + const unsigned int rmap_id = mm->mm_rmap_id; + + VM_WARN_ON_ONCE(rmap_id < RMAP_ID_MIN || rmap_id > RMAP_ID_MAX || nr > 4); + return rmap_subids_5[(rmap_id >> (nr * 5)) & 0x1f]; +} +#endif + +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER +/* + * With 32768 (order-15) possible exclusive mappings per folio, we can have + * 16 sub-IDs per 64bit value. + * + * With 6 such 64bit values, we can support 8^6 == 16M IDs. + */ +static const unsigned long rmap_subids_6[16] = { + 0ul, + 1ul, + 32769ul, + 32770ul, + 1073807361ul, + 1073807362ul, + 1073840130ul, + 1073840131ul, + 35187593412609ul, + 35187593412610ul, + 35187593445378ul, + 35187593445379ul, + 35188667219970ul, + 35188667219971ul, + 35188667252739ul, + 35188667252740ul, +}; + +static unsigned long get_rmap_subid_6(struct mm_struct *mm, int nr) +{ + const unsigned int rmap_id = mm->mm_rmap_id; + + VM_WARN_ON_ONCE(rmap_id < RMAP_ID_MIN || rmap_id > RMAP_ID_MAX || nr > 15); + return rmap_subids_6[(rmap_id >> (nr * 4)) & 0xf]; +} +#endif + +void __folio_set_large_rmap_val(struct folio *folio, int count, + struct mm_struct *mm) +{ + const unsigned int order = folio_order(folio); + + switch (order) { +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + case RMAP_SUBID_6_MIN_ORDER ... RMAP_SUBID_6_MAX_ORDER: + atomic_long_set(&folio->_rmap_val0, get_rmap_subid_6(mm, 0) * count); + atomic_long_set(&folio->_rmap_val1, get_rmap_subid_6(mm, 1) * count); + atomic_long_set(&folio->_rmap_val2, get_rmap_subid_6(mm, 2) * count); + atomic_long_set(&folio->_rmap_val3, get_rmap_subid_6(mm, 3) * count); + atomic_long_set(&folio->_rmap_val4, get_rmap_subid_6(mm, 4) * count); + atomic_long_set(&folio->_rmap_val5, get_rmap_subid_6(mm, 5) * count); + break; +#endif +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + case RMAP_SUBID_5_MIN_ORDER ... RMAP_SUBID_5_MAX_ORDER: + atomic_long_set(&folio->_rmap_val0, get_rmap_subid_5(mm, 0) * count); + atomic_long_set(&folio->_rmap_val1, get_rmap_subid_5(mm, 1) * count); + atomic_long_set(&folio->_rmap_val2, get_rmap_subid_5(mm, 2) * count); + atomic_long_set(&folio->_rmap_val3, get_rmap_subid_5(mm, 3) * count); + atomic_long_set(&folio->_rmap_val4, get_rmap_subid_5(mm, 4) * count); + break; +#endif + default: + atomic_long_set(&folio->_rmap_val0, get_rmap_subid_4(mm, 0) * count); + atomic_long_set(&folio->_rmap_val1, get_rmap_subid_4(mm, 1) * count); + atomic_long_set(&folio->_rmap_val2, get_rmap_subid_4(mm, 2) * count); + atomic_long_set(&folio->_rmap_val3, get_rmap_subid_4(mm, 3) * count); + break; + } +} + +void __folio_add_large_rmap_val(struct folio *folio, int count, + struct mm_struct *mm) +{ + const unsigned int order = folio_order(folio); + + switch (order) { +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + case RMAP_SUBID_6_MIN_ORDER ... RMAP_SUBID_6_MAX_ORDER: + atomic_long_add(get_rmap_subid_6(mm, 0) * count, &folio->_rmap_val0); + atomic_long_add(get_rmap_subid_6(mm, 1) * count, &folio->_rmap_val1); + atomic_long_add(get_rmap_subid_6(mm, 2) * count, &folio->_rmap_val2); + atomic_long_add(get_rmap_subid_6(mm, 3) * count, &folio->_rmap_val3); + atomic_long_add(get_rmap_subid_6(mm, 4) * count, &folio->_rmap_val4); + atomic_long_add(get_rmap_subid_6(mm, 5) * count, &folio->_rmap_val5); + break; +#endif +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + case RMAP_SUBID_5_MIN_ORDER ... RMAP_SUBID_5_MAX_ORDER: + atomic_long_add(get_rmap_subid_5(mm, 0) * count, &folio->_rmap_val0); + atomic_long_add(get_rmap_subid_5(mm, 1) * count, &folio->_rmap_val1); + atomic_long_add(get_rmap_subid_5(mm, 2) * count, &folio->_rmap_val2); + atomic_long_add(get_rmap_subid_5(mm, 3) * count, &folio->_rmap_val3); + atomic_long_add(get_rmap_subid_5(mm, 4) * count, &folio->_rmap_val4); + break; +#endif + default: + atomic_long_add(get_rmap_subid_4(mm, 0) * count, &folio->_rmap_val0); + atomic_long_add(get_rmap_subid_4(mm, 1) * count, &folio->_rmap_val1); + atomic_long_add(get_rmap_subid_4(mm, 2) * count, &folio->_rmap_val2); + atomic_long_add(get_rmap_subid_4(mm, 3) * count, &folio->_rmap_val3); + break; + } +} + +int alloc_rmap_id(void) +{ + int id; + + /* + * We cannot use a mutex, because free_rmap_id() might get called + * when we are not allowed to sleep. + * + * TODO: do we need something like idr_preload()? + */ + spin_lock(&rmap_id_lock); + id = ida_alloc_range(&rmap_ida, RMAP_ID_MIN, RMAP_ID_MAX, GFP_ATOMIC); + spin_unlock(&rmap_id_lock); + + return id; +} + +void free_rmap_id(int id) +{ + if (id == RMAP_ID_DUMMY) + return; + if (WARN_ON_ONCE(id < RMAP_ID_MIN || id > RMAP_ID_MAX)) + return; + spin_lock(&rmap_id_lock); + ida_free(&rmap_ida, id); + spin_unlock(&rmap_id_lock); +} From patchwork Fri Nov 24 13:26:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7D07C624B4 for ; Fri, 24 Nov 2023 13:27:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98E898D007C; Fri, 24 Nov 2023 08:27:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 916B98D006E; Fri, 24 Nov 2023 08:27:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76C228D007C; Fri, 24 Nov 2023 08:27:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 58B9A8D006E for ; Fri, 24 Nov 2023 08:27:05 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2E7E340482 for ; Fri, 24 Nov 2023 13:27:05 +0000 (UTC) X-FDA: 81492923610.05.AD3C32E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 79DF91A0022 for ; Fri, 24 Nov 2023 13:27:03 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hLUcy/My"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832423; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hWu/0kIJvur5UANFtettn6VuzgBBSt2XXjypdyFb6aY=; b=YJMrw0tQ73mPoqKpu0VA7pzTHCeFNTvuEUwU8UYmVhp/dGr6RZNAtRglIiKXAhENGC2tyI ulKvIKYHo+8G3OikpGdTwvJp1iNZ7QRaubBypJz///2x+79t6etoFApDkCQ8lAiL4C1EAK 4ZNAbePVjrZstg8lvGWPt183WnTV3a8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hLUcy/My"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832423; a=rsa-sha256; cv=none; b=8bJwzGSPBcsuF5MvMCdZuDG+wPNfr/KmqS8CRtyt1tT4Wx2ZB0c7SCds/g+vH6YMKqvw1O 8bGNJiLjHONnmtJuQD4PU6f3zMuw2RfZ3cGd2Fv68a/OQ41/m5fAk0tMWR34GaoyxgvLHY 8KPcsCJuKc2/aGCv39bQ+0Nv7rtOfqg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832422; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hWu/0kIJvur5UANFtettn6VuzgBBSt2XXjypdyFb6aY=; b=hLUcy/MydxovAxTKzzFOKt6Ez64RzW3C2AyX027+G+CPQxQZ7t9HJXT3kh/Au40sbx9MQL O/CmgI2erkJQuqHZJ4bCBdbEnod3bNVCgRZTOGZtDkEZgEWenXSYfOcNfDCtU/ypHh3Qpo CubW1/PhrVz4eIK8TT1rivjZIndikJQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-351-57hRR764P9m-_xbjczRUpw-1; Fri, 24 Nov 2023 08:27:01 -0500 X-MC-Unique: 57hRR764P9m-_xbjczRUpw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DCBB085A5BD; Fri, 24 Nov 2023 13:26:59 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 003162166B2A; Fri, 24 Nov 2023 13:26:56 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 08/20] mm: pass MM to folio_mapped_shared() Date: Fri, 24 Nov 2023 14:26:13 +0100 Message-ID: <20231124132626.235350-9-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 79DF91A0022 X-Stat-Signature: 9gik88xd6kad7bbe8zq48pkadfq1ukkz X-Rspam-User: X-HE-Tag: 1700832423-382015 X-HE-Meta: U2FsdGVkX1/Bk5C8xNa/gMweCcJCr3NdkeFqAK25m+gdtwZqAKvElq/cxe50CCsqa4AQblCmZmgQg0G/oISIAo3n4jxkWNPO4oRwc2XUhFRwl3bVebXTVP1wY78rveoPwpCwnSKu2A6bpKBauwn9ays6kIX63E/6vgpPKnlaVqxCW+QVx/0QKYBEVSbtu50fLWDTqcEVaNRBQdRWRqjdGjruqwmXQhJTNSPwe1S0+NH2J/qUBlcaSo0pwgAv0ItsObSza/KaVhlQtP6hBJwp/Qxj15e7mr1TTsxlrbLyzGB7nEU+rVAEVm65fQA81KasWBIVshyGmFbVAt0l39DGa7JBW0bhCMcWBthLVaxyzWVZxXPgYVmL8KL4O8MDvl56ihAZMMYzWsThyF8mCOb/H4YMZs4/Q/hPNLJI2opHE7Pp8HVHZQ8efhzMMYFojfl18vZNa9epGJKxow5VH0p4upfCmg/Bce7P2FodsrsTJT8cP3vLG5lI+lZseQWLbF2nYiYTo5eWDmu6FOgeiccum1hLk52UGvf0Ln/5TWYfKREC/Ba0oP0n0hjTjcCQvdTcddXmI8YQkOjBfsz748YSGPRWFn459MZKv5ncmxSFXXoMEKhOiAqrh+4Xs+k7PgB8vYkyWNwBlsQ6xg+IapHs1Gu+FtBId7aSPLDV7uUf2VYEOZxcG+YCV5Wr2e1PAHGYDtFG2gqZ9VRIAr3eaI0jlOF9/0Bk3oniueYZH6Z/VvJpBk4VH04LHvYMdkwFbs/b+4iEQnkk3t9pL3/cXIQBEVYyOmaEi0JcI7UisZYNsjH9UaXeClamb+PqdL4shdxzPC8exUNXG1lQQhpqch1SmrpiXe01SaQbOrpDvg/Ezw32ynjyFEHsxd2ro+06vhmlqU3fboYAVOhRWLIulvcMJ/yivRMTz/UACDojRh60u4shAZ+vDsntbYj914I2E/qq3+qGaXH4Tevc7x/McX8 /yhbbcfn r3sB+mBep4LrhC4ZaW6Won+b1zmW2dvbWqmVKNQNQnDdhpiHhy3G6a3iE2ScB5PYBmz3d3HcAhXGiE7CBp8FWqnXIpPht+fd1EUoivlpZ+YfWNJgx9EJ4spJliS0wrLupyy1sENe243+OPcMbSDUspMmUpgMwONhS8ceR0IoA5FIsyWYCV1mtVkTQC/Oqmk137ONGNVsrXYL0R9Yj9K2GG9ccVp3Xo8hJH5bt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We'll need the MM next to make a better decision regarding partially-mappable folios (e.g., PTE-mapped THP) using per-MM rmap IDs. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 4 +++- mm/huge_memory.c | 2 +- mm/madvise.c | 6 +++--- mm/memory.c | 2 +- mm/mempolicy.c | 14 +++++++------- mm/migrate.c | 2 +- 6 files changed, 16 insertions(+), 14 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 17dac913f367..765e688690f1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2117,6 +2117,7 @@ static inline size_t folio_size(struct folio *folio) * folio_mapped_shared - Report if a folio is certainly mapped by * multiple entities in their page tables * @folio: The folio. + * @mm: The mm the folio is mapped into. * * This function checks if a folio is certainly *currently* mapped by * multiple entities in their page table ("mapped shared") or if the folio @@ -2153,7 +2154,8 @@ static inline size_t folio_size(struct folio *folio) * * Return: Whether the folio is certainly mapped by multiple entities. */ -static inline bool folio_mapped_shared(struct folio *folio) +static inline bool folio_mapped_shared(struct folio *folio, + struct mm_struct *mm) { unsigned int total_mapcount; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0228b04c4053..fd7251923557 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1639,7 +1639,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * If other processes are mapping this folio, we couldn't discard * the folio unless they all do MADV_FREE so let's skip the folio. */ - if (folio_mapped_shared(folio)) + if (folio_mapped_shared(folio, mm)) goto out; if (!folio_trylock(folio)) diff --git a/mm/madvise.c b/mm/madvise.c index 1a82867c8c2e..e3e4f3ea5f6d 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -365,7 +365,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, folio = pfn_folio(pmd_pfn(orig_pmd)); /* Do not interfere with other mappings of this folio */ - if (folio_mapped_shared(folio)) + if (folio_mapped_shared(folio, mm)) goto huge_unlock; if (pageout_anon_only_filter && !folio_test_anon(folio)) @@ -441,7 +441,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (folio_test_large(folio)) { int err; - if (folio_mapped_shared(folio)) + if (folio_mapped_shared(folio, mm)) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break; @@ -665,7 +665,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (folio_test_large(folio)) { int err; - if (folio_mapped_shared(folio)) + if (folio_mapped_shared(folio, mm)) break; if (!folio_trylock(folio)) break; diff --git a/mm/memory.c b/mm/memory.c index 14416d05e1b6..5048d58d6174 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4848,7 +4848,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) * Flag if the folio is shared between multiple address spaces. This * is later used when determining whether to group tasks together */ - if (folio_mapped_shared(folio) && (vma->vm_flags & VM_SHARED)) + if (folio_mapped_shared(folio, vma->vm_mm) && (vma->vm_flags & VM_SHARED)) flags |= TNF_SHARED; nid = folio_nid(folio); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 0492113497cc..bd0243da26bf 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -418,7 +418,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { }; static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, - unsigned long flags); + struct mm_struct *mm, unsigned long flags); static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol, pgoff_t ilx, int *nid); @@ -481,7 +481,7 @@ static void queue_folios_pmd(pmd_t *pmd, struct mm_walk *walk) return; if (!(qp->flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(walk->vma) || - !migrate_folio_add(folio, qp->pagelist, qp->flags)) + !migrate_folio_add(folio, qp->pagelist, walk->mm, qp->flags)) qp->nr_failed++; } @@ -561,7 +561,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, } if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma) || - !migrate_folio_add(folio, qp->pagelist, flags)) { + !migrate_folio_add(folio, qp->pagelist, walk->mm, flags)) { qp->nr_failed++; if (strictly_unmovable(flags)) break; @@ -609,7 +609,7 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, * easily detect if a folio is shared. */ if ((flags & MPOL_MF_MOVE_ALL) || - (!folio_mapped_shared(folio) && !hugetlb_pmd_shared(pte))) + (!folio_mapped_shared(folio, walk->mm) && !hugetlb_pmd_shared(pte))) if (!isolate_hugetlb(folio, qp->pagelist)) qp->nr_failed++; unlock: @@ -981,7 +981,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, #ifdef CONFIG_MIGRATION static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, - unsigned long flags) + struct mm_struct *mm, unsigned long flags) { /* * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio. @@ -990,7 +990,7 @@ static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, * See folio_mapped_shared() on possible imprecision when we cannot * easily detect if a folio is shared. */ - if ((flags & MPOL_MF_MOVE_ALL) || !folio_mapped_shared(folio)) { + if ((flags & MPOL_MF_MOVE_ALL) || !folio_mapped_shared(folio, mm)) { if (folio_isolate_lru(folio)) { list_add_tail(&folio->lru, foliolist); node_stat_mod_folio(folio, @@ -1195,7 +1195,7 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src, #else static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, - unsigned long flags) + struct mm_struct *mm, unsigned long flags) { return false; } diff --git a/mm/migrate.c b/mm/migrate.c index 341a84c3e8e4..8a1d75ff2dc6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2559,7 +2559,7 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, * every page is mapped to the same process. Doing that is very * expensive, so check the estimated mapcount of the folio instead. */ - if (folio_mapped_shared(folio) && folio_is_file_lru(folio) && + if (folio_mapped_shared(folio, vma->vm_mm) && folio_is_file_lru(folio) && (vma->vm_flags & VM_EXEC)) goto out; From patchwork Fri Nov 24 13:26:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49290C61D97 for ; Fri, 24 Nov 2023 13:27:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D37D68D007D; Fri, 24 Nov 2023 08:27:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CBFFE8D006E; Fri, 24 Nov 2023 08:27:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B130E8D007D; Fri, 24 Nov 2023 08:27:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 95DF98D006E for ; Fri, 24 Nov 2023 08:27:12 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5337912057C for ; Fri, 24 Nov 2023 13:27:12 +0000 (UTC) X-FDA: 81492923904.24.1A0C00A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 7B6D040028 for ; Fri, 24 Nov 2023 13:27:10 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ci6nzm9h; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qOqXc/Lc47PjYVBDjn5tNISm7M7uwGrIMk+t0cHF8N0=; b=HcjJmqS8Z0WxR3wRaj2D05mkHYLbI0mOpCceKzl830dgd4fV9RH/K5RS5RQZFux0O0mDfn gmUcRfYptJ8JG01aNX4xYgW3HAQlRDIv0wAOQ91hU19JCBdaxye71l/p4e1tfMZlO62TA3 SO2Xzbj7UnwHhdzFQu9bTNKaIYkcfs8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ci6nzm9h; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832430; a=rsa-sha256; cv=none; b=bUMdptXwg5ATpNhOT0zfRcLfvx+LIB6UcEVoEviSKXNbwmoL5uMbIYuKjeMOYaGqhFHztF PornqY7YLdm7Wl9Iw/t8GlFYXMEiegMYjOeHjNi9+Uwozh9BXyChi8OK2RXT94LYFD/WPN ZQqHuzB8pK1O+O3q+GwdB8hXvXVvU5c= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832429; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qOqXc/Lc47PjYVBDjn5tNISm7M7uwGrIMk+t0cHF8N0=; b=Ci6nzm9hcHMWZuOapqbpBSXFSXWl0TnzZyMigA7IRrFl1gTanwBv+Yxr8wUCSiBJO071Bt MEK/fQHM2ZYEnjZIVNPCV7ntYVs3F6f0Gge60gEQr9Gbs9WlQj3RSxMpCtrZZEzpBXX0pM er3r07Kwxy5b54n2RQrliMdEl3Nzk/E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-85-HfF3hCRgOCiCl6qXEnqmsA-1; Fri, 24 Nov 2023 08:27:04 -0500 X-MC-Unique: HfF3hCRgOCiCl6qXEnqmsA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AD25B811E86; Fri, 24 Nov 2023 13:27:03 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 477E52166B2A; Fri, 24 Nov 2023 13:27:00 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 09/20] mm: improve folio_mapped_shared() for partially-mappable folios using rmap IDs Date: Fri, 24 Nov 2023 14:26:14 +0100 Message-ID: <20231124132626.235350-10-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 7B6D040028 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: i3t3tme5xqxx81bwqejhub836fxmebh6 X-HE-Tag: 1700832430-767845 X-HE-Meta: U2FsdGVkX19ai0+lHL3XWCnoQ3xEBUQiuiD1mJcgOEx8/mtQ9n5Atd/i/ARFRebKmE3FSQtLMM+N2qZXDSF2bdHoqYLnME1tWlPg4USksxBkLSPU14QES7AAwSe/xPkLqBaUQBYkHU2M+tILIvKmf/pJDS2YP+3fPrrToRFyApNSJRI/P3HFmUWnHfvaY6rlupMEU/4fpR4Yp8DA5smLAQ2nbxN6wm62eqhNPcmgfejIzyx7Klh3DnZc/7Qj768xQuZeLTXK02U8eAZwOH5mX3ao3+U88zYEwlX4fKSEYIkE5u1LZCFOEjeB0AMuKxs4AdQao840JH7r1YU48AnH7fK14cxu+4V8BbjlLeu786lPgGISp2+OBWxlA1XyGrwAKmQlNzN6dFkwHvFr/HrW9fpfnvlOTlMMPECrNOvmB9khREOTcQyR5+XL3AGFZjEySViQz8eQ1gd7rpbZN/58i2vPlNKwy8jYRY3EaRELGkgDpO/BS/XHdhAR+C8FdG9Q88j3QnFSXULikWSTXCzXhVOpEaCy976iPczRSfNKy5ijtt5AbqjPLkWIAnUr8MsejL8do5By+nTEN/xFAiLOHKsukx853yyeuDWQUD/YqUW3s6r0GGfivGESGNT+RAcMZI0JkoT7AiW5FqYVAtjaJE2rxexp7FImbUChB4QQUQv+v0hSapyr4+kw9XcLH1V1+Qk8K6ATwHPRKj1Enw3gDBMmG2fG/AKCUa2Ra3wZ+c5lSFKSz/SFlefzu1ZR9M9KTu7kWHsLCdgx/WImdw0I68UWhZGG8459NsojadMsiAXeuwT1xNXGuMtSCbAxLc8jtGCsEKwwXMujrKzL9ksZk4TD8hIq5MINPchuLiYahGofXltXU2/sknoiIq2sjtzQjo2W/7NFo85Q0loLjOgjUbKOnUr/RaDYBwo+Xu/yJIQ1y3fYRnN4qI+uXfdJbWiYSZnlNgd9ZYteiSY/waw o6thgMFH Wnbj3tN5adqwAehOL3AuHMxOxUxRRpPwV+pe+888LRUkJAjmrth5ITWrgeVOag42XtKbnhZro4pjJ6BpHVIhRgdEVtao/gFtGktLqxcoI6aWV/20MhD7w+Zq9KNp0/S2MwDKUg7F54a+ICqRXAasm4B9lyzHcTRZqbc84LtgDkRj3xR4risNnFkcVIzfrOfuSZm62sWknPZCAu9i2OJ8KYSbtQS1bDdDeh232 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's make folio_mapped_shared() precise by using or rmap ID magic to identify if a single MM is responsible for all mappings. If there is a lot of concurrent (un)map activity, we could theoretically spin for quite a while. But we're only looking at the rmap values in case we didn't already identify the folio as "obviously shared". In most cases, there should only be one or a handful of page tables involved. For current THPs with ~512 .. 2048 subpages, we really shouldn't see a lot of concurrent updates that keep us spinning for a long time. Anyhow, if ever a problem this can be optimized later if there is real demand. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 21 ++++++++++++--- include/linux/rmap.h | 2 ++ mm/rmap_id.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 765e688690f1..1081a8faa1a3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2113,6 +2113,17 @@ static inline size_t folio_size(struct folio *folio) return PAGE_SIZE << folio_order(folio); } +#ifdef CONFIG_RMAP_ID +bool __folio_large_mapped_shared(struct folio *folio, struct mm_struct *mm); +#else +static inline bool __folio_large_mapped_shared(struct folio *folio, + struct mm_struct *mm) +{ + /* ... guess based on the mapcount of the first page of the folio. */ + return atomic_read(&folio->page._mapcount) > 0; +} +#endif + /** * folio_mapped_shared - Report if a folio is certainly mapped by * multiple entities in their page tables @@ -2141,8 +2152,11 @@ static inline size_t folio_size(struct folio *folio) * PMD-mapped PMD-sized THP), the result will be exactly correct. * * For all other (partially-mappable) folios, such as PTE-mapped THP, the - * return value is partially fuzzy: true is not fuzzy, because it means - * "certainly mapped shared", but false means "maybe mapped exclusively". + * return value is partially fuzzy without CONFIG_RMAP_ID: true is not fuzzy, + * because it means "certainly mapped shared", but false means + * "maybe mapped exclusively". + * + * With CONFIG_RMAP_ID, the result will be exactly correct. * * Note that this function only considers *current* page table mappings * tracked via rmap -- that properly adjusts the folio mapcount(s) -- and @@ -2177,8 +2191,7 @@ static inline bool folio_mapped_shared(struct folio *folio, */ if (total_mapcount > folio_nr_pages(folio)) return true; - /* ... guess based on the mapcount of the first page of the folio. */ - return atomic_read(&folio->page._mapcount) > 0; + return __folio_large_mapped_shared(folio, mm); } #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 19c9dc3216df..a73e146d82d1 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -253,6 +253,8 @@ void __folio_set_large_rmap_val(struct folio *folio, int count, struct mm_struct *mm); void __folio_add_large_rmap_val(struct folio *folio, int count, struct mm_struct *mm); +bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, + struct mm_struct *mm); #else static inline void __folio_prep_large_rmap(struct folio *folio) { diff --git a/mm/rmap_id.c b/mm/rmap_id.c index e66b0f5aea2d..85a61c830f19 100644 --- a/mm/rmap_id.c +++ b/mm/rmap_id.c @@ -322,6 +322,69 @@ void __folio_add_large_rmap_val(struct folio *folio, int count, } } +bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, + struct mm_struct *mm) +{ + const unsigned int order = folio_order(folio); + unsigned long diff = 0; + + switch (order) { +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + case RMAP_SUBID_6_MIN_ORDER .. RMAP_SUBID_6_MAX_ORDER: + diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_6(mm, 0) * count); + diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_6(mm, 1) * count); + diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_6(mm, 2) * count); + diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_6(mm, 3) * count); + diff |= atomic_long_read(&folio->_rmap_val4) ^ (get_rmap_subid_6(mm, 4) * count); + diff |= atomic_long_read(&folio->_rmap_val5) ^ (get_rmap_subid_6(mm, 5) * count); + break; +#endif +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + case RMAP_SUBID_5_MIN_ORDER .. RMAP_SUBID_5_MAX_ORDER: + diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_5(mm, 0) * count); + diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_5(mm, 1) * count); + diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_5(mm, 2) * count); + diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_5(mm, 3) * count); + diff |= atomic_long_read(&folio->_rmap_val4) ^ (get_rmap_subid_5(mm, 4) * count); + break; +#endif + default: + diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_4(mm, 0) * count); + diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_4(mm, 1) * count); + diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_4(mm, 2) * count); + diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_4(mm, 3) * count); + break; + } + return !diff; +} + +bool __folio_large_mapped_shared(struct folio *folio, struct mm_struct *mm) +{ + unsigned long start; + bool exclusive; + int mapcount; + + VM_WARN_ON_ONCE(!folio_test_large_rmappable(folio)); + VM_WARN_ON_ONCE(folio_test_hugetlb(folio)); + + /* + * Livelocking here is unlikely, as the caller already handles the + * "obviously shared" cases. If ever an issue and there is too much + * concurrent (un)mapping happening (using different page tables), we + * could stop earlier and just return "shared". + */ + do { + start = raw_read_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount); + mapcount = folio_mapcount(folio); + if (unlikely(mapcount > folio_nr_pages(folio))) + return true; + exclusive = __folio_has_large_matching_rmap_val(folio, mapcount, mm); + } while (raw_read_atomic_seqcount_retry(&folio->_rmap_atomic_seqcount, + start)); + + return !exclusive; +} + int alloc_rmap_id(void) { int id; From patchwork Fri Nov 24 13:26:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16484C624B4 for ; Fri, 24 Nov 2023 13:27:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 91AAB8D007E; Fri, 24 Nov 2023 08:27:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 87BEE8D006E; Fri, 24 Nov 2023 08:27:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CE088D007E; Fri, 24 Nov 2023 08:27:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4E75D8D006E for ; Fri, 24 Nov 2023 08:27:14 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2A32940410 for ; Fri, 24 Nov 2023 13:27:14 +0000 (UTC) X-FDA: 81492923988.04.34BD6AC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 6EFA88000A for ; Fri, 24 Nov 2023 13:27:12 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f2v4Nd9M; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QUP8GR2DnkC5694xV/qGgIseF+xugXnf+IpocEcF75Q=; b=y0GwYH0otRp3kVhYAeo9gX51h4AIAlMtAwUse2q56DVu/bozsMW/5sy2/wVNS+kq8Ucc0b J9GWtHIgGvppdjLEx2vh625+Ie2ymHyZhAcxzcCAiW6I8g8x6+CY4DZkxhNYBwGW9FGLMo SAEOVXNTpHYMvgNoPEubb0Gtr2hhdUk= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f2v4Nd9M; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832432; a=rsa-sha256; cv=none; b=csuAGugzZfqcFdOfshUmGqqUeKoGpZG+/WT5jq3YkBBOZ+IazRBZj2yMObIyBMylhbvWXu DbQh9lKOliKwmXDWtCMAiLNgiE4+UJ9EHfujePajlzRCccrZIKqtdUDzUZi2hu1Q5haywv XNbV1Az8RghEi8Q6u9Fszwwb9K5DcDM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QUP8GR2DnkC5694xV/qGgIseF+xugXnf+IpocEcF75Q=; b=f2v4Nd9MWqwmzWHvH54xxheGVyGAyt5tAxJIMVpfBd0QdgkTuJFTFXpNhTI0bmsb+XtBuw kKskpr0b9PT+eBiQ0HxvdQMgwW0ZzTWmPnqV9y9qoUWFdhaGf7jJhOjxbtF0fnoulpXgZl +uGVUW0gJA7SwIMywIZvNl0TsromnDE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-508-EVwlOeefM6O30sNZp-JA9g-1; Fri, 24 Nov 2023 08:27:08 -0500 X-MC-Unique: EVwlOeefM6O30sNZp-JA9g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BC99C811E93; Fri, 24 Nov 2023 13:27:07 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id ECEBC2166B2B; Fri, 24 Nov 2023 13:27:03 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 10/20] mm/memory: COW reuse support for PTE-mapped THP with rmap IDs Date: Fri, 24 Nov 2023 14:26:15 +0100 Message-ID: <20231124132626.235350-11-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6EFA88000A X-Stat-Signature: 3xjq5at5u1gjqgozkwczfkjxbogaa4ji X-HE-Tag: 1700832432-721164 X-HE-Meta: U2FsdGVkX1+rRP8r/eUZO/nx9MQnSYnwoxZgUrGYqUzev/7QRLWO3KA8lLw5zJPoJeNb8BpiHOQS+BSOnuiqfvzOQpKFrv0Zhxm8lHRaL8/o5BGsTAZ3vec6/amJnTgD8kK/FdcCxoSSWyFQiY6eapVg3im+/0YGkLgJclXfZnbnov1RSzRips1yJ+jQwA/gdVm/4mVUjj5sjD11SHXIM0hEwkdDb5r1FHtgQbYwVjynq5YBNCpMoi5wKrEnwen4t4t5S+k0phQwIHBZm5H0Hifp+X1+j+AIN4/lXBq9NXS9ai2Fh/dgSjh8VNwZX8agKjnDJU7fuFX+07qKj0ZuA8jDIpD9Su5xHUQpBirl5onDOHL2slrmGSpuJh9H2gN+jR++omnLGLtx2HdkMgY5DfnnzBULlp/2H6JpGYz1NXR1IR9dmX4TpbvRfkQIg4rCa9EygTWqc+o2dYVp2YhSH4v0YJiDhuX7RYdCuPxfh5Bduk0sexU2PQFzcbiY//fiPgqTnJmTh/Lk8tpMFlec5FXlgN1ke9nI5TqHnbmdocqHCT1n4daGc7tb1F68j7pf9J9a0u/+fB2ltmspmc3DDS728Mz9qe4b6QkBGNLp7s7MOCVKSd7anWKd0OfLUmy82lh7ylxPs/IAb9E1X4wnZJ60gwE0l0bZoO67p9TVBEybjDS3J7FpTjuomLs/yCWah2CLZC1RdC11q9LYL5WmGlraChXBvo5GAfq8ZK+6LLbS4owMDakMkNIQgmA2SAJJHKxONZThRoTZS9LVbXliTWuSsr269W7/dUc44rZ+ytmKmi2HRlL0PxqvnHKucj9JCDPcE2l2Wk+fOYUUhcmjl9pYTqqaLbqQlXTVUAF8GgZIQbx8HlPcBJDETnWUT9geae98gqLJipc7RJNbMqQxVZwJkzPELG4Z5BRRJz1aat7P+2/fI3sum91a72QM35lfkdntQ/sgHhNKvNNTPKI wFoG2xxs gOTV2PbxePGNVfJzgUSfFuXrRGBNudJq/N+N7ogqOea4vEO6MnjLkDdvAK669AqLsZjxD4AK2dc0+YR9lo/F5IG1tYQn+AbkIFkPdvIn4EdKMZ0FseqhcHX/C9IW6eeljGNOE/v/zNPUr58NkyRiq6Y1Pd9/CWXU3UVdr0tat85ilX62eJcuPvUeIH505Ed4TMaQ5KBUm6SuschZ4w7QkV70qRg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For now, we only end up reusing small folios and PMD-mapped large folios (i.e., THP) after fork(); PTE-mapped THPs are never reused, except when only a single page of the folio remains mapped. Instead, we end up copying each subpage even though the THP might be exclusive to the MM. The logic we're using for small folios and PMD-mapped THPs is the following: Is the only reference to the folio from a single page table mapping? Then: (a) There are no other references to the folio from other MMs (e.g., page table mapping, GUP) (b) There are no other references to the folio from page migration/ swapout/swapcache that might temporarily unmap the folio. Consequently, the folio is exclusive to that process and can be reused. In that case, we end up with folio_refcount(folio) == 1 and an implied folio_mapcount(folio) == 1, while holding the page table lock and the page lock to protect against possible races. For PTE-mapped THP, however, we have not one, but multiple references from page tables, whereby such THPs can be mapped into multiple page tables in the MM. Reusing the logic that we use for small folios and PMD-mapped THPs means, that when reusing a PTE-mapped THP, we want to make sure that: (1) All folio references are from page table mappings. (2) All page table mappings belong to the same MM. (3) We didn't race with (un)mapping of the page related to other page tables, such that the mapcount and refcount are stable. For (1), we can check folio_refcount(folio) == folio_mapcount(folio) For (2) and (3), we can use our new rmap ID infrastructure. We won't bother with the swapcache and LRU cache for now. Add some sanity checks under CONFIG_DEBUG_VM, to identify any obvious problems early. Signed-off-by: David Hildenbrand --- mm/memory.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 5048d58d6174..fb533995ff68 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3360,6 +3360,95 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio) static bool wp_can_reuse_anon_folio(struct folio *folio, struct vm_area_struct *vma) { +#ifdef CONFIG_RMAP_ID + if (folio_test_large(folio)) { + bool retried = false; + unsigned long start; + int mapcount, i; + + /* + * The assumption for anonymous folios is that each page can + * only get mapped once into a MM. This also holds for + * small folios -- except when KSM is involved. KSM does + * currently not apply to large folios. + * + * Further, each taken mapcount must be paired with exactly one + * taken reference, whereby references must be incremented + * before the mapcount when mapping a page, and references must + * be decremented after the mapcount when unmapping a page. + * + * So if all references to a folio are from mappings, and all + * mappings are due to our (MM) page tables, and there was no + * concurrent (un)mapping, this folio is certainly exclusive. + * + * We currently don't optimize for: + * (a) folio is mapped into multiple page tables in this + * MM (e.g., mremap) and other page tables are + * concurrently (un)mapping the folio. + * (b) the folio is in the swapcache. Likely the other PTEs + * are still swap entries and folio_free_swap() would fail. + * (c) the folio is in the LRU cache. + */ +retry: + start = raw_read_atomic_seqcount(&folio->_rmap_atomic_seqcount); + if (start & ATOMIC_SEQCOUNT_WRITERS_MASK) + return false; + mapcount = folio_mapcount(folio); + + /* Is this folio possibly exclusive ... */ + if (mapcount > folio_nr_pages(folio) || folio_entire_mapcount(folio)) + return false; + + /* ... and are all references from mappings ... */ + if (folio_ref_count(folio) != mapcount) + return false; + + /* ... and do all mappings belong to us ... */ + if (!__folio_has_large_matching_rmap_val(folio, mapcount, vma->vm_mm)) + return false; + + /* ... and was there no concurrent (un)mapping ? */ + if (raw_read_atomic_seqcount_retry(&folio->_rmap_atomic_seqcount, + start)) + return false; + + /* Safety checks we might want to drop in the future. */ + if (IS_ENABLED(CONFIG_DEBUG_VM)) { + unsigned int mapcount; + + if (WARN_ON_ONCE(folio_test_ksm(folio))) + return false; + /* + * We might have raced against swapout code adding + * the folio to the swapcache (which, by itself, is not + * problematic). Let's simply check again if we would + * properly detect the additional reference now and + * properly fail. + */ + if (unlikely(folio_test_swapcache(folio))) { + if (WARN_ON_ONCE(retried)) + return false; + retried = true; + goto retry; + } + for (i = 0; i < folio_nr_pages(folio); i++) { + mapcount = page_mapcount(folio_page(folio, i)); + if (WARN_ON_ONCE(mapcount > 1)) + return false; + } + } + + /* + * This folio is exclusive to us. Do we need the page lock? + * Likely not, and a trylock would be unfortunate if this + * folio is mapped into multiple page tables and we get + * concurrent page faults. If there would be references from + * page migration/swapout/swapcache, we would have detected + * an additional reference and never ended up here. + */ + return true; + } +#endif /* CONFIG_RMAP_ID */ /* * We have to verify under folio lock: these early checks are * just an optimization to avoid locking the folio and freeing From patchwork Fri Nov 24 13:26:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B781C61DF4 for ; Fri, 24 Nov 2023 13:27:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 044038D007F; Fri, 24 Nov 2023 08:27:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E95748D006E; Fri, 24 Nov 2023 08:27:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C99A78D007F; Fri, 24 Nov 2023 08:27:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A8E028D006E for ; Fri, 24 Nov 2023 08:27:15 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7E128C01D0 for ; Fri, 24 Nov 2023 13:27:15 +0000 (UTC) X-FDA: 81492924030.21.188049B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 9D1FA120019 for ; Fri, 24 Nov 2023 13:27:13 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=W8rcl+FE; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tnmd+tFa3nPIVK41hUx6D+f5FKMgV+UNk16uwK5iEbI=; b=EagMKq/6jLhe/qyDQHT6r/mDTa0a51ZbIBHkuZ/oR3JdoKBLuXinoRQbowvKjWJw8upms5 EcR3pA5a+6EfW0LxqvTRe1cYv8CA+QS1GvBY0pX6YSt3ZrLLpnjat46Tngl/gdoKY7NRoq /+JyoRjKWlu424OmfvpINNB1qNGMWXQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832433; a=rsa-sha256; cv=none; b=u6Z33rqe8oA0Rew1KJu4XAzjl0U69eyvsXxjrZEWwoleG5KPu/5By4WlJOvuZJwm1m3vvf E6/Z4Yxv9WvXYrAQpHOXvOU0r/UGOp2BsKHbwnubMd0yvyrBQa8M0PQYkg7fiWIIUDybc4 5vAozd44josfp1EhUMH7WUbR4SHx0Uk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=W8rcl+FE; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832432; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tnmd+tFa3nPIVK41hUx6D+f5FKMgV+UNk16uwK5iEbI=; b=W8rcl+FEAblkz91paxNS587GZI3bA4FlrwNi5tf3HGti1990LJVaXQml2IsAVRnpPStdlU JUdXdiW4G76G/hc0kMxACrlKbdzh705lBQG2NyajZ3MXnIqthc2QD+Wy5JYScAgby4mwsK FrgDozJzPtD/GsmFLjltgBXylILvgGY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-554-n03FLBEbPOa7jDqGNMEHgQ-1; Fri, 24 Nov 2023 08:27:11 -0500 X-MC-Unique: n03FLBEbPOa7jDqGNMEHgQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EACB085A58C; Fri, 24 Nov 2023 13:27:10 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 29FE22166B2A; Fri, 24 Nov 2023 13:27:07 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 11/20] mm/rmap_id: support for 1, 2 and 3 values by manual calculation Date: Fri, 24 Nov 2023 14:26:16 +0100 Message-ID: <20231124132626.235350-12-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Stat-Signature: mi61ipy1wp3ytgx8xjsmcw1z9466gt9k X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9D1FA120019 X-Rspam-User: X-HE-Tag: 1700832433-930926 X-HE-Meta: U2FsdGVkX18E3JOxKfhG+dmB+9Drmg01MiqQrx5mYLGSVJtk880kJpi/M7Hhn9OjDw95/hGlQWQpav4sUeHaZFhtzvzYmFahgQo26/J0rQ9PhVqkkEL322RGP83zHbKVlrGZGH7NsbSsUenExPlscrnhnDQjqtn9rVHeHpvv+xIlMf7lMJiLfXcQ3sVoHNL72RfgAUYp6KtwITl+Mp/M2Ldga4Xx3/1qI4Fa5awRpzbcrlSSj8FH7vFkjcgPvIVREo88UiBbIOBABshR+eggX2qlHrBFIeW7KEfrohqhj24N9YYZWELN1yj8RY0lH3DzTgzORMW/qUV4uiL8KEMIntrLDV+cYh4/d22HLBWnvHCOuoE+sunO1FLSvdyCdcte7KhhN/d5W2acbUCNjUcj+W+/iw2VUf++rnF7s4bKhNk9MARuLdOUXC8XHZzp6UcQDEy0aGhs9XAjitkUwZCTt9VdBv0traPmbepWVmFIUy/5jti2yMJXOv0BcFrjz7LTsXwWXhp2b1XH/YbliQjirFS13Hs292ClesKcJ2nVYnPugSnKN9wSRLOedY3F35HPGu+cnmm0wkev2P4Y09eLp3xWFI+avn0cJZtgeQ6KOwQC2bUqGb6VQmzN1A7qBwNrvunvl+NYB8sWx9/9QdOOdu4CNjRMjb9vymJ8pBzsbSoJEQ4SWxczGBf0wGsmbAW0oPqavMGGa6o0ROqkyngXfjBIrHYDsnoNE4QhTGsOeeZraEPyfCIWwsmo1hQ9Y2lmizmfkao5On5iBtz7+nXkLWpAjfl08DOC9Ibv5qtSxIsT2gs86c9MMOqKjAfugBdewpSAZZUBNvSyy/s6yQKquyFEcQzsvLwUE7c2XMwb6LRiOyeWhnpV4T/b1tvlepAZadwzi6uQZXQGDAI4xZ2TKEYFB1v6kvtYL38t2NkDdpKAJ7AZ9kA4byOU0uNQ5LszL3Foo0U2SUltC/49jEA 44Y9Fd06 HLBCExVMohrbvQKWdahmEQkrbS7WcjFo7eUt7sKP2T2f04y2o2notLz8r7OGh85k6ZgkU2azDq5qmHrtrh336xAjMWQU1N7V1gBSTbqtk3AiFnVX+qCkgbcO97g2RQpoY2kukGY1yhzram6MUGWtnZS7UQf//R7uOWDrO9/j0lLtV3WqRBG0v0SNJUhDUdIKnQ8J9pK5Uh+8JU3kdVWHNjw3FWJan3bvFyLX6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For smaller folios, we can use less rmap values: * <= order-2: 1x 64bit value * <= order-5: 2x 64bit values * <= order-9: 3x 64bit values We end up with a lot of subids, so we cannot really use lookup tables. Pre-calculate the subids per MM. For order-9 we could think about having a lookup table with 128bit entries. Further, we could calcualte them only when really required. With 2 MiB THP this now implies only 3 instead of 4 values. Signed-off-by: David Hildenbrand --- include/linux/mm_types.h | 3 ++ include/linux/rmap.h | 58 ++++++++++++++++++++++++++++- kernel/fork.c | 6 +++ mm/rmap_id.c | 79 +++++++++++++++++++++++++++++++++++++--- 4 files changed, 139 insertions(+), 7 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 75305c57ef64..0ca5004e8f4a 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1032,6 +1032,9 @@ struct mm_struct { #ifdef CONFIG_RMAP_ID int mm_rmap_id; + unsigned long mm_rmap_subid_1; + unsigned long mm_rmap_subid_2[2]; + unsigned long mm_rmap_subid_3[3]; #endif /* CONFIG_RMAP_ID */ } __randomize_layout; diff --git a/include/linux/rmap.h b/include/linux/rmap.h index a73e146d82d1..39aeab457f4a 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -180,12 +180,54 @@ struct anon_vma *folio_get_anon_vma(struct folio *folio); void free_rmap_id(int id); int alloc_rmap_id(void); +#define RMAP_SUBID_1_MAX_ORDER 2 +#define RMAP_SUBID_2_MIN_ORDER 3 +#define RMAP_SUBID_2_MAX_ORDER 5 +#define RMAP_SUBID_3_MIN_ORDER 6 +#define RMAP_SUBID_3_MAX_ORDER 9 +#define RMAP_SUBID_4_MIN_ORDER 10 #define RMAP_SUBID_4_MAX_ORDER 10 #define RMAP_SUBID_5_MIN_ORDER 11 #define RMAP_SUBID_5_MAX_ORDER 12 #define RMAP_SUBID_6_MIN_ORDER 13 #define RMAP_SUBID_6_MAX_ORDER 15 +static inline unsigned long calc_rmap_subid(unsigned int n, unsigned int i) +{ + unsigned long nr = 0, mult = 1; + + while (i) { + if (i & 1) + nr += mult; + mult *= (n + 1); + i >>= 1; + } + return nr; +} + +static inline unsigned long calc_rmap_subid_1(int rmap_id) +{ + VM_WARN_ON_ONCE(rmap_id < RMAP_ID_MIN || rmap_id > RMAP_ID_MAX); + + return calc_rmap_subid(1u << RMAP_SUBID_1_MAX_ORDER, rmap_id); +} + +static inline unsigned long calc_rmap_subid_2(int rmap_id, int nr) +{ + VM_WARN_ON_ONCE(rmap_id < RMAP_ID_MIN || rmap_id > RMAP_ID_MAX || nr > 1); + + return calc_rmap_subid(1u << RMAP_SUBID_2_MAX_ORDER, + (rmap_id >> (nr * 12)) & 0xfff); +} + +static inline unsigned long calc_rmap_subid_3(int rmap_id, int nr) +{ + VM_WARN_ON_ONCE(rmap_id < RMAP_ID_MIN || rmap_id > RMAP_ID_MAX || nr > 2); + + return calc_rmap_subid(1u << RMAP_SUBID_3_MAX_ORDER, + (rmap_id >> (nr * 8)) & 0xff); +} + static inline void __folio_prep_large_rmap(struct folio *folio) { const unsigned int order = folio_order(folio); @@ -202,10 +244,16 @@ static inline void __folio_prep_large_rmap(struct folio *folio) atomic_long_set(&folio->_rmap_val4, 0); fallthrough; #endif - default: + case RMAP_SUBID_4_MIN_ORDER ... RMAP_SUBID_4_MAX_ORDER: atomic_long_set(&folio->_rmap_val3, 0); + fallthrough; + case RMAP_SUBID_3_MIN_ORDER ... RMAP_SUBID_3_MAX_ORDER: atomic_long_set(&folio->_rmap_val2, 0); + fallthrough; + case RMAP_SUBID_2_MIN_ORDER ... RMAP_SUBID_2_MAX_ORDER: atomic_long_set(&folio->_rmap_val1, 0); + fallthrough; + default: atomic_long_set(&folio->_rmap_val0, 0); break; } @@ -227,10 +275,16 @@ static inline void __folio_undo_large_rmap(struct folio *folio) VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val4)); fallthrough; #endif - default: + case RMAP_SUBID_4_MIN_ORDER ... RMAP_SUBID_4_MAX_ORDER: VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val3)); + fallthrough; + case RMAP_SUBID_3_MIN_ORDER ... RMAP_SUBID_3_MAX_ORDER: VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val2)); + fallthrough; + case RMAP_SUBID_2_MIN_ORDER ... RMAP_SUBID_2_MAX_ORDER: VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val1)); + fallthrough; + default: VM_WARN_ON_ONCE(atomic_long_read(&folio->_rmap_val0)); break; } diff --git a/kernel/fork.c b/kernel/fork.c index 773c93613ca2..1d2f6248c83e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -822,6 +822,12 @@ static inline int mm_alloc_rmap_id(struct mm_struct *mm) if (id < 0) return id; mm->mm_rmap_id = id; + mm->mm_rmap_subid_1 = calc_rmap_subid_1(id); + mm->mm_rmap_subid_2[0] = calc_rmap_subid_2(id, 0); + mm->mm_rmap_subid_2[1] = calc_rmap_subid_2(id, 1); + mm->mm_rmap_subid_3[0] = calc_rmap_subid_3(id, 0); + mm->mm_rmap_subid_3[1] = calc_rmap_subid_3(id, 1); + mm->mm_rmap_subid_3[2] = calc_rmap_subid_3(id, 2); return 0; } diff --git a/mm/rmap_id.c b/mm/rmap_id.c index 85a61c830f19..6c3187547741 100644 --- a/mm/rmap_id.c +++ b/mm/rmap_id.c @@ -87,6 +87,39 @@ static DEFINE_IDA(rmap_ida); * involved page tables are locked and stop any page table walkers. */ +/* + * With 4 (order-2) possible exclusive mappings per folio, we can have + * 16777216 = 16M sub-IDs per 64bit value. + */ +static unsigned long get_rmap_subid_1(struct mm_struct *mm) +{ + return mm->mm_rmap_subid_1; +} + +/* + * With 32 (order-5) possible exclusive mappings per folio, we can have + * 4096 sub-IDs per 64bit value. + * + * With 2 such 64bit values, we can support 4096^2 == 16M IDs. + */ +static unsigned long get_rmap_subid_2(struct mm_struct *mm, int nr) +{ + VM_WARN_ON_ONCE(nr > 1); + return mm->mm_rmap_subid_2[nr]; +} + +/* + * With 512 (order-9) possible exclusive mappings per folio, we can have + * 128 sub-IDs per 64bit value. + * + * With 3 such 64bit values, we can support 128^3 == 16M IDs. + */ +static unsigned long get_rmap_subid_3(struct mm_struct *mm, int nr) +{ + VM_WARN_ON_ONCE(nr > 2); + return mm->mm_rmap_subid_3[nr]; +} + /* * With 1024 (order-10) possible exclusive mappings per folio, we can have 64 * sub-IDs per 64bit value. @@ -279,12 +312,24 @@ void __folio_set_large_rmap_val(struct folio *folio, int count, atomic_long_set(&folio->_rmap_val4, get_rmap_subid_5(mm, 4) * count); break; #endif - default: + case RMAP_SUBID_4_MIN_ORDER ... RMAP_SUBID_4_MAX_ORDER: atomic_long_set(&folio->_rmap_val0, get_rmap_subid_4(mm, 0) * count); atomic_long_set(&folio->_rmap_val1, get_rmap_subid_4(mm, 1) * count); atomic_long_set(&folio->_rmap_val2, get_rmap_subid_4(mm, 2) * count); atomic_long_set(&folio->_rmap_val3, get_rmap_subid_4(mm, 3) * count); break; + case RMAP_SUBID_3_MIN_ORDER ... RMAP_SUBID_3_MAX_ORDER: + atomic_long_set(&folio->_rmap_val0, get_rmap_subid_3(mm, 0) * count); + atomic_long_set(&folio->_rmap_val1, get_rmap_subid_3(mm, 1) * count); + atomic_long_set(&folio->_rmap_val2, get_rmap_subid_3(mm, 2) * count); + break; + case RMAP_SUBID_2_MIN_ORDER ... RMAP_SUBID_2_MAX_ORDER: + atomic_long_set(&folio->_rmap_val0, get_rmap_subid_2(mm, 0) * count); + atomic_long_set(&folio->_rmap_val1, get_rmap_subid_2(mm, 1) * count); + break; + default: + atomic_long_set(&folio->_rmap_val0, get_rmap_subid_1(mm) * count); + break; } } @@ -313,12 +358,24 @@ void __folio_add_large_rmap_val(struct folio *folio, int count, atomic_long_add(get_rmap_subid_5(mm, 4) * count, &folio->_rmap_val4); break; #endif - default: + case RMAP_SUBID_4_MIN_ORDER ... RMAP_SUBID_4_MAX_ORDER: atomic_long_add(get_rmap_subid_4(mm, 0) * count, &folio->_rmap_val0); atomic_long_add(get_rmap_subid_4(mm, 1) * count, &folio->_rmap_val1); atomic_long_add(get_rmap_subid_4(mm, 2) * count, &folio->_rmap_val2); atomic_long_add(get_rmap_subid_4(mm, 3) * count, &folio->_rmap_val3); break; + case RMAP_SUBID_3_MIN_ORDER ... RMAP_SUBID_3_MAX_ORDER: + atomic_long_add(get_rmap_subid_3(mm, 0) * count, &folio->_rmap_val0); + atomic_long_add(get_rmap_subid_3(mm, 1) * count, &folio->_rmap_val1); + atomic_long_add(get_rmap_subid_3(mm, 2) * count, &folio->_rmap_val2); + break; + case RMAP_SUBID_2_MIN_ORDER ... RMAP_SUBID_2_MAX_ORDER: + atomic_long_add(get_rmap_subid_2(mm, 0) * count, &folio->_rmap_val0); + atomic_long_add(get_rmap_subid_2(mm, 1) * count, &folio->_rmap_val1); + break; + default: + atomic_long_add(get_rmap_subid_1(mm) * count, &folio->_rmap_val0); + break; } } @@ -330,7 +387,7 @@ bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, switch (order) { #if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER - case RMAP_SUBID_6_MIN_ORDER .. RMAP_SUBID_6_MAX_ORDER: + case RMAP_SUBID_6_MIN_ORDER ... RMAP_SUBID_6_MAX_ORDER: diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_6(mm, 0) * count); diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_6(mm, 1) * count); diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_6(mm, 2) * count); @@ -340,7 +397,7 @@ bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, break; #endif #if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER - case RMAP_SUBID_5_MIN_ORDER .. RMAP_SUBID_5_MAX_ORDER: + case RMAP_SUBID_5_MIN_ORDER ... RMAP_SUBID_5_MAX_ORDER: diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_5(mm, 0) * count); diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_5(mm, 1) * count); diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_5(mm, 2) * count); @@ -348,12 +405,24 @@ bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, diff |= atomic_long_read(&folio->_rmap_val4) ^ (get_rmap_subid_5(mm, 4) * count); break; #endif - default: + case RMAP_SUBID_4_MIN_ORDER ... RMAP_SUBID_4_MAX_ORDER: diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_4(mm, 0) * count); diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_4(mm, 1) * count); diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_4(mm, 2) * count); diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_4(mm, 3) * count); break; + case RMAP_SUBID_3_MIN_ORDER ... RMAP_SUBID_3_MAX_ORDER: + diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_3(mm, 0) * count); + diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_3(mm, 1) * count); + diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_3(mm, 2) * count); + break; + case RMAP_SUBID_2_MIN_ORDER ... RMAP_SUBID_2_MAX_ORDER: + diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_2(mm, 0) * count); + diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_2(mm, 1) * count); + break; + default: + diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_1(mm) * count); + break; } return !diff; } From patchwork Fri Nov 24 13:26:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D28ABC624B4 for ; Fri, 24 Nov 2023 13:27:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 659178D0080; Fri, 24 Nov 2023 08:27:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E1F18D006E; Fri, 24 Nov 2023 08:27:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 436428D0080; Fri, 24 Nov 2023 08:27:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2B8DA8D006E for ; Fri, 24 Nov 2023 08:27:24 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EAE0712015D for ; Fri, 24 Nov 2023 13:27:23 +0000 (UTC) X-FDA: 81492924366.26.9E70E4C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 3681DA001B for ; Fri, 24 Nov 2023 13:27:21 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PvrINXIi; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832442; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XtSW4MwxrPZp0IyU2r08B+zJ2gAs9ZHqKJrFZADgzdE=; b=o6M7rlOlrZpB8SrQKuXsThDWA0kbbaJNEY9obiWwPmJE/yFI9mPMPX2qZovvNoTNT4Dzuj IJiy4kRY7Jg2i49Yd3+kpJvnLLmU+r/YFTkgY6HtIFvhRa8Mpz1UGT2tQEl3CM66zPUqv6 +BvbFONVUNZUr3q1PlkINzYydeC13fM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PvrINXIi; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832442; a=rsa-sha256; cv=none; b=sesr/wiZrdUQX8Z4P/4lCs3IUXvbX71yDnkJOkGHPbtuc9YYUiaXmu2PtTpf+E97DZKQUw ioa/tNBH9rG3ZCORQ0qmOs37TXa7TldBFIWpo0d0mpULCpeHS8whs6Tmm1OoDPoCv0hHZM Zhf0eV/5dBJiYDh/urCeSZizQ6pK4Zg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XtSW4MwxrPZp0IyU2r08B+zJ2gAs9ZHqKJrFZADgzdE=; b=PvrINXIi+HkZ7i12u9n/LQMKb8TwExu7SJXhWFy/IQ2YsFwegLyOqW6/ewE6EpYd/A5da7 UuKPW8eUmHy1/qqpqsTos9jgYEaoMHI5Y/ex6bEjAZGeMnH/vs9wDGrAFNoKXsyFi6pvAa O8YZKOyFQVRwnWUh1f2pSor63+lEJZs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-624-S2Vde6e2MxqrSRyf_jNx7w-1; Fri, 24 Nov 2023 08:27:15 -0500 X-MC-Unique: S2Vde6e2MxqrSRyf_jNx7w-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B241E185A784; Fri, 24 Nov 2023 13:27:14 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3636A2166B2A; Fri, 24 Nov 2023 13:27:11 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 12/20] mm/rmap: introduce folio_add_anon_rmap_range() Date: Fri, 24 Nov 2023 14:26:17 +0100 Message-ID: <20231124132626.235350-13-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 3681DA001B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: pjhn14gsiooe67s6zd1xbkjz7bunik5t X-HE-Tag: 1700832441-938669 X-HE-Meta: U2FsdGVkX1/J1sIodycpTSU1usgUmJW7EKnN7PdX8DVR6IPoskwBg+wLEgF+NZFnCrEWBiQnmQ+SXXCv9qMwGVyPJR9B1JXG6y4MhPGiTr2KSz6g1cuYA9vZXZd0Nk5qUQu7y2aNeU8Tdbu5e0PFF93jy6dKJcSee+FfdttQ3WqEh0zIrEDtfCzZMt809nCPWM/5gwsrPkwJyzvrQ065WOEnXnxJrK40/9Uiijl0Fmebwtp0T5N52qneJOi5XkwkDIN57UbBrW/F+I139h+2hJNbVIaNtcPfKy7F0Anq6yvV3c+F08cexLpkpbr2JG65PEeI/RlAdD6VrMDlXK2nkX1VmgPqVwY7REfOjqs+E7/S64GNxGDBvfRovrAeMC7d1uv5VMDh8b4In4KduTUGmKwGgGCCImCP6ltZoAeeI32DwOmOKb5F7J6d/BWS72h4XmZJHITKGpCB5m/UyM+ARKK0LrthKdWfrsZnsAcQ0WLP+HnGVdO21qDDXnF3ThnhLanIA4JMJiKsFHOk5IV8iepOWGVGQcDK5owE1c4McE7ds48yO8mS5BdclvvkR4fe7MMD8JjtaI2n6TQLFP3n0gunxGFCIKPuK+nJ4g0uO7XxcOTtHKVw9s43ah6X8le5sHJWcTxiYDsQ1YHAAYraW4lFr+C+mbKDNeX34CnrngVm2jk+BxMHiZTiASRum3lbu/1lctUbLk/v6BawJKEQTiPiTNm++SF8b/982mlzGbq0k+eKAj6HmUsd9dTwOxF9KqbMQ7c0k50r5T+hoFNNqsdgP5uDgweh39dLUqLy/xPzN4I4jGDM1cUBEuWnV2ojITiYK/lfrrcsO+oP7T6JehGcMavuurlvUjk7z1lOumxCbZC5yF2HsuUf48XPmMQL2EzVxnCbDj1/dyfkLQNqvDnYIvHNamZCui7bJKaD+5EQ6OZ8o3GG2tns3srzWkRi7ZDMIiPCAKQW/GDfxsC R8xNqrF4 O8pNXV46XNy0Ikc9iq6n36i0YbiLA+blZ4EW+UAu/cirwSPhVNwojkTCjaufOCk23MSfs6Y3+g4Ffe8j0pPy7gXTFXQUO+CM0tRrjhVdwojdR6KDt1uRdffP3EovfYg8ssmUNhHZa4xXVH9zBgiMVYRfflXREUi5ylvZHrlwm8tYch8KE/w6h05o131BGD93hOLztfTRCDQjqWX1TxZyZpeGRT4TlqbG6EuqA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are probably ways to have an even cleaner interface (e.g., pass the mapping granularity instead of "compound"). For now, let's handle it like folio_add_file_rmap_range(). Use separate loops for handling the "SetPageAnonExclusive()" case and performing debug checks. The latter should get optimized out automatically without CONFIG_DEBUG_VM. We'll use this function to batch rmap operations when PTE-remapping a PMD-mapped THP next. Signed-off-by: David Hildenbrand --- include/linux/rmap.h | 3 ++ mm/rmap.c | 69 +++++++++++++++++++++++++++++++++----------- 2 files changed, 55 insertions(+), 17 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 39aeab457f4a..76e6fb1dad5c 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -393,6 +393,9 @@ typedef int __bitwise rmap_t; * rmap interfaces called when adding or removing pte of page */ void folio_move_anon_rmap(struct folio *, struct vm_area_struct *); +void folio_add_anon_rmap_range(struct folio *, struct page *, + unsigned int nr_pages, struct vm_area_struct *, + unsigned long address, rmap_t flags); void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, diff --git a/mm/rmap.c b/mm/rmap.c index 689ad85cf87e..da7fa46a18fc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1240,25 +1240,29 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page, } /** - * page_add_anon_rmap - add pte mapping to an anonymous page - * @page: the page to add the mapping to - * @vma: the vm area in which the mapping is added - * @address: the user virtual address mapped - * @flags: the rmap flags + * folio_add_anon_rmap_range - add mappings to a page range of an anon folio + * @folio: The folio to add the mapping to + * @page: The first page to add + * @nr_pages: The number of pages which will be mapped + * @vma: The vm area in which the mapping is added + * @address: The user virtual address of the first page to map + * @flags: The rmap flags + * + * The page range of folio is defined by [first_page, first_page + nr_pages) * * The caller needs to hold the pte lock, and the page must be locked in * the anon_vma case: to serialize mapping,index checking after setting, - * and to ensure that PageAnon is not being upgraded racily to PageKsm - * (but PageKsm is never downgraded to PageAnon). + * and to ensure that an anon folio is not being upgraded racily to a KSM folio + * (but KSM folios are never downgraded). */ -void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, +void folio_add_anon_rmap_range(struct folio *folio, struct page *page, + unsigned int nr_pages, struct vm_area_struct *vma, unsigned long address, rmap_t flags) { - struct folio *folio = page_folio(page); - unsigned int nr, nr_pmdmapped = 0; + unsigned int i, nr, nr_pmdmapped = 0; bool compound = flags & RMAP_COMPOUND; - nr = __folio_add_rmap_range(folio, page, 1, vma, compound, + nr = __folio_add_rmap_range(folio, page, nr_pages, vma, compound, &nr_pmdmapped); if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr_pmdmapped); @@ -1279,12 +1283,20 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, } else if (likely(!folio_test_ksm(folio))) { __page_check_anon_rmap(folio, page, vma, address); } - if (flags & RMAP_EXCLUSIVE) - SetPageAnonExclusive(page); - /* While PTE-mapping a THP we have a PMD and a PTE mapping. */ - VM_WARN_ON_FOLIO((atomic_read(&page->_mapcount) > 0 || - (folio_test_large(folio) && folio_entire_mapcount(folio) > 1)) && - PageAnonExclusive(page), folio); + + if (flags & RMAP_EXCLUSIVE) { + for (i = 0; i < nr_pages; i++) + SetPageAnonExclusive(page + i); + } + for (i = 0; i < nr_pages; i++) { + struct page *cur_page = page + i; + + /* While PTE-mapping a THP we have a PMD and a PTE mapping. */ + VM_WARN_ON_FOLIO((atomic_read(&cur_page->_mapcount) > 0 || + (folio_test_large(folio) && + folio_entire_mapcount(folio) > 1)) && + PageAnonExclusive(cur_page), folio); + } /* * For large folio, only mlock it if it's fully mapped to VMA. It's @@ -1296,6 +1308,29 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, mlock_vma_folio(folio, vma); } +/** + * page_add_anon_rmap - add mappings to an anonymous page + * @page: The page to add the mapping to + * @vma: The vm area in which the mapping is added + * @address: The user virtual address of the page to map + * @flags: The rmap flags + * + * See folio_add_anon_rmap_range(). + */ +void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, + unsigned long address, rmap_t flags) +{ + struct folio *folio = page_folio(page); + unsigned int nr_pages; + + if (likely(!(flags & RMAP_COMPOUND))) + nr_pages = 1; + else + nr_pages = folio_nr_pages(folio); + + folio_add_anon_rmap_range(folio, page, nr_pages, vma, address, flags); +} + /** * folio_add_new_anon_rmap - Add mapping to a new anonymous folio. * @folio: The folio to add the mapping to. From patchwork Fri Nov 24 13:26:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A1DDC61DF4 for ; Fri, 24 Nov 2023 13:27:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12A528D0081; Fri, 24 Nov 2023 08:27:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 08BAA8D006E; Fri, 24 Nov 2023 08:27:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E208C8D0081; Fri, 24 Nov 2023 08:27:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C96118D006E for ; Fri, 24 Nov 2023 08:27:24 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A48AEA0643 for ; Fri, 24 Nov 2023 13:27:24 +0000 (UTC) X-FDA: 81492924408.18.F561411 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id DF8DD10000A for ; Fri, 24 Nov 2023 13:27:22 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=d7EWUSPs; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832442; a=rsa-sha256; cv=none; b=7/dsQ57uRE3GHEimSDkFLRKk6kiQBmW9eMORyVnZolbCYJiJkdA2f4pQ4NAbo2KqbcwBjA Aqsp6KJs4LCPJMwf+FGd2Mutw53eY0kTXSepNFwJvlb0LfRi/s198vNR2qnx3RHX02WkAq +Mf0YjrRnw73Muz8Y74M0NfFQloKvcI= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=d7EWUSPs; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832442; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vCiq1mHEXhReKPGNK7ma5q/JWzLE5up94nYDqOWMaw0=; b=HGTaes5DckR9iEAfIah5w95TU8LtG5xOyNANDUbdGd/a5R6VJAj/69pMeFa/bYuUcAPIzg c4fWs3RH2QYwmtfzw806buovLrzpyZ6M8VfKM8602iojf4++Uzkygom0qL/DqmZ6DkSpmF ba/j2/5H0nJEUp/qAeRBaj0tmD8EdSc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vCiq1mHEXhReKPGNK7ma5q/JWzLE5up94nYDqOWMaw0=; b=d7EWUSPsWGhkf2ycaVid5Yl+kZTACJr0EQKXwcLfO/F+mRmJ/r2YbDkXE3vMG7WHzIu0ro QTvngX7bjHq/kOmAfj1DRpqNgtUqrFwDSlrbMbpFzEwi9MVjE3nstj0KdlYyvmuBK4axmu lq0hBt7/HHqPjgsqV0CpMSIIQzw6YSc= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-654-EfryieQzPKuzYd766I-Umw-1; Fri, 24 Nov 2023 08:27:19 -0500 X-MC-Unique: EfryieQzPKuzYd766I-Umw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 805CE1C05142; Fri, 24 Nov 2023 13:27:18 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0717E2166B2A; Fri, 24 Nov 2023 13:27:14 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 13/20] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() Date: Fri, 24 Nov 2023 14:26:18 +0100 Message-ID: <20231124132626.235350-14-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: DF8DD10000A X-Stat-Signature: jpboqoibuh76959no3j7zou8s5otg1ru X-HE-Tag: 1700832442-188710 X-HE-Meta: U2FsdGVkX18g4yzXK37VdRFVzQSJWasA6xdS58AzyhT6+IlfmiVWZWfd18OlhgFek8UG/OPHYflmHpT76/b17a0hY28C+apOL1gytHRmrgXLMnBcAG3v2mfzkPPo8s3nghcjt01oQrx8u9r5EA5DBQdB8nE/T1kvTnDqWBV2FEIdvrz629cUP0yemiuDIuYVKQawIjr4v9UAu3a9XzM4dI9SIkabPVrec+B9jATuMFlYJgNdFAL2Rlbo7mK/ohfMeXefJWduFVXi1Jj/HT69pLPFt4zxpZEkQiHScFNiSspX29EZxGhyxoQgdHHWZSFuZrghBqIIaWiCvj/9p5p3T1K9FGamQBuUjJUFYht1IXTCk/hRLQq83nomSHEuN4I9lksw2Mxxsbzl4TvOmw5fFo8RYlNj2X0kWuMKbVWjMWsVw4tfyo0KMmpAHXE5ZuOT9B7fwiVOSFKn0DHNGR9uGyWPmX61GSH+FJ91T+p4GkNxUZIa8kl9R9dVbuaHjG/v5Wc3MQ4PrNGk2tmUNVzjfe74jaFRG807tZBf2N6h4OxhI1PnQvMJgYWJdLycGkOK7DDW11ipL+sXR7i/DhIVUQ1siTEpcyFVJM5JLgFzUjZ4rM3o+q9Wed7b6p+p/jYTFUvx3t+vFS5V9Exu47y7Y/MhvT0ZRuHr6H3vGpMHVPNV7OPOKsbJUA35fPiQwpXrt6hRVglgtEiKJ/QDX5W+g0rE9OECbOUYmh6UWrCzYl9URtaQtAVOAsiiNAyeWprU5bHsShv0mJ8mJ7f7p1L7vAuHmn5m5poXxmGO5ysNyPw6Gb4RAejjO3+RpCUvmdTh8XrPg2aLcUlUK1gtTlV5PmjS7nNL2P5ibUh0KNMOszWVAAs8WKgJUpak7c3Y21zFquc5MdvT8TS0vmn8a0mKnYq7UFE+Z7SpL4ysH9PP1uYIu/ziSyd0gVZzNInrV99n3KLOaZ0ShX77SjmXeCR ePU8dopm Ouq06Q58XTNjR1L+SVlVWzVzlI/8C3caTsWAnFoD/LylQWvgrl95CapfZtJR+OkqHsBq5S8OZv+WemCIWkN6LsT2HnL/JkiAg+35fuzG42LQV6MAijUfunn5NSlqgnZc0PmP+5dsottoucNPyPkPSOBeITw8mXxw5xu0IeO4P9PjkoCSHSORQhmXfAUmRASuD+YBoSH11jee+Lz9uizyR85zB+BiEgCox9K/1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's batch the rmap operations, as a preparation to making individual page_add_anon_rmap() calls more expensive. While at it, use more folio operations (but only in the code branch we're touching), use VM_WARN_ON_FOLIO(), and pass RMAP_COMPOUND instead of manually setting PageAnonExclusive. We should never see non-anon pages on that branch: otherwise, the existing page_add_anon_rmap() call would have been flawed already. Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fd7251923557..f47971d1afbf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2100,6 +2100,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, unsigned long haddr, bool freeze) { struct mm_struct *mm = vma->vm_mm; + struct folio *folio; struct page *page; pgtable_t pgtable; pmd_t old_pmd, _pmd; @@ -2195,16 +2196,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { page = pmd_page(old_pmd); + folio = page_folio(page); if (pmd_dirty(old_pmd)) { dirty = true; - SetPageDirty(page); + folio_set_dirty(folio); } write = pmd_write(old_pmd); young = pmd_young(old_pmd); soft_dirty = pmd_soft_dirty(old_pmd); uffd_wp = pmd_uffd_wp(old_pmd); - VM_BUG_ON_PAGE(!page_count(page), page); + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); /* * Without "freeze", we'll simply split the PMD, propagating the @@ -2221,11 +2224,18 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * * See page_try_share_anon_rmap(): invalidate PMD first. */ - anon_exclusive = PageAnon(page) && PageAnonExclusive(page); + anon_exclusive = PageAnonExclusive(page); if (freeze && anon_exclusive && page_try_share_anon_rmap(page)) freeze = false; - if (!freeze) - page_ref_add(page, HPAGE_PMD_NR - 1); + if (!freeze) { + rmap_t rmap_flags = RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags = RMAP_EXCLUSIVE; + folio_add_anon_rmap_range(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } } /* @@ -2268,8 +2278,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); if (write) entry = pte_mkwrite(entry, vma); - if (anon_exclusive) - SetPageAnonExclusive(page + i); if (!young) entry = pte_mkold(entry); /* NOTE: this may set soft-dirty too on some archs */ @@ -2279,7 +2287,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = pte_mksoft_dirty(entry); if (uffd_wp) entry = pte_mkuffd_wp(entry); - page_add_anon_rmap(page + i, vma, addr, RMAP_NONE); } VM_BUG_ON(!pte_none(ptep_get(pte))); set_pte_at(mm, addr, pte, entry); From patchwork Fri Nov 24 13:26:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7F30C61D97 for ; Fri, 24 Nov 2023 13:27:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5481D8D0082; Fri, 24 Nov 2023 08:27:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D18C8D006E; Fri, 24 Nov 2023 08:27:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FD388D0082; Fri, 24 Nov 2023 08:27:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1874A8D006E for ; Fri, 24 Nov 2023 08:27:30 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E2461A0379 for ; Fri, 24 Nov 2023 13:27:29 +0000 (UTC) X-FDA: 81492924618.14.5070BEF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 29B0F8001E for ; Fri, 24 Nov 2023 13:27:27 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TYfau4I+; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832448; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8wkdTzr4ovr6JqvWUuBFMPrmTwPV2fAdlmREPfnLqjk=; b=g01SjQ1ZgOi+s/JAk3KfTSmBlUw+sUJ+CZqAuG2kwx/IYIl1VdkJxOcZolLJnNfqx7zaMZ PNuqjSi8FSyThfpYOEfI9FinDKGNoF/WIIgub6w+Ftftpvfh+SPVauJoEb+DCwwARNCGi7 pJIAqXpPz+zT8jKbe8wHWgkct2sn4Ww= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832448; a=rsa-sha256; cv=none; b=4vUTmp2XjgDdXGNcI7PLuowBLW9W0VHa12LIqp5L4nCT0NZ+NaWT1ZLYKAOHV7XcoNTSAT LNEx43EPBDnXJvAIbgsa+Uu7+TIN2hW1noJpg7dOT5rkmgOuEaLEf48UpnqabEWZzPPyj1 am3+MYPbHgrjhfoPflROayM9p7cceP4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TYfau4I+; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832447; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8wkdTzr4ovr6JqvWUuBFMPrmTwPV2fAdlmREPfnLqjk=; b=TYfau4I+oMaNyqwKaqIB55GH8yEKtlFTQ4NL9nDH7yNom0puCSrQ6hoIl05J4g3ooDXu5J PAL7k+qDMVfRG+0z7BnltbPpsSr/y+ni2OzXde5wrpBVW20OvLy+ycsL9TdhgnBK5NtxZ0 FNe4+zlhtLvspvY5SO2EqRpuM6Sk2pY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-Mzyg-U9HMR2q8VnSaYE9mw-1; Fri, 24 Nov 2023 08:27:23 -0500 X-MC-Unique: Mzyg-U9HMR2q8VnSaYE9mw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F79C85A58C; Fri, 24 Nov 2023 13:27:22 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id E251F2166B2B; Fri, 24 Nov 2023 13:27:18 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 14/20] mm/huge_memory: avoid folio_refcount() < folio_mapcount() in __split_huge_pmd_locked() Date: Fri, 24 Nov 2023 14:26:19 +0100 Message-ID: <20231124132626.235350-15-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 29B0F8001E X-Rspam-User: X-Stat-Signature: uzfdowpy8qyk5h8w5tipogig8bf4tt9t X-Rspamd-Server: rspam03 X-HE-Tag: 1700832447-434140 X-HE-Meta: U2FsdGVkX19s2qY7T91buCTlr84J6cIRWnmCeNwTxLm9HMonqLuuflou+5oou1GGCqX0qVeZ7YaM9pjQwzz6a+6zP62ugIUr1TE1iuVipSW98IZ3iPUPkgZBbOk/KFAsy8+ZKi8bt5hO+VpoxofR73US7r8lFFICzwjb/NmlNTXR+Wciw6DM0lhuOkHeorSykVfRVzGMclP5EcdPNuNWo4qPxPwh39RDyt0qtJFIWm0BtXSLxjLA78wsvySjuLFi0ezxUcf48oCzvBCem9s0idOKZGd0sowSOv45phFFiS0ZW8Y57TJqv5NaxIk84RKGk9Mqg0DfUwuRgv+Zq2znAqyKKJH1YK0Q4gBm/vEGH8sWojqP4TgH76ArbOTsE3WVhi9O5VZ1rLSEAqNO8I/iZwboCa3tjLOT+696sUKLVPaJXnDHGYp6kJFPib2jAlJDigNemp1TtGIf3N1tZewwqo+f3aCBDOi8HvnGXYB0vqW0RUU7cu1EKAfi/EDnvocW7jmYrY5qNN13jUN81ehajs6xUUoU0tjDqf/UN7xF+q/UNjKx+g4yQnm8dxe7tj+zdkwPd58fhcWh51gMQh56sFxqv/EYXwFlIQ23H5Ni+WAQ24GepCSdNUuiWz0LwZVs/TbGTFc0S6b6XYp9A/fpPfN8/Tbqmb6H2T2zjVA37IXFBJ0Kxbf941qoRb48mpxR9daV6C9C4kKd4xhVJhUGZtltb0KdOmcQ0YbjfNLMzG1WEuQPJOalYApI8h0Cr5rgqUOOgvhIeTF2QqeKUH+Obyg7LP5Rr31zviobgtRnUileP8iZUsx1P28hKv7nO4JU7TPw57z8dEDtMMKFB4le5KZosynuOhMK3Dh7GbFbdB+YUHIz4h4iGCNHArcfqCXEv9bQiWrhO7oHXXIBgtf96P9z0ze+sQXuGE21h4Wpjb0/EQTBxLNJ7dFT9y00ooTVjPNXlWep1Fuwt7/qBK0 TJ5VY6PS sM67IgzC7m/BTjOnGyreEykUMMRY+ao8Tv8GMSWRKv3Us++/5qPIAGukzFZaji94Q9RnAd7mf2X7b/CwAb+k4vqnqugrtyWHFag/9yPlHz58yW/il837K3TeJTJumgQswdtd9BTARtVqnMWYOYa0CgU4qLnr+h+ZBHNXHPkPZRdHmk/CJdRkXJxz5+LKFawSRA7jvXPWlWe+ZNTB1cZm0qNO1wLO8Q3Agma58hLkeoaNQ4bkSCwDuR26+lr/Iys6M6dMedqgMm8CsTgM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, there is a short period in time where the refcount is smaller than the mapcount. Let's just make sure we obey the rules of refcount vs. mapcount: increment the refcount before incrementing the mapcount and decrement the refcount after decrementing the mapcount. While this could make code like can_split_folio() fail to detect other folio references, such code is (currently) racy already and this change shouldn't actually be considered a real fix but rather an improvement/ cleanup. The refcount vs. mapcount changes are now well balanced in the code, with the cost of one additional refcount change, which really shouldn't matter here that much -- we're usually touching >= 512 subpage mapcounts and much more after all. Found while playing with some sanity checks to detect such cases, which we might add at some later point. Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f47971d1afbf..9639b4edc8a5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2230,7 +2230,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, if (!freeze) { rmap_t rmap_flags = RMAP_NONE; - folio_ref_add(folio, HPAGE_PMD_NR - 1); + folio_ref_add(folio, HPAGE_PMD_NR); if (anon_exclusive) rmap_flags = RMAP_EXCLUSIVE; folio_add_anon_rmap_range(folio, page, HPAGE_PMD_NR, @@ -2294,10 +2294,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } pte_unmap(pte - 1); - if (!pmd_migration) + if (!pmd_migration) { page_remove_rmap(page, vma, true); - if (freeze) put_page(page); + } smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); From patchwork Fri Nov 24 13:26:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78059C61D97 for ; Fri, 24 Nov 2023 13:27:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 104178D0085; Fri, 24 Nov 2023 08:27:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 08CBC8D0084; Fri, 24 Nov 2023 08:27:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E54778D0085; Fri, 24 Nov 2023 08:27:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C952D8D006E for ; Fri, 24 Nov 2023 08:27:39 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8FC991CBC09 for ; Fri, 24 Nov 2023 13:27:39 +0000 (UTC) X-FDA: 81492925038.16.8BA4D2E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 8FBBE20016 for ; Fri, 24 Nov 2023 13:27:37 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dQWeGBfK; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832457; a=rsa-sha256; cv=none; b=LrNyyFL8YDIzImqamUqeyRG+S2z5qEqpRg/ws9Un+nxi3yHN45KCiRJKMkLhXitb/kmOmk 1seSykXfFsHr+yVk3i3/O/i/bFu8+7U23szwOTOa5XvkIkcc//CzT9crfPq6OkGUT/l+V7 QrTCvuKXTSu0124LJ6Hr3OdPYoUN8RE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dQWeGBfK; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832457; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qX2dA3aqMotkmkgBnT9s/2Q/bW+jQxPjHk8bIRl2TXE=; b=c4gSpZzkww3Pr5egrIGKYadrHI+24GGdBSwMPpXLZdWlpazG7/IDzhsIQKoTpEHD2BV3QC 9QJGiRrJRFWP3IF5sdAVXNgFJcm/ggFU/7ieIFm8o4hIOzD3gGSQbcOoU/Ps+dmktZSHoQ Y5Em6R5MZPHXAMRYS4a4QQrRLWz7Fc8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832456; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qX2dA3aqMotkmkgBnT9s/2Q/bW+jQxPjHk8bIRl2TXE=; b=dQWeGBfKXMPea9Kj/9qHdaRFhs6cbWq9G+QHWZkqxUminddsyiLqg5ILBCnQ+Wo/rkcETm zXzv++mcKEnuIMWkojOJbWVHLodNAEnCEXh1dke6pZ4PN0lLN3o9uK0a0GvPYKHe2/G6SM 4Lpjtx7snDBxf0B4qa2on9d+IaitCvM= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-198-yyv_70-8PeWkL3Q6JW939Q-1; Fri, 24 Nov 2023 08:27:26 -0500 X-MC-Unique: yyv_70-8PeWkL3Q6JW939Q-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CD8A52806053; Fri, 24 Nov 2023 13:27:25 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id DE44C2166B2B; Fri, 24 Nov 2023 13:27:22 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 15/20] mm/rmap_id: verify precalculated subids with CONFIG_DEBUG_VM Date: Fri, 24 Nov 2023 14:26:20 +0100 Message-ID: <20231124132626.235350-16-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8FBBE20016 X-Stat-Signature: txwjf15a31mkgeesxzie6kja6y8anmkh X-Rspam-User: X-HE-Tag: 1700832457-917296 X-HE-Meta: U2FsdGVkX181jlwCNJ9Nix1p/7CxduG5JeF7eAmcQlT0UxCk32mVgdLyd5i0igdCouct8PvVO41wz8VOWjyDidqZcYyKOTGppgDiwyhR+f+Ym2fDpNRb1Cc3LGXKcinLe/7SqRZA5sqGBUa/Gg9TwqekSPewojktTeXbIaMtsASHCa5Zyb6BoMba3Q4gQYWRbF8kbXbdjuIe7KoFqCPlJOdsQkPHIjjpuazluGQcUzoZjTZ940DX85U4Q639/AYlry0hjDGykblLj9ASodGqrednFK5+BWuThvsOqNBdgMyhMHDqOs7KiDKwRBqsD5wNHqucDFfBShWHFa+AparS59ctWkSUR4qJWEMJn5ODxmPA8cI+wTuyZmSt07XUeDNqpmy8w3VPkzvS8Dhbe9FxO/YCeiXZudM8yWOwQKe6hWF/GgqKEQyhsuKCBlSX9RaHiTp5ZIrKGxBiBaqKeUMIVHso93cF10yTT6nzUbM7Fci7uLKALVXsXw8SbHhnb9Ly3rTNdKQDgIIwHuHDQYVS1NsOEnDUxWmlCVC0chCsQMy8lyE2z0Eq6vFN/HkwNuc4+EwnBAkSAgBM0jFGKApoQ5XlYqDlzHCN3k4OawH2Y3F/K4xrgOC5a7ExfwaDnKJ6M/s6zwKyWhg8n/mNKJf0h2whEn5wX/DtwXTfECUFOP8Hpf1K1UCm5G2HWmHI9R3X8fuscDdbleKMPfvoZ+cvzGO4GgBtNYIK2ObRXMH/D81L5XVf1rUP6T9Jd1tAKzvRWxXEvEyROePCICRxV9rFxQoEjunqV5oJmNhdKxU6R1xYfBG5f5fG9l6d1foPtBW5hbiUwfEbhZ6AYXUgav2wbq2a/SQwGdYecs2E9xvwwfukuwfsSE1Octgn2nK8lmnlH5S8SVOOYV1pf0+aDs3AF6nZ9NG8u9yLP/ZtMrxxDawtwf4FekICnfHHenNhaiv+UjxXIJ+T7lPW0VY1Jdj 2ylUcaCR cuNW4U45Z1gWWmIjoOEhY8JZZNpy8/qFYDl/l9DH58Xpxzv78RcyTwN7QS1uUWknbyVRcZNduXoVVP/i/1EP9591pE9aOo1Jhb2iWCWjWTyEZCzuIoZkHkalhE8A4y1Yiz5oQ17GyA2XuOamNje/pdCe2w7xxbyrbjIE3CFiHQjdFpGV22XFbqC3gAo67kgNVT1DW3TpXpucxWi5Tyf2usNaHAw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's verify the precalculated subids for 4/5/6 values. Signed-off-by: David Hildenbrand --- mm/rmap_id.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/rmap_id.c b/mm/rmap_id.c index 6c3187547741..421d8d2b646c 100644 --- a/mm/rmap_id.c +++ b/mm/rmap_id.c @@ -481,3 +481,29 @@ void free_rmap_id(int id) ida_free(&rmap_ida, id); spin_unlock(&rmap_id_lock); } + +#ifdef CONFIG_DEBUG_VM +static int __init rmap_id_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(rmap_subids_4); i++) + WARN_ON_ONCE(calc_rmap_subid(1u << RMAP_SUBID_4_MAX_ORDER, i) != + rmap_subids_4[i]); + +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + for (i = 0; i < ARRAY_SIZE(rmap_subids_5); i++) + WARN_ON_ONCE(calc_rmap_subid(1u << RMAP_SUBID_5_MAX_ORDER, i) != + rmap_subids_5[i]); +#endif + +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + for (i = 0; i < ARRAY_SIZE(rmap_subids_6); i++) + WARN_ON_ONCE(calc_rmap_subid(1u << RMAP_SUBID_6_MAX_ORDER, i) != + rmap_subids_6[i]); +#endif + + return 0; +} +module_init(rmap_id_init) +#endif /* CONFIG_DEBUG_VM */ From patchwork Fri Nov 24 13:26:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72F09C624B4 for ; Fri, 24 Nov 2023 13:27:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08D528D0083; Fri, 24 Nov 2023 08:27:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 015C88D006E; Fri, 24 Nov 2023 08:27:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD2078D0083; Fri, 24 Nov 2023 08:27:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C87BC8D006E for ; Fri, 24 Nov 2023 08:27:36 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A9EC9A062A for ; Fri, 24 Nov 2023 13:27:36 +0000 (UTC) X-FDA: 81492924912.21.CB2EEA5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 0660D4001E for ; Fri, 24 Nov 2023 13:27:34 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=evv8hcdE; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832455; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IWxh+srF4eII3yzN2Ch68mHrKm0yO3+Xlp6snuZfYi4=; b=c2sf20GvgrQE5L2MpuGE6sW3MHTiZ20v5NmKmI10SXeLvWcGHkW2Cm7ObY92UevNNAXy+Y 1hE0l438RgRPaIjxnFbi4d7NI3G9q2HQBtHe+p/HEa6QMd5BzHa5cq+AXbWfX1ghHQmTkN tfNpuFrFBzcBLs9bnCmjU26RvAzQOpU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=evv8hcdE; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832455; a=rsa-sha256; cv=none; b=1mZZDHQDOQBQXnJCRqIEe2eojat2peZQGclFM2LMZp704VBsw/lU90/HxceNBGOLvye5S9 bSj4JiOZGlfGTTXxE0cmorJKnyy8+z2xFegdNXPRschlLFTNpBGip4UxnBzeeyWrcsWzef RbGr6Ih/Cd2ndDZLqMcgrSb10Tzm5Wg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IWxh+srF4eII3yzN2Ch68mHrKm0yO3+Xlp6snuZfYi4=; b=evv8hcdEYQIPUHMbWsl2JwW7m3xwWCTjaiLvCF8VyeLh7IMXw/waukZSIJioNWnHcA7s2d Rryh+p1yySbYSr2ZuSCKvWa3DD6BzLfOSQEpfV3e62v5gdtKHTqis64kr5tbe8eaqEpO3a Yvwacs+sdWlfWA+G5B8sFDBDRhHiz/c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-344-yC66DjjfM-2TnstsUc80dA-1; Fri, 24 Nov 2023 08:27:30 -0500 X-MC-Unique: yC66DjjfM-2TnstsUc80dA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5C903185A780; Fri, 24 Nov 2023 13:27:29 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 150282166B2A; Fri, 24 Nov 2023 13:27:25 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 16/20] atomic_seqcount: support a single exclusive writer in the absence of other writers Date: Fri, 24 Nov 2023 14:26:21 +0100 Message-ID: <20231124132626.235350-17-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 0660D4001E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: dsdqrby4hz4pzzyre6cyurcqx6azcrgy X-HE-Tag: 1700832454-705984 X-HE-Meta: U2FsdGVkX1/H7o15MOZminpO40WzIjdSWiHoP1ymK+ytDGpx5IU50F16raaRbEVvWuQhLGGq/Erh5cL3Jw0uFCHDCDx4yHO5b4yZv3nY4vNl+IYVvxSw4PFpdbU3fDs3OjFVOyK/NGZxyvzQbg7O6TLNhuKt4VcOvMEurVEQbrYwvC0PuEXLgw0bERKvYX9w5SBOq4TuGB9tUnhv1bWttLXx/nl0Lgv88LS1hfE/oouLofIzNSL5jXYsLSCwTUZSjW4FzwUi1PUOeVXRHwOb2sG0E1TsFIiGWp/AGGqVoRGkVRB1Zv5Qx4/5t8GgToI0RbpmyIb7GmW6fibyATVaz9xCaXvQmEC+IPwZIN/jqkAJ8WyvFoj5GnMdrAWifUyhkrM1OM5pS54NFG9a84TOnQgQsQt+QibpW1nUGOfjGF3oQ04cxww1Xz/dwz/rpwi3trkngZD4DK8NGS4CNeZiBlerP/HfPFIo9ipV5ReSbi0SYTe7WI2wu73cglF55v5VOjiVgrOvRLLIQkEO7LEaNrM/JsjF8YDIevvinnReCRBqslJUAPEZ22tGRnXGEW7hD8uwUjrGkulO0wep94+44ZzOjuvpH9AdAj0TNLOdunCzme1nDjIatyxSB/92vAqpCacpXkjcWGXxQDpR57Aql10FUJbua8WsY4Enc8Afin8PMm0IMervCLIJw/OawzKiNlmHAN9K76QU1yzA6EcZr5RO9GYI9HT82rsxJx83aZ3YKSfQymaEvRuoe5vIxMGF/aYGTUcSiNLav8YkBlpT89HxdQDGORaybkHTAjFBzd/5vusCdNT1mLkkZ2PMlKH1Vh+5+liI07d+k8qc85qyfWen4GAasPkwptTj0c6PHtPY3AFnEw4gBDXW0hbR980Az/+mMx3wL0vc5R4m81CBR8RCF8eFslSewjbqW5RyY9wiXhwoyIkuafheSO+AdtjlLHz9ChoM0qSOZseKIAX TEggiX7z 5zNSPmG1T3WjHNgcqPED1YoyHdWFbpeqCdez2mNkCdjgpUzJSv6ae4k/QcCXHBRfYXkTEq5KFrHqlFyIaLPk54kSE9dBgo/EWPlwTmSc3u+FjwrhHjBmnWrKfeYoqsAitSTtoxu+a5etHip9rbfyCJFEbhkIYqNDi0VO956t+Tc0yjvT/wgEvJfSWMSF/t8VLTbKn2n8LePc8bx8+VSyVAtCa77QgyNBRaJR+/1Zmj3TJgJRHoJqGja03JHsWa7jlBJzS1po6hp4negmx/1OElsUvMLTa/wbdOVrV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The current atomic seqcount requires that all writers must use atomic RMW operations in the critical section, which can result in quite some overhead on some platforms. In the common case, there is only a single writer, and ideally we'd be able to not use atomic RMW operations in that case, to reduce the overall number of atomic RMW operations on the fast path. So let's add support for a single exclusive writer. If there are no other writers, a writer can become the single exclusive writer by using an atomic cmpxchg on the atomic seqcount. However, if there is any concurrent writer (shared or exclusive), the writers become shared and only have to wait for a single exclusive writer to finish. So shared writers might be delayed a bit by the single exclusive writer, but they don't starve as they are guaranteed to make progress after the exclusive writer finished (that ideally runs faster than any shared writer due to no atomic RMW operations in the critical section). The exclusive path now effectively acts as a lock: if the trylock fails, we fallback to the shared path. We need acquire-release semantics that are implied by the full memory barriers that we are enforcing. Instead of the atomic_long_add_return(), we could keep using an atomic_long_add() + atomic_long_read(). But I suspect that doesn't really matter. If it ever matters, if will be easy to optimize. Signed-off-by: David Hildenbrand --- include/linux/atomic_seqcount.h | 101 ++++++++++++++++++++++++++------ include/linux/rmap.h | 5 +- 2 files changed, 85 insertions(+), 21 deletions(-) diff --git a/include/linux/atomic_seqcount.h b/include/linux/atomic_seqcount.h index 109447b663a1..00286a9da221 100644 --- a/include/linux/atomic_seqcount.h +++ b/include/linux/atomic_seqcount.h @@ -8,8 +8,11 @@ /* * raw_atomic_seqcount_t -- a reader-writer consistency mechanism with - * lockless readers (read-only retry loops), and lockless writers. - * The writers must use atomic RMW operations in the critical section. + * lockless readers (read-only retry loops), and (almost) lockless writers. + * Shared writers must use atomic RMW operations in the critical section, + * a single exclusive writer can avoid atomic RMW operations in the critical + * section. Shared writers will always have to wait for at most one exclusive + * writer to finish in order to make progress. * * This locking mechanism is applicable when all individual operations * performed by writers can be expressed using atomic RMW operations @@ -38,9 +41,10 @@ typedef struct raw_atomic_seqcount { /* 65536 CPUs */ #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x0000000000008000ul #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x000000000000fffful -#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000000000fffful +#define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER 0x0000000000010000ul +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000000001fffful /* We have 48bit for the actual sequence. */ -#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000000010000ul +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000000020000ul #else /* CONFIG_64BIT */ @@ -48,9 +52,10 @@ typedef struct raw_atomic_seqcount { /* 64 CPUs */ #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x00000040ul #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x0000007ful -#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x0000007ful -/* We have 25bit for the actual sequence. */ -#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x00000080ul +#define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER 0x00000080ul +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000fful +/* We have 24bit for the actual sequence. */ +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x00000100ul #endif /* CONFIG_64BIT */ @@ -126,44 +131,102 @@ static inline bool raw_read_atomic_seqcount_retry(raw_atomic_seqcount_t *s, /** * raw_write_seqcount_begin() - start a raw_seqcount_t write critical section * @s: Pointer to the raw_atomic_seqcount_t + * @try_exclusive: Whether to try becoming the exclusive writer. * * raw_write_seqcount_begin() opens the write critical section of the * given raw_seqcount_t. This function must not be used in interrupt context. + * + * Return: "true" when we are the exclusive writer and can avoid atomic RMW + * operations in the critical section. Otherwise, we are a shared + * writer and have to use atomic RMW operations in the critical + * section. Will always return "false" if @try_exclusive is not "true". */ -static inline void raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s) +static inline bool raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s, + bool try_exclusive) { + unsigned long seqcount, seqcount_new; + BUILD_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT)); #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT DEBUG_LOCKS_WARN_ON(in_interrupt()); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ preempt_disable(); - atomic_long_add(ATOMIC_SEQCOUNT_SHARED_WRITER, &s->sequence); - /* Store the sequence before any store in the critical section. */ - smp_mb__after_atomic(); + + /* If requested, can we just become the exclusive writer? */ + if (!try_exclusive) + goto shared; + + seqcount = atomic_long_read(&s->sequence); + if (unlikely(seqcount & ATOMIC_SEQCOUNT_WRITERS_MASK)) + goto shared; + + seqcount_new = seqcount | ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER; + /* + * Store the sequence before any store in the critical section. Further, + * this implies an acquire so loads within the critical section are + * not reordered to be outside the critical section. + */ + if (atomic_long_try_cmpxchg(&s->sequence, &seqcount, seqcount_new)) + return true; +shared: + /* + * Indicate that there is a shared writer, and spin until the exclusive + * writer is done. This avoids writer starvation, because we'll always + * have to wait for at most one writer. + * + * We spin with preemption disabled to not reschedule to a reader that + * cannot make any progress either way. + * + * Store the sequence before any store in the critical section. + */ + seqcount = atomic_long_add_return(ATOMIC_SEQCOUNT_SHARED_WRITER, + &s->sequence); #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT - DEBUG_LOCKS_WARN_ON((atomic_long_read(&s->sequence) & - ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK) > + DEBUG_LOCKS_WARN_ON((seqcount & ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK) > ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + if (likely(!(seqcount & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER))) + return false; + + while (atomic_long_read(&s->sequence) & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER) + cpu_relax(); + return false; } /** * raw_write_seqcount_end() - end a raw_seqcount_t write critical section * @s: Pointer to the raw_atomic_seqcount_t + * @exclusive: Return value of raw_write_atomic_seqcount_begin(). * * raw_write_seqcount_end() closes the write critical section of the * given raw_seqcount_t. */ -static inline void raw_write_atomic_seqcount_end(raw_atomic_seqcount_t *s) +static inline void raw_write_atomic_seqcount_end(raw_atomic_seqcount_t *s, + bool exclusive) { + unsigned long val = ATOMIC_SEQCOUNT_SEQUENCE_STEP; + + if (likely(exclusive)) { +#ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT + DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & + ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER)); +#endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + val -= ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER; + } else { #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT - DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & - ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK)); + DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & + ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK)); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ - /* Store the sequence after any store in the critical section. */ + val -= ATOMIC_SEQCOUNT_SHARED_WRITER; + } + /* + * Store the sequence after any store in the critical section. For + * the exclusive path, this further implies a release, so loads + * within the critical section are not reordered to be outside the + * cricial section. + */ smp_mb__before_atomic(); - atomic_long_add(ATOMIC_SEQCOUNT_SEQUENCE_STEP - - ATOMIC_SEQCOUNT_SHARED_WRITER, &s->sequence); + atomic_long_add(val, &s->sequence); preempt_enable(); } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 76e6fb1dad5c..0758dddc5528 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -295,12 +295,13 @@ static inline void __folio_write_large_rmap_begin(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); - raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount); + raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount, + false); } static inline void __folio_write_large_rmap_end(struct folio *folio) { - raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount); + raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount, false); } void __folio_set_large_rmap_val(struct folio *folio, int count, From patchwork Fri Nov 24 13:26:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7753FC624B4 for ; Fri, 24 Nov 2023 13:27:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32CF08D006E; Fri, 24 Nov 2023 08:27:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2AF348D0086; Fri, 24 Nov 2023 08:27:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F334E8D006E; Fri, 24 Nov 2023 08:27:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D83328D0084 for ; Fri, 24 Nov 2023 08:27:39 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6DC80A062A for ; Fri, 24 Nov 2023 13:27:39 +0000 (UTC) X-FDA: 81492925038.21.93A5E7C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 81F3240025 for ; Fri, 24 Nov 2023 13:27:37 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=edevLR9P; spf=pass (imf01.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832457; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Cfr/GJlqpI78eOuvYmduIMFro8OwYW0A0XSs9rIUuNI=; b=hq5LJjN9ORvhW3q5HWSCmmBF1RSAM8D6514iXcpDoBghZJ8TH1uJ77KVeoESpgyXEQ6XLq yvG7aPDOJEsS6yA3aj5+RQz7ztBzJhexJH7+T3z+hjSzf8Ka98d2mwLCeXMiHwWLGJSW6c H/RuQokNJgsDQ5mMYFiHPDU4Hrcqpuw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832457; a=rsa-sha256; cv=none; b=rRwe1OZPa2XKFPxeX7bJ+2Q11WolCnQ4v/gAuf6My4Jj3k3c6wnhbRqCdHxAKrNahV/FP8 rkrZs03SE5HmiZFPOzc2HIqByecx1lMCMHPVAbIZJoFfDOctpfwuF9o7r/YSlnBKrVWiOq ciCsvn2el85/52ojlbkxetoeNP6tRw0= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=edevLR9P; spf=pass (imf01.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832456; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cfr/GJlqpI78eOuvYmduIMFro8OwYW0A0XSs9rIUuNI=; b=edevLR9PUoRVCb52wiTofr7qqMmyXIV75KUhrV3YiriecMiJ3gMc+hKne6zkLNnvIi9aqO TXZaOsulLBX5Mc3C70pZ09EBJo861JVGyTWG3TfdZs394IxdVO23hMnMIoNQymvpyJU0f4 8T/rW2eaSTwJDZDNi/4DIfoa9HM0IHU= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-509-8WRNY9xWPlSFJQ6I882kqQ-1; Fri, 24 Nov 2023 08:27:33 -0500 X-MC-Unique: 8WRNY9xWPlSFJQ6I882kqQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 28DBB1C04357; Fri, 24 Nov 2023 13:27:33 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 953662166B2A; Fri, 24 Nov 2023 13:27:29 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 17/20] mm/rmap_id: reduce atomic RMW operations when we are the exclusive writer Date: Fri, 24 Nov 2023 14:26:22 +0100 Message-ID: <20231124132626.235350-18-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Stat-Signature: na9be34g8y1rxn98i3qrobmsod69nq35 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 81F3240025 X-Rspam-User: X-HE-Tag: 1700832457-147066 X-HE-Meta: U2FsdGVkX19YouEND1wCuqDvam3LhcI8MAUGCaBjLGblsd2e7Shl6aqxlVD9XbSJnFdHpMFqgKxE6ds+ebwH3shktuww0tytU9iKgNnejgTNWcFtRMXRvxMbitn0IFh9olzyXGKg5AS8UXBnXcp94kDuoMs/BANPZTvT4rkFEO6gn4rAouUqxmS3FIa8hnXf5gwlKFAxdDW6bv1aQwi5rSfUSNg4qLypn5/++bDbi1r4EAwDMn7mPgMRsxkjoLtIyQHNWBldgL3Si635NuSeoDbmheMPQ3bgwVkTjp0NYhI/UsQRrbWVE5MWlwGSvQHhou/3pG6GCk5fwze20+CrstjQoRshgTFRs+SE+sU2eFyh8eVFDYVMmrXMsftEGoES2pETeB/2WZsl2/Vrl+NttALG5R3tfqF8VgKtJPQgjT9SNHcfcUPum1E8ZUnqhBqGrNyfMShDqJu2zjqXWBf4RSnLJQmhAmTthe7QertZ3tKY9Q6ceL46HNP6eh8uwWHvfDMMRwtqD7mFr5QvDz/vIxK/RVJFsrOCE3MUNkfh/XbYY+yLrTNJKSkTGoNXwKNw43nYR7/JYVq+pe4gxYhY/DyP7TLJaUpmkzF5Vk4DcgGpnKFZI1DuhtoY5HgwAvWijwP9RnWnprKF2xfpku2WwVzgAmuOGJ6zc+Jlps6df4mE9Wx8E2SBd5IpTpDO2mJLX4Ba1D3U8aIh2l2P4Vyahgoomyi7NEwLHpW4DcHQcfJ2ycK62HQkA3saWyz3EAPml6gajt+xb4VCQdKBKsYoYOYGcLKqN6i97CAOwaM371RgcK+SNrEXpew+uWBdX4zA2fcFVdVpjCBuBgqM+rhwQOPyu/ncPAFXN87XSCVDsqGitaNeV4cCDTnj762lkLZ/xeNBN0arCIrxb7YyDNayBk61wqgMLLWhUvlRQge5U43JAzg7TEs4aiBrOm7qghL7O/tnip4S3Cf+o7ts6/V WaFl9Rm8 WlBZlQ/Cgu5On/ByoFnm7nSSZz+2nP7yicCJgWG+aF9gEHO5W6lQr0utjnJqKYrSbUzKedpdy8WDq10HhYmxOgJ6g9LPDYkFSmWOV1lcs71paq6fdCIB4iACLgWYi2L8aLJ0Y3EW6vSzi79Qq0f66mPyNRwyS670n2IETp/ssnJR2fGlVWWlVRhv9LQ3qYIaO840do86Ex5wlexRl2lWgk4O7FvkdO2M/mA8r X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We can reduce the number of atomic RMW operations when we are the single exclusive writer -- the common case. So instead of always requiring (1) 2 atomic RMW operations for adjusting the atomic seqcount (2) 1 atomic RMW operation for adjusting the total mapcount (3) 1 to 6 atomic RMW operation for adjusting the rmap values We can avoid (2) and (3) if we are the exclusive writer and limit it to the 2 atomic RMW operations from (1). Signed-off-by: David Hildenbrand --- include/linux/rmap.h | 81 +++++++++++++++++++++++++++++++++----------- mm/rmap_id.c | 52 ++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+), 19 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0758dddc5528..538c23d3c0c9 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -291,23 +291,36 @@ static inline void __folio_undo_large_rmap(struct folio *folio) #endif } -static inline void __folio_write_large_rmap_begin(struct folio *folio) +static inline bool __folio_write_large_rmap_begin(struct folio *folio) { + bool exclusive; + VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); - raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount, - false); + + exclusive = raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount, + true); + if (likely(exclusive)) { + prefetchw(&folio->_rmap_val0); + if (unlikely(folio_order(folio) > RMAP_SUBID_4_MAX_ORDER)) + prefetchw(&folio->_rmap_val4); + } + return exclusive; } -static inline void __folio_write_large_rmap_end(struct folio *folio) +static inline void __folio_write_large_rmap_end(struct folio *folio, + bool exclusive) { - raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount, false); + raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount, + exclusive); } void __folio_set_large_rmap_val(struct folio *folio, int count, struct mm_struct *mm); void __folio_add_large_rmap_val(struct folio *folio, int count, struct mm_struct *mm); +void __folio_add_large_rmap_val_exclusive(struct folio *folio, int count, + struct mm_struct *mm); bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, struct mm_struct *mm); #else @@ -317,12 +330,14 @@ static inline void __folio_prep_large_rmap(struct folio *folio) static inline void __folio_undo_large_rmap(struct folio *folio) { } -static inline void __folio_write_large_rmap_begin(struct folio *folio) +static inline bool __folio_write_large_rmap_begin(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + return false; } -static inline void __folio_write_large_rmap_end(struct folio *folio) +static inline void __folio_write_large_rmap_end(struct folio *folio, + bool exclusive) { } static inline void __folio_set_large_rmap_val(struct folio *folio, int count, @@ -333,6 +348,10 @@ static inline void __folio_add_large_rmap_val(struct folio *folio, int count, struct mm_struct *mm) { } +static inline void __folio_add_large_rmap_val_exclusive(struct folio *folio, + int count, struct mm_struct *mm) +{ +} #endif /* CONFIG_RMAP_ID */ static inline void folio_set_large_mapcount(struct folio *folio, @@ -348,28 +367,52 @@ static inline void folio_set_large_mapcount(struct folio *folio, static inline void folio_inc_large_mapcount(struct folio *folio, struct vm_area_struct *vma) { - __folio_write_large_rmap_begin(folio); - atomic_inc(&folio->_total_mapcount); - __folio_add_large_rmap_val(folio, 1, vma->vm_mm); - __folio_write_large_rmap_end(folio); + bool exclusive; + + exclusive = __folio_write_large_rmap_begin(folio); + if (likely(exclusive)) { + atomic_set(&folio->_total_mapcount, + atomic_read(&folio->_total_mapcount) + 1); + __folio_add_large_rmap_val_exclusive(folio, 1, vma->vm_mm); + } else { + atomic_inc(&folio->_total_mapcount); + __folio_add_large_rmap_val(folio, 1, vma->vm_mm); + } + __folio_write_large_rmap_end(folio, exclusive); } static inline void folio_add_large_mapcount(struct folio *folio, int count, struct vm_area_struct *vma) { - __folio_write_large_rmap_begin(folio); - atomic_add(count, &folio->_total_mapcount); - __folio_add_large_rmap_val(folio, count, vma->vm_mm); - __folio_write_large_rmap_end(folio); + bool exclusive; + + exclusive = __folio_write_large_rmap_begin(folio); + if (likely(exclusive)) { + atomic_set(&folio->_total_mapcount, + atomic_read(&folio->_total_mapcount) + count); + __folio_add_large_rmap_val_exclusive(folio, count, vma->vm_mm); + } else { + atomic_add(count, &folio->_total_mapcount); + __folio_add_large_rmap_val(folio, count, vma->vm_mm); + } + __folio_write_large_rmap_end(folio, exclusive); } static inline void folio_dec_large_mapcount(struct folio *folio, struct vm_area_struct *vma) { - __folio_write_large_rmap_begin(folio); - atomic_dec(&folio->_total_mapcount); - __folio_add_large_rmap_val(folio, -1, vma->vm_mm); - __folio_write_large_rmap_end(folio); + bool exclusive; + + exclusive = __folio_write_large_rmap_begin(folio); + if (likely(exclusive)) { + atomic_set(&folio->_total_mapcount, + atomic_read(&folio->_total_mapcount) - 1); + __folio_add_large_rmap_val_exclusive(folio, -1, vma->vm_mm); + } else { + atomic_dec(&folio->_total_mapcount); + __folio_add_large_rmap_val(folio, -1, vma->vm_mm); + } + __folio_write_large_rmap_end(folio, exclusive); } /* RMAP flags, currently only relevant for some anon rmap operations. */ diff --git a/mm/rmap_id.c b/mm/rmap_id.c index 421d8d2b646c..5009c6e43965 100644 --- a/mm/rmap_id.c +++ b/mm/rmap_id.c @@ -379,6 +379,58 @@ void __folio_add_large_rmap_val(struct folio *folio, int count, } } +void __folio_add_large_rmap_val_exclusive(struct folio *folio, int count, + struct mm_struct *mm) +{ + const unsigned int order = folio_order(folio); + + /* + * Concurrent rmap value modifications are impossible. We don't care + * about store tearing because readers will realize the concurrent + * updates using the seqcount and simply retry. So adjust the bare + * atomic counter instead. + */ + switch (order) { +#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER + case RMAP_SUBID_6_MIN_ORDER ... RMAP_SUBID_6_MAX_ORDER: + folio->_rmap_val0.counter += get_rmap_subid_6(mm, 0) * count; + folio->_rmap_val1.counter += get_rmap_subid_6(mm, 1) * count; + folio->_rmap_val2.counter += get_rmap_subid_6(mm, 2) * count; + folio->_rmap_val3.counter += get_rmap_subid_6(mm, 3) * count; + folio->_rmap_val4.counter += get_rmap_subid_6(mm, 4) * count; + folio->_rmap_val5.counter += get_rmap_subid_6(mm, 5) * count; + break; +#endif +#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER + case RMAP_SUBID_5_MIN_ORDER ... RMAP_SUBID_5_MAX_ORDER: + folio->_rmap_val0.counter += get_rmap_subid_5(mm, 0) * count; + folio->_rmap_val1.counter += get_rmap_subid_5(mm, 1) * count; + folio->_rmap_val2.counter += get_rmap_subid_5(mm, 2) * count; + folio->_rmap_val3.counter += get_rmap_subid_5(mm, 3) * count; + folio->_rmap_val4.counter += get_rmap_subid_5(mm, 4) * count; + break; +#endif + case RMAP_SUBID_4_MIN_ORDER ... RMAP_SUBID_4_MAX_ORDER: + folio->_rmap_val0.counter += get_rmap_subid_4(mm, 0) * count; + folio->_rmap_val1.counter += get_rmap_subid_4(mm, 1) * count; + folio->_rmap_val2.counter += get_rmap_subid_4(mm, 2) * count; + folio->_rmap_val3.counter += get_rmap_subid_4(mm, 3) * count; + break; + case RMAP_SUBID_3_MIN_ORDER ... RMAP_SUBID_3_MAX_ORDER: + folio->_rmap_val0.counter += get_rmap_subid_3(mm, 0) * count; + folio->_rmap_val1.counter += get_rmap_subid_3(mm, 1) * count; + folio->_rmap_val2.counter += get_rmap_subid_3(mm, 2) * count; + break; + case RMAP_SUBID_2_MIN_ORDER ... RMAP_SUBID_2_MAX_ORDER: + folio->_rmap_val0.counter += get_rmap_subid_2(mm, 0) * count; + folio->_rmap_val1.counter += get_rmap_subid_2(mm, 1) * count; + break; + default: + folio->_rmap_val0.counter += get_rmap_subid_1(mm); + break; + } +} + bool __folio_has_large_matching_rmap_val(struct folio *folio, int count, struct mm_struct *mm) { From patchwork Fri Nov 24 13:26:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C1A0C624B4 for ; Fri, 24 Nov 2023 13:27:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB61D8D0086; Fri, 24 Nov 2023 08:27:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B138F8D0084; Fri, 24 Nov 2023 08:27:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 940358D0086; Fri, 24 Nov 2023 08:27:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 79B648D0084 for ; Fri, 24 Nov 2023 08:27:43 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4C951B65DA for ; Fri, 24 Nov 2023 13:27:43 +0000 (UTC) X-FDA: 81492925206.30.82B02C2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 8234A10000C for ; Fri, 24 Nov 2023 13:27:41 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FuIzEwOH; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832461; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uVx5jgkX7VDXu2BOJ112MWVTaXFaq3iLz1p8EjK2pug=; b=mkT+pd4wn54viHz/LwuoCrDQeZ8gu/E5xssrqPk84CBBbqmmTCyilU9U41imRAPYfWWXfa X5JbI7tbJzUODrUEcMCKh6jUZdj41pbSxT+bU6U4WoZ9n+I9zrUZl7CF8Ma+5Xdli13ZfJ 3NTZmpSSlKCZBjh/9JHkB3fWvyy70Ys= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FuIzEwOH; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832461; a=rsa-sha256; cv=none; b=K4pOFHbwKwNc5zypi6T5YrpwleP6r38SQXBl1UJc1LcbsvtXLVMGmJqxoN+erz4Sj8/ZwS 4s8Zhos6wqr54V34jl5V1LZdoYPa8nWJuIVP2RTq14uJw1/mbUYTznWdPLaz6S/810RJCa WMIoH3lKX8GEIxelco+G5nG5TKNs+OQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832460; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uVx5jgkX7VDXu2BOJ112MWVTaXFaq3iLz1p8EjK2pug=; b=FuIzEwOHwTZrspx/x3A2KR/uG1WseChB2hi9BSZHUwTB3gLM9kvhN2rcmCmsCc2pqByx5f pwS7csN6BTqbUr7SwNZx6te60nD9XPwDkGkhKA8waJ4C9smlFzac1P7xOv0MccYupgTUmK xP9AMepFQHjcDEg2h4m8/ts0Malra6w= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-77-x3WGX4woMCePcALEhUGVJg-1; Fri, 24 Nov 2023 08:27:37 -0500 X-MC-Unique: x3WGX4woMCePcALEhUGVJg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E919E185A781; Fri, 24 Nov 2023 13:27:36 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 887C32166B2A; Fri, 24 Nov 2023 13:27:33 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 18/20] atomic_seqcount: use atomic add-return instead of atomic cmpxchg on 64bit Date: Fri, 24 Nov 2023 14:26:23 +0100 Message-ID: <20231124132626.235350-19-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspam-User: X-Stat-Signature: hb1747cjk7pnai36hsi3aqbma4iu8ium X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8234A10000C X-HE-Tag: 1700832461-101287 X-HE-Meta: U2FsdGVkX19mg5yjCLCLyu5GjMV0u/FsT/ntAmvpefy67Skr9BMV3Cg8/JCHISTev7weGlL72HyLg1NhSw8qaEu8RkcoXxE7OE1WvvzV2qLHJoqi9mq6PYL60unC/uV/KgrJf0NetAvzizK3I85sItOWc3cOk6Ax4hgRb62fpDrQa5PWSxQfaRjtOysJyQzJynUiHh3Cr/4Vg6ohQjmvHhKJkR5ywBtCrH1VHT/vc0VkX7SHypO9JxCj4Eur9tUufCHkmWxzn1wThAmfUcGqLFRKzJAlf0lmRsrk7KITc4Wf1MG4wTTsnVOzMHXTMFnicq9/GvjBNHm1oS81nFmAexInlBk+cz5JcaRPK1b9I2BU2lF7FwaLj6j+wv4B658oqdNW2r/e05xnHFLMd7zmeT35tD6k5hKW3QnT9QKDM2vFL3vm2wQZ3Xl8URTHZgi/1xp66TEPt8PLUScX2Jj8mkVF8j4tDCONGTeUggv7/3V/fNxnx64QNzGVGQsldgg5+cM9j61wPYfoEQIQf82gJBZUjbC93Rk8+lhhpoljU/2oBWeYW8Rss+Xioy25MG5gWt0Uf9BqBV62vdsGSiauM2n+Flr48CNwrFJCgo2ce2bR30QNL9XznxXuD9LFwQWZk3C23zc3ly8Q1fLDvAB+Y6WmvISxhboH2orWZjwU7B3dga/MEFgalb0nBt43/z7IobHqMnTLPpNM3EkYTl4LjPZffJRGzC5lLiL1T74XbMQYUk4+nYjsks59ow6/qIAZI9JDjp8e6u2mBMYFckXz2Xe3HWbxYSRZ7P8+vdExN2++++6GJmuk4BxcSowX9S9Xpqus0w2wR2DJy4X+Yow0Szi9QLUpXWuZ247vqwzQ5zz/tO9RjyiXFxfr20PRuGp/TnTTJw80KfeTPmI9erl5/u7msHVWBENt0b7OdJ/EOd/EpAMAj+Q84vTPJUsUCowBkF+CReoBvEkqlN+3c8Z TpeNqyZV eq1tp8IlmvbkMRj0tMjJPYGxNW7D410Yva4IQnHVL3RbpoA3zmD7jr+RHtyAqt/wAdXff5B42JVjkj3d/IjCjOM5rHbcLYf622y4G/XwF6+0vHEIi88+zW7rqRuEWEhA9L3xAvlsWpoYvp2j1aC+86TmxF0eYRq5T3rsOhpFg8MHCYwbjnSpnZU40EQLdOPEZvYW6RokljuRXrokvzNflRqRp2Zua++BiFIQ7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Turns out that it can be beneficial on some HW to use an add-return instead of and atomic cmpxchg. However, we have to deal with more possible races now: in the worst case, each and every CPU might try becoming the exclusive writer at the same time, so we need the same number of bits as for the shared writer case. In case we detect that we didn't end up being the exclusive writer, simply back off and convert to a shared writer. Only implement this optimization on 64bit, where we can steal more bits from the actual sequence without sorrow. Signed-off-by: David Hildenbrand --- include/linux/atomic_seqcount.h | 43 +++++++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 7 deletions(-) diff --git a/include/linux/atomic_seqcount.h b/include/linux/atomic_seqcount.h index 00286a9da221..9cd40903863d 100644 --- a/include/linux/atomic_seqcount.h +++ b/include/linux/atomic_seqcount.h @@ -42,9 +42,10 @@ typedef struct raw_atomic_seqcount { #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x0000000000008000ul #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x000000000000fffful #define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER 0x0000000000010000ul -#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000000001fffful -/* We have 48bit for the actual sequence. */ -#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000000020000ul +#define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK 0x00000000ffff0000ul +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x00000000fffffffful +/* We have 32bit for the actual sequence. */ +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000100000000ul #else /* CONFIG_64BIT */ @@ -53,6 +54,7 @@ typedef struct raw_atomic_seqcount { #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x00000040ul #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x0000007ful #define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER 0x00000080ul +#define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK 0x00000080ul #define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000fful /* We have 24bit for the actual sequence. */ #define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x00000100ul @@ -144,7 +146,7 @@ static inline bool raw_read_atomic_seqcount_retry(raw_atomic_seqcount_t *s, static inline bool raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s, bool try_exclusive) { - unsigned long seqcount, seqcount_new; + unsigned long __maybe_unused seqcount, seqcount_new; BUILD_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT)); #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT @@ -160,6 +162,32 @@ static inline bool raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s, if (unlikely(seqcount & ATOMIC_SEQCOUNT_WRITERS_MASK)) goto shared; +#ifdef CONFIG_64BIT + BUILD_BUG_ON(__builtin_popcount(ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK) != + __builtin_popcount(ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK)); + + /* See comment for atomic_long_try_cmpxchg() below. */ + seqcount = atomic_long_add_return(ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER, + &s->sequence); + if (likely((seqcount & ATOMIC_SEQCOUNT_WRITERS_MASK) == + ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER)) + return true; + + /* + * Whoops, we raced with another writer. Back off, converting ourselves + * to a shared writer and wait for any exclusive writers. + */ + atomic_long_add(ATOMIC_SEQCOUNT_SHARED_WRITER - ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER, + &s->sequence); + /* + * No need for __smp_mb__after_atomic(): the reader side already + * realizes that it has to retry and the memory barrier from + * atomic_long_add_return() is sufficient for that. + */ + while (atomic_long_read(&s->sequence) & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK) + cpu_relax(); + return false; +#else seqcount_new = seqcount | ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER; /* * Store the sequence before any store in the critical section. Further, @@ -168,6 +196,7 @@ static inline bool raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s, */ if (atomic_long_try_cmpxchg(&s->sequence, &seqcount, seqcount_new)) return true; +#endif shared: /* * Indicate that there is a shared writer, and spin until the exclusive @@ -185,10 +214,10 @@ static inline bool raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s, DEBUG_LOCKS_WARN_ON((seqcount & ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK) > ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ - if (likely(!(seqcount & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER))) + if (likely(!(seqcount & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK))) return false; - while (atomic_long_read(&s->sequence) & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER) + while (atomic_long_read(&s->sequence) & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK) cpu_relax(); return false; } @@ -209,7 +238,7 @@ static inline void raw_write_atomic_seqcount_end(raw_atomic_seqcount_t *s, if (likely(exclusive)) { #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & - ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER)); + ATOMIC_SEQCOUNT_EXCLUSIVE_WRITERS_MASK)); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ val -= ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER; } else { From patchwork Fri Nov 24 13:26:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08AB0C636CB for ; Fri, 24 Nov 2023 13:27:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 177BB8D0087; Fri, 24 Nov 2023 08:27:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DCA48D0084; Fri, 24 Nov 2023 08:27:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E70C98D0087; Fri, 24 Nov 2023 08:27:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CAA548D0084 for ; Fri, 24 Nov 2023 08:27:44 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A48BF8025B for ; Fri, 24 Nov 2023 13:27:44 +0000 (UTC) X-FDA: 81492925248.16.83EADA8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 05341100009 for ; Fri, 24 Nov 2023 13:27:42 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Hizn47st; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832463; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BnEHwgivpdbaZgh54Ih7L1AobW4tDWwZEN96Ch5d2sI=; b=aXTuN9CTz8/Ag01FQj/S9NSnyP4Hjv9vcJZLQas6ErPZHYfOKbvOkFls05bU+CP6OEZrRW /4VWwV/ojutrxX01cwgIT13E7TyGTwSd3PzTXPqpAAqKIqJ6QUx25XW+4gzHAfff9mabmj d50eaVvS5xwDIOjuv1JgtDXEVIv+tVk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Hizn47st; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832463; a=rsa-sha256; cv=none; b=wSq/VPlECxJYYCxdPjIXuWaDeQ8s/5EiecU9jxwy6ubGnffaPe9pe/juLjCBykFzEhVG/I JwJBNva9CCUbz/QgrV6qu8t1tjKgLLyG0vDdZ1m+VeSVLUxt1+ILicd7Oigne1g5xUAN0W uZPDq6OArRy1CPQP+YAC18amtYKly0s= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832462; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BnEHwgivpdbaZgh54Ih7L1AobW4tDWwZEN96Ch5d2sI=; b=Hizn47stOY0bjcCAapanjWMo1DctKX6iH+q4hCiCWA0XhmT2hRBOUWfHfCXBdsEZQQsJiU PwgKXtiLA4TJss8Dutnh2w5T8tDYEclkU4NKPAOWGGLBOp0UWz2wZUCxxhYje3Xa4IQL8/ TGJ7vu7hxWsuWfYCnpXM3IRr5scvSJ8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-542-EU4DmTN2O1ar0sjJ8gqpbQ-1; Fri, 24 Nov 2023 08:27:41 -0500 X-MC-Unique: EU4DmTN2O1ar0sjJ8gqpbQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3AD05185A783; Fri, 24 Nov 2023 13:27:40 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 550AE2166B2A; Fri, 24 Nov 2023 13:27:37 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 19/20] mm/rmap: factor out removing folio range into __folio_remove_rmap_range() Date: Fri, 24 Nov 2023 14:26:24 +0100 Message-ID: <20231124132626.235350-20-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 05341100009 X-Stat-Signature: rooe8jmwpmz1e7wgutdjhk7zz5xczse3 X-Rspam-User: X-HE-Tag: 1700832462-979557 X-HE-Meta: U2FsdGVkX19636OgypDEZDzTBk/FG/t/McmHtJ9Uh4nD+MyyvGWaUhVxmZE7F7wlSUtaDZpqnBE25NOEy53CA82wKO0AyHqwDEwjRi3iniyBNMhzPq151+UTqYM3LFQWtkBBlfC7SQZmDgdL07tMijAzLjHEsLhWsXVgqu6xBC0b95dWljV9dLWNqh4YqkUcdaOEAQpkwIdjivqMk0JpqI/TejJ91gYx3MOTn/HPB8e38tEx1yxeWaUq/BX40pQ5pGBPC2kmGzjEFwiEWJ7ONSGjTiyDhNcNXCfC3njT5lLFD80r7LSeqaY9guRE8Qd6KCiNB3sMW3hrcrzIa0/kG5HJ3D6jL6xXooBoWRBVxzQA4EoOfJJ4V6zcWqOHOMMGEgFOY92raW8NMK+WWR1NU2phOPIqa0wOLJb5KXiEoMh8snsZBHt0gSSW91N2/hQWDVBfhaHljaRqzN9BDsonEBvZEXC0/q/1cf1nqIf8oUTU3lcoJchA6g6J5pxByodalQh0gCDPh6qXrceKXekzZ/H7uxMjcaIVz9tvAfFHMvCIRurb82udDI2tT2aqI5s0mpxJCBeB1OtIMuOYMJig4F8H6s+9QWuhxpqlznyajiRZE8JaVxe2grY00pjcWxBGkAY/BWW/9ZLQLo15NTqehru5clwnnnZqbx26l7PK920/5WI+X3RaK+60eqZxglgECBac0J0K5wuu1RPQ2ZFfXnoc40Wz7/GpM4EbFNyGhbmQ+AED9BGDSD8JKl5fWESAwkSRsFTU71RGgzkAGkWIXokVdCQz5Qcjjy6tQEDPsU01LUrjZYRJfecTxnfk9UbRUZrU/c16pklgprNY+O7UeDyMx7RPnTakmC+MAScyfVaIG7O5uXaIc38mk3kMvjswtDxwaJCPMVuaPGWvOwu2jq+ZKFaqclypDyoLtSAEpEmx2iomt65xBUMKlfRUn8YZE6pp7LT6D/p7cNJkxex TjXsN9xH 4gsvIcuCi6ph6D9/XUxwN3pkBaGDdDEhRFCOoIoBCDzfQ6xcMKOPR5Stf+m6D/xCQifFVVHuyNDW9/m5l5nsRfbBbdazIoUDaulebRTOa6naDi8xX/wlO1W46zSQkIwnNvNgeMTXjDBNq2d9b/gdZpFvDjEYRetsUxPVrAzmXv0tEC9dmHW9O9uPYAkvSUMR7NciaDu1D0liRqiVEoELH5Zb72XFQKOKscuK7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's factor it out, optimize for small folios, and compact it a bit. Well, we're adding the range part, but that will surely come in handy soon -- and it's now wasier to compare it with __folio_add_rmap_range(). Signed-off-by: David Hildenbrand --- mm/rmap.c | 90 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 35 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index da7fa46a18fc..80ac53633332 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1155,6 +1155,57 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, return nr; } +static unsigned int __folio_remove_rmap_range(struct folio *folio, + struct page *page, unsigned int nr_pages, + struct vm_area_struct *vma, bool compound, int *nr_pmdmapped) +{ + atomic_t *mapped = &folio->_nr_pages_mapped; + int last, count, nr = 0; + + VM_WARN_ON_FOLIO(compound && page != &folio->page, folio); + VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); + VM_WARN_ON_FOLIO(compound && nr_pages != folio_nr_pages(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_large(folio) && nr_pages != 1, folio); + + if (likely(!folio_test_large(folio))) + return atomic_add_negative(-1, &page->_mapcount); + + /* Is page being unmapped by PTE? Is this its last map to be removed? */ + if (!compound) { + folio_add_large_mapcount(folio, -nr_pages, vma); + count = nr_pages; + do { + last = atomic_add_negative(-1, &page->_mapcount); + if (last) { + last = atomic_dec_return_relaxed(mapped); + if (last < COMPOUND_MAPPED) + nr++; + } + } while (page++, --count > 0); + } else if (folio_test_pmd_mappable(folio)) { + /* That test is redundant: it's for safety or to optimize out */ + + folio_dec_large_mapcount(folio, vma); + last = atomic_add_negative(-1, &folio->_entire_mapcount); + if (last) { + nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped); + if (likely(nr < COMPOUND_MAPPED)) { + *nr_pmdmapped = folio_nr_pages(folio); + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); + /* Raced ahead of another remove and an add? */ + if (unlikely(nr < 0)) + nr = 0; + } else { + /* An add of COMPOUND_MAPPED raced ahead */ + nr = 0; + } + } + } else { + VM_WARN_ON_ONCE_FOLIO(true, folio); + } + return nr; +} + /** * folio_move_anon_rmap - move a folio to our anon_vma * @folio: The folio to move to our anon_vma @@ -1439,13 +1490,10 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, bool compound) { struct folio *folio = page_folio(page); - atomic_t *mapped = &folio->_nr_pages_mapped; - int nr = 0, nr_pmdmapped = 0; - bool last; + unsigned long nr_pages = compound ? folio_nr_pages(folio) : 1; + unsigned int nr, nr_pmdmapped = 0; enum node_stat_item idx; - VM_BUG_ON_PAGE(compound && !PageHead(page), page); - /* Hugetlb pages are not counted in NR_*MAPPED */ if (unlikely(folio_test_hugetlb(folio))) { /* hugetlb pages are always mapped with pmds */ @@ -1454,36 +1502,8 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, return; } - if (folio_test_large(folio)) - folio_dec_large_mapcount(folio, vma); - - /* Is page being unmapped by PTE? Is this its last map to be removed? */ - if (likely(!compound)) { - last = atomic_add_negative(-1, &page->_mapcount); - nr = last; - if (last && folio_test_large(folio)) { - nr = atomic_dec_return_relaxed(mapped); - nr = (nr < COMPOUND_MAPPED); - } - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ - - last = atomic_add_negative(-1, &folio->_entire_mapcount); - if (last) { - nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped); - if (likely(nr < COMPOUND_MAPPED)) { - nr_pmdmapped = folio_nr_pages(folio); - nr = nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); - /* Raced ahead of another remove and an add? */ - if (unlikely(nr < 0)) - nr = 0; - } else { - /* An add of COMPOUND_MAPPED raced ahead */ - nr = 0; - } - } - } - + nr = __folio_remove_rmap_range(folio, page, nr_pages, vma, compound, + &nr_pmdmapped); if (nr_pmdmapped) { if (folio_test_anon(folio)) idx = NR_ANON_THPS; From patchwork Fri Nov 24 13:26:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7923C61DF4 for ; Fri, 24 Nov 2023 13:27:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 373D18D0088; Fri, 24 Nov 2023 08:27:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3082C8D0084; Fri, 24 Nov 2023 08:27:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14EE18D0088; Fri, 24 Nov 2023 08:27:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EF58E8D0084 for ; Fri, 24 Nov 2023 08:27:54 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BEE4316099E for ; Fri, 24 Nov 2023 13:27:54 +0000 (UTC) X-FDA: 81492925668.18.CA45684 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf11.hostedemail.com (Postfix) with ESMTP id F144440011 for ; Fri, 24 Nov 2023 13:27:52 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gZzzNnV9; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832473; a=rsa-sha256; cv=none; b=jc22TCnSjvJpLeHVriq8dYsKVurlfb9RBp6hSYAZ1fvS6HX3vUxrHj662Ke3plwpZUIJLd 8sDGqZN0w8ixu/Qjr753DNGhkWDQsADN/cGTrh0seTqocLaelQkMBb6szAUDRkImPdXVzM 1mFogUBHlAPB0F5bIbPtZZcSzmjXE8U= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gZzzNnV9; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832473; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3nZeFmARPNGvb6d3sXI2ciVXy/9aT4qOMIkiPA0/sq4=; b=zfKOgiWjNHHEJBxG6WRV6EsN/LjZTNHFOh12pLBCMAX+vulbLXyK+jpuAtt/g440pATtDM a0Fh8knyTJ+3yIMXgo+b7GYw5qGHcSH6Aec+Q3ItWkh8tAWwGaRkcBDzEEgqEjeH18ze2m QVd3LLuobFyF2cOGmohMbRt9sByDcCU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832472; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3nZeFmARPNGvb6d3sXI2ciVXy/9aT4qOMIkiPA0/sq4=; b=gZzzNnV97SRVO12KuOd/123F7LJsy7PL8aKHSBMAkGAqlyZ5ZoCjpF5lVLLi5THgdrvmij iHQ0UtqHMf7ABNzca0uUHGQmSctdh8OqaDy1L/9su+0W57ccSR6Lns5gp1oeTDH38/i7Xl YbmnwU8vtiuvbbsiXzuGDk7DTdlecGQ= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-617-qEni6tCTMAipvOzVZRL0Sg-1; Fri, 24 Nov 2023 08:27:44 -0500 X-MC-Unique: qEni6tCTMAipvOzVZRL0Sg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AF09F2806052; Fri, 24 Nov 2023 13:27:43 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 870CD2166B2A; Fri, 24 Nov 2023 13:27:40 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 20/20] mm/rmap: perform all mapcount operations of large folios under the rmap seqcount Date: Fri, 24 Nov 2023 14:26:25 +0100 Message-ID: <20231124132626.235350-21-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: F144440011 X-Stat-Signature: bbm4yp1f11abtmky18f5jqfqu8sdkktc X-HE-Tag: 1700832472-176085 X-HE-Meta: U2FsdGVkX1/abP0Y9QqWkbQrUU42Pqh1/fkpCHhryF50aPpFxCvKo3KI5/Xm4v2ZhePmjXJHp0dujZgEia6giF2GYr0xxDkG18B04fyKor2PViThhHEgfDxMojQcXPQnUvAkivYQqUsvgQpnREg5lzax7uOITWBqkVt5obVmpPmmVQ3OC9Wz/2irY3AzhfMp7iewHfposYLVUz8MxiSwyiepnKJSCheXb1STGMltgTmmWLAKGWu1daGUyQ9TQ2dJEjSqgRVLi2+65O7f47ekX7w2gF2m8Q+8q8M8GhlzQM3YGtpJDNTtMLsRQij+9vBnniJ5M+L3L5SxbilPr/RSsJmI3VqYR0Jt6ueL8wtKD5GE4SNcIavgjSPtHlAryAIMTs4ZPPbLj3MwHWO8MIKfFI+D6w8zWjsdxJv6Lv1UaDX+d2f3qfIUnOm+4dDP4sT/RXXlXPPg1nHpuZu0XQdnlv9/GfSlB0ZCE/gqItMQmUvTKuaXjYa8SAejagLlPoarNewwjq0HfMGy/tUuf3pVQTnrPQfs3IThb/9XjLw6HuUBJrTqekXlFg7YiTSdO4y0NKJMORiLfTi4PZr/1QgXa+6rYMO8Nms3K3fh4aMFLToaDidmVyNjSCgvdKy8DFlIsfUN6jVUmHFfaTyfm8917axRrhtCRw4F0Du1EvkJKmcGCEohdJL3Wb/74/Nnw7B6atUEkHdQoSja5MKcRen+VoYxONeEu0+RkXfBJzfH+QBaWsu+qZtQ1MIHg84UW7vXnMuBrnC97hZ72Hsgv4VShOZzERgoDDNljShbT0m7kC5bUu+Jfzq1Go3zTFNu5OhYPn93IkUHfAiBVzB1sODrLX2fsC5FjGLpHaWh1bfERd11UPpHHx6bHlJ68zUTApJ8WrC3c2Gspq0HQBpLQvSrFrGrhk3lDLH3L5QbNIr5aqUC30y9wA2FCAqqlY+e5vglm4G+ZN0V0bMbm+fZyn4 ax0X1KLB DiZ57pDWUA6ukgEUgVWVimpY371zhZC58/Q9+WLppg++bbthYTqhnEblZPkzYBk0W7GJNlRt3j1efZNR9+N0haAx+XBShTcmTm6EYvORrmrSrGg7hHb6KS9ybxUcdA5ITvX7X+XEDNd9KQd0RcKPdCnkhfUpmP9VhBy3ybF8MeTKpKrk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's extend the atomic seqcount to also protect modifications of: * The subpage mapcounts * The entire mapcount * folio->_nr_pages_mapped This way, we can avoid another 1/2 atomic RMW operations on the fast path (and significantly more when patching): When we are the exclusive writer, we only need two atomic RMW operations to manage the atomic seqcount. Let's document how the existing atomic seqcount memory barriers keep the old behavior unmodified: especially, how it makes sure that folio refcount updates cannot be reordered with folio mapcount updates. Signed-off-by: David Hildenbrand --- include/linux/rmap.h | 95 ++++++++++++++++++++++++++------------------ mm/rmap.c | 84 +++++++++++++++++++++++++++++++++++++-- 2 files changed, 137 insertions(+), 42 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 538c23d3c0c9..3cff4aa71393 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -301,6 +301,12 @@ static inline bool __folio_write_large_rmap_begin(struct folio *folio) exclusive = raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount, true); if (likely(exclusive)) { + /* + * Note: raw_write_atomic_seqcount_begin() implies a full + * memory barrier like non-exclusive mapcount operations + * will. Any refcount updates that happened before this call + * are visible before any mapcount updates on other CPUs. + */ prefetchw(&folio->_rmap_val0); if (unlikely(folio_order(folio) > RMAP_SUBID_4_MAX_ORDER)) prefetchw(&folio->_rmap_val4); @@ -311,6 +317,12 @@ static inline bool __folio_write_large_rmap_begin(struct folio *folio) static inline void __folio_write_large_rmap_end(struct folio *folio, bool exclusive) { + /* + * Note: raw_write_atomic_seqcount_end() implies a full memory + * barrier like non-exclusive mapcount operations will. Any + * refcount updates happening after this call are visible after any + * mapcount updates on other CPUs. + */ raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount, exclusive); } @@ -367,52 +379,46 @@ static inline void folio_set_large_mapcount(struct folio *folio, static inline void folio_inc_large_mapcount(struct folio *folio, struct vm_area_struct *vma) { - bool exclusive; + atomic_inc(&folio->_total_mapcount); + __folio_add_large_rmap_val(folio, 1, vma->vm_mm); +} - exclusive = __folio_write_large_rmap_begin(folio); - if (likely(exclusive)) { - atomic_set(&folio->_total_mapcount, - atomic_read(&folio->_total_mapcount) + 1); - __folio_add_large_rmap_val_exclusive(folio, 1, vma->vm_mm); - } else { - atomic_inc(&folio->_total_mapcount); - __folio_add_large_rmap_val(folio, 1, vma->vm_mm); - } - __folio_write_large_rmap_end(folio, exclusive); +static inline void folio_inc_large_mapcount_exclusive(struct folio *folio, + struct vm_area_struct *vma) +{ + atomic_set(&folio->_total_mapcount, + atomic_read(&folio->_total_mapcount) + 1); + __folio_add_large_rmap_val_exclusive(folio, 1, vma->vm_mm); } static inline void folio_add_large_mapcount(struct folio *folio, int count, struct vm_area_struct *vma) { - bool exclusive; + atomic_add(count, &folio->_total_mapcount); + __folio_add_large_rmap_val(folio, count, vma->vm_mm); +} - exclusive = __folio_write_large_rmap_begin(folio); - if (likely(exclusive)) { - atomic_set(&folio->_total_mapcount, - atomic_read(&folio->_total_mapcount) + count); - __folio_add_large_rmap_val_exclusive(folio, count, vma->vm_mm); - } else { - atomic_add(count, &folio->_total_mapcount); - __folio_add_large_rmap_val(folio, count, vma->vm_mm); - } - __folio_write_large_rmap_end(folio, exclusive); +static inline void folio_add_large_mapcount_exclusive(struct folio *folio, + int count, struct vm_area_struct *vma) +{ + atomic_set(&folio->_total_mapcount, + atomic_read(&folio->_total_mapcount) + count); + __folio_add_large_rmap_val_exclusive(folio, count, vma->vm_mm); } static inline void folio_dec_large_mapcount(struct folio *folio, struct vm_area_struct *vma) { - bool exclusive; + atomic_dec(&folio->_total_mapcount); + __folio_add_large_rmap_val(folio, -1, vma->vm_mm); +} - exclusive = __folio_write_large_rmap_begin(folio); - if (likely(exclusive)) { - atomic_set(&folio->_total_mapcount, - atomic_read(&folio->_total_mapcount) - 1); - __folio_add_large_rmap_val_exclusive(folio, -1, vma->vm_mm); - } else { - atomic_dec(&folio->_total_mapcount); - __folio_add_large_rmap_val(folio, -1, vma->vm_mm); - } - __folio_write_large_rmap_end(folio, exclusive); +static inline void folio_dec_large_mapcount_exclusive(struct folio *folio, + struct vm_area_struct *vma) +{ + atomic_set(&folio->_total_mapcount, + atomic_read(&folio->_total_mapcount) - 1); + __folio_add_large_rmap_val_exclusive(folio, -1, vma->vm_mm); } /* RMAP flags, currently only relevant for some anon rmap operations. */ @@ -462,6 +468,7 @@ static inline void __page_dup_rmap(struct page *page, struct vm_area_struct *dst_vma, bool compound) { struct folio *folio = page_folio(page); + bool exclusive; VM_BUG_ON_PAGE(compound && !PageHead(page), page); if (likely(!folio_test_large(folio))) { @@ -475,11 +482,23 @@ static inline void __page_dup_rmap(struct page *page, return; } - if (compound) - atomic_inc(&folio->_entire_mapcount); - else - atomic_inc(&page->_mapcount); - folio_inc_large_mapcount(folio, dst_vma); + exclusive = __folio_write_large_rmap_begin(folio); + if (likely(exclusive)) { + if (compound) + atomic_set(&folio->_entire_mapcount, + atomic_read(&folio->_entire_mapcount) + 1); + else + atomic_set(&page->_mapcount, + atomic_read(&page->_mapcount) + 1); + folio_inc_large_mapcount_exclusive(folio, dst_vma); + } else { + if (compound) + atomic_inc(&folio->_entire_mapcount); + else + atomic_inc(&page->_mapcount); + folio_inc_large_mapcount(folio, dst_vma); + } + __folio_write_large_rmap_end(folio, exclusive); } static inline void page_dup_file_rmap(struct page *page, diff --git a/mm/rmap.c b/mm/rmap.c index 80ac53633332..755a62b046e2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1109,7 +1109,8 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, struct vm_area_struct *vma, bool compound, int *nr_pmdmapped) { atomic_t *mapped = &folio->_nr_pages_mapped; - int first, count, nr = 0; + int first, val, count, nr = 0; + bool exclusive; VM_WARN_ON_FOLIO(compound && page != &folio->page, folio); VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); @@ -1119,8 +1120,23 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, if (likely(!folio_test_large(folio))) return atomic_inc_and_test(&page->_mapcount); + exclusive = __folio_write_large_rmap_begin(folio); + /* Is page being mapped by PTE? Is this its first map to be added? */ - if (!compound) { + if (likely(exclusive) && !compound) { + count = nr_pages; + do { + val = atomic_read(&page->_mapcount) + 1; + atomic_set(&page->_mapcount, val); + if (!val) { + val = atomic_read(mapped) + 1; + atomic_set(mapped, val); + if (val < COMPOUND_MAPPED) + nr++; + } + } while (page++, --count > 0); + folio_add_large_mapcount_exclusive(folio, nr_pages, vma); + } else if (!compound) { count = nr_pages; do { first = atomic_inc_and_test(&page->_mapcount); @@ -1131,6 +1147,26 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, } } while (page++, --count > 0); folio_add_large_mapcount(folio, nr_pages, vma); + } else if (likely(exclusive) && folio_test_pmd_mappable(folio)) { + /* That test is redundant: it's for safety or to optimize out */ + + val = atomic_read(&folio->_entire_mapcount) + 1; + atomic_set(&folio->_entire_mapcount, val); + if (!val) { + nr = atomic_read(mapped) + COMPOUND_MAPPED; + atomic_set(mapped, nr); + if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { + *nr_pmdmapped = folio_nr_pages(folio); + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); + /* Raced ahead of a remove and another add? */ + if (unlikely(nr < 0)) + nr = 0; + } else { + /* Raced ahead of a remove of COMPOUND_MAPPED */ + nr = 0; + } + } + folio_inc_large_mapcount_exclusive(folio, vma); } else if (folio_test_pmd_mappable(folio)) { /* That test is redundant: it's for safety or to optimize out */ @@ -1152,6 +1188,8 @@ static unsigned int __folio_add_rmap_range(struct folio *folio, } else { VM_WARN_ON_ONCE_FOLIO(true, folio); } + + __folio_write_large_rmap_end(folio, exclusive); return nr; } @@ -1160,7 +1198,8 @@ static unsigned int __folio_remove_rmap_range(struct folio *folio, struct vm_area_struct *vma, bool compound, int *nr_pmdmapped) { atomic_t *mapped = &folio->_nr_pages_mapped; - int last, count, nr = 0; + int last, val, count, nr = 0; + bool exclusive; VM_WARN_ON_FOLIO(compound && page != &folio->page, folio); VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio); @@ -1170,8 +1209,23 @@ static unsigned int __folio_remove_rmap_range(struct folio *folio, if (likely(!folio_test_large(folio))) return atomic_add_negative(-1, &page->_mapcount); + exclusive = __folio_write_large_rmap_begin(folio); + /* Is page being unmapped by PTE? Is this its last map to be removed? */ - if (!compound) { + if (likely(exclusive) && !compound) { + folio_add_large_mapcount_exclusive(folio, -nr_pages, vma); + count = nr_pages; + do { + val = atomic_read(&page->_mapcount) - 1; + atomic_set(&page->_mapcount, val); + if (val < 0) { + val = atomic_read(mapped) - 1; + atomic_set(mapped, val); + if (val < COMPOUND_MAPPED) + nr++; + } + } while (page++, --count > 0); + } else if (!compound) { folio_add_large_mapcount(folio, -nr_pages, vma); count = nr_pages; do { @@ -1182,6 +1236,26 @@ static unsigned int __folio_remove_rmap_range(struct folio *folio, nr++; } } while (page++, --count > 0); + } else if (likely(exclusive) && folio_test_pmd_mappable(folio)) { + /* That test is redundant: it's for safety or to optimize out */ + + folio_dec_large_mapcount_exclusive(folio, vma); + val = atomic_read(&folio->_entire_mapcount) - 1; + atomic_set(&folio->_entire_mapcount, val); + if (val < 0) { + nr = atomic_read(mapped) - COMPOUND_MAPPED; + atomic_set(mapped, nr); + if (likely(nr < COMPOUND_MAPPED)) { + *nr_pmdmapped = folio_nr_pages(folio); + nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED); + /* Raced ahead of another remove and an add? */ + if (unlikely(nr < 0)) + nr = 0; + } else { + /* An add of COMPOUND_MAPPED raced ahead */ + nr = 0; + } + } } else if (folio_test_pmd_mappable(folio)) { /* That test is redundant: it's for safety or to optimize out */ @@ -1203,6 +1277,8 @@ static unsigned int __folio_remove_rmap_range(struct folio *folio, } else { VM_WARN_ON_ONCE_FOLIO(true, folio); } + + __folio_write_large_rmap_end(folio, exclusive); return nr; }