From patchwork Mon Jan 31 16:29:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C495C433F5 for ; Mon, 31 Jan 2022 16:32:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF31C8D0006; Mon, 31 Jan 2022 11:32:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A78C78D0001; Mon, 31 Jan 2022 11:32:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F29A8D0006; Mon, 31 Jan 2022 11:32:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 7DC498D0001 for ; Mon, 31 Jan 2022 11:32:19 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3BEB3181E7264 for ; Mon, 31 Jan 2022 16:32:19 +0000 (UTC) X-FDA: 79091124756.12.E800521 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id B8B95C000B for ; Mon, 31 Jan 2022 16:32:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hGw4fY8fS9EqJwPyYv8TPZpwFN0h2hNHz6qrwbdGAa0=; b=JZGkgZ7grqSg8b3lJoq6pd6epcC6kJBZ+nEKkz/LdSc1/7FC10iQIb66Yqz3p0Qll7ru3c cj+dR5GRk/v03EjDECDiEWik6/O6QxgBzbS0fz2lXoLn8qwLXrTUjEeU6dAkEaW2KyrAOa X0eHyY/NSNKsYVy3a5KuBgQwM9y3aAY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-460-cr6LuM7uNkCo8D2BcGhQqA-1; Mon, 31 Jan 2022 11:32:14 -0500 X-MC-Unique: cr6LuM7uNkCo8D2BcGhQqA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AE87C84DA70; Mon, 31 Jan 2022 16:32:11 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 28A2E798D1; Mon, 31 Jan 2022 16:31:06 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand , Nadav Amit Subject: [PATCH v3 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache Date: Mon, 31 Jan 2022 17:29:31 +0100 Message-Id: <20220131162940.210846-2-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: rn8oo9kzcnfmjihgp6odth39pkyyey4g X-Rspam-User: nil Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JZGkgZ7g; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B8B95C000B X-HE-Tag: 1643646738-959500 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Liang Zhang reported [1] that the current COW logic in do_wp_page() is sub-optimal when it comes to swap+read fault+write fault of anonymous pages that have a single user, visible via a performance degradation in the redis benchmark. Something similar was previously reported [2] by Nadav with a simple reproducer. After we put an anon page into the swapcache and unmapped it from a single process, that process might read that page again and refault it read-only. If that process then writes to that page, the process is actually the exclusive user of the page, however, the COW logic in do_co_page() won't be able to reuse it due to the additional reference from the swapcache. Let's optimize for pages that have been added to the swapcache but only have an exclusive user. Try removing the swapcache reference if there is hope that we're the exclusive user. We will fail removing the swapcache reference in two scenarios: (1) There are additional swap entries referencing the page: copying instead of reusing is the right thing to do. (2) The page is under writeback: theoretically we might be able to reuse in some cases, however, we cannot remove the additional reference and will have to copy. Note that we'll only try removing the page from the swapcache when it's highly likely that we'll be the exclusive owner after removing the page from the swapache. As we're about to map that page writable and redirty it, that should not affect reclaim but is rather the right thing to do. Further, we might have additional references from the LRU pagevecs, which will force us to copy instead of being able to reuse. We'll try handling such references for some scenarios next. Concurrent writeback cannot be handled easily and we'll always have to copy. While at it, remove the superfluous page_mapcount() check: it's implicitly covered by the page_count() for ordinary anon pages. [1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com [2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com Reported-by: Liang Zhang Reported-by: Nadav Amit Reviewed-by: Matthew Wilcox (Oracle) Acked-by: Vlastimil Babka Signed-off-by: David Hildenbrand --- mm/memory.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..bcd3b7c50891 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3291,19 +3291,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) if (PageAnon(vmf->page)) { struct page *page = vmf->page; - /* PageKsm() doesn't necessarily raise the page refcount */ - if (PageKsm(page) || page_count(page) != 1) + /* + * We have to verify under page lock: these early checks are + * just an optimization to avoid locking the page and freeing + * the swapcache if there is little hope that we can reuse. + * + * PageKsm() doesn't necessarily raise the page refcount. + */ + if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page)) goto copy; if (!trylock_page(page)) goto copy; - if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { + if (PageSwapCache(page)) + try_to_free_swap(page); + if (PageKsm(page) || page_count(page) != 1) { unlock_page(page); goto copy; } /* - * Ok, we've got the only map reference, and the only - * page count reference, and the page is locked, - * it's dark out, and we're wearing sunglasses. Hit it. + * Ok, we've got the only page reference from our mapping + * and the page is locked, it's dark out, and we're wearing + * sunglasses. Hit it. */ unlock_page(page); wp_page_reuse(vmf); From patchwork Mon Jan 31 16:29:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730864 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6716DC433EF for ; Mon, 31 Jan 2022 16:32:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE9C18D0007; Mon, 31 Jan 2022 11:32:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E979B8D0001; Mon, 31 Jan 2022 11:32:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D12A68D0007; Mon, 31 Jan 2022 11:32:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id C07BE8D0001 for ; Mon, 31 Jan 2022 11:32:56 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 848C7181E4910 for ; Mon, 31 Jan 2022 16:32:56 +0000 (UTC) X-FDA: 79091126352.06.1AB1409 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 227441C0002 for ; Mon, 31 Jan 2022 16:32:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646775; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7b7no2C9n9H1mR1Q0vPK2/0w0SD0s3OdOkqIUCgpsak=; b=Gh226oGe1qi8z2kb4BTne92WKEH/CUH1ldUabG3y2w9oOQjWvSTSs7WdeWeVCC7hVWvZXc DKopWtl9oRVVunDRrPBO4dVaVhPW8MF5M8pTDIOmEqJ7Ol4NpzS26QS47cCs5m2DIJ1YGd TUCNwAww8112M8/3K4uIK1PUHxJvkAE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-101-VZ4YW5wkPhipVS5zqZLLaA-1; Mon, 31 Jan 2022 11:32:53 -0500 X-MC-Unique: VZ4YW5wkPhipVS5zqZLLaA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DD08A869ECA; Mon, 31 Jan 2022 16:32:49 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 176C2798CF; Mon, 31 Jan 2022 16:32:11 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs Date: Mon, 31 Jan 2022 17:29:32 +0100 Message-Id: <20220131162940.210846-3-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 227441C0002 X-Stat-Signature: fobuu6qua9daew7xtdusr67bd5en69em Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Gh226oGe; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf21.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1643646775-291123 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For example, if a page just got swapped in via a read fault, the LRU pagevecs might still hold a reference to the page. If we trigger a write fault on such a page, the additional reference from the LRU pagevecs will prohibit reusing the page. Let's conditionally drain the local LRU pagevecs when we stumble over a !PageLRU() page. We cannot easily drain remote LRU pagevecs and it might not be desirable performance-wise. Consequently, this will only avoid copying in some cases. Add a simple "page_count(page) > 3" check first but keep the "page_count(page) > 1 + PageSwapCache(page)" check in place, as we want to minimize cases where we remove a page from the swapcache but won't be able to reuse it, for example, because another process has it mapped R/O, to not affect reclaim. We cannot easily handle the following cases and we will always have to copy: (1) The page is referenced in the LRU pagevecs of other CPUs. We really would have to drain the LRU pagevecs of all CPUs -- most probably copying is much cheaper. (2) The page is already PageLRU() but is getting moved between LRU lists, for example, for activation (e.g., mark_page_accessed()), deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to drain mostly unconditionally, which might be bad performance-wise. Most probably this won't happen too often in practice. Note that there are other reasons why an anon page might temporarily not be PageLRU(): for example, compaction and migration have to isolate LRU pages from the LRU lists first (isolate_lru_page()), moving them to temporary local lists and clearing PageLRU() and holding an additional reference on the page. In that case, we'll always copy. This change seems to be fairly effective with the reproducer [1] shared by Nadav, as long as writeback is done synchronously, for example, using zram. However, with asynchronous writeback, we'll usually fail to free the swapcache because the page is still under writeback: something we cannot easily optimize for, and maybe it's not really relevant in practice. [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- mm/memory.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index bcd3b7c50891..923165b4c27e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3298,7 +3298,15 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * * PageKsm() doesn't necessarily raise the page refcount. */ - if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page)) + if (PageKsm(page) || page_count(page) > 3) + goto copy; + if (!PageLRU(page)) + /* + * Note: We cannot easily detect+handle references from + * remote LRU pagevecs or references to PageLRU() pages. + */ + lru_add_drain(); + if (page_count(page) > 1 + PageSwapCache(page)) goto copy; if (!trylock_page(page)) goto copy; From patchwork Mon Jan 31 16:29:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 077E8C433EF for ; Mon, 31 Jan 2022 16:33:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8588A8D0008; Mon, 31 Jan 2022 11:33:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8087B8D0001; Mon, 31 Jan 2022 11:33:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A8958D0008; Mon, 31 Jan 2022 11:33:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 5981D8D0001 for ; Mon, 31 Jan 2022 11:33:04 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1B8EC951A3 for ; Mon, 31 Jan 2022 16:33:04 +0000 (UTC) X-FDA: 79091126688.08.027C831 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 9A4FC40008 for ; Mon, 31 Jan 2022 16:33:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Of3jZ+VHPXQ5Dt61jaIUicZu7ZwPC24uJWxR+ti2d3o=; b=RPkBrZyow2wu/gvMX1Q5i5UJot6oJ401uo6IEPNLjejyHEAN28v150TX24avlLRKSCDmmv 23Ep6An6KxwoAMCGPUWo7MEq42OHe6JXa4YSyNMuGSMMdlr6FU8SMiZ90/RCY/ePIzME/m 2ACCTxzlSJI8cUq8FqcDlk+M6C2r2FI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-115-3cmnDHY0Oh2_v_JeAp2-Iw-1; Mon, 31 Jan 2022 11:32:59 -0500 X-MC-Unique: 3cmnDHY0Oh2_v_JeAp2-Iw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 42E0486A066; Mon, 31 Jan 2022 16:32:56 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 601CB798CD; Mon, 31 Jan 2022 16:32:50 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 3/9] mm: slightly clarify KSM logic in do_swap_page() Date: Mon, 31 Jan 2022 17:29:33 +0100 Message-Id: <20220131162940.210846-4-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: 1qitqmrt8e67hmdjippcex57jw46a6pu X-Rspam-User: nil Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RPkBrZyo; spf=none (imf17.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9A4FC40008 X-HE-Tag: 1643646783-145437 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's make it clearer that KSM might only have to copy a page in case we have a page in the swapcache, not if we allocated a fresh page and bypassed the swapcache. While at it, add a comment why this is usually necessary and merge the two swapcache conditions. Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- mm/memory.c | 38 +++++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 923165b4c27e..3c91294cca98 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3615,21 +3615,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } - /* - * Make sure try_to_free_swap or reuse_swap_page or swapoff did not - * release the swapcache from under us. The page pin, and pte_same - * test below, are not enough to exclude that. Even if it is still - * swapcache, we need to check that the page's swap has not changed. - */ - if (unlikely((!PageSwapCache(page) || - page_private(page) != entry.val)) && swapcache) - goto out_page; - - page = ksm_might_need_to_copy(page, vma, vmf->address); - if (unlikely(!page)) { - ret = VM_FAULT_OOM; - page = swapcache; - goto out_page; + if (swapcache) { + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did + * not release the swapcache from under us. The page pin, and + * pte_same test below, are not enough to exclude that. Even if + * it is still swapcache, we need to check that the page's swap + * has not changed. + */ + if (unlikely(!PageSwapCache(page) || + page_private(page) != entry.val)) + goto out_page; + + /* + * KSM sometimes has to copy on read faults, for example, if + * page->index of !PageKSM() pages would be nonlinear inside the + * anon VMA -- PageKSM() is lost on actual swapout. + */ + page = ksm_might_need_to_copy(page, vma, vmf->address); + if (unlikely(!page)) { + ret = VM_FAULT_OOM; + page = swapcache; + goto out_page; + } } cgroup_throttle_swaprate(page, GFP_KERNEL); From patchwork Mon Jan 31 16:29:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4278C433F5 for ; Mon, 31 Jan 2022 16:33:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BDE98D0009; Mon, 31 Jan 2022 11:33:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 76B348D0001; Mon, 31 Jan 2022 11:33:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D0B8D0009; Mon, 31 Jan 2022 11:33:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 4DC2C8D0001 for ; Mon, 31 Jan 2022 11:33:09 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0CB67181E4910 for ; Mon, 31 Jan 2022 16:33:09 +0000 (UTC) X-FDA: 79091126898.23.74F4241 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 79C70180005 for ; Mon, 31 Jan 2022 16:33:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646787; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jeTg0/Y3BkS2SWDSNYONhdLAPahzEucTq0Tno17H8CM=; b=ilULugFbWakZbgImH6XfwNAbGhc1Db+y6mNZ2YHgehyl+/yBWFyE/q5dZDl7FKMzXvWaJL S6B1vdA4Gfd+ySY0s3zMGd4Bd0+rsVaRH9y6yngtR4zqm51e9zftaFBOHNB9XvxceAwRB/ PwB+jy9msE+x6Lh3frK+desK8qruP6w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-475-4ulBNwQvN5qj-Zpk3sdsyg-1; Mon, 31 Jan 2022 11:33:05 -0500 X-MC-Unique: 4ulBNwQvN5qj-Zpk3sdsyg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C4852760C4; Mon, 31 Jan 2022 16:33:02 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9CC33798C9; Mon, 31 Jan 2022 16:32:56 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 4/9] mm: streamline COW logic in do_swap_page() Date: Mon, 31 Jan 2022 17:29:34 +0100 Message-Id: <20220131162940.210846-5-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 79C70180005 X-Stat-Signature: 4z84k8k3t3x7bn4dhqte7gu9faq6cmw7 X-Rspam-User: nil Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ilULugFb; spf=none (imf06.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1643646787-753004 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have a different COW logic when: * triggering a read-fault to swapin first and then trigger a write-fault -> do_swap_page() + do_wp_page() * triggering a write-fault to swapin -> do_swap_page() + do_wp_page() only if we fail reuse in do_swap_page() The COW logic in do_swap_page() is different than our reuse logic in do_wp_page(). The COW logic in do_wp_page() -- page_count() == 1 -- makes currently sure that we certainly don't have a remaining reference, e.g., via GUP, on the target page we want to reuse: if there is any unexpected reference, we have to copy to avoid information leaks. As do_swap_page() behaves differently, in environments with swap enabled we can currently have an unintended information leak from the parent to the child, similar as known from CVE-2020-29374: 1. Parent writes to anonymous page -> Page is mapped writable and modified 2. Page is swapped out -> Page is unmapped and replaced by swap entry 3. fork() -> Swap entries are copied to child 4. Child pins page R/O -> Page is mapped R/O into child 5. Child unmaps page -> Child still holds GUP reference 6. Parent writes to page -> Page is reused in do_swap_page() -> Child can observe changes Exchanging 2. and 3. should have the same effect. Let's apply the same COW logic as in do_wp_page(), conditionally trying to remove the page from the swapcache after freeing the swap entry, however, before actually mapping our page. We can change the order now that we use try_to_free_swap(), which doesn't care about the mapcount, instead of reuse_swap_page(). To handle references from the LRU pagevecs, conditionally drain the local LRU pagevecs when required, however, don't consider the page_count() when deciding whether to drain to keep it simple for now. Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- mm/memory.c | 55 +++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3c91294cca98..c6177d897964 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3497,6 +3497,25 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) return 0; } +static inline bool should_try_to_free_swap(struct page *page, + struct vm_area_struct *vma, + unsigned int fault_flags) +{ + if (!PageSwapCache(page)) + return false; + if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || + PageMlocked(page)) + return true; + /* + * If we want to map a page that's in the swapcache writable, we + * have to detect via the refcount if we're really the exclusive + * user. Try freeing the swapcache to get rid of the swapcache + * reference only in case it's likely that we'll be the exlusive user. + */ + return (fault_flags & FAULT_FLAG_WRITE) && !PageKsm(page) && + page_count(page) == 2; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3638,6 +3657,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) page = swapcache; goto out_page; } + + /* + * If we want to map a page that's in the swapcache writable, we + * have to detect via the refcount if we're really the exclusive + * owner. Try removing the extra reference from the local LRU + * pagevecs if required. + */ + if ((vmf->flags & FAULT_FLAG_WRITE) && page == swapcache && + !PageKsm(page) && !PageLRU(page)) + lru_add_drain(); } cgroup_throttle_swaprate(page, GFP_KERNEL); @@ -3656,19 +3685,25 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } /* - * The page isn't present yet, go ahead with the fault. - * - * Be careful about the sequence of operations here. - * To get its accounting right, reuse_swap_page() must be called - * while the page is counted on swap but not yet in mapcount i.e. - * before page_add_anon_rmap() and swap_free(); try_to_free_swap() - * must be called after the swap_free(), or it will never succeed. + * Remove the swap entry and conditionally try to free up the swapcache. + * We're already holding a reference on the page but haven't mapped it + * yet. */ + swap_free(entry); + if (should_try_to_free_swap(page, vma, vmf->flags)) + try_to_free_swap(page); inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS); pte = mk_pte(page, vma->vm_page_prot); - if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) { + + /* + * Same logic as in do_wp_page(); however, optimize for fresh pages + * that are certainly not shared because we just allocated them without + * exposing them to the swapcache. + */ + if ((vmf->flags & FAULT_FLAG_WRITE) && !PageKsm(page) && + (page != swapcache || page_count(page) == 1)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &= ~FAULT_FLAG_WRITE; ret |= VM_FAULT_WRITE; @@ -3694,10 +3729,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); - swap_free(entry); - if (mem_cgroup_swap_full(page) || - (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) - try_to_free_swap(page); unlock_page(page); if (page != swapcache && swapcache) { /* From patchwork Mon Jan 31 16:29:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19C21C433EF for ; Mon, 31 Jan 2022 16:33:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A04FF8D000A; Mon, 31 Jan 2022 11:33:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B45F8D0001; Mon, 31 Jan 2022 11:33:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87D908D000A; Mon, 31 Jan 2022 11:33:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 787D28D0001 for ; Mon, 31 Jan 2022 11:33:19 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 426949675E for ; Mon, 31 Jan 2022 16:33:19 +0000 (UTC) X-FDA: 79091127318.09.87CA0E3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id C9D24120004 for ; Mon, 31 Jan 2022 16:33:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646798; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ybmiamaJyQbtkZkDBfGYUzoim24eO3SbNNrD1ZW6zsM=; b=AeTijsbXs4Yc1vRJfK0AYJ0IbGfW7oBsApLqsCFPBIVTBLGzUleI4dFvthbG3Jklcn6b3S UhXmQccXysHf0N8+oOybXtqzg1kys7lCbuR2KxLpH/Wo+wShOpsLlYHvVmutaK/8lpAvYT EMlgqTR0nVHQJ87mHLpqexhB2DUJens= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-628-edoI-CQNOSet1yuZVdhQBA-1; Mon, 31 Jan 2022 11:33:12 -0500 X-MC-Unique: edoI-CQNOSet1yuZVdhQBA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E80A7760C1; Mon, 31 Jan 2022 16:33:08 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30980798C6; Mon, 31 Jan 2022 16:33:03 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() Date: Mon, 31 Jan 2022 17:29:35 +0100 Message-Id: <20220131162940.210846-6-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C9D24120004 X-Stat-Signature: qhe3ufyhkgdnpr7hwy3s3azz5a5wuess Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AeTijsbX; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf29.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Rspam-User: nil X-HE-Tag: 1643646798-830563 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We currently have a different COW logic for anon THP than we have for ordinary anon pages in do_wp_page(): the effect is that the issue reported in CVE-2020-29374 is currently still possible for anon THP: an unintended information leak from the parent to the child. Let's apply the same logic (page_count() == 1), with similar optimizations to remove additional references first as we really want to avoid PTE-mapping the THP and copying individual pages best we can. If we end up with a page that has page_count() != 1, we'll have to PTE-map the THP and fallback to do_wp_page(), which will always copy the page. Note that KSM does not apply to THP. I. Interaction with the swapcache and writeback While a THP is in the swapcache, the swapcache holds one reference on each subpage of the THP. So with PageSwapCache() set, we expect as many additional references as we have subpages. If we manage to remove the THP from the swapcache, all these references will be gone. Usually, a THP is not split when entered into the swapcache and stays a compound page. However, try_to_unmap() will PTE-map the THP and use PTE swap entries. There are no PMD swap entries for that purpose, consequently, we always only swapin subpages into PTEs. Removing a page from the swapcache can fail either when there are remaining swap entries (in which case COW is the right thing to do) or if the page is currently under writeback. Having a locked, R/O PMD-mapped THP that is in the swapcache seems to be possible only in corner cases, for example, if try_to_unmap() failed after adding the page to the swapcache. However, it's comparatively easy to handle. As we have to fully unmap a THP before starting writeback, and swapin is always done on the PTE level, we shouldn't find a R/O PMD-mapped THP in the swapcache that is under writeback. This should at least leave writeback out of the picture. II. Interaction with GUP references Having a R/O PMD-mapped THP with GUP references (i.e., R/O references) will result in PTE-mapping the THP on a write fault. Similar to ordinary anon pages, do_wp_page() will have to copy sub-pages and result in a disconnect between the GUP references and the pages actually mapped into the page tables. To improve the situation in the future, we'll need additional handling to mark anonymous pages as definitely exclusive to a single process, only allow GUP pins on exclusive anon pages, and disallow sharing of exclusive anon pages with GUP pins e.g., during fork(). III. Interaction with references from LRU pagevecs There is no need to try draining the (local) LRU pagevecs in case we would stumble over a !PageLRU() page: folio_add_lru() and friends will always flush the affected pagevec after adding a compound page to it immediately -- pagevec_add_and_need_flush() always returns "true" for them. Note that the LRU pagevecs will hold a reference on the compound page for a very short time, between adding the page to the pagevec and draining it immediately afterwards. IV. Interaction with speculative/temporary references Similar to ordinary anon pages, other speculative/temporary references on the THP, for example, from the pagecache or page migration code, will disallow exclusive reuse of the page. We'll have to PTE-map the THP. Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- mm/huge_memory.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 406a3c28c026..f34ebc5cb827 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1303,7 +1303,6 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) page = pmd_page(orig_pmd); VM_BUG_ON_PAGE(!PageHead(page), page); - /* Lock page for reuse_swap_page() */ if (!trylock_page(page)) { get_page(page); spin_unlock(vmf->ptl); @@ -1319,10 +1318,15 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) } /* - * We can only reuse the page if nobody else maps the huge page or it's - * part. + * See do_wp_page(): we can only map the page writable if there are + * no additional references. Note that we always drain the LRU + * pagevecs immediately after adding a THP. */ - if (reuse_swap_page(page)) { + if (page_count(page) > 1 + PageSwapCache(page) * thp_nr_pages(page)) + goto unlock_fallback; + if (PageSwapCache(page)) + try_to_free_swap(page); + if (page_count(page) == 1) { pmd_t entry; entry = pmd_mkyoung(orig_pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); @@ -1333,6 +1337,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) return VM_FAULT_WRITE; } +unlock_fallback: unlock_page(page); spin_unlock(vmf->ptl); fallback: From patchwork Mon Jan 31 16:29:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8B0CC433EF for ; Mon, 31 Jan 2022 16:33:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39CB48D000B; Mon, 31 Jan 2022 11:33:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3259D8D0001; Mon, 31 Jan 2022 11:33:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C5B38D000B; Mon, 31 Jan 2022 11:33:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0150.hostedemail.com [216.40.44.150]) by kanga.kvack.org (Postfix) with ESMTP id 0D9578D0001 for ; Mon, 31 Jan 2022 11:33:23 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C4A6295CB1 for ; Mon, 31 Jan 2022 16:33:22 +0000 (UTC) X-FDA: 79091127444.31.2E788A6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 40211A0009 for ; Mon, 31 Jan 2022 16:33:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646801; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yJs7IHJWJFW3BFUXsLfDUvtzfhZ2GWsqY1Ai0k1O2Gw=; b=SdKlk7xd9oEMAQWaqeq5omyhqsfVosuCSVkjYD47Yt04WooGMj6YpUK4g+nVZTxu//xvqd A0YiHKIB5szrOoT5pC43Jlr+HifAV9g6MU0kVhF6atgKsh9Jb91boqH+y/7uXzH0gC+yOO GY//SiiqBMVuGkmELKLiXOy61jcWA54= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-D365DQlVNt-QUhpVEQfznQ-1; Mon, 31 Jan 2022 11:33:17 -0500 X-MC-Unique: D365DQlVNt-QUhpVEQfznQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0DE75760C1; Mon, 31 Jan 2022 16:33:15 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 720FD798CD; Mon, 31 Jan 2022 16:33:09 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 6/9] mm/khugepaged: remove reuse_swap_page() usage Date: Mon, 31 Jan 2022 17:29:36 +0100 Message-Id: <20220131162940.210846-7-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: 8ojn3mfiaqzsxrnbcdwh3z8nbp7pe3no X-Rspam-User: nil Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SdKlk7xd; spf=none (imf25.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 40211A0009 X-HE-Tag: 1643646802-33820 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: reuse_swap_page() currently indicates if we can write to an anon page without COW. A COW is required if the page is shared by multiple processes (either already mapped or via swap entries) or if there is concurrent writeback that cannot tolerate concurrent page modifications. However, in the context of khugepaged we're not actually going to write to a read-only mapped page, we'll copy the page content to our newly allocated THP and map that THP writable. All we have to make sure is that the read-only mapped page we're about to copy won't get reused by another process sharing the page, otherwise, page content would get modified. But that is already guaranteed via multiple mechanisms (e.g., holding a reference, holding the page lock, removing the rmap after copying the page). The swapcache handling was introduced in commit 10359213d05a ("mm: incorporate read-only pages into transparent huge pages") and it sounds like it merely wanted to mimic what do_swap_page() would do when trying to map a page obtained via the swapcache writable. As that logic is unnecessary, let's just remove it, removing the last user of reuse_swap_page(). Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- include/trace/events/huge_memory.h | 1 - mm/khugepaged.c | 11 ----------- 2 files changed, 12 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 4fdb14a81108..d651f3437367 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -29,7 +29,6 @@ EM( SCAN_VMA_NULL, "vma_null") \ EM( SCAN_VMA_CHECK, "vma_check_failed") \ EM( SCAN_ADDRESS_RANGE, "not_suitable_address_range") \ - EM( SCAN_SWAP_CACHE_PAGE, "page_swap_cache") \ EM( SCAN_DEL_PAGE_LRU, "could_not_delete_page_from_lru")\ EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 35f14d0a00a6..9da9325ab4d4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -45,7 +45,6 @@ enum scan_result { SCAN_VMA_NULL, SCAN_VMA_CHECK, SCAN_ADDRESS_RANGE, - SCAN_SWAP_CACHE_PAGE, SCAN_DEL_PAGE_LRU, SCAN_ALLOC_HUGE_PAGE_FAIL, SCAN_CGROUP_CHARGE_FAIL, @@ -682,16 +681,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out; } - if (!pte_write(pteval) && PageSwapCache(page) && - !reuse_swap_page(page)) { - /* - * Page is in the swap cache and cannot be re-used. - * It cannot be collapsed into a THP. - */ - unlock_page(page); - result = SCAN_SWAP_CACHE_PAGE; - goto out; - } /* * Isolate the page to avoid collapsing an hugepage From patchwork Mon Jan 31 16:29:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 686D7C433F5 for ; Mon, 31 Jan 2022 16:33:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E98F58D000C; Mon, 31 Jan 2022 11:33:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E6F9F8D0001; Mon, 31 Jan 2022 11:33:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D10CE8D000C; Mon, 31 Jan 2022 11:33:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id C24BB8D0001 for ; Mon, 31 Jan 2022 11:33:28 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8A5A9824C443 for ; Mon, 31 Jan 2022 16:33:28 +0000 (UTC) X-FDA: 79091127696.30.D592AAC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 1813E80004 for ; Mon, 31 Jan 2022 16:33:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KaiTUjRYTOXVgHeAUn6VE+wj9NpgU3CxyxXikV/W8rE=; b=dNaKgvY1+kozjuz5VIt04Eku0Ltmew0AXRcj81V39us1F0c7ywKgmv3ZZwES2+4VYYbTcN jP9AfqyDrxWqeh0McaU7t7lQgO/LSVvmdbopwlwC9HWaPDAa+xqcWfs3+nnyct5+LpCiKs jfsFGOAQwV+LEjQgEpMV48t5X1bzifs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-22-QJ7XmgmWPpWKfX7uTIWyxA-1; Mon, 31 Jan 2022 11:33:23 -0500 X-MC-Unique: QJ7XmgmWPpWKfX7uTIWyxA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 25C8186A062; Mon, 31 Jan 2022 16:33:21 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 75A35798C6; Mon, 31 Jan 2022 16:33:15 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 7/9] mm/swapfile: remove stale reuse_swap_page() Date: Mon, 31 Jan 2022 17:29:37 +0100 Message-Id: <20220131162940.210846-8-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Stat-Signature: aganigom59uiu97duwtidqsbdxeo5ou9 X-Rspam-User: nil Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dNaKgvY1; spf=none (imf30.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1813E80004 X-HE-Tag: 1643646807-222684 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick around for now, as it might come in handy in the near future. Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- include/linux/swap.h | 4 -- mm/swapfile.c | 104 ------------------------------------------- 2 files changed, 108 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..b546e4bd5c5a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -514,7 +514,6 @@ extern int __swp_swapcount(swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); -extern bool reuse_swap_page(struct page *); extern int try_to_free_swap(struct page *); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); @@ -680,9 +679,6 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -#define reuse_swap_page(page) \ - (page_trans_huge_mapcount(page) == 1) - static inline int try_to_free_swap(struct page *page) { return 0; diff --git a/mm/swapfile.c b/mm/swapfile.c index bf0df7aa7158..a5183315dc58 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1167,16 +1167,6 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) return NULL; } -static struct swap_info_struct *swap_info_get(swp_entry_t entry) -{ - struct swap_info_struct *p; - - p = _swap_info_get(entry); - if (p) - spin_lock(&p->lock); - return p; -} - static struct swap_info_struct *swap_info_get_cont(swp_entry_t entry, struct swap_info_struct *q) { @@ -1601,100 +1591,6 @@ static bool page_swapped(struct page *page) return false; } -static int page_trans_huge_map_swapcount(struct page *page, - int *total_swapcount) -{ - int i, map_swapcount, _total_swapcount; - unsigned long offset = 0; - struct swap_info_struct *si; - struct swap_cluster_info *ci = NULL; - unsigned char *map = NULL; - int swapcount = 0; - - /* hugetlbfs shouldn't call it */ - VM_BUG_ON_PAGE(PageHuge(page), page); - - if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) { - if (PageSwapCache(page)) - swapcount = page_swapcount(page); - if (total_swapcount) - *total_swapcount = swapcount; - return swapcount + page_trans_huge_mapcount(page); - } - - page = compound_head(page); - - _total_swapcount = map_swapcount = 0; - if (PageSwapCache(page)) { - swp_entry_t entry; - - entry.val = page_private(page); - si = _swap_info_get(entry); - if (si) { - map = si->swap_map; - offset = swp_offset(entry); - } - } - if (map) - ci = lock_cluster(si, offset); - for (i = 0; i < HPAGE_PMD_NR; i++) { - int mapcount = atomic_read(&page[i]._mapcount) + 1; - if (map) { - swapcount = swap_count(map[offset + i]); - _total_swapcount += swapcount; - } - map_swapcount = max(map_swapcount, mapcount + swapcount); - } - unlock_cluster(ci); - - if (PageDoubleMap(page)) - map_swapcount -= 1; - - if (total_swapcount) - *total_swapcount = _total_swapcount; - - return map_swapcount + compound_mapcount(page); -} - -/* - * We can write to an anon page without COW if there are no other references - * to it. And as a side-effect, free up its swap: because the old content - * on disk will never be read, and seeking back there to write new content - * later would only waste time away from clustering. - */ -bool reuse_swap_page(struct page *page) -{ - int count, total_swapcount; - - VM_BUG_ON_PAGE(!PageLocked(page), page); - if (unlikely(PageKsm(page))) - return false; - count = page_trans_huge_map_swapcount(page, &total_swapcount); - if (count == 1 && PageSwapCache(page) && - (likely(!PageTransCompound(page)) || - /* The remaining swap count will be freed soon */ - total_swapcount == page_swapcount(page))) { - if (!PageWriteback(page)) { - page = compound_head(page); - delete_from_swap_cache(page); - SetPageDirty(page); - } else { - swp_entry_t entry; - struct swap_info_struct *p; - - entry.val = page_private(page); - p = swap_info_get(entry); - if (p->flags & SWP_STABLE_WRITES) { - spin_unlock(&p->lock); - return false; - } - spin_unlock(&p->lock); - } - } - - return count <= 1; -} - /* * If swap is getting full, or if there are no more mappings of this page, * then try_to_free_swap is called to free its swap space. From patchwork Mon Jan 31 16:29:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730870 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A0D8C433F5 for ; Mon, 31 Jan 2022 16:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEAF58D000D; Mon, 31 Jan 2022 11:33:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C9AE18D0001; Mon, 31 Jan 2022 11:33:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3C488D000D; Mon, 31 Jan 2022 11:33:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id A125F8D0001 for ; Mon, 31 Jan 2022 11:33:37 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3A222951A3 for ; Mon, 31 Jan 2022 16:33:36 +0000 (UTC) X-FDA: 79091127990.12.DE695B2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 43553C000C for ; Mon, 31 Jan 2022 16:33:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646814; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Su3tfNSxu4pFBMhTHkCtGXz4Z8gTxkcEWqCn/7T98FU=; b=I5UR3x7BTOgqXizRp9aOLuSP1GDGv/uHgL6OWuZ9syi3k/0fqcI7MPr3Qzm9P7PAN0OiE8 5ErTChRLk3rOnpFVpjfpnnGddRx9TAQ8cdrvhOnERqBq+2Ii4gUtGKpbg35pYurqUZMx3h +7+OhpwukloYvxu/MrNRAvVIxfpFxWI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-387-eOdulcGEM12hLHtsU6FxKQ-1; Mon, 31 Jan 2022 11:33:31 -0500 X-MC-Unique: eOdulcGEM12hLHtsU6FxKQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A897F1018722; Mon, 31 Jan 2022 16:33:27 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7EF43798CD; Mon, 31 Jan 2022 16:33:21 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() Date: Mon, 31 Jan 2022 17:29:38 +0100 Message-Id: <20220131162940.210846-9-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 43553C000C X-Stat-Signature: sryznqynbs1rj93ha33mq47ysdjb7ckm Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=I5UR3x7B; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf10.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Rspam-User: nil X-HE-Tag: 1643646815-376600 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All users are gone, let's remove it. Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- include/linux/mm.h | 5 ----- mm/huge_memory.c | 48 ---------------------------------------------- 2 files changed, 53 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..a12291cfe5dd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -820,16 +820,11 @@ static inline int page_mapcount(struct page *page) #ifdef CONFIG_TRANSPARENT_HUGEPAGE int total_mapcount(struct page *page); -int page_trans_huge_mapcount(struct page *page); #else static inline int total_mapcount(struct page *page) { return page_mapcount(page); } -static inline int page_trans_huge_mapcount(struct page *page) -{ - return page_mapcount(page); -} #endif static inline struct page *virt_to_head_page(const void *x) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f34ebc5cb827..a6dc5af1a763 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2523,54 +2523,6 @@ int total_mapcount(struct page *page) return ret; } -/* - * This calculates accurately how many mappings a transparent hugepage - * has (unlike page_mapcount() which isn't fully accurate). This full - * accuracy is primarily needed to know if copy-on-write faults can - * reuse the page and change the mapping to read-write instead of - * copying them. At the same time this returns the total_mapcount too. - * - * The function returns the highest mapcount any one of the subpages - * has. If the return value is one, even if different processes are - * mapping different subpages of the transparent hugepage, they can - * all reuse it, because each process is reusing a different subpage. - * - * The total_mapcount is instead counting all virtual mappings of the - * subpages. If the total_mapcount is equal to "one", it tells the - * caller all mappings belong to the same "mm" and in turn the - * anon_vma of the transparent hugepage can become the vma->anon_vma - * local one as no other process may be mapping any of the subpages. - * - * It would be more accurate to replace page_mapcount() with - * page_trans_huge_mapcount(), however we only use - * page_trans_huge_mapcount() in the copy-on-write faults where we - * need full accuracy to avoid breaking page pinning, because - * page_trans_huge_mapcount() is slower than page_mapcount(). - */ -int page_trans_huge_mapcount(struct page *page) -{ - int i, ret; - - /* hugetlbfs shouldn't call it */ - VM_BUG_ON_PAGE(PageHuge(page), page); - - if (likely(!PageTransCompound(page))) - return atomic_read(&page->_mapcount) + 1; - - page = compound_head(page); - - ret = 0; - for (i = 0; i < thp_nr_pages(page); i++) { - int mapcount = atomic_read(&page[i]._mapcount) + 1; - ret = max(ret, mapcount); - } - - if (PageDoubleMap(page)) - ret -= 1; - - return ret + compound_mapcount(page); -} - /* Racy check whether the huge page can be split */ bool can_split_huge_page(struct page *page, int *pextra_pins) { From patchwork Mon Jan 31 16:29:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12730874 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1AFFC433F5 for ; Mon, 31 Jan 2022 16:33:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60DDD8D000E; Mon, 31 Jan 2022 11:33:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BD1B8D0001; Mon, 31 Jan 2022 11:33:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 484988D000E; Mon, 31 Jan 2022 11:33:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id 39B108D0001 for ; Mon, 31 Jan 2022 11:33:42 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id F186A180238DF for ; Mon, 31 Jan 2022 16:33:41 +0000 (UTC) X-FDA: 79091128242.07.C34F233 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 8762540006 for ; Mon, 31 Jan 2022 16:33:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643646821; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yS2LXvHJpQ6J5JWRMEEhchS65pUuAfYv0luhTwYX+Ho=; b=XnD85oe7daW6XOljNbzAK18e+Oz7ZKGRqmPkiiI1pMKRoYGQBFRPejPMkxYxP+jYEC6drP 3lj0LxpuFHTYUBFe9wZO1TFzliRJI2zEgSvlLtZw2InjT2NIsgm6wPxq0Q0P+k83zeIFaH mtnMx2a0k75j/PSfvb+nnnIRrLMBXpo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-652-ysRcGFu6Pn2U572AOTfgrQ-1; Mon, 31 Jan 2022 11:33:37 -0500 X-MC-Unique: ysRcGFu6Pn2U572AOTfgrQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 24C001091DA1; Mon, 31 Jan 2022 16:33:34 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2BB34798C6; Mon, 31 Jan 2022 16:33:27 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v3 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() Date: Mon, 31 Jan 2022 17:29:39 +0100 Message-Id: <20220131162940.210846-10-david@redhat.com> In-Reply-To: <20220131162940.210846-1-david@redhat.com> References: <20220131162940.210846-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8762540006 X-Stat-Signature: pq9y7bmao146d5noc8j9wucty7kxenhu X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XnD85oe7; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1643646821-873517 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's remove the stale logic that was required for reuse_swap_page(). Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka --- mm/huge_memory.c | 32 +------------------------------- 1 file changed, 1 insertion(+), 31 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a6dc5af1a763..cda88d8ac1bd 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2152,8 +2152,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, { spinlock_t *ptl; struct mmu_notifier_range range; - bool do_unlock_page = false; - pmd_t _pmd; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, address & HPAGE_PMD_MASK, @@ -2172,35 +2170,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, goto out; } -repeat: if (pmd_trans_huge(*pmd)) { - if (!page) { + if (!page) page = pmd_page(*pmd); - /* - * An anonymous page must be locked, to ensure that a - * concurrent reuse_swap_page() sees stable mapcount; - * but reuse_swap_page() is not used on shmem or file, - * and page lock must not be taken when zap_pmd_range() - * calls __split_huge_pmd() while i_mmap_lock is held. - */ - if (PageAnon(page)) { - if (unlikely(!trylock_page(page))) { - get_page(page); - _pmd = *pmd; - spin_unlock(ptl); - lock_page(page); - spin_lock(ptl); - if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); - put_page(page); - page = NULL; - goto repeat; - } - put_page(page); - } - do_unlock_page = true; - } - } if (PageMlocked(page)) clear_page_mlock(page); } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) @@ -2208,8 +2180,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: spin_unlock(ptl); - if (do_unlock_page) - unlock_page(page); /* * No need to double call mmu_notifier->invalidate_range() callback. * They are 3 cases to consider inside __split_huge_pmd_locked():