From patchwork Wed Jan 26 09:55:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17282C2BA4C for ; Wed, 26 Jan 2022 09:57:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 988D86B0074; Wed, 26 Jan 2022 04:57:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90FB06B0075; Wed, 26 Jan 2022 04:57:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D8016B0078; Wed, 26 Jan 2022 04:57:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 6503E6B0074 for ; Wed, 26 Jan 2022 04:57:48 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1E95F8DF96 for ; Wed, 26 Jan 2022 09:57:48 +0000 (UTC) X-FDA: 79071986616.07.1917896 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 695D5120007 for ; Wed, 26 Jan 2022 09:57:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191065; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NAAIIFATEqd80W11R9L9MXwM4GCrPUNUZiPXyoAGr5M=; b=jMKqRpRmJE0M0ohH7TnybmXwLZcmtoGC/wCpoIK4wczvUXFsRXTbz4VMUTuuDLM7XzIyAd S2b4ZmR+J/1MhypSmVKL8nLjes0JMciMmc1uj9VYsgqz9cbS1WyUHiv+XHxL7dmNac1mcw C+2D06Ki61gO3jPv+2+hG4wBCsUtFA4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-672-B2JjIRltNDiDbXZcfa4JDQ-1; Wed, 26 Jan 2022 04:57:42 -0500 X-MC-Unique: B2JjIRltNDiDbXZcfa4JDQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E5DE2100C62A; Wed, 26 Jan 2022 09:57:39 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id B3BA21F2FB; Wed, 26 Jan 2022 09:56:57 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand , Nadav Amit Subject: [PATCH RFC v2 1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache Date: Wed, 26 Jan 2022 10:55:49 +0100 Message-Id: <20220126095557.32392-2-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 695D5120007 X-Stat-Signature: xcekitk7t3jqrkd4p9ar3pm1xj3mgqfc Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jMKqRpRm; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf29.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1643191066-722577 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Liang Zhang reported [1] that the current COW logic in do_wp_page() is sub-optimal when it comes to swap+read fault+write fault of anonymous pages that have a single user, visible via a performance degradation in the redis benchmark. Something similar was previously reported [2] by Nadav with a simple reproducer. Let's optimize for pages that have been added to the swapcache but only have an exclusive owner. Try removing the swapcache reference if there is hope that we're the exclusive user. We will fail removing the swapcache reference in two scenarios: (1) There are additional swap entries referencing the page: copying instead of reusing is the right thing to do. (2) The page is under writeback: theoretically we might be able to reuse in some cases, however, we cannot remove the additional reference and will have to copy. Further, we might have additional references from the LRU pagevecs, which will force us to copy instead of being able to reuse. We'll try handling such references for some scenarios next. Concurrent writeback cannot be handled easily and we'll always have to copy. While at it, remove the superfluous page_mapcount() check: it's implicitly covered by the page_count() for ordinary anon pages. [1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com [2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com Reported-by: Liang Zhang Reported-by: Nadav Amit Signed-off-by: David Hildenbrand Reviewed-by: Matthew Wilcox (Oracle) Acked-by: Vlastimil Babka --- mm/memory.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..bcd3b7c50891 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3291,19 +3291,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) if (PageAnon(vmf->page)) { struct page *page = vmf->page; - /* PageKsm() doesn't necessarily raise the page refcount */ - if (PageKsm(page) || page_count(page) != 1) + /* + * We have to verify under page lock: these early checks are + * just an optimization to avoid locking the page and freeing + * the swapcache if there is little hope that we can reuse. + * + * PageKsm() doesn't necessarily raise the page refcount. + */ + if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page)) goto copy; if (!trylock_page(page)) goto copy; - if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { + if (PageSwapCache(page)) + try_to_free_swap(page); + if (PageKsm(page) || page_count(page) != 1) { unlock_page(page); goto copy; } /* - * Ok, we've got the only map reference, and the only - * page count reference, and the page is locked, - * it's dark out, and we're wearing sunglasses. Hit it. + * Ok, we've got the only page reference from our mapping + * and the page is locked, it's dark out, and we're wearing + * sunglasses. Hit it. */ unlock_page(page); wp_page_reuse(vmf); From patchwork Wed Jan 26 09:55:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724858 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B757C28CF5 for ; Wed, 26 Jan 2022 09:58:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B81F06B0072; Wed, 26 Jan 2022 04:58:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B0A976B0075; Wed, 26 Jan 2022 04:58:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AD176B0078; Wed, 26 Jan 2022 04:58:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165]) by kanga.kvack.org (Postfix) with ESMTP id 877E96B0072 for ; Wed, 26 Jan 2022 04:58:27 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3FB5C95192 for ; Wed, 26 Jan 2022 09:58:27 +0000 (UTC) X-FDA: 79071988254.12.99C80CD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id C40EC140005 for ; Wed, 26 Jan 2022 09:58:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191106; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fIx2OqkV5g2J0YsQllaAKYz2gQUQCnh10CIndxq7uoU=; b=CHzekelLAM/jKHnnLo6LmUOwdeP2b2VU4AatQ50CDhbNLZeIoQ/Nc8iGbBkXj6J3Rb/Srn Tj5WxTxQ8y0l7pgbZWZ5TWl615HRtvFy9gUrUi02w2ERUCs6DRddFZB4PsSQXKbRA3v8G7 9x1lzZ+EOajt9O0aG72ZW+NyBmarIFM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-422-6fqFOokFNRunatfjIhy5Bw-1; Wed, 26 Jan 2022 04:58:22 -0500 X-MC-Unique: 6fqFOokFNRunatfjIhy5Bw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D6E481091DA5; Wed, 26 Jan 2022 09:58:19 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 684AE1F2E8; Wed, 26 Jan 2022 09:57:40 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs Date: Wed, 26 Jan 2022 10:55:50 +0100 Message-Id: <20220126095557.32392-3-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Stat-Signature: 6sfbhx6x14188pr1ejp9hhcpnm3yqnc8 X-Rspam-User: nil Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CHzekelL; spf=none (imf26.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C40EC140005 X-HE-Tag: 1643191106-456038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For example, if a page just got swapped in via a read fault, the LRU pagevecs might still hold a reference to the page. If we trigger a write fault on such a page, the additional reference from the LRU pagevecs will prohibit reusing the page. Let's conditionally drain the local LRU pagevecs when we stumble over a !PageLRU() page. We cannot easily drain remote LRU pagevecs and it might not be desirable performance-wise. Consequently, this will only avoid copying in some cases. We cannot easily handle the following cases and we will always have to copy: (1) The page is referenced in the LRU pagevecs of other CPUs. We really would have to drain the LRU pagevecs of all CPUs -- most probably copying is much cheaper. (2) The page is already PageLRU() but is getting moved between LRU lists, for example, for activation (e.g., mark_page_accessed()), deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to drain mostly unconditionally, which might be bad performance-wise. Most probably this won't happen too often in practice. Note that there are other reasons why an anon page might temporarily not be PageLRU(): for example, compaction and migration have to isolate LRU pages from the LRU lists first (isolate_lru_page()), moving them to temporary local lists and clearing PageLRU() and holding an additional reference on the page. In that case, we'll always copy. This change seems to be fairly effective with the reproducer [1] shared by Nadav, as long as writeback is done synchronously, for example, using zram. However, with asynchronous writeback, we'll usually fail to free the swapcache because the page is still under writeback: something we cannot easily optimize for, and maybe it's not really relevant in practice. [1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com Signed-off-by: David Hildenbrand --- mm/memory.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index bcd3b7c50891..61d67ceef734 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3298,7 +3298,17 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * * PageKsm() doesn't necessarily raise the page refcount. */ - if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page)) + if (PageKsm(page)) + goto copy; + if (page_count(page) > 1 + PageSwapCache(page) + !PageLRU(page)) + goto copy; + if (!PageLRU(page)) + /* + * Note: We cannot easily detect+handle references from + * remote LRU pagevecs or references to PageLRU() pages. + */ + lru_add_drain(); + if (page_count(page) > 1 + PageSwapCache(page)) goto copy; if (!trylock_page(page)) goto copy; From patchwork Wed Jan 26 09:55:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8823C28CF5 for ; Wed, 26 Jan 2022 09:59:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 566AE6B0074; Wed, 26 Jan 2022 04:59:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EF706B0075; Wed, 26 Jan 2022 04:59:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B7DA6B0078; Wed, 26 Jan 2022 04:59:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id 2C9B06B0074 for ; Wed, 26 Jan 2022 04:59:10 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id ECC5B9528C for ; Wed, 26 Jan 2022 09:59:09 +0000 (UTC) X-FDA: 79071990018.24.05BCC90 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 8C9FF180005 for ; Wed, 26 Jan 2022 09:59:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191149; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rwvlrvac61lF3gQ6F6uZri0xKYAPflsQBOJgJdyNeH0=; b=UTXMtdGSs/8bh4nKI94w0xAW1AB8363Ein9qtOysvfyfQEeAeCB+KsAeHO7hpMfqAqyGAX gfKBHQ4TL8ZlYksLbhSJHJ8vCxkFQxkGscUWjU4Z4il7ZsIl9MgeBE6uMeZiWYZqEhWCEY 2YHNWK/LYkbXDLUofk0yk9HjigPGEw4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-416-_NkElsNjPvOI0gEAXXjxxg-1; Wed, 26 Jan 2022 04:59:05 -0500 X-MC-Unique: _NkElsNjPvOI0gEAXXjxxg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2F163100C663; Wed, 26 Jan 2022 09:59:03 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3E2211F2FD; Wed, 26 Jan 2022 09:58:20 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 3/9] mm: slightly clarify KSM logic in do_swap_page() Date: Wed, 26 Jan 2022 10:55:51 +0100 Message-Id: <20220126095557.32392-4-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UTXMtdGS; spf=none (imf06.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: nil X-Rspamd-Queue-Id: 8C9FF180005 X-Stat-Signature: ho91m17b5y3a1fiat1mjwhd7c9futiwn X-Rspamd-Server: rspam12 X-HE-Tag: 1643191149-332875 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's make it clearer that KSM might only have to copy a page in case we have a page in the swapcache, not if we allocated a fresh page and bypassed the swapcache. While at it, add a comment why this is usually necessary and merge the two swapcache conditions. Signed-off-by: David Hildenbrand --- mm/memory.c | 38 +++++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 61d67ceef734..ab3153252cfe 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3617,21 +3617,29 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } - /* - * Make sure try_to_free_swap or reuse_swap_page or swapoff did not - * release the swapcache from under us. The page pin, and pte_same - * test below, are not enough to exclude that. Even if it is still - * swapcache, we need to check that the page's swap has not changed. - */ - if (unlikely((!PageSwapCache(page) || - page_private(page) != entry.val)) && swapcache) - goto out_page; - - page = ksm_might_need_to_copy(page, vma, vmf->address); - if (unlikely(!page)) { - ret = VM_FAULT_OOM; - page = swapcache; - goto out_page; + if (swapcache) { + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did + * not release the swapcache from under us. The page pin, and + * pte_same test below, are not enough to exclude that. Even if + * it is still swapcache, we need to check that the page's swap + * has not changed. + */ + if (unlikely(!PageSwapCache(page) || + page_private(page) != entry.val)) + goto out_page; + + /* + * KSM sometimes has to copy on read faults, for example, if + * page->index of !PageKSM() pages would be nonlinear inside the + * anon VMA -- PageKSM() is lost on actual swapout. + */ + page = ksm_might_need_to_copy(page, vma, vmf->address); + if (unlikely(!page)) { + ret = VM_FAULT_OOM; + page = swapcache; + goto out_page; + } } cgroup_throttle_swaprate(page, GFP_KERNEL); From patchwork Wed Jan 26 09:55:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724880 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F731C2BA4C for ; Wed, 26 Jan 2022 10:00:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E68DF6B0072; Wed, 26 Jan 2022 05:00:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E16666B0075; Wed, 26 Jan 2022 05:00:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDE0F6B0078; Wed, 26 Jan 2022 05:00:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id BD3326B0072 for ; Wed, 26 Jan 2022 05:00:08 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7FF5B8E3E1 for ; Wed, 26 Jan 2022 10:00:08 +0000 (UTC) X-FDA: 79071992496.10.50232F2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 2FA53A0014 for ; Wed, 26 Jan 2022 10:00:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191207; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8lZCUoGF7vMZxd2uBv8vuAjoIcLxZX5AOgXwi6hd1oU=; b=hmaCM0ds68UaddZt6jLEks8LLfDWcfVmdoewQ6e90gCf63TIKmK7MtrQpFFyaQGGcIrsot 2vWjLQc9JcKT9jXjXvGGbX3c0pXvH8Qtzx9TShcjafLybjCxNZiQ1eDqljIxIZJRIT23vF mouO4i0peB6QKaZoyaqKlt9jmnYyH2Y= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-49-LrhBAHJkN12DwCXYCzWIZA-1; Wed, 26 Jan 2022 05:00:03 -0500 X-MC-Unique: LrhBAHJkN12DwCXYCzWIZA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B88B286A061; Wed, 26 Jan 2022 10:00:00 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8E0E31F2E8; Wed, 26 Jan 2022 09:59:03 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 4/9] mm: streamline COW logic in do_swap_page() Date: Wed, 26 Jan 2022 10:55:52 +0100 Message-Id: <20220126095557.32392-5-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2FA53A0014 X-Rspam-User: nil Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hmaCM0ds; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf25.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: 8f58axqakau6efu1eohnnukwh1curai5 X-HE-Tag: 1643191208-41405 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have a different COW logic when: * triggering a read-fault to swapin first and then trigger a write-fault -> do_swap_page() + do_wp_page() * triggering a write-fault to swapin -> do_swap_page() + do_wp_page() only if we fail reuse in do_swap_page() The COW logic in do_swap_page() is different than our reuse logic in do_wp_page(). The COW logic in do_wp_page() -- page_count() == 1 -- makes currently sure that we certainly don't have a remaining reference, e.g., via GUP, on the target page we want to reuse: if there is any unexpected reference, we have to copy to avoid information leaks. As do_swap_page() behaves differently, in environments with swap enabled we can currently have an unintended information leak from the parent to the child, similar as known from CVE-2020-29374: 1. Parent writes to anonymous page -> Page is mapped writable and modified 2. Page is swapped out -> Page is unmapped and replaced by swap entry 3. fork() -> Swap entries are copied to child 4. Child pins page R/O -> Page is mapped R/O into child 5. Child unmaps page -> Child still holds GUP reference 6. Parent writes to page -> Page is reused in do_swap_page() -> Child can observe changes Exchanging 2. and 3. should have the same effect. Let's apply the same COW logic as in do_wp_page(), conditionally trying to remove the page from the swapcache after freeing the swap entry, however, before actually mapping our page. We can change the order now that we use try_to_free_swap(), which doesn't care about the mapcount, instead of reuse_swap_page(). To handle references from the LRU pagevecs, conditionally drain the local LRU pagevecs when required, however, don't consider the page_count() when deciding whether to drain to keep it simple for now. Signed-off-by: David Hildenbrand --- mm/memory.c | 55 +++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ab3153252cfe..ba23d13b8410 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3499,6 +3499,25 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) return 0; } +static inline bool should_try_to_free_swap(struct page *page, + struct vm_area_struct *vma, + unsigned int fault_flags) +{ + if (!PageSwapCache(page)) + return false; + if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || + PageMlocked(page)) + return true; + /* + * If we want to map a page that's in the swapcache writable, we + * have to detect via the refcount if we're really the exclusive + * owner. Try freeing the swapcache to get rid of the swapcache + * reference in case it's likely that we will succeed. + */ + return (fault_flags & FAULT_FLAG_WRITE) && !PageKsm(page) && + page_count(page) == 2; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3640,6 +3659,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) page = swapcache; goto out_page; } + + /* + * If we want to map a page that's in the swapcache writable, we + * have to detect via the refcount if we're really the exclusive + * owner. Try removing the extra reference from the local LRU + * pagevecs if required. + */ + if ((vmf->flags & FAULT_FLAG_WRITE) && page == swapcache && + !PageKsm(page) && !PageLRU(page)) + lru_add_drain(); } cgroup_throttle_swaprate(page, GFP_KERNEL); @@ -3658,19 +3687,25 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } /* - * The page isn't present yet, go ahead with the fault. - * - * Be careful about the sequence of operations here. - * To get its accounting right, reuse_swap_page() must be called - * while the page is counted on swap but not yet in mapcount i.e. - * before page_add_anon_rmap() and swap_free(); try_to_free_swap() - * must be called after the swap_free(), or it will never succeed. + * Remove the swap entry and conditionally try to free up the swapcache. + * We're already holding a reference on the page but haven't mapped it + * yet. */ + swap_free(entry); + if (should_try_to_free_swap(page, vma, vmf->flags)) + try_to_free_swap(page); inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS); pte = mk_pte(page, vma->vm_page_prot); - if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) { + + /* + * Same logic as in do_wp_page(); however, optimize for fresh pages + * that are certainly not shared because we just allocated them without + * exposing them to the swapcache. + */ + if ((vmf->flags & FAULT_FLAG_WRITE) && !PageKsm(page) && + (page != swapcache || page_count(page) == 1)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &= ~FAULT_FLAG_WRITE; ret |= VM_FAULT_WRITE; @@ -3696,10 +3731,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); - swap_free(entry); - if (mem_cgroup_swap_full(page) || - (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) - try_to_free_swap(page); unlock_page(page); if (page != swapcache && swapcache) { /* From patchwork Wed Jan 26 09:55:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05842C28CF5 for ; Wed, 26 Jan 2022 10:00:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 910306B0074; Wed, 26 Jan 2022 05:00:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BEF96B0078; Wed, 26 Jan 2022 05:00:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 785FA6B007B; Wed, 26 Jan 2022 05:00:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id 67C446B0074 for ; Wed, 26 Jan 2022 05:00:53 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2348080C7BC5 for ; Wed, 26 Jan 2022 10:00:53 +0000 (UTC) X-FDA: 79071994386.20.552FA3E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 2D339180023 for ; Wed, 26 Jan 2022 10:00:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191251; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZADr9hVxfhKENIr/VRsnRFwtDwaPlig2N4uKOY4hLWc=; b=KU8TfMlq2f/VN3GuCxrm9gQhQ6bj222Ez6PMo9TsAEWo0IQU4j5i1H4h+pXbaTAgOuJWXV samtPWAZ6p2vEsnYGBvH5r7GXNrZpb+TpevCmm2S9lWCpoUNhGm0nAx7Kxw4JZYy3kN1ru Jfj66ZNeXuzEhB/xIdnHFvegauitm9Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-589-doMA_xzJMmKguE09hrMqCQ-1; Wed, 26 Jan 2022 05:00:45 -0500 X-MC-Unique: doMA_xzJMmKguE09hrMqCQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9B87486A060; Wed, 26 Jan 2022 10:00:42 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5324D1F2E8; Wed, 26 Jan 2022 10:00:01 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 5/9] mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() Date: Wed, 26 Jan 2022 10:55:53 +0100 Message-Id: <20220126095557.32392-6-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2D339180023 X-Stat-Signature: iqendy89cypx1qunnomz414zqu1swgh6 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KU8TfMlq; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1643191251-998679 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We currently have a different COW logic for anon THP than we have for ordinary anon pages in do_wp_page(): the effect is that the issue reported in CVE-2020-29374 is currently still possible for anon THP: an unintended information leak from the parent to the child. Let's apply the same logic (page_count() == 1), with similar optimizations to remove additional references first as we really want to avoid PTE-mapping the THP and copying individual pages best we can. If we end up with a page that has page_count() != 1, we'll have to PTE-map the THP and fallback to do_wp_page(), which will always copy the page. Note that KSM does not apply to THP. I. Interaction with the swapcache and writeback While a THP is in the swapcache, the swapcache holds one reference on each subpage of the THP. So with PageSwapCache() set, we expect as many additional references as we have subpages. If we manage to remove the THP from the swapcache, all these references will be gone. Usually, a THP is not split when entered into the swapcache and stays a compound page. However, try_to_unmap() will PTE-map the THP and use PTE swap entries. There are no PMD swap entries for that purpose, consequently, we always only swapin subpages into PTEs. Removing a page from the swapcache can fail either when there are remaining swap entries (in which case COW is the right thing to do) or if the page is currently under writeback. Having a locked, R/O PMD-mapped THP that is in the swapcache seems to be possible only in corner cases, for example, if try_to_unmap() failed after adding the page to the swapcache. However, it's comparatively easy to handle. As we have to fully unmap a THP before starting writeback, and swapin is always done on the PTE level, we shouldn't find a R/O PMD-mapped THP in the swapcache that is under writeback. This should at least leave writeback out of the picture. II. Interaction with GUP references Having a R/O PMD-mapped THP with GUP references (i.e., R/O references) will result in PTE-mapping the THP on a write fault. Similar to ordinary anon pages, do_wp_page() will have to copy sub-pages and result in a disconnect between the GUP references and the pages actually mapped into the page tables. To improve the situation in the future, we'll need additional handling to mark anonymous pages as definitely exclusive to a single process, only allow GUP pins on exclusive anon pages, and disallow sharing of exclusive anon pages with GUP pins e.g., during fork(). III. Interaction with references from LRU pagevecs Similar to ordinary anon pages, we can have LRU pagevecs referencing our THP. Reliably removing such references requires draining LRU pagevecs on all CPUs -- lru_add_drain_all() -- a possibly expensive operation that can sleep. For now, similar do do_wp_page(), let's conditionally drain the local LRU pagevecs only if we detect !PageLRU(). IV. Interaction with speculative/temporary references Similar to ordinary anon pages, other speculative/temporary references on the THP, for example, from the pagecache or page migration code, will disallow exclusive reuse of the page. We'll have to PTE-map the THP. Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 406a3c28c026..b6ba88a98266 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1286,6 +1286,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) struct page *page; unsigned long haddr = vmf->address & HPAGE_PMD_MASK; pmd_t orig_pmd = vmf->orig_pmd; + int swapcache_refs = 0; vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); VM_BUG_ON_VMA(!vma->anon_vma, vma); @@ -1303,7 +1304,6 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) page = pmd_page(orig_pmd); VM_BUG_ON_PAGE(!PageHead(page), page); - /* Lock page for reuse_swap_page() */ if (!trylock_page(page)) { get_page(page); spin_unlock(vmf->ptl); @@ -1319,10 +1319,20 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) } /* - * We can only reuse the page if nobody else maps the huge page or it's - * part. + * See do_wp_page(): we can only map the page writable if there are + * no additional references. */ - if (reuse_swap_page(page)) { + if (PageSwapCache(page)) + swapcache_refs = thp_nr_pages(page); + if (page_count(page) > 1 + swapcache_refs + !PageLRU(page)) + goto unlock_fallback; + if (!PageLRU(page)) + lru_add_drain(); + if (page_count(page) > 1 + swapcache_refs) + goto unlock_fallback; + if (swapcache_refs) + try_to_free_swap(page); + if (page_count(page) == 1) { pmd_t entry; entry = pmd_mkyoung(orig_pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); @@ -1333,6 +1343,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) return VM_FAULT_WRITE; } +unlock_fallback: unlock_page(page); spin_unlock(vmf->ptl); fallback: From patchwork Wed Jan 26 09:55:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5A7EC28CF5 for ; Wed, 26 Jan 2022 10:00:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 406EA6B0078; Wed, 26 Jan 2022 05:00:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B7966B007B; Wed, 26 Jan 2022 05:00:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27E266B007D; Wed, 26 Jan 2022 05:00:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0176.hostedemail.com [216.40.44.176]) by kanga.kvack.org (Postfix) with ESMTP id 18EEF6B0078 for ; Wed, 26 Jan 2022 05:00:58 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D1DBB95AF0 for ; Wed, 26 Jan 2022 10:00:57 +0000 (UTC) X-FDA: 79071994554.13.0E5ECFE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 8A645160006 for ; Wed, 26 Jan 2022 10:00:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191256; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uz7WrEefhkRzVFGmcsfP6Zp/Ap9shXYEfGhZft5IAs0=; b=YqvlLzUfQHsqzFehari2wAbKLXh9PLhDQPwTNpJ6ZQTdMhYcVPBUiAy4ek+qYzE33mdBZJ PjjeXqtAhUjcp9Lsb0TT57ICNWxf7O0dg1wQhlERCx6LtZKzNvrFY4daqSVKk2hnhw38Wm DtaEoJf1OoCyuTvBKkvSP8WC7K9WHTU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-455-Z5P5QTJXMmC-UmZSLEVHQg-1; Wed, 26 Jan 2022 05:00:52 -0500 X-MC-Unique: Z5P5QTJXMmC-UmZSLEVHQg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 914831006AA6; Wed, 26 Jan 2022 10:00:49 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 199521F2F3; Wed, 26 Jan 2022 10:00:42 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage Date: Wed, 26 Jan 2022 10:55:54 +0100 Message-Id: <20220126095557.32392-7-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8A645160006 X-Rspam-User: nil Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YqvlLzUf; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf08.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: seappyktxsadrcypbfygx6ms8b9i6c8r X-HE-Tag: 1643191256-406546 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: reuse_swap_page() currently indicates if we can write to an anon page without COW. A COW is required if the page is shared by multiple processes (either already mapped or via swap entries) or if there is concurrent writeback that cannot tolerate concurrent page modifications. reuse_swap_page() doesn't check for pending references from other processes that already unmapped the page, however, is_refcount_suitable() essentially does the same thing in the context of khugepaged. khugepaged is the last remaining user of reuse_swap_page() and we want to remove that function. In the context of khugepaged, we are not actually going to write to the page and we don't really care about other processes mapping the page: for example, without swap, we don't care about shared pages at all. The current logic seems to be: * Writable: -> Not shared, but might be in the swapcache. Nobody can fault it in from the swapcache as there are no other swap entries. * Readable and not in the swapcache: Might be shared (but nobody can fault it in from the swapcache). * Readable and in the swapcache: Might be shared and someone might be able to fault it in from the swapcache. Make sure we're the exclusive owner via reuse_swap_page(). Having to guess due to lack of comments and documentation, the current logic really only wants to make sure that a page that might be shared cannot be faulted in from the swapcache while khugepaged is active. It's hard to guess why that is that case and if it's really still required, but let's try keeping that logic unmodified. Instead of relying on reuse_swap_page(), let's unconditionally try_to_free_swap(), special casing PageKsm(). try_to_free_swap() will fail if there are still swap entries targeting the page or if the page is under writeback. After a successful try_to_free_swap() that page cannot be readded to the swapcache because we're keeping the page locked and removed from the LRU until we actually perform the copy. So once we succeeded removing a page from the swapcache, it cannot be re-added until we're done copying. Add a comment stating that. Signed-off-by: David Hildenbrand --- mm/khugepaged.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 35f14d0a00a6..bc0ff598e98f 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -683,10 +683,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, goto out; } if (!pte_write(pteval) && PageSwapCache(page) && - !reuse_swap_page(page)) { + (PageKsm(page) || !try_to_free_swap(page))) { /* - * Page is in the swap cache and cannot be re-used. - * It cannot be collapsed into a THP. + * Possibly shared page cannot be removed from the + * swapache. It cannot be collapsed into a THP. */ unlock_page(page); result = SCAN_SWAP_CACHE_PAGE; @@ -702,6 +702,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } + + /* + * We're holding the page lock and removed the page from the + * LRU. Once done copying, we'll unlock and readd to the + * LRU via release_pte_page(). If the page is still in the + * swapcache, we're the exclusive owner. Due to the page lock + * the page cannot be added to the swapcache until we're done + * and consequently it cannot be faulted in from the swapcache + * into another process. + */ mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page)); From patchwork Wed Jan 26 09:55:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1802FC2BA4C for ; Wed, 26 Jan 2022 10:01:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 983876B007B; Wed, 26 Jan 2022 05:01:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9333D6B007D; Wed, 26 Jan 2022 05:01:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FB9F6B0080; Wed, 26 Jan 2022 05:01:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay040.a.hostedemail.com [64.99.140.40]) by kanga.kvack.org (Postfix) with ESMTP id 6F56A6B007B for ; Wed, 26 Jan 2022 05:01:10 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E0BF4102D for ; Wed, 26 Jan 2022 10:01:09 +0000 (UTC) X-FDA: 79071995058.03.D6AA34B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 6A294180009 for ; Wed, 26 Jan 2022 10:01:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191268; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KaiTUjRYTOXVgHeAUn6VE+wj9NpgU3CxyxXikV/W8rE=; b=A197hYgCWCDTB0SSiQt5tVQVU9AhYuRktoL2M5VW7i5N7HH/89L1V+3b8CKvoTx91/+eNt LrqdWSTv3VVOhRJ+koyeIDpY27U2b0D20aL2OC7hfP8x0LvQIF6DoNUB+zxtgw/UzXvdxV Q2CDYfZCjtDzXsF0jkRG2dM7x9GsBYA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-214-uqqGwpPKPKy7Aa-6tyVRdw-1; Wed, 26 Jan 2022 05:00:59 -0500 X-MC-Unique: uqqGwpPKPKy7Aa-6tyVRdw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AB08D51090; Wed, 26 Jan 2022 10:00:56 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2B4391F2F8; Wed, 26 Jan 2022 10:00:49 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 7/9] mm/swapfile: remove reuse_swap_page() Date: Wed, 26 Jan 2022 10:55:55 +0100 Message-Id: <20220126095557.32392-8-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Queue-Id: 6A294180009 X-Rspam-User: nil Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=A197hYgC; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf24.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: 5m3xym6ttd9kq6iuntkrzze9mpyz3b3r X-Rspamd-Server: rspam08 X-HE-Tag: 1643191269-215744 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick around for now, as it might come in handy in the near future. Signed-off-by: David Hildenbrand --- include/linux/swap.h | 4 -- mm/swapfile.c | 104 ------------------------------------------- 2 files changed, 108 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..b546e4bd5c5a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -514,7 +514,6 @@ extern int __swp_swapcount(swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); -extern bool reuse_swap_page(struct page *); extern int try_to_free_swap(struct page *); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); @@ -680,9 +679,6 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -#define reuse_swap_page(page) \ - (page_trans_huge_mapcount(page) == 1) - static inline int try_to_free_swap(struct page *page) { return 0; diff --git a/mm/swapfile.c b/mm/swapfile.c index bf0df7aa7158..a5183315dc58 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1167,16 +1167,6 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) return NULL; } -static struct swap_info_struct *swap_info_get(swp_entry_t entry) -{ - struct swap_info_struct *p; - - p = _swap_info_get(entry); - if (p) - spin_lock(&p->lock); - return p; -} - static struct swap_info_struct *swap_info_get_cont(swp_entry_t entry, struct swap_info_struct *q) { @@ -1601,100 +1591,6 @@ static bool page_swapped(struct page *page) return false; } -static int page_trans_huge_map_swapcount(struct page *page, - int *total_swapcount) -{ - int i, map_swapcount, _total_swapcount; - unsigned long offset = 0; - struct swap_info_struct *si; - struct swap_cluster_info *ci = NULL; - unsigned char *map = NULL; - int swapcount = 0; - - /* hugetlbfs shouldn't call it */ - VM_BUG_ON_PAGE(PageHuge(page), page); - - if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) { - if (PageSwapCache(page)) - swapcount = page_swapcount(page); - if (total_swapcount) - *total_swapcount = swapcount; - return swapcount + page_trans_huge_mapcount(page); - } - - page = compound_head(page); - - _total_swapcount = map_swapcount = 0; - if (PageSwapCache(page)) { - swp_entry_t entry; - - entry.val = page_private(page); - si = _swap_info_get(entry); - if (si) { - map = si->swap_map; - offset = swp_offset(entry); - } - } - if (map) - ci = lock_cluster(si, offset); - for (i = 0; i < HPAGE_PMD_NR; i++) { - int mapcount = atomic_read(&page[i]._mapcount) + 1; - if (map) { - swapcount = swap_count(map[offset + i]); - _total_swapcount += swapcount; - } - map_swapcount = max(map_swapcount, mapcount + swapcount); - } - unlock_cluster(ci); - - if (PageDoubleMap(page)) - map_swapcount -= 1; - - if (total_swapcount) - *total_swapcount = _total_swapcount; - - return map_swapcount + compound_mapcount(page); -} - -/* - * We can write to an anon page without COW if there are no other references - * to it. And as a side-effect, free up its swap: because the old content - * on disk will never be read, and seeking back there to write new content - * later would only waste time away from clustering. - */ -bool reuse_swap_page(struct page *page) -{ - int count, total_swapcount; - - VM_BUG_ON_PAGE(!PageLocked(page), page); - if (unlikely(PageKsm(page))) - return false; - count = page_trans_huge_map_swapcount(page, &total_swapcount); - if (count == 1 && PageSwapCache(page) && - (likely(!PageTransCompound(page)) || - /* The remaining swap count will be freed soon */ - total_swapcount == page_swapcount(page))) { - if (!PageWriteback(page)) { - page = compound_head(page); - delete_from_swap_cache(page); - SetPageDirty(page); - } else { - swp_entry_t entry; - struct swap_info_struct *p; - - entry.val = page_private(page); - p = swap_info_get(entry); - if (p->flags & SWP_STABLE_WRITES) { - spin_unlock(&p->lock); - return false; - } - spin_unlock(&p->lock); - } - } - - return count <= 1; -} - /* * If swap is getting full, or if there are no more mappings of this page, * then try_to_free_swap is called to free its swap space. From patchwork Wed Jan 26 09:55:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73872C2BA4C for ; Wed, 26 Jan 2022 10:01:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11F4A6B0072; Wed, 26 Jan 2022 05:01:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CFB36B0075; Wed, 26 Jan 2022 05:01:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFF9D6B007D; Wed, 26 Jan 2022 05:01:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id DF4C86B0072 for ; Wed, 26 Jan 2022 05:01:14 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 98CD41849A83D for ; Wed, 26 Jan 2022 10:01:14 +0000 (UTC) X-FDA: 79071995268.15.1E3052E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 2414FC0007 for ; Wed, 26 Jan 2022 10:01:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191273; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TtFF3IJUOAziUOkk7Dy8cWYxEVf/NaN8ObVsm7wL2Kc=; b=Wu3+V49BpiFo2g1/HJ2b9jY3yVuLp46afO2I6qb0jkvNXilQMBaKbbsSG5j4bNv0uJ4Djs aPFyO+rUBhR0G60JYedE5o4IbOrIhbit4xs7xaY+2eBApZfuZz6KRX3IhPqC4q7kPPi+Yv V+K/CC0sI8Kj0bY3sYlashv7Rvk1K8A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-452-DT7c-FHEOW-IwDRh5--odA-1; Wed, 26 Jan 2022 05:01:07 -0500 X-MC-Unique: DT7c-FHEOW-IwDRh5--odA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1F173190B2A0; Wed, 26 Jan 2022 10:01:04 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 00BCCE2CF; Wed, 26 Jan 2022 10:00:56 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 8/9] mm/huge_memory: remove stale page_trans_huge_mapcount() Date: Wed, 26 Jan 2022 10:55:56 +0100 Message-Id: <20220126095557.32392-9-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspam-User: nil X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2414FC0007 X-Stat-Signature: mmqnbn8gbh1d78gcc6j9jmy5g5x15rps Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Wu3+V49B; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1643191273-151912 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All users are gone, let's remove it. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 5 ----- mm/huge_memory.c | 48 ---------------------------------------------- 2 files changed, 53 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e1a84b1e6787..df6b34d66a9b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -820,16 +820,11 @@ static inline int page_mapcount(struct page *page) #ifdef CONFIG_TRANSPARENT_HUGEPAGE int total_mapcount(struct page *page); -int page_trans_huge_mapcount(struct page *page); #else static inline int total_mapcount(struct page *page) { return page_mapcount(page); } -static inline int page_trans_huge_mapcount(struct page *page) -{ - return page_mapcount(page); -} #endif static inline struct page *virt_to_head_page(const void *x) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b6ba88a98266..863c933b3b1e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2529,54 +2529,6 @@ int total_mapcount(struct page *page) return ret; } -/* - * This calculates accurately how many mappings a transparent hugepage - * has (unlike page_mapcount() which isn't fully accurate). This full - * accuracy is primarily needed to know if copy-on-write faults can - * reuse the page and change the mapping to read-write instead of - * copying them. At the same time this returns the total_mapcount too. - * - * The function returns the highest mapcount any one of the subpages - * has. If the return value is one, even if different processes are - * mapping different subpages of the transparent hugepage, they can - * all reuse it, because each process is reusing a different subpage. - * - * The total_mapcount is instead counting all virtual mappings of the - * subpages. If the total_mapcount is equal to "one", it tells the - * caller all mappings belong to the same "mm" and in turn the - * anon_vma of the transparent hugepage can become the vma->anon_vma - * local one as no other process may be mapping any of the subpages. - * - * It would be more accurate to replace page_mapcount() with - * page_trans_huge_mapcount(), however we only use - * page_trans_huge_mapcount() in the copy-on-write faults where we - * need full accuracy to avoid breaking page pinning, because - * page_trans_huge_mapcount() is slower than page_mapcount(). - */ -int page_trans_huge_mapcount(struct page *page) -{ - int i, ret; - - /* hugetlbfs shouldn't call it */ - VM_BUG_ON_PAGE(PageHuge(page), page); - - if (likely(!PageTransCompound(page))) - return atomic_read(&page->_mapcount) + 1; - - page = compound_head(page); - - ret = 0; - for (i = 0; i < thp_nr_pages(page); i++) { - int mapcount = atomic_read(&page[i]._mapcount) + 1; - ret = max(ret, mapcount); - } - - if (PageDoubleMap(page)) - ret -= 1; - - return ret + compound_mapcount(page); -} - /* Racy check whether the huge page can be split */ bool can_split_huge_page(struct page *page, int *pextra_pins) { From patchwork Wed Jan 26 09:55:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12724885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9178BC63682 for ; Wed, 26 Jan 2022 10:01:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27E156B0075; Wed, 26 Jan 2022 05:01:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 255576B007D; Wed, 26 Jan 2022 05:01:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CF536B0080; Wed, 26 Jan 2022 05:01:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id F19736B0075 for ; Wed, 26 Jan 2022 05:01:17 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AE1868162930 for ; Wed, 26 Jan 2022 10:01:17 +0000 (UTC) X-FDA: 79071995394.17.5E6BFD5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 4A2D540021 for ; Wed, 26 Jan 2022 10:01:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643191276; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WDmpWapjP9Wkuqng2NIRJk/6UxFibcIxNF2eScAVq98=; b=MBHl5fsjl8hdT8EsqBKz9X0XDBVir4mTVYQbpAXp5u931yyyOVZQSVvgpqkWMC/DgZBwIh yvHpYbzBHHjFewFj62OqPyMzetTMNyy7XaM5fjz7VA1osg1tzlP+vG6W6XzaF9uoYKCtTo btTZ9mQUJq7UTtbnsqx+oChGs/aPS7g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-458-Vz9PWdCbMzKfu4dycLa_hQ-1; Wed, 26 Jan 2022 05:01:13 -0500 X-MC-Unique: Vz9PWdCbMzKfu4dycLa_hQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A63A9190B2AE; Wed, 26 Jan 2022 10:01:10 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7BF5D1F2F3; Wed, 26 Jan 2022 10:01:04 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH RFC v2 9/9] mm/huge_memory: remove stale locking logic from __split_huge_pmd() Date: Wed, 26 Jan 2022 10:55:57 +0100 Message-Id: <20220126095557.32392-10-david@redhat.com> In-Reply-To: <20220126095557.32392-1-david@redhat.com> References: <20220126095557.32392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 4A2D540021 X-Stat-Signature: wythi9zk34gigodi7az3dmhsuij61yi1 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MBHl5fsj; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf17.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Rspam-User: nil X-HE-Tag: 1643191277-940471 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's remove the stale logic that was required for reuse_swap_page(). Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 32 +------------------------------- 1 file changed, 1 insertion(+), 31 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 863c933b3b1e..5cc438f92548 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2158,8 +2158,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, { spinlock_t *ptl; struct mmu_notifier_range range; - bool do_unlock_page = false; - pmd_t _pmd; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, address & HPAGE_PMD_MASK, @@ -2178,35 +2176,9 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, goto out; } -repeat: if (pmd_trans_huge(*pmd)) { - if (!page) { + if (!page) page = pmd_page(*pmd); - /* - * An anonymous page must be locked, to ensure that a - * concurrent reuse_swap_page() sees stable mapcount; - * but reuse_swap_page() is not used on shmem or file, - * and page lock must not be taken when zap_pmd_range() - * calls __split_huge_pmd() while i_mmap_lock is held. - */ - if (PageAnon(page)) { - if (unlikely(!trylock_page(page))) { - get_page(page); - _pmd = *pmd; - spin_unlock(ptl); - lock_page(page); - spin_lock(ptl); - if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); - put_page(page); - page = NULL; - goto repeat; - } - put_page(page); - } - do_unlock_page = true; - } - } if (PageMlocked(page)) clear_page_mlock(page); } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) @@ -2214,8 +2186,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: spin_unlock(ptl); - if (do_unlock_page) - unlock_page(page); /* * No need to double call mmu_notifier->invalidate_range() callback. * They are 3 cases to consider inside __split_huge_pmd_locked():