From patchwork Thu Aug 29 16:56:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13783475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 133A9C87FC8 for ; Thu, 29 Aug 2024 16:58:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A24C66B00A1; Thu, 29 Aug 2024 12:58:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D3946B00A2; Thu, 29 Aug 2024 12:58:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84D576B00A3; Thu, 29 Aug 2024 12:58:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 65E566B00A1 for ; Thu, 29 Aug 2024 12:58:36 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 25D081606A3 for ; Thu, 29 Aug 2024 16:58:36 +0000 (UTC) X-FDA: 82505891832.04.3141F54 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 6D6A840014 for ; Thu, 29 Aug 2024 16:58:34 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FoLIfD8r; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf27.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724950615; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SjSkhPyRz/5fIFSM96q39YLgP24AQs1QkGAT42V2m6c=; b=J3kxgdXDt0taAw+9CW3VQFWQFQvOstgSRc53+jrSrefWUTnSViEdiRSKjsHNT0WU/cjXfM UQ+GUUV0T9UHGr+YMa68/iOS0x38Z2WK+CAcLiBYZV3+pkrWj8fPMU6OdwyHs67bsxVUGN iePjYo11DYvWqw4EuF6RspDm+Qijo1I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724950615; a=rsa-sha256; cv=none; b=tMC8uhZdeMnZdUprtcjFGdYcSRBOseQZkRzYjwRxXSvkEyeCy0UxL6j3om+XWo/j6MZOtI LLlNxyh4fzW/QufrOo9gzTQwxeh8+Ha6kYmBWnIjSELvrqq7ez/I//UM4uKwWH3kw8idhd EAkVs8TJPaikfNSpSQw1jClz20MeMEE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FoLIfD8r; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf27.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724950713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SjSkhPyRz/5fIFSM96q39YLgP24AQs1QkGAT42V2m6c=; b=FoLIfD8r8RvGR1/WK0TOLAMzon1MRWv5AlmkhU4cU9vAeLjOA48my1rDGvN6UbdDNDUduq 8bWgFBVvb7hOouEtWNE/o1YrlyIGPlL1lz0e5oyMm6WTGJV7zOttgi2wYFjRJqQXMnHbuV VxknXGg6Am7t5lO6fNuczLmdHR545xI= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-548-ZwwKUWzxPCORnd2cpciv7A-1; Thu, 29 Aug 2024 12:58:27 -0400 X-MC-Unique: ZwwKUWzxPCORnd2cpciv7A-1 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3E98818BC2F6; Thu, 29 Aug 2024 16:58:23 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.193.245]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 120621955F66; Thu, 29 Aug 2024 16:58:14 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Johannes Weiner , =?utf-8?q?Michal_Koutn=C3=BD?= , Jonathan Corbet , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen Subject: [PATCH v1 10/17] mm: COW reuse support for PTE-mapped THP with CONFIG_MM_ID Date: Thu, 29 Aug 2024 18:56:13 +0200 Message-ID: <20240829165627.2256514-11-david@redhat.com> In-Reply-To: <20240829165627.2256514-1-david@redhat.com> References: <20240829165627.2256514-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6D6A840014 X-Stat-Signature: nwuzzpp1zxffcgchr71i1kgwaj6r31bt X-Rspam-User: X-HE-Tag: 1724950714-535916 X-HE-Meta: U2FsdGVkX1+WRpUrcC16+aP6zIptf8GLmLLtx/ZeLd+OOEmsGLt5Li1fCU0RDtHn8Id/zGywnA/kPvSIVt6Rf9wmTjyBaKEaZeQmhoy+dQhcGiU0gmloEtObqw+h11jnPYiYexbQ6wyfJBlFFFOxBxVZQZui5sHw2VwVkwtRKJiHSS45LVC9HP7S8FjFwfC9sSH+jm6jbOYBqPTseYhZXFi+K+HXChkLmpfjKAd7g/fDx5kq5/EiXTDzwqQ82K3TkyVReaKkluR0G6wH1qv7tt+S47W7ZQuDak581kkg+lZQJyjemYJQuVLbztxKr6ROOk227CqSv26JaJnYXqmrEyHPUfx410SPbHZfa8Q5LrZgAzh5aYkKXVKZQ/yFMZ4sdcC0Fxvh8SNLEzE+vzCjcF239t90oO0i7wyL4z5Jv8JoEPYeNzgRdlUfuguGWrGrza9LawSDmqTEN+JxotHwv4jkWfpKQC3xzMok/XuoDVTgFqY6ci7LktTphPaWuvf3JzhOXrVBZ/qsrtv+rwZr84xtw6jBi1zN+kJLrMkP827lDi7Y+GS9oaOGVPJw4Kac+z2UEcTAdbowm5OPfTNxnrn2w4QcIRlQKQm8YZ6Deg3EERYDQZJZ93+GXMay2dneZV/mZ9epnONPkCug6rNk+jVzYky6MEcD1Zvrsai6TxL9is/BB/VA70LB3djbp6Fjft8UKUlcVnTk712Dm59/rZJmBcQzoDg4oLPISbI0ZjCIo0AVzK/Pdb+buMCItCkIZNKsJTk+hmJSoH51az5vAmcySRiU/ZFESTCFIf+4q6nr4RQXugR/q/nwSBNiw4Ol5lDg/NAQlc7fykBoi25luD2vrp5vNNPxRTk2HhA+W3p/bu8FGzrmajoektYAKX5fJ6rTUcpDgeyfr/GI+Xjt07ek/U3DXOYe6ie1B2gaYrDyx/uuK/4viVHtp4doYm9tW60WanTJab0IzG27IRz aGO81I5r qkU2+cYKwtg6M/Ci0B5pQjo3JbjxBgb4nfdNIgmwWL1Ui3toIRz2Yb7ZkYhCK6GVeb5kLX8s2uTb+Vs65uHNeyGnjRAVoMgX6qcjLVP0buCxKlCJVW0hXbz3LuI0yXIFOFfK9f28xpLABMwPSsnWbFOSxZJMQAnk1JyPx6nXcXQCwYb0+8IKxxAYctE1ofZn6V8PyuEQSmL4D8/dhCXGDytuYugmpDIKfzQifB3noK1PFMNaJTHo/4UuqSg/n46TucGrktAbI+zm8cbi2Xzn9ZIXcAGBqBZqNL3f7RRrEXExyMxH3drFTkdafbalbw6RJM3Bj9Z6WdKBCC0Gjl66XNfB1RNuEzHOn3BxESbNovCGzI44HVd7fFrSvFw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Let's add support for CONFIG_MM_ID. The implementation is fairly straight forward: if exclusively mapped, make sure that all references are from mappings. There are plenty of things we can optimize in the future: For example, we could remember that the folio is fully exclusive so we could speedup the next fault further. Also, we could try "faulting around", turning surrounding PTEs that map the same folio writable. But especially the latter might increase COW latency, so it would need further investigation. Signed-off-by: David Hildenbrand --- mm/memory.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 79 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c2143c40a134b..3803d4aa952ed 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3564,19 +3564,90 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio) return ret; } -static bool wp_can_reuse_anon_folio(struct folio *folio, - struct vm_area_struct *vma) +#ifdef CONFIG_MM_ID +static bool __wp_can_reuse_large_anon_folio(struct folio *folio, + struct vm_area_struct *vma) { + bool exclusive = false; + + /* Let's just free up a large folio if only a single page is mapped. */ + if (folio_large_mapcount(folio) <= 1) + return false; + /* - * We could currently only reuse a subpage of a large folio if no - * other subpages of the large folios are still mapped. However, - * let's just consistently not reuse subpages even if we could - * reuse in that scenario, and give back a large folio a bit - * sooner. + * The assumption for anonymous folios is that each page can only get + * mapped once into each MM. The only exception are KSM folios, which + * are always small. + * + * Each taken mapcount must be paired with exactly one taken reference, + * whereby the refcount must be incremented before the mapcount when + * mapping a page, and the refcount must be decremented after the + * mapcount when unmapping a page. + * + * If all folio references are from mappings, and all mappings are in + * the page tables of this MM, then this folio is exclusive to this MM. */ - if (folio_test_large(folio)) + if (!folio_test_large_mapped_exclusively(folio)) + return false; + + VM_WARN_ON_ONCE(folio_test_ksm(folio)); + VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio)); + VM_WARN_ON_ONCE(folio_entire_mapcount(folio)); + + if (unlikely(folio_test_swapcache(folio))) { + /* + * Note: freeing up the swapcache will fail if some PTEs are + * still swap entries. + */ + if (!folio_trylock(folio)) + return false; + folio_free_swap(folio); + folio_unlock(folio); + } + + if (folio_large_mapcount(folio) != folio_ref_count(folio)) return false; + /* Stabilize the mapcount vs. refcount and recheck. */ + folio_lock_large_mapcount_data(folio); + VM_WARN_ON_ONCE(folio_large_mapcount(folio) < folio_ref_count(folio)); + + if (!folio_test_large_mapped_exclusively(folio)) + goto unlock; + if (folio_large_mapcount(folio) != folio_ref_count(folio)) + goto unlock; + + VM_WARN_ON_ONCE(folio_mm0_id(folio) != vma->vm_mm->mm_id && + folio_mm1_id(folio) != vma->vm_mm->mm_id); + + /* + * Do we need the folio lock? Likely not. If there would have been + * references from page migration/swapout, we would have detected + * an additional folio reference and never ended up here. + */ + exclusive = true; +unlock: + folio_unlock_large_mapcount_data(folio); + return exclusive; +} +#else /* !CONFIG_MM_ID */ +static bool __wp_can_reuse_large_anon_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + /* + * We could reuse the last mapped page of a large folio, but let's + * just free up this large folio. + */ + return false; +} +#endif /* !CONFIG_MM_ID */ + +static bool wp_can_reuse_anon_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + if (folio_test_large(folio)) + return __wp_can_reuse_large_anon_folio(folio, vma); + /* * We have to verify under folio lock: these early checks are * just an optimization to avoid locking the folio and freeing