From patchwork Wed Dec 4 11:09:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13893581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59230E7716D for ; Wed, 4 Dec 2024 11:10:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE87A6B0082; Wed, 4 Dec 2024 06:10:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C98406B0083; Wed, 4 Dec 2024 06:10:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B12546B0085; Wed, 4 Dec 2024 06:10:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 90A2D6B0082 for ; Wed, 4 Dec 2024 06:10:19 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 57635A0F28 for ; Wed, 4 Dec 2024 11:10:19 +0000 (UTC) X-FDA: 82857007632.14.EAA3069 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf29.hostedemail.com (Postfix) with ESMTP id 32868120014 for ; Wed, 4 Dec 2024 11:09:56 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="C/J0DM/A"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf29.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733310604; a=rsa-sha256; cv=none; b=jBWUdH1eqc0pCe6wRcDNMaT+XDwxOqqp1sB7UU/A6YOel9NxEwszZL/0UKx7WnF4BSCiHS UuxR4x3EZVQ/Vvxuenje60R6HOJcBbexLte51bVsj8+Q36P4gNbdrETp3DkeuEnu72aAWZ pltgvT1wsZBnpYN3+G27fh97hLBgXaw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="C/J0DM/A"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf29.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733310604; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=8v/2J9FQpQc9WZCTNK+PDAJdMK8SXRfaFd/20oYgCX4=; b=G/GpcYOqytTpZjAGV4GeQgj1eDMPoZq9HlT752kZ/0KZfGWyMfIO1ybNHiERRpLIWJidH7 171d8kH69tltj6bz0ySjamngbAyGIgr5VgWw2WZOzjhDJnDPPLpYwGA9CSuxnioigpu9Vj kQTfSMj+NjoNWJNc97FXyPLyi5YrUVE= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2155312884fso50278545ad.0 for ; Wed, 04 Dec 2024 03:10:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310615; x=1733915415; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=8v/2J9FQpQc9WZCTNK+PDAJdMK8SXRfaFd/20oYgCX4=; b=C/J0DM/ABKc9h0TzB0kgCKf3/9zxNMjrr9pVw68yXhjWXzhp3VCrtqeUJyZbZDuZni B4wMSUbSYUSfRBqvil2JHSubDJmMbDmwyg7kBcmFa8N3mhSFXnZTJs3bg8kDp4v41kPv om67OqBmojQr9VjF86MNa1cDT4OuJl5DCiMKhnhgP7wttwmc3bzeaHZ9004ommraWV8J Lv96FErz/N26q7g/vUF1h9Auf9RrmJuiPxAk/YmOUntDdijoVeBQgUCRZoLjsd0k4B9x nL6WYkMFGkVojGiHdgiAYW12s66jrT4yhpChquXOCwEiJyelBzw2B6obzNb14agVzcEU BsFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310615; x=1733915415; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8v/2J9FQpQc9WZCTNK+PDAJdMK8SXRfaFd/20oYgCX4=; b=YmddqBnb/qbADJEBhfnRUA3HtwNCicUHWrl+qDhjMZTVVv79rew9Q6O8tZeONHw4jV t+gvafA7qEXoBI9UYq6Bn2m26oFK8dJnUpyW3mgKhEqumUyD4nr1vvp8x9pVJ0/tWS39 69vN0h2CPc14/OjwzxFE9dMYIqZP+GxvfSTRgTbl/pg55zShBQXLBvps56aN9x66ZWD2 o6S74em9NnbSRD7Cz33BJHpnUmm61es51WsxfZT/vKeCiRC+JWvrNN6uaHsOuCHONzKg v73s8P7pb9GTn+JRGykpmgEKIIIRZoW8FYuDnuUwzfH+J65ludBY9rmCoY7M1aPdKCsz Gk2w== X-Forwarded-Encrypted: i=1; AJvYcCXyZ2cww7FXwhTEpFV5WzOkZx4vriFr7uQZ+uhG01HgXJT93uASrF+rJwBRwQtIUnYlrSmMM8zK7g==@kvack.org X-Gm-Message-State: AOJu0Ywlw2uTF2hHYW6bG3X63nEQadN7iZcA9zCqYGZzEA0l8Qj9hUK6 PIDEAPccF0LcdPbpxUhX3z/NoW4aKO3gBeV9iArI4I69paJSXVp/8oFNR+a3G2Q= X-Gm-Gg: ASbGncvyXTUHatF9DeDEBEVYUdTzy0oMDxtGkusE5304wGfE4li2usxIXAv8KFfEySC BsyyiSqW9tJz8ef1/n1zMLUfQyFvo9TOEwzjHbC+xbf4zCd+sGFDsX+sTbJDbhqnWdswdjQ2Lys TBYlP+S9mxomyPA6MvwnLSeyHsSRbcCmDdCSX/GrjVMAeOZdDzDq0gVcRqm0l+zwgdCuTgisNT+ OgqyyUUVGohKKuKiLyz9I2Q29A5wF8bg1y9qgGnvapy9xn/NJ/FoO+Pnf8JmFFKSz5N4lhXgci2 A8xpoaPxscwEjT4= X-Google-Smtp-Source: AGHT+IH48Wa8LfWoa8qFH8RpsZrm4aa9MAs4O8ws8UrMK3r6was8ohTmagz60VzDfKmxNZy3Wldn3g== X-Received: by 2002:a17:902:d2cc:b0:215:9ebc:f1ab with SMTP id d9443c01a7336-215bd2514d7mr81044525ad.35.1733310615221; Wed, 04 Dec 2024 03:10:15 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:14 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 00/11] synchronously scan and reclaim empty user PTE pages Date: Wed, 4 Dec 2024 19:09:40 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 32868120014 X-Stat-Signature: jwe14j76st6d9jmks613at18swfkupot X-Rspam-User: X-HE-Tag: 1733310596-405653 X-HE-Meta: U2FsdGVkX18sb7NXvncx8QBX6Ty8CUSC0L37XnxZir0ffGE/gFiiRO0vBnkaPPsBCaKZV9O1EkePOazvZ3dqfcnoo9LTZyt5zUqBz/ub2+Fw816Qovb9q3vvcgWKaW6H4/ab1ZGuNhsGEiAq00Ys6YZepbs45YYf0/OZNhLFXnMkOkG/brPm9QeMMxBV6eP1PKPV27wy5RScad+GUnSMJ//uSKflubhkljlIBejhYXNugskksI2GsDn8OQ9LgshitvGQ/f29Z3tRMbgyy+xq8iuvhNv8i5LARfczMPb2EP128gisMRY8qJHyJafv6nazn0iY/JhnBIDuyVJ75kNur1RJ3LVyzITtsRTBBV0suRibfa174ZZzbkimwWYR9R+m7v7nD5vLb0VS10yZavAivCeoUHfP6V4pdUfMlKwhKe+4t4WJOi927QWgEOzxQYilaNZlYj+R+pBtws5WtmBtdn4NiLI7rqfAHIy9/JncIz5XVZ/TN+59IdvPvQHVSkApjckkme6728FTj3oc6GHAKUZcUjmrF0V61S5AJxdE7jnxnXD5w6S1NjZRNp2KysUjGxocvorkQ8QloNFBTIOZ0XoFHcbtCvIpgq/2IrfDeE2d68KLZIT9H1dHGgV8Xc5gdYiCLvMKOmL1TlxXLXTPAZWpQw6XFdK/U9CiNCJxq3Ox0PmU5B3Viy3mDHz9cr5tCc9K1h+B5UjsIYOIfhFkk0wVgTyPOZaA8UfeCXn8rki8gl+BZzOIfzGAwPMM5qCgqKcEVfz9i9Gku3xodv6SgWfTu+zflTK7WQNpHQMeT2W/vr22bcnEeLWj7cd+0IQe1iw8LoidmNdcpi1WzFTTVR3fqfQJOguGJ0BP2vR5rRFMnGcMUNdhe3c6XstPUJ9jFebzA/fL3mi7s2OEeTwIkXeZF2vl1srLFlKnjo/8shKkGJ7uzRpTHVpwQ7k5a5oF1vGIBL3IWoJtobSguxH XNZjHW2h cN0KPIVSx7wE9eqo6J484NmCYfhH2tMsQXt1OVMUoY8GYnxWPuWK+7DtkN3d5zpYQnmQXKRO7u6QWMkVQ199dd1qL06bTW+jGNN2iYHInF3POtnmcwXmi4QhWhKWxx3KKNnMGcDs2H9+9jugVo9fx6s+rw7u/BotNBodHmseFlPExdXdqiBmG1l0PGb+nUUHfK7bCGhHoPKciOdqsdx2snUk1zwF8c31965LN4YXUKDj+4i1qyBlAowlrRGxA8MWysFFmN5SQ3fqTdAY2jCZfEXeSpPefUqUfADUvh6A+OXGV1H/AbEJnp6nzzumrFDdl4bZq5rUTMLsNQV46MO2Mwg9euA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Changes in v4: - update the process_addrs.rst in [PATCH v4 01/11] (suggested by Lorenzo Stoakes) - fix [PATCH v3 4/9] and move it after [PATCH v3 5/9] (pointed by David Hildenbrand) - change to use any_skipped instead of rechecking pte_none() to detect empty user PTE pages (suggested by David Hildenbrand) - rebase onto the next-20241203 Changes in v3: - recheck pmd state instead of pmd_same() in retract_page_tables() (suggested by Jann Horn) - recheck dst_pmd entry in move_pages_pte() (pointed by Jann Horn) - introduce new skip_none_ptes() (suggested by David Hildenbrand) - minor changes in [PATCH v2 5/7] - remove tlb_remove_table_sync_one() if CONFIG_PT_RECLAIM is enabled. - use put_page() instead of free_page_and_swap_cache() in __tlb_remove_table_one_rcu() (pointed by Jann Horn) - collect the Reviewed-bys and Acked-bys - rebase onto the next-20241112 Changes in v2: - fix [PATCH v1 1/7] (Jann Horn) - reset force_flush and force_break to false in [PATCH v1 2/7] (Jann Horn) - introduce zap_nonpresent_ptes() and do_zap_pte_range() - check pte_none() instead of can_reclaim_pt after the processing of PTEs (remove [PATCH v1 3/7] and [PATCH v1 4/7]) - reorder patches - rebase onto the next-20241031 Changes in v1: - replace [RFC PATCH 1/7] with a separate serise (already merge into mm-unstable): https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/ (suggested by David Hildenbrand) - squash [RFC PATCH 2/7] into [RFC PATCH 4/7] (suggested by David Hildenbrand) - change to scan and reclaim empty user PTE pages in zap_pte_range() (suggested by David Hildenbrand) - sent a separate RFC patch to track the tlb flushing issue, and remove that part form this series ([RFC PATCH 3/7] and [RFC PATCH 6/7]). link: https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/ - add [PATCH v1 1/7] into this series - drop RFC tag - rebase onto the next-20241011 Changes in RFC v2: - fix compilation errors in [RFC PATCH 5/7] and [RFC PATCH 7/7] reproted by kernel test robot - use pte_offset_map_nolock() + pmd_same() instead of check_pmd_still_valid() in retract_page_tables() (in [RFC PATCH 4/7]) - rebase onto the next-20240805 Hi all, Previously, we tried to use a completely asynchronous method to reclaim empty user PTE pages [1]. After discussing with David Hildenbrand, we decided to implement synchronous reclaimation in the case of madvise(MADV_DONTNEED) as the first step. So this series aims to synchronously free the empty PTE pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases other than madvise(MADV_DONTNEED). In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and page freeing operations. Therefore, if we want to free the empty PTE page in this path, the most natural way is to add it to mmu_gather as well. Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free page table pages by semi RCU: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). So we should let single table also be freed by RCU like batch table freeing. As a first step, we supported this feature on x86_64 and selectd the newly introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM. For other cases such as madvise(MADV_FREE), consider scanning and freeing empty PTE pages asynchronously in the future. This series is based on next-20241112 (which contains the series [2]). Note: issues related to TLB flushing are not new to this series and are tracked in the separate RFC patch [3]. And more context please refer to this thread [4]. Comments and suggestions are welcome! Thanks, Qi [1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/ [3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/ [4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/ Qi Zheng (11): mm: khugepaged: recheck pmd state in retract_page_tables() mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() mm: introduce zap_nonpresent_ptes() mm: introduce do_zap_pte_range() mm: skip over all consecutive none ptes in do_zap_pte_range() mm: zap_install_uffd_wp_if_needed: return whether uffd-wp pte has been re-installed mm: do_zap_pte_range: return any_skipped information to the caller mm: make zap_pte_range() handle full within-PMD range mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) x86: mm: free page table pages by RCU instead of semi RCU x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Documentation/mm/process_addrs.rst | 4 + arch/x86/Kconfig | 1 + arch/x86/include/asm/tlb.h | 20 +++ arch/x86/kernel/paravirt.c | 7 + arch/x86/mm/pgtable.c | 10 +- include/linux/mm.h | 1 + include/linux/mm_inline.h | 11 +- include/linux/mm_types.h | 4 +- mm/Kconfig | 15 ++ mm/Makefile | 1 + mm/internal.h | 19 +++ mm/khugepaged.c | 45 +++-- mm/madvise.c | 7 +- mm/memory.c | 253 ++++++++++++++++++----------- mm/mmu_gather.c | 9 +- mm/pt_reclaim.c | 71 ++++++++ mm/userfaultfd.c | 51 ++++-- 17 files changed, 397 insertions(+), 132 deletions(-) create mode 100644 mm/pt_reclaim.c