From patchwork Thu Mar 7 06:14:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lance Yang X-Patchwork-Id: 13585104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84309C48BF6 for ; Thu, 7 Mar 2024 06:15:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D39886B010D; Thu, 7 Mar 2024 01:15:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CE98A6B010E; Thu, 7 Mar 2024 01:15:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD7A06B010F; Thu, 7 Mar 2024 01:15:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AE1C76B010D for ; Thu, 7 Mar 2024 01:15:11 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 50BAB1A04E4 for ; Thu, 7 Mar 2024 06:15:11 +0000 (UTC) X-FDA: 81869230422.05.D353587 Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) by imf14.hostedemail.com (Postfix) with ESMTP id 92F6B100005 for ; Thu, 7 Mar 2024 06:15:09 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=POdgZJv0; spf=pass (imf14.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.160.41 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709792109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=apnvVQ9ZLn9mh7pAifg1wFyh1DXFnDGYwcPOcBe2Rnk=; b=ioYstvLYx9UdKPAJENQ/kxpyobtvytYAwc2D/gL5KpW2ZxjgIZWAz3tWxGMiIbVIhMhGhv MCSIbTfh/TFU/5y/R+G8tO808gMdZ52YJ8qpG6dnobmHUQqEmqn5hE5nBWAeA4oqg0Dhc/ 110p2gzi14ikp23fIvuZoPVosmXMQNo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=POdgZJv0; spf=pass (imf14.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.160.41 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709792109; a=rsa-sha256; cv=none; b=anCDXegxJKwMdZTIUFPJVZBssXVjJZyiTO1YUffLMBTTYqHLQl9EmTBz688eu9eLQ4w9eh hCkzJgBKt+aTPjlYbxxlldKKeERF7Q/J9l57jOXcgcdktoz6/lCwITgEY40SMIEGmmuyCG NqA3Q/tPivN7Tf1EVpJuXyg6szl3GLE= Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-21fed501addso190833fac.2 for ; Wed, 06 Mar 2024 22:15:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709792108; x=1710396908; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=apnvVQ9ZLn9mh7pAifg1wFyh1DXFnDGYwcPOcBe2Rnk=; b=POdgZJv0Nszv6p+Zp2dYActZudvHu30iJK/4qixHfgbPr9mSnLQdcZLk+/bnkp/0b1 s1jV58ZzK69QuF1NDAnbJIQFRCQ5O7K6Sz/pf2si1klgq7vOjCHIIu8oA4QGjor/vseG VJ7bpQsbybEqYVEhGF+8MtmhDVUQBAatPQPzoyWTmsEmuzXYV/EOde0e/s1t9LQcrELN I4kpcexaa4Tn4IxzRgxaVAMOKRxGD/OL+17I0QP6v0yd8JITypgCVE4SH/P0SjIGWfi1 wdpJ0o25wCeuJy1GmR+zLBbArNlmeSptnjsOvQEBzwhgbzky5Buw7DFnLKJO2lF/0IPo Gnrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709792108; x=1710396908; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=apnvVQ9ZLn9mh7pAifg1wFyh1DXFnDGYwcPOcBe2Rnk=; b=Om36SWozrb7uKFBkdEZu6sZq5oCqM5ClSK6SjLfIQfI4/ABffdbFHNKwPMaN5DGtEi g7zv76bN5dOeZ/CBXPjZntwTHWQpLYm1EhFWiQjjPUpr99IvMXPFNKYgqmlqvGFDYEyN E4PFyYgq/XZZIYrs1LqM3TO7yiwvxxCXvN/6hTY/yJE2wdVK8V9XgDU7fICNSscZi+wz 50toQGFTfU6+d0a3zA2A96WIzrSmQmZ9LcrvqPkvlIdIYW7HCMAnZG8ffDu4YAeiVej9 JSZsXcJx3xAv/wsX6EXKcfm5Dlz27UYLHnlXQnc7CIdTiqz3EoKtCJCQs7+DZQQDU6ce pJWw== X-Forwarded-Encrypted: i=1; AJvYcCWstLS2G6IiHQVLkdkiSHpLDx0ojx4dA/iQejjWBHPAX4mP44WZSzwqpHXzR2z4E5K4fY5HtkNtsIX3pEZ/+T3MrPo= X-Gm-Message-State: AOJu0YzuF9CyEbW/gYVfr5kTRAFpdA5jh65u5Ru9EsSoIwwRDvg2yXWn huG9aRb2g6pvIqyUx87I+gwV8FeNj/zEqviySCVntDVzMK2u6TXi X-Google-Smtp-Source: AGHT+IFpqYcCVD06nSocbIa942aoVM34TabPzLpB/cqqsBVgaflu1XDDQKrpu6qKFqgKazF+zK4Dkg== X-Received: by 2002:a05:6870:f152:b0:221:141:1f00 with SMTP id l18-20020a056870f15200b0022101411f00mr7981768oac.44.1709792108557; Wed, 06 Mar 2024 22:15:08 -0800 (PST) Received: from LancedeMBP.lan ([112.10.240.9]) by smtp.gmail.com with ESMTPSA id t7-20020a625f07000000b006e64a9104fdsm2546938pfb.114.2024.03.06.22.14.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 22:15:08 -0800 (PST) From: Lance Yang To: akpm@linux-foundation.org Cc: zokeefe@google.com, ryan.roberts@arm.com, 21cnbao@gmail.com, shy828301@gmail.com, david@redhat.com, mhocko@suse.com, fengwei.yin@intel.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free Date: Thu, 7 Mar 2024 14:14:25 +0800 Message-Id: <20240307061425.21013-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 92F6B100005 X-Rspam-User: X-Stat-Signature: p3ffthpgmsb5gy8ub5ejfgs11zgr44ze X-Rspamd-Server: rspam01 X-HE-Tag: 1709792109-891472 X-HE-Meta: U2FsdGVkX1+giPSHQwH+KAT9hwHgHnoWlUjjsVUIIdHOnakfdzEO7VV/yrIaaRL0p/FqNLvlKfGCCJB15tRO54p9fzuFuIgaNtpazNzMxmBm3TtiWzS7A9SQ8PA1+pxyRGLcAN/cu2a2PgpimLGZNm2+DXQnQzU3/Qy+4PyjHCJrE2DAeZQTULvqOCy5+nVMMsgJ0z4qpiUzbMjDkiv7S5wJCk9RAXqmy0ZHCB99hv2qgPZAYAIYxNtHQ2F1LxcxTS8C9yDptDxLzRJ2S0VgP4YPuze/b5dV2Wi4+HygImILaFj18DWaC1BY4y1CZoQngEtNnYkzYKwIF/rv28hHjlbVBxlz6crRXebt6aCUNQgtUWdB3cKEVutScQSz9sOMVpsMm7TFPKvvGNnBtfEjpOfeqJ6CvG1/pPZ9QTPVkxBdJ0pP9txzgsnUWYpNWGpIBpiw2pmHxgh5Pkdc1btN/0cmmVEe/kfjNnhLtbAgN5MmUxoamUL5Ao16oNfLoeHkf1MMHiYciCE8RwF1YoeDG7j/y80MrXj6wOPfq0f9Mp9QOXJOramdE0K5kt1OebY1YGIrBQ+1bdIHZ76SSrM3VaKJZNribIDU3S2XTvf7ROrsIWgzNG7i8DPxUyQbAOF9UKUWOz0Ij++rNDZpz6sfK3r24DD8EZQOw2tifOQHwhddUqA2jhwUsp7XvBtAoi0U6JmIOuKucWRJxkkqdJjVz7R0Sx4ozS+7bXJUYK7QBkSYVIfzdCYrRNplOwEs9cEoCYCBsS9lpR9StkgLE81VC+T2yrJoL0Sfi7qF6XNDkidAWAWQdrHqjKvao4hce7Ri4qbwbvEgAv/mvEIfwSVuxukVP+S8nI8Vro/rF9VKDf0rPijV9GZOb+58UoIN83Qi6cr2pVAvTWk9DTdvsKj1O7qF0p1APkRnzxsZcqmmTsluPvjC9NBVIZLJZYMTUxC6pQsmU/0yBa3XLxXTK9Z 22o27+rn tYetsow7h9i4d8j9WYQmnZc905akreg+09nOSRhuH/1v5e9dtlvY5BfskokqXBJNveTQz7zLXE8hfxQnYmW7pdWmhpKUCKo+VQdyzHR19O7GssOaXGqKKGjL2EII/B9Hd05C+f5q5AM2ON7x07DzwHQ4KbWAvFBPqXE4u5Zeqjkec7c/rxIQIB/rYDWuf/Vw3MzoPJ3f97Lkt7KRvXbvOy+1HRtHFODWtNjbSL9peC8y5uCxY5/hw9G0UKTf9P/mMggOGl4JqpMV08kEHbPWfcz7BUZ+U4aNsW3xc4ohPMnwMRuJnkjRVe8QFnjr+wtYrNEZ5+m7KeltXUfeaQzDdtX5XLXAF1NahuCcgKaxqC/QpWssIWFyKfkTi5Mx02W7XNYy48tTSlk5g5XmT5E5QJJrDKl4pHEk5UtFDukflbwvuU1ShmmfMvz2E+lIjD7nq5vGxdO3tj+5mRELAzbsoL48qSLCoBeyOca/1F3CW+m/uSPfbu1L1F3DjdrSLAsXcgt7dv662YXD/PGG8+rakRPedX84ree4fwkLvhC6o/oNKyqZ3FwX/GgLgzJAwnTquikQg4nCUcmrDAsG87g3iHmHT5K1OCZkwcE3Cn4docldi8vhYifV3i+dG0g5D8N79wxZs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch optimizes lazyfreeing with PTE-mapped mTHP[1] (Inspired by David Hildenbrand[2]). We aim to avoid unnecessary folio splitting if the large folio is entirely within the given range. On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios of the same size results in the following runtimes for madvise(MADV_FREE) in seconds (shorter is better): Folio Size | Old | New | Change ------------------------------------------ 4KiB | 0.590251 | 0.590259 | 0% 16KiB | 2.990447 | 0.185655 | -94% 32KiB | 2.547831 | 0.104870 | -95% 64KiB | 2.457796 | 0.052812 | -97% 128KiB | 2.281034 | 0.032777 | -99% 256KiB | 2.230387 | 0.017496 | -99% 512KiB | 2.189106 | 0.010781 | -99% 1024KiB | 2.183949 | 0.007753 | -99% 2048KiB | 0.002799 | 0.002804 | 0% [1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@arm.com [2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@redhat.com/ Signed-off-by: Lance Yang --- v1 -> v2: * Update the performance numbers * Update the changelog, suggested by Ryan Roberts * Check the COW folio, suggested by Yin Fengwei * Check if we are mapping all subpages, suggested by Barry Song, David Hildenbrand, Ryan Roberts * https://lore.kernel.org/linux-mm/20240225123215.86503-1-ioworker0@gmail.com/ mm/madvise.c | 85 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 74 insertions(+), 11 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 44a498c94158..1437ac6eb25e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -616,6 +616,20 @@ static long madvise_pageout(struct vm_area_struct *vma, return 0; } +static inline bool can_mark_large_folio_lazyfree(unsigned long addr, + struct folio *folio, pte_t *start_pte) +{ + int nr_pages = folio_nr_pages(folio); + fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + + for (int i = 0; i < nr_pages; i++) + if (page_mapcount(folio_page(folio, i)) != 1) + return false; + + return nr_pages == folio_pte_batch(folio, addr, start_pte, + ptep_get(start_pte), nr_pages, flags, NULL); +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -676,11 +690,45 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, */ if (folio_test_large(folio)) { int err; + unsigned long next_addr, align; - if (folio_estimated_sharers(folio) != 1) - break; - if (!folio_trylock(folio)) - break; + if (folio_estimated_sharers(folio) != 1 || + !folio_trylock(folio)) + goto skip_large_folio; + + align = folio_nr_pages(folio) * PAGE_SIZE; + next_addr = ALIGN_DOWN(addr + align, align); + + /* + * If we mark only the subpages as lazyfree, or + * cannot mark the entire large folio as lazyfree, + * then just split it. + */ + if (next_addr > end || next_addr - addr != align || + !can_mark_large_folio_lazyfree(addr, folio, pte)) + goto split_large_folio; + + /* + * Avoid unnecessary folio splitting if the large + * folio is entirely within the given range. + */ + folio_clear_dirty(folio); + folio_unlock(folio); + for (; addr != next_addr; pte++, addr += PAGE_SIZE) { + ptent = ptep_get(pte); + if (pte_young(ptent) || pte_dirty(ptent)) { + ptent = ptep_get_and_clear_full( + mm, addr, pte, tlb->fullmm); + ptent = pte_mkold(ptent); + ptent = pte_mkclean(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + } + folio_mark_lazyfree(folio); + goto next_folio; + +split_large_folio: folio_get(folio); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(start_pte, ptl); @@ -688,13 +736,28 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, err = split_folio(folio); folio_unlock(folio); folio_put(folio); - if (err) - break; - start_pte = pte = - pte_offset_map_lock(mm, pmd, addr, &ptl); - if (!start_pte) - break; - arch_enter_lazy_mmu_mode(); + + /* + * If the large folio is locked or cannot be split, + * we just skip it. + */ + if (err) { +skip_large_folio: + if (next_addr >= end) + break; + pte += (next_addr - addr) / PAGE_SIZE; + addr = next_addr; + } + + if (!start_pte) { + start_pte = pte = pte_offset_map_lock( + mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); + } + +next_folio: pte--; addr -= PAGE_SIZE; continue;