From patchwork Sun Feb 25 12:32:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lance Yang X-Patchwork-Id: 13570809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F95FC47DD9 for ; Sun, 25 Feb 2024 12:32:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E035E6B010C; Sun, 25 Feb 2024 07:32:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DB3F96B010D; Sun, 25 Feb 2024 07:32:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C53D36B010E; Sun, 25 Feb 2024 07:32:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B419E6B010C for ; Sun, 25 Feb 2024 07:32:45 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 82CACA0766 for ; Sun, 25 Feb 2024 12:32:45 +0000 (UTC) X-FDA: 81830265090.28.F943A2B Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf24.hostedemail.com (Postfix) with ESMTP id A353D180007 for ; Sun, 25 Feb 2024 12:32:43 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ts/oVpWZ"; spf=pass (imf24.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708864363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=NkrOn60bYXii+776z5Blla5PvhzOI0ScvyS7QmR81bk=; b=m3X89cT1OJIxVqwIFAJNl1FZB0eGv3lZtGApSXb4wxsm99qvGRyJ9lYpd0xdmaLIrsukYN rr2UOIZSFoLNbQFo2gVA07U6eKLFGBPB43AWrLKkFGtNnCfpTLQ3DQo9QknB3bkskFPmj1 Ys4KjjnpnrqTP16YjIQMtcwBeWTgDnc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708864363; a=rsa-sha256; cv=none; b=HIFU5cg35RpNLJPQUZLScj38npQs+ovKuYFAQ+q6LhWV416IxM0BtYWXEdqaUhPbb3I/MB tgJNzdGP5w6KS+R76Qblefe3zbCzvcLI0ezslh5xB+lCpd67H2O5GrmvD+juLCNEn4owgX JpnKA3RZPER9Phy89oG9FhP3YezGIZ8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ts/oVpWZ"; spf=pass (imf24.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1dba94f9201so15557595ad.0 for ; Sun, 25 Feb 2024 04:32:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708864362; x=1709469162; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=NkrOn60bYXii+776z5Blla5PvhzOI0ScvyS7QmR81bk=; b=Ts/oVpWZ/GS06DiLc0bs6X4ywaKcoUaghElLErfpKhdQhp+IUz9bPufvd2yI6/uHl/ x1PUG7OU/UqFsvD2XdrRVFu/X08PF6LrD5ECsri94ISwz2ccdVhLfpla4M6mw6Q+wgM2 8iYL6WhfL6hpa7seeBVEY6HVqnMzbpuUu5bOJthwLyeO2MJTbbZYBkn5dtyiwFP4/3FX 7ud7wPyf6127QJOw2yCRyI6hTvC5d4SFGK8SO0zRJaA2BENIMBdnQSUrHy889sSTe208 Bd4/Ebd+ZZvFjSleYPESjU6EbWTwsy6OlGdorDW+qMIcPWFeLuH5AST4WSXKm4T/baeg Q6+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708864362; x=1709469162; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NkrOn60bYXii+776z5Blla5PvhzOI0ScvyS7QmR81bk=; b=CYWlw9LthwnVqfC1SPZ1pJsCm0IaBBlrJ0dcCOczzDZlUnMQ6B7k6R5buCfAvAk0NL q35g3GMSmAhick9P/Cs8VAfypJyr8bkwINGkogRSiT9edC77LE9JiVapk+YzMCRlGeDV ztEVBzR/+sOd90YfQnat7btwIWBALXFFZ8XI5eQvCS7bwGmx80UoQ45M3aF2mVOYIWWK V/fPhSzMdH7vtF7XrPDVn5l4oKtnbVNpDsTR2iriza6wJCzep2/15g6rc79zeyru2cVm szhFo1w1/H1soxpdOUM3RLFUgwkfq3tX400Hpgk1A0ybhCkUhj6SYkTX3COHbtGesuE+ ywUQ== X-Forwarded-Encrypted: i=1; AJvYcCVJ3rsAXnnE76s4IjdwsgqNGqZ8e+uKhpkgTj0TqY8vkPH4mBhF/oORb8XR50wraJ5YWICqA/6cpVHKbX25EuMZjQY= X-Gm-Message-State: AOJu0Yx4yFAIjj0uIUwG6nPCxDA5c/74YaOd37ymgVOhudBWVkly7OIk DxoDLso6ILQ1UPkra7aAuQ8GkCjojKGeocpCnHgnSMRl5irbNVmU X-Google-Smtp-Source: AGHT+IGEp+WZndCmlzDJnWpIFgzUaQLyX3PK3uV53ziSi6AW12U3MkcG3ndSagzAEYAVoAc/sqIeRQ== X-Received: by 2002:a17:903:1c5:b0:1dc:8798:7436 with SMTP id e5-20020a17090301c500b001dc87987436mr4023890plh.1.1708864362444; Sun, 25 Feb 2024 04:32:42 -0800 (PST) Received: from LancedeMBP.lan ([112.10.225.117]) by smtp.gmail.com with ESMTPSA id lc7-20020a170902fa8700b001db9d2c1874sm2192069plb.29.2024.02.25.04.32.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Feb 2024 04:32:42 -0800 (PST) From: Lance Yang To: akpm@linux-foundation.org Cc: zokeefe@google.com, shy828301@gmail.com, david@redhat.com, mhocko@suse.com, ryan.roberts@arm.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free Date: Sun, 25 Feb 2024 20:32:15 +0800 Message-Id: <20240225123215.86503-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: A353D180007 X-Rspam-User: X-Stat-Signature: gt68jyry9y663jnkf6i3zxfhs5w85d7m X-Rspamd-Server: rspam03 X-HE-Tag: 1708864363-263792 X-HE-Meta: U2FsdGVkX182XjhXVxmvX0fZwcPYM3YMQnW4ygirkOwFq7jT0pqPvFs5H+H/0CftrY1qoq6h5sSwDo1lq1CEEFrx++U4/OgjgbVfARICBbObdPwwknxzlPusZyLnvkHmjTQC3xVyfJQDpIFRyXrp8wHDxwgJCx7Xs23Dk+Fe/MPDzHDHMu9fl77II7LJM4QwrBQ5PyemdAw0/c1onhdYo02HtrWtAdfWVNtudch780dM0qf/ygiHoLjMRbC8iYPmC9m1epMxQ386EHo5BfTb0n5d+vc7Iv8hA7T9moIVwlB1R4IPtjbDKEGOi3LHDExbhnPdqLafM9sKy43sB0XV1Mlq7Agk57j9gPD4/Al8iCTEp4pvSAMQSsw9STMVHN3buMSn2u6g9zWL3tD5tx60F/22TOt6gbUeO5w0xETv8NDaufGGEmj1H5Lfs030sVlILNsXe/HXueDS0sElDqba85kXd7HfO/RSF6n8Z1CIEqWngrtTayz/NimTod8QsVlshDqWOAuPHjyCPQIBNmE2ug6iwpjQlgKMGZbS8MWNxHN904bQ7NkALlTP1QZNfpIftGyTjwINirKeIWv/aJIJrYLajiwmqmbDsqu5VXDNyH/J2hMM6M4Rc5KgmURpXtfuRzNPr0G689sj1luh8xk6POX4zkBAMj+DU1DiPoQeZGHE0JXg7C7KvUxgRBism3lOzAFZivLdvGEBuZ0NblJ0BXXkZx7DnRLFjyAcKVJRzt71dsPQRhEdJa99IMa0wHTj9k5eYj7hRRgCiAT9YbYPCqcf3fuEJZ+zgUPHEI42G6eDPERUpqvTK3HcyhA8DYyz4FLQfjoN0EXaFFGayRTW6RDEAmUt7WrFq1mkUk9cdVJ+QytGH6HP6aN7Y5WbabwKsWjGKb1bzDLX72y8LxvLS+jSrlHIDnBzM/8T5cd2MdJdUJnWRs7imKGhTEODlUizdNE6B2QNWRVhgfWkYxu onCd2r6v eTsgEjTZFZsqMZrC3VKcGslPGmJWUBGKzPOOn3zp5H8P1W1fjTZVJ0zU5MbAAGvCd5+YdEWpkp57dWvPhRF3l0mZlFCiAbA1jPfdMzafwXNGnT7fVPO8nGUWEIMy5ZYl27BxzY/yb69f+0mLsHYlQizP973jzVH6irsm9VZqDbwZclepk9kzN1f+t1By5eGyYWGShQrayhxAcC6QolKrb1yWEgdMw2L6aL2FvUdXFgZF9KgdmttQVn6dE8CYOi3yNJ+hUO6pX70sshDWPGUM372rWiQa02KBvorRMGUV/A7WemFypUxhsQH5fQUNQccJRFzD2NOVxe9DiTitqvSeowMYNaqPN+EGwujvdSfyWKLSs2voTNzWbocKdtCThfmrz2zzRTOrBVnp7/oWHBzDVPbp87CsdRzRSh0g/D8n21Z57PdlUjscp4Yg/PygWEuKkd1s0PCGI8KBG6o7EjvlG3utY/55fvqCTPS2V5ZomiRRf9VY1AtgViz8SQO4TiX6gz7Z00X74n32eArJbQqkGYzm+qujDS+4PzGphUD+L2qEa1OU+rcN4AhHlfJhWP8f4pzLi6qVrCyjtW8ZZnOVe0yMDDbgX+0o2kBwFeR/fTiASv4YIlx0xJQ1Dm/iSqauDBvoBV2bdx7bMVxc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch improves madvise_free_pte_range() to correctly handle large folio that is smaller than PMD-size (for example, 16KiB to 1024KiB[1]). It’s probably part of the preparation to support anonymous multi-size THP. Additionally, when the consecutive PTEs are mapped to consecutive pages of the same large folio (mTHP), if the folio is locked before madvise(MADV_FREE) or cannot be split, then all subsequent PTEs within the same PMD will be skipped. However, they should have been MADV_FREEed. Moreover, this patch also optimizes lazyfreeing with PTE-mapped mTHP (Inspired by David Hildenbrand[2]). We aim to avoid unnecessary folio splitting if the large folio is entirely within the given range. On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios of the same size results in the following runtimes for madvise(MADV_FREE) in seconds (shorter is better): Folio Size | Old | New | Change ---------------------------------------------- 4KiB | 0.590251 | 0.590264 | 0% 16KiB | 2.990447 | 0.182167 | -94% 32KiB | 2.547831 | 0.101622 | -96% 64KiB | 2.457796 | 0.049726 | -98% 128KiB | 2.281034 | 0.030109 | -99% 256KiB | 2.230387 | 0.015838 | -99% 512KiB | 2.189106 | 0.009149 | -99% 1024KiB | 2.183949 | 0.006620 | -99% 2048KiB | 0.002799 | 0.002795 | 0% [1] https://lkml.kernel.org/r/20231207161211.2374093-5-ryan.roberts@arm.com [2] https://lore.kernel.org/linux-mm/20240214204435.167852-1-david@redhat.com/ Signed-off-by: Lance Yang Signed-off-by: Barry Song --- mm/madvise.c | 69 +++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 58 insertions(+), 11 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index cfa5e7288261..bcbf56595a2e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -676,11 +676,43 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, */ if (folio_test_large(folio)) { int err; + unsigned long next_addr, align; - if (folio_estimated_sharers(folio) != 1) - break; - if (!folio_trylock(folio)) - break; + if (folio_estimated_sharers(folio) != 1 || + !folio_trylock(folio)) + goto skip_large_folio; + + align = folio_nr_pages(folio) * PAGE_SIZE; + next_addr = ALIGN_DOWN(addr + align, align); + + /* + * If we mark only the subpages as lazyfree, + * split the large folio. + */ + if (next_addr > end || next_addr - addr != align) + goto split_large_folio; + + /* + * Avoid unnecessary folio splitting if the large + * folio is entirely within the given range. + */ + folio_test_clear_dirty(folio); + folio_unlock(folio); + for (; addr != next_addr; pte++, addr += PAGE_SIZE) { + ptent = ptep_get(pte); + if (pte_young(ptent) || pte_dirty(ptent)) { + ptent = ptep_get_and_clear_full( + mm, addr, pte, tlb->fullmm); + ptent = pte_mkold(ptent); + ptent = pte_mkclean(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + } + folio_mark_lazyfree(folio); + goto next_folio; + +split_large_folio: folio_get(folio); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(start_pte, ptl); @@ -688,13 +720,28 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, err = split_folio(folio); folio_unlock(folio); folio_put(folio); - if (err) - break; - start_pte = pte = - pte_offset_map_lock(mm, pmd, addr, &ptl); - if (!start_pte) - break; - arch_enter_lazy_mmu_mode(); + + /* + * If the large folio is locked before madvise(MADV_FREE) + * or cannot be split, we just skip it. + */ + if (err) { +skip_large_folio: + if (next_addr >= end) + break; + pte += (next_addr - addr) / PAGE_SIZE; + addr = next_addr; + } + + if (!start_pte) { + start_pte = pte = pte_offset_map_lock( + mm, pmd, addr, &ptl); + if (!start_pte) + break; + arch_enter_lazy_mmu_mode(); + } + +next_folio: pte--; addr -= PAGE_SIZE; continue;