From patchwork Mon Jan 13 03:38:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13936683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 87099E7719E for ; Mon, 13 Jan 2025 03:42:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=RdWsE3cpdSsXdAC8zpJOiFPfg4 X1SsfIUpaFLO74uLIlfgJUXhaLYTr+uJQyQlH1rW+uAa6o6Y1SDpFXWEZ8Eb+9XzuoautgcgVSR0v 0+t4/nTK7mFCigzfQy/2AbPqMx7EC07IzQlumhI7XR5adWKEeZnx2a7VWA9EDSwTpqL9j3wXltgif QCjCXn2dAv8OBwMQ3/H/2YeTZPecmb08kSHzEpSu9SJ+OZxHS2InYXnVlwSrYdWSPfKk3hqr7AOnX dI0HnWZXRdnjbQXgy7xY3FGmu/cr5KgIzW5Fsvs7CEmVTFrVugYBYq8oi+MAMb4uSwkrOPOryMj6E py1b+9YA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXBKx-00000003vfA-2kK1; Mon, 13 Jan 2025 03:41:59 +0000 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXBIU-00000003v7V-1O84; Mon, 13 Jan 2025 03:39:27 +0000 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-21649a7bcdcso62730165ad.1; Sun, 12 Jan 2025 19:39:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736739565; x=1737344365; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=Gl+ww428QF0m6zZvTcsv3uyU/7iadXtfUUxPRhI1W/Y6WRPCJZQgRsosM+PYgFvYFa IOwKRMRC0OGZcLTfY8b7tUeOYjSGlBD/5EgabaJdFx9w5rDkzVXa5ShqPPPvQT6/qrej BbgGul4ANhtiHAQLq75sDX4avyzAtXgjAs5h0Bav2L9r7m2Q4exO9QLQq3lgt2sQ85YJ KSJ2s5W9nXCUuUom+S+u3o+zMOyTO209K/rkdGCHncBW/tOI7yd7BNwkp9yWCAAZ2LZb Ml1zrl4U3y129q4By1os/VI7+1UrzQbWxusc6RCwmoV30EITMBeXStdWmSEOnApTyZfW LPbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736739565; x=1737344365; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=FyCftRiiVmSihVKrHlchAkcgMnDke0i4AsDYJ8BZZ5Y5n1RxVizx7z2smlVVyUN02m nb7JIwzJL1jPnbBKAiWBV0mffjPwcacDuWw9XD/9mZoWasT8pTN8wC7576ejrXdgD2c2 XufLwB8W+5QvhXdr0g9U6qTuF1NG93JU6g9Gd2x9rNnhrydKNjqYGAc2zqUWIjfsN1+R tsZzDmGYcou0L/MK5oleaTfYH12n1oyvTBCc8K6UcbCIQlYZ30nihyrkzNcmMlAEen7L +3rLOTYI6/a3VdKfvSH0+VEeEWIZk1IDEsjX5JOEb3/M6IsdDTipKLYiAXw7fH8asSc4 chyw== X-Forwarded-Encrypted: i=1; AJvYcCUDmJLiNho/k/mjbzjJCOHnTCi2Hu3DC2ZwI8Ovk5/VQjaG+TWNIpC2lsiDlV4bzFkFmjr3s6z89YQtZxJmbSUb@lists.infradead.org, AJvYcCVRu4bFjm3bB1WoCD0MCv9y8nGVlVE03gqiAMI0fSTM0jjza4pvCepUuFCr/T0qMrn6QXli9BJN8TNpxsY=@lists.infradead.org X-Gm-Message-State: AOJu0YyztEw31duBt/U/O98ctZ3sKC0h06ps1M0azHp4LIJ03O13ctlY l0Fa30mYRFNV6zvmYaBiv0KJHNwGrJHAtUfQoN6t10N56W/S+1ZD X-Gm-Gg: ASbGncvSZk1QrIGmS0m4NuBD/GivKlGemCVXxYzfkhJoLQo+H98pxkjPTMGK1/2fT4x CxCym1IhJO86jZrrvDLZt7ct0KKiHvuyWLXYqxEe4YnbXv8Wch3O6PYVAwImh25B2VU8J5Tez9m u+ZeFy9eBfkpvnWwqTJ+GegYYXqPRgpOK72hKkhXeh3yz/GtqZkDjf9j1M5Ykjkx9c7VYzKFnbw 6y/qLq5E8ATouAdMgbnDScK0/nqDeFWgYddOv2U5ZysGc4H2Br0Zp+yufQMBuHM7DAiYUlVY4Xm TkGHxmRC X-Google-Smtp-Source: AGHT+IHhm0qTbSZrWVTQXUqJ8k0+E+zfqHExh6ZNZDkVUuwdyBd3fwtM3FFGJhBiv45zTGk0mUeVsQ== X-Received: by 2002:a17:902:d2ca:b0:211:8404:a957 with SMTP id d9443c01a7336-21a83fc0619mr292344105ad.41.1736739565411; Sun, 12 Jan 2025 19:39:25 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:39b5:3f0b:acf3:9158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25aabfsm44368405ad.246.2025.01.12.19.39.19 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 12 Jan 2025 19:39:25 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, linux-riscv@lists.infradead.org, ying.huang@intel.com, zhengtangquan@oppo.com, lorenzo.stoakes@oracle.com Subject: [PATCH v2 1/4] mm: Set folio swapbacked iff folios are dirty in try_to_unmap_one Date: Mon, 13 Jan 2025 16:38:58 +1300 Message-Id: <20250113033901.68951-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250113033901.68951-1-21cnbao@gmail.com> References: <20250113033901.68951-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250112_193926_375137_10544CCF X-CRM114-Status: GOOD ( 15.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song The refcount may be temporarily or long-term increased, but this does not change the fundamental nature of the folio already being lazy- freed. Therefore, we only reset 'swapbacked' when we are certain the folio is dirty and not droppable. Suggested-by: David Hildenbrand Signed-off-by: Barry Song Acked-by: David Hildenbrand Reviewed-by: Baolin Wang Reviewed-by: Lance Yang --- mm/rmap.c | 49 ++++++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7..de6b8c34e98c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ smp_rmb(); - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - (!folio_test_dirty(folio) || - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE - * ones can be dropped even if they've - * been dirtied. - */ - (vma->vm_flags & VM_DROPPABLE))) { - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE ones - * never get swap backed on failure to drop. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_pte_at(mm, address, pvmw.pte, pteval); folio_set_swapbacked(folio); - goto walk_abort; + goto walk_abort; + } else if (ref_count != 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + goto walk_abort; + } + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; } if (swap_duplicate(entry) < 0) { From patchwork Mon Jan 13 03:38:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13936684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DE723E77188 for ; Mon, 13 Jan 2025 03:43:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=eULRCDFJ0DJzA6lc7vvdVA8dxS/L+jVPc2aRWImXMe0=; b=K7xE+xtpDAqI/WWB2yE0wMNz9S FVA34cDAmQrgqDvzvXL5lYIPKcRgNaSy2OAGYX299yMENgjEzjsii2EKsVjpoProvZlfWZ5/7IWCR zbQjlHfQmV679nEx1f/8v/ZSJ3gt5VV10cFhJcbMsQVcKaeVdiwl7y9jfD1T01IVSea5U/XfRjn70 foc2AOkbxNd/0iR5ToXuyD/cdU6WX3cmY1EnMoQnvwPGsXbmbAng+PTBlHLCkippkgeSq/DCUDgki oeXP8qFbquufnhwvwdd4OXqbVifa9lmD7gds/nPg4EzN4Ul3TMuXvXuBiIbVdxwsu7kd8j+ew1xL+ xXncPvwQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXBMF-00000003vpw-00B1; Mon, 13 Jan 2025 03:43:19 +0000 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXBIh-00000003vB1-1RG1; Mon, 13 Jan 2025 03:39:40 +0000 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-21a7ed0155cso63606035ad.3; Sun, 12 Jan 2025 19:39:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736739578; x=1737344378; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eULRCDFJ0DJzA6lc7vvdVA8dxS/L+jVPc2aRWImXMe0=; b=KPrsC2EUtUU8My0Sw58c/wgLVC3zx4BnO+52vKGilyOxOlWlTomeYWrpkTFyXa7iN8 Uk6o3i7tOKjW6FFEdVGF4K3i6HnqEJ5kK3g6/dOJAfWA1A9+8ouc3deOs72BenhbGnX4 eV/FRchjB5VLSUqu6U1vwsz1v3xWZn13wALhLpmGkQ9WqP1smCcy8O/ArWQwRJDE4NG3 MroghzEmgMjedy5UWg+PKaYbZcUFql/shNhoWWtg/xXGQMCE0DRFJo1OjvU46915TXkj Zyt6PVtsekOPWnKmz0IxB72Rj63LU0n1F8hp9mzyO+lsAKCbigrcAzZVBFhXC7MlzB6P afpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736739578; x=1737344378; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eULRCDFJ0DJzA6lc7vvdVA8dxS/L+jVPc2aRWImXMe0=; b=M6igPMsW/mTzjTs1X3Ilw2jsYyK+qvexFl526Q+iWOVegIUrgwWddN6fa2GeeyzrGh rJ/wv+7EW1ARtrPF19aANwLV1NSaivT0bmfDzdtqLiEZxKwimzMvlAFBDu2BjC2NCMST LVNZoyNNmw7TJOok4yi8EOpcZDMbRW7vteGJPTYC/cv2u7ErDU+m0rKhmqBpx+7l76V/ AYTxLK2IsaOPnnvn1teopsQw8kUVSBcwwiR0M6/TW0rdgUIbM4q7pWTlvAv/qkTfNEGY HaW1Grkwdfssg0ol7WLdsllWVf1GxKIoa+a/y3fCwuJPdPZqw3vlMqgVFtg/xW2kyNwG N0fQ== X-Forwarded-Encrypted: i=1; AJvYcCVLgCvoUyS0gOHyUTjbMxgxSr5aqvl++aoAfqWWwQSkkGXTmyEDjaw3tPsHBF2RzhTqLsJhXgDfWyrIXO0=@lists.infradead.org, AJvYcCWAS4cZqUsI2WJTgQw/rNyjxRbfrJ4+r3CxEm2w9aPm6hX1dWy4tLm+BWCHnI33NM5l175c3bhz/EtO8rSILr3M@lists.infradead.org X-Gm-Message-State: AOJu0Yx9ks0TjJBR0jqLPGYUn6ABU/GPtbylPhrjUOudZ4Jd62ibTMpC xrgl5YbVDroRNEfDVOMPo6TSAvn9915QaaZNXe9JVuzL8WAxtlfx X-Gm-Gg: ASbGncvBiVAZta4zaSu2QsgHYXEt8nT3qf7hYhJeU1Bk7+mIe9bcAaYwqromtXElSem UfVRKKVtDyu8niyU6Q1oOYPaHGAA6ChbsVUKvr6MZohEbd40t9NUjF/H6mgjiS6za0QVEIxIvaI 9RECwX7VQNdiEChCBn9pKh32vw1NJuFbIYU8h9sVB6yPoV0h441s0UqJrMHDhHAWzkFICgiwWIN 8MMfVL+vfCpX9oaZ1ra+P4ucveIMkpss/Wrqbh93weYeUSYMDTi6IdWOaxA5qCAbNYwKq2q2WFq +PJ0F/02 X-Google-Smtp-Source: AGHT+IGyqwjrrsZeQxoPWWpl+Lu4WtU0rXovoTE/PMvKn9xWPcPyfiEtn9aeXvMxYF6gWs977A6wvQ== X-Received: by 2002:a17:902:c951:b0:215:bc30:c952 with SMTP id d9443c01a7336-21a83f4298fmr238697265ad.6.1736739578393; Sun, 12 Jan 2025 19:39:38 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:39b5:3f0b:acf3:9158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25aabfsm44368405ad.246.2025.01.12.19.39.27 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 12 Jan 2025 19:39:38 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, linux-riscv@lists.infradead.org, ying.huang@intel.com, zhengtangquan@oppo.com, lorenzo.stoakes@oracle.com, Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Anshuman Khandual , Shaoqin Huang , Gavin Shan , Kefeng Wang , Mark Rutland , "Kirill A. Shutemov" , Yosry Ahmed , Paul Walmsley , Palmer Dabbelt , Albert Ou , Yicong Yang Subject: [PATCH v2 2/4] mm: Support tlbbatch flush for a range of PTEs Date: Mon, 13 Jan 2025 16:38:59 +1300 Message-Id: <20250113033901.68951-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250113033901.68951-1-21cnbao@gmail.com> References: <20250113033901.68951-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250112_193939_385674_6C4DAE90 X-CRM114-Status: GOOD ( 16.60 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song This is a preparatory patch to support batch PTE unmapping in `try_to_unmap_one`. It first introduces range handling for `tlbbatch` flush. Currently, the range is always set to the size of PAGE_SIZE. Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: Ryan Roberts Cc: Shaoqin Huang Cc: Gavin Shan Cc: Kefeng Wang Cc: Mark Rutland Cc: David Hildenbrand Cc: Lance Yang Cc: "Kirill A. Shutemov" Cc: Yosry Ahmed Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Yicong Yang Signed-off-by: Barry Song Acked-by: Will Deacon --- arch/arm64/include/asm/tlbflush.h | 26 ++++++++++++++------------ arch/arm64/mm/contpte.c | 2 +- arch/riscv/include/asm/tlbflush.h | 3 ++- arch/riscv/mm/tlbflush.c | 3 ++- arch/x86/include/asm/tlbflush.h | 3 ++- mm/rmap.c | 12 +++++++----- 6 files changed, 28 insertions(+), 21 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..f34e4fab5aa2 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) return true; } -static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr) -{ - __flush_tlb_page_nosync(mm, uaddr); -} - /* * If mprotect/munmap/etc occurs during TLB batched flushing, we need to * synchronise all the TLBI issued with a DSB to avoid the race mentioned in @@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsigned long start, return false; } -static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, +static inline void __flush_tlb_range_nosync(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long stride, bool last_level, int tlb_level) @@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, pages = (end - start) >> PAGE_SHIFT; if (__flush_tlb_range_limit_excess(start, end, pages, stride)) { - flush_tlb_mm(vma->vm_mm); + flush_tlb_mm(mm); return; } dsb(ishst); - asid = ASID(vma->vm_mm); + asid = ASID(mm); if (last_level) __flush_tlb_range_op(vale1is, start, pages, stride, asid, @@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, lpa2_is_enabled()); - mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); + mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } static inline void __flush_tlb_range(struct vm_area_struct *vma, @@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long stride, bool last_level, int tlb_level) { - __flush_tlb_range_nosync(vma, start, end, stride, + __flush_tlb_range_nosync(vma->vm_mm, start, end, stride, last_level, tlb_level); dsb(ish); } @@ -533,6 +526,15 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr) dsb(ish); isb(); } + +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr, + unsigned long size) +{ + __flush_tlb_range_nosync(mm, uaddr, uaddr + size, + PAGE_SIZE, true, 3); +} #endif #endif diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 55107d27d3f8..bcac4f55f9c1 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, * eliding the trailing DSB applies here. */ addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); - __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, + __flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE, PAGE_SIZE, true, 3); } diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 72e559934952..7f3ea687ce33 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -61,7 +61,8 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, bool arch_tlbbatch_should_defer(struct mm_struct *mm); void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm, - unsigned long uaddr); + unsigned long uaddr, + unsigned long size); void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 9b6e86ce3867..aeda64a36d50 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -187,7 +187,8 @@ bool arch_tlbbatch_should_defer(struct mm_struct *mm) void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 69e79fff41b8..4b62a6329b8f 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -279,7 +279,8 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { inc_mm_tlb_gen(mm); cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); diff --git a/mm/rmap.c b/mm/rmap.c index de6b8c34e98c..365112af5291 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -672,7 +672,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; int batch; @@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; - arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size); tlb_ubc->flush_required = true; /* @@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { } @@ -1792,7 +1794,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2164,7 +2166,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } From patchwork Mon Jan 13 03:39:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13936688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08607E7719E for ; Mon, 13 Jan 2025 03:44:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7ePF7LRaXT6WJxMxcE/kQzKdknlZvi8u0Wr7nYwGrGY=; b=fxaRi9mQRTNfEQdDGV2TGVpxY9 lrSw0PKJDwhJJ4T6NfFcFyaWq6RWD+V/av1v0s1lZ0fmN0RMO6uYkMhmFCuROIItX80lXJLUANckc IvCvg/EUUcfjX8kkhZb6kxSFBr/Bv1VS9K18czJddzKKrbWDDhxZJQ+7Ees4UeVNnopF+KOvmjZBV ppwMoqNHedCfVEQLejJloxFqTDFh+aZINlGLb4f5UHM8JacTG4BaGBTHSx3yCSVWXo4DK9eckluEY c/ZSIYGrbxt/z/8xyFHh7W2mUlswS5I8KD3ed6OxuTzYfhHltarZr10XMzqj9zTwfJp/uMc5Q2D5P J/VPNyAQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXBNZ-00000003w0p-13wP; Mon, 13 Jan 2025 03:44:41 +0000 Received: from mail-pl1-x62a.google.com ([2607:f8b0:4864:20::62a]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXBIo-00000003vDi-13AS; Mon, 13 Jan 2025 03:39:47 +0000 Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-2166022c5caso57669165ad.2; Sun, 12 Jan 2025 19:39:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736739585; x=1737344385; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7ePF7LRaXT6WJxMxcE/kQzKdknlZvi8u0Wr7nYwGrGY=; b=UyM5xxsTJK0r7Lxtl0mQNjxLAje4sgp4Uko93B0sTl+olrdpaZRC/u/Zj0anPZzprF W6m6K7RDMlbbujgYRcpP/uVt117PXLlwCxeMG6ILoCUZUh6tZEwsqQhVqJkBxpJ57UzT pNVFJ2u+/PTSgjSRGgimIuVBcG2/mLEoCBHPjI46ONsN9i6/qW4gjK5lCD+KprfRF21O mT3hV+N6AYD1IQTtlHWglWaS34Xu6IGusxlgGsM2Ag9nZo+Ou3ebRlt9lIK/QgWxo/+O VL5YI8xulWD3Lf+PiDPVAhv9xbdT7mPxN1fmkxN1ezNI1IevJQtJiVIkRLQI0zIV/pi0 rfRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736739585; x=1737344385; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7ePF7LRaXT6WJxMxcE/kQzKdknlZvi8u0Wr7nYwGrGY=; b=we411H3eT86uod7o6C8yzg+3f7YZEJhFhaqboHvaOpGsdMsnwibYQQO1UpNLYm8Kr2 MGiWS/3sv/fwCBURNbqY9iYpus3W9+w1FVSZpl9nfQwqUfA56KXoDof5RVnsACG1G+Bn m8GwUZhkPv9mgwYa4PsYQjRMheWl+p9Uv89aEV7E2tWuuPrthUPFtn5oZ/nxMfVesV/K xdCTQozWwLqC70NeimXBGheYpPxDreNn80heuq7VFqnmTdXnsVPCYpAzlzIwDasbN4IH hVR9VPk0ClG+kVxrHXCTZ2EvqJeP+qEC/Lgx54Ftux8dRKgf7H0QUs5OLT/5bR+YR+QW 7YNg== X-Forwarded-Encrypted: i=1; AJvYcCX/XYbsCBBU8ji7czV72uin6WIEESar6m6cxsVOT+kTuXNT+LAJXq6v5i7+lO6dxmE/xSqFlGlxMS0nyK7sNvtl@lists.infradead.org, AJvYcCXrCMmYUpIW0M5jdIIvc6wAsgLIsA1WkJcE5HIcM3dvVqXocAN/qYdONAeQY9a7+Q2fKDuKp1j+vFZaP8g=@lists.infradead.org X-Gm-Message-State: AOJu0YwIYcrpjYHf/HPQB470qo/vQgND0Jqbs8XGmicKcmbSvwZF2UGA 07JY3f5Qn8unfN7eI1MPXUV4A7bBLOzO6DMwnwtuiHAkkfxeiu9h X-Gm-Gg: ASbGncvpH1AKYrQ1j7E2RbKJwaJ6fbiFYC5PN1/OC/BkX1rLBdm06SxICINLYh4DO+Q tgNrCG3AB1XN9Va8FF1EPpUGbn/ajpcPAzq5Z2gWCOp7Pi3KQZLtHFD4cdhrXEk70AtxLb4nZaS jYdJn1c2DA+AWS29Q29kTauoecwocXLaseDn6hOlcrkCvlLkMaTfpJvzkoEO3fJ3h5Ux/bpxDeu 6l7WDFRWa90dLmXFJYNcJUKHrued2/obQOb4Eucl3Q6K8dneNQe9s0ESfyqnvdBb3hgrjY6XINS pOOYxyvW X-Google-Smtp-Source: AGHT+IEJyXTMuDf5NuY3j1SIPBbXyZqFASFj5tk1fMcghhN0EpO5xryOPxQNqGVnzZ7Hib6ZrPMFDA== X-Received: by 2002:a17:903:3385:b0:216:1cf8:8b8 with SMTP id d9443c01a7336-21a83f69643mr199408035ad.27.1736739585271; Sun, 12 Jan 2025 19:39:45 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:39b5:3f0b:acf3:9158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25aabfsm44368405ad.246.2025.01.12.19.39.39 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 12 Jan 2025 19:39:44 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, linux-riscv@lists.infradead.org, ying.huang@intel.com, zhengtangquan@oppo.com, lorenzo.stoakes@oracle.com Subject: [PATCH v2 3/4] mm: Support batched unmap for lazyfree large folios during reclamation Date: Mon, 13 Jan 2025 16:39:00 +1300 Message-Id: <20250113033901.68951-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250113033901.68951-1-21cnbao@gmail.com> References: <20250113033901.68951-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250112_193946_313541_97C58AC9 X-CRM114-Status: GOOD ( 17.74 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song Currently, the PTEs and rmap of a large folio are removed one at a time. This is not only slow but also causes the large folio to be unnecessarily added to deferred_split, which can lead to races between the deferred_split shrinker callback and memory reclamation. This patch releases all PTEs and rmap entries in a batch. Currently, it only handles lazyfree large folios. The below microbench tries to reclaim 128MB lazyfree large folios whose sizes are 64KiB: #include #include #include #include #define SIZE 128*1024*1024 // 128 MB unsigned long read_split_deferred() { FILE *file = fopen("/sys/kernel/mm/transparent_hugepage" "/hugepages-64kB/stats/split_deferred", "r"); if (!file) { perror("Error opening file"); return 0; } unsigned long value; if (fscanf(file, "%lu", &value) != 1) { perror("Error reading value"); fclose(file); return 0; } fclose(file); return value; } int main(int argc, char *argv[]) { while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); clock_t start_time = clock(); unsigned long start_split = read_split_deferred(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time = clock(); unsigned long end_split = read_split_deferred(); double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n", elapsed_time, end_split - start_split); munmap((void *)p, SIZE); } return 0; } w/o patch: ~ # ./a.out Time taken by reclamation: 0.177418 seconds, split_deferred: 2048 Time taken by reclamation: 0.178348 seconds, split_deferred: 2048 Time taken by reclamation: 0.174525 seconds, split_deferred: 2048 Time taken by reclamation: 0.171620 seconds, split_deferred: 2048 Time taken by reclamation: 0.172241 seconds, split_deferred: 2048 Time taken by reclamation: 0.174003 seconds, split_deferred: 2048 Time taken by reclamation: 0.171058 seconds, split_deferred: 2048 Time taken by reclamation: 0.171993 seconds, split_deferred: 2048 Time taken by reclamation: 0.169829 seconds, split_deferred: 2048 Time taken by reclamation: 0.172895 seconds, split_deferred: 2048 Time taken by reclamation: 0.176063 seconds, split_deferred: 2048 Time taken by reclamation: 0.172568 seconds, split_deferred: 2048 Time taken by reclamation: 0.171185 seconds, split_deferred: 2048 Time taken by reclamation: 0.170632 seconds, split_deferred: 2048 Time taken by reclamation: 0.170208 seconds, split_deferred: 2048 Time taken by reclamation: 0.174192 seconds, split_deferred: 2048 ... w/ patch: ~ # ./a.out Time taken by reclamation: 0.074231 seconds, split_deferred: 0 Time taken by reclamation: 0.071026 seconds, split_deferred: 0 Time taken by reclamation: 0.072029 seconds, split_deferred: 0 Time taken by reclamation: 0.071873 seconds, split_deferred: 0 Time taken by reclamation: 0.073573 seconds, split_deferred: 0 Time taken by reclamation: 0.071906 seconds, split_deferred: 0 Time taken by reclamation: 0.073604 seconds, split_deferred: 0 Time taken by reclamation: 0.075903 seconds, split_deferred: 0 Time taken by reclamation: 0.073191 seconds, split_deferred: 0 Time taken by reclamation: 0.071228 seconds, split_deferred: 0 Time taken by reclamation: 0.071391 seconds, split_deferred: 0 Time taken by reclamation: 0.071468 seconds, split_deferred: 0 Time taken by reclamation: 0.071896 seconds, split_deferred: 0 Time taken by reclamation: 0.072508 seconds, split_deferred: 0 Time taken by reclamation: 0.071884 seconds, split_deferred: 0 Time taken by reclamation: 0.072433 seconds, split_deferred: 0 Time taken by reclamation: 0.071939 seconds, split_deferred: 0 ... Signed-off-by: Barry Song --- mm/rmap.c | 46 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 365112af5291..3ef659310797 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1642,6 +1642,25 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page, #endif } +/* We support batch unmapping of PTEs for lazyfree large folios */ +static inline bool can_batch_unmap_folio_ptes(unsigned long addr, + struct folio *folio, pte_t *ptep) +{ + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + int max_nr = folio_nr_pages(folio); + pte_t pte = ptep_get(ptep); + + if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) + return false; + if (pte_none(pte) || pte_unused(pte) || !pte_present(pte)) + return false; + if (pte_pfn(pte) != folio_pfn(folio)) + return false; + + return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, + NULL, NULL) == max_nr; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1655,6 +1674,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + int nr_pages = 1; unsigned long pfn; unsigned long hsz = 0; @@ -1780,6 +1800,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + } else if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && + can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) { + nr_pages = folio_nr_pages(folio); + flush_cache_range(vma, range.start, range.end); + pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); + if (should_defer_flush(mm, flags)) + set_tlb_ubc_flush_pending(mm, pteval, address, folio_size(folio)); + else + flush_tlb_range(vma, range.start, range.end); } else { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ @@ -1875,7 +1904,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * redirtied either using the page table or a previously * obtained GUP reference. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); folio_set_swapbacked(folio); goto walk_abort; } else if (ref_count != 1 + map_count) { @@ -1888,10 +1917,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * We'll come back here later and detect if the folio was * dirtied when the additional reference is gone. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); goto walk_abort; } - dec_mm_counter(mm, MM_ANONPAGES); + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto discard; } @@ -1943,13 +1972,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter_file(folio)); } discard: - if (unlikely(folio_test_hugetlb(folio))) + if (unlikely(folio_test_hugetlb(folio))) { hugetlb_remove_rmap(folio); - else - folio_remove_rmap_pte(folio, subpage, vma); + } else { + folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); + folio_ref_sub(folio, nr_pages - 1); + } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); + /* We have already batched the entire folio */ + if (nr_pages > 1) + goto walk_done; continue; walk_abort: ret = false; From patchwork Mon Jan 13 03:39:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13936689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CD633E7719A for ; Mon, 13 Jan 2025 03:46:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=t1XMRg54nQSp8/EH5otPlOs5Y//8NM069hYIc60Rxo8=; b=XJE7tk6h1fMUTXd1zJSjxG17EF I+wKsfVTzlYxtDJ1rFGnt3+wyTmBhjEsI80FUyDhNHpfD4Wz+DaOgO4HTOcpd3W2LfU4JLUEF0m0G PpcPmUsLBiGZorVXiqa0uLj90cXSwDX5tQrzddFA6V/Mjn5jQDc627yBWQnpTYRo2CHxn2RhiPcyb jEnWV7797jeeOiDKcFmvRGmOSmOWGVKS+L0Ty543NJWbahpfxEaoFY/cRmyeGQzesGWkoKYkGWfNt 2awr1TI8wSxdKVLokSX4LblCbK7O3BVEQASthqoAmBVzCbdOzirm8P10raGwgcxo8WtoYhda2N20h p97qIFxg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXBOp-00000003wBX-30h7; Mon, 13 Jan 2025 03:45:59 +0000 Received: from mail-pl1-x629.google.com ([2607:f8b0:4864:20::629]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXBIv-00000003vFv-0ipe; Mon, 13 Jan 2025 03:39:54 +0000 Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-21669fd5c7cso66199665ad.3; Sun, 12 Jan 2025 19:39:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736739592; x=1737344392; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=t1XMRg54nQSp8/EH5otPlOs5Y//8NM069hYIc60Rxo8=; b=DPK5OnIvLXkdS5o3EP2JJk8J3jr5cgfl6b6bP5pL0jHmHK4fgcbfbKVyqWRFuVicgv fS71by+jEn73bAsIJPsPPeDyEwCKHBywsUcHDjhvkkGBv7Cyumeu30QbqDA+K3BE9ofM CiTmQRZe1wdkILrqxijFDJvCl0ljIhAk4dQKUhxwA/2Lwc4DWwZDkAeICKLE2mSDrLot hps9gN4aZZlTF4KwSV5pYJ4yaBgyIjxtdIici+VFby/Wlny+tTnGXokaAZPm7Q5QSZw9 hTfEzjpyGNbdvkCgm5BMF/fHQCfvQ4JQcVSmv/wDjT8APYAfO+esnflwJU6pf19b+Flc 7YVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736739592; x=1737344392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t1XMRg54nQSp8/EH5otPlOs5Y//8NM069hYIc60Rxo8=; b=mSvYdtZOjv6i30aVpqMGh/h45R4xnYTieN2taMOhlOl4rWXAxl9ul70jPWryi07xZW /qQip0qpYPNYP58t4Bkc/L4TNv5l+KYmbTLYNIZQRpTqKvpYbnl0WhPCjjpssACSn2LT 73NzdFqGHkjnHSB5tW8BYx6KzalW7Oz1SZ//gNDl3ehJRzzhl1ZThyyrVQ38oNy80Srs QOQdh7wh03jPncojHrnZb4pWJoG40k5kMGGfz+AMC73//IJ9P4SYYGtEUzfCOq/WiQGr CziryRYChYi0r0ZBSU8uqkFMnwFCSTA/3WnxGv4oRMXRbBbtXhpvlR4kL7kHVkGIXeJg cd2g== X-Forwarded-Encrypted: i=1; AJvYcCVjphbZuLEsP5Sz/WMujIhpseahVj4FwtPlPhizWOZGnaeIULZz/zWViSlRXRzwGic4mDy/DBEM2WwbpNTboNiF@lists.infradead.org, AJvYcCWq4N+cFnPOeOiH4VCyzU0V9iQCPbz6VNWjP0ljC8YwHfHO+r48w2uX4OSOQ4ZlBkmRS86AVrTpjS8ZD3o=@lists.infradead.org X-Gm-Message-State: AOJu0Yx/vxI/TAXrwilnsmBb6EjRQznFidHpvGZAaZBw0uDCB8+XDgZ5 QBVO1gZ781QQdt49QEigERwZ+DvPLg/g+K1K28JeJCVHHll+sjCD X-Gm-Gg: ASbGncs6vkdk0M2LNYQWglTvS4YmRmeHgPgoq2qqqZhW4TXUBt5V0MUYuDnkC55MYxJ YiYUwrMY1I73+rYls3ho5IpMvE/3fhbmdl+IfUHmKUvIB0ZNuotNIbuHRsx5M+yxm9isS077lWN 4/hXwlQ9KXM0hK5O5HrqaWT3u4pIXpR0l9vOm5jJ6kpix6LzWf+OMFbhq0X/ojb7Rr3jyj8kJoG LiFlzFKWYbWVMBSB2a1dOlMT0r3aefrcEEX90OQmtn7/CC/QdbCvKpCr5zAtSmh2X943a5VMNLf 7TVMfZEl X-Google-Smtp-Source: AGHT+IFvTUatRCa49/9Qoa3okmsFGYsqNW658p8oE5OURLMqB1h00MlvT5OAmq5hWQRPMmGvVdLulw== X-Received: by 2002:a17:902:cec3:b0:216:6fb5:fd83 with SMTP id d9443c01a7336-21a83f69d84mr299035325ad.29.1736739592302; Sun, 12 Jan 2025 19:39:52 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:39b5:3f0b:acf3:9158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25aabfsm44368405ad.246.2025.01.12.19.39.46 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 12 Jan 2025 19:39:51 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, linux-riscv@lists.infradead.org, ying.huang@intel.com, zhengtangquan@oppo.com, lorenzo.stoakes@oracle.com Subject: [PATCH v2 4/4] mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap Date: Mon, 13 Jan 2025 16:39:01 +1300 Message-Id: <20250113033901.68951-5-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250113033901.68951-1-21cnbao@gmail.com> References: <20250113033901.68951-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250112_193953_214905_A6D159BF X-CRM114-Status: GOOD ( 15.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song The try_to_unmap_one() function currently handles PMD-mapped THPs inefficiently. It first splits the PMD into PTEs, copies the dirty state from the PMD to the PTEs, iterates over the PTEs to locate the dirty state, and then marks the THP as swap-backed. This process involves unnecessary PMD splitting and redundant iteration. Instead, this functionality can be efficiently managed in __discard_anon_folio_pmd_locked(), avoiding the extra steps and improving performance. The following microbenchmark redirties folios after invoking MADV_FREE, then measures the time taken to perform memory reclamation (actually set those folios swapbacked again) on the redirtied folios. #include #include #include #include #define SIZE 128*1024*1024 // 128 MB int main(int argc, char *argv[]) { while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); /* redirty after MADV_FREE */ memset((void *)p, 1, SIZE); clock_t start_time = clock(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time = clock(); double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds\n", elapsed_time); munmap((void *)p, SIZE); } return 0; } Testing results are as below, w/o patch: ~ # ./a.out Time taken by reclamation: 0.007300 seconds Time taken by reclamation: 0.007226 seconds Time taken by reclamation: 0.007295 seconds Time taken by reclamation: 0.007731 seconds Time taken by reclamation: 0.007134 seconds Time taken by reclamation: 0.007285 seconds Time taken by reclamation: 0.007720 seconds Time taken by reclamation: 0.007128 seconds Time taken by reclamation: 0.007710 seconds Time taken by reclamation: 0.007712 seconds Time taken by reclamation: 0.007236 seconds Time taken by reclamation: 0.007690 seconds Time taken by reclamation: 0.007174 seconds Time taken by reclamation: 0.007670 seconds Time taken by reclamation: 0.007169 seconds Time taken by reclamation: 0.007305 seconds Time taken by reclamation: 0.007432 seconds Time taken by reclamation: 0.007158 seconds Time taken by reclamation: 0.007133 seconds … w/ patch ~ # ./a.out Time taken by reclamation: 0.002124 seconds Time taken by reclamation: 0.002116 seconds Time taken by reclamation: 0.002150 seconds Time taken by reclamation: 0.002261 seconds Time taken by reclamation: 0.002137 seconds Time taken by reclamation: 0.002173 seconds Time taken by reclamation: 0.002063 seconds Time taken by reclamation: 0.002088 seconds Time taken by reclamation: 0.002169 seconds Time taken by reclamation: 0.002124 seconds Time taken by reclamation: 0.002111 seconds Time taken by reclamation: 0.002224 seconds Time taken by reclamation: 0.002297 seconds Time taken by reclamation: 0.002260 seconds Time taken by reclamation: 0.002246 seconds Time taken by reclamation: 0.002272 seconds Time taken by reclamation: 0.002277 seconds Time taken by reclamation: 0.002462 seconds … This patch significantly speeds up try_to_unmap_one() by allowing it to skip redirtied THPs without splitting the PMD. Suggested-by: Baolin Wang Suggested-by: Lance Yang Signed-off-by: Barry Song --- mm/huge_memory.c | 17 ++++++++++++++--- mm/rmap.c | 11 ++++++++++- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3d3ebdc002d5..aea49f7125f1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3070,8 +3070,12 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma, int ref_count, map_count; pmd_t orig_pmd = *pmdp; - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd)) + if (pmd_dirty(orig_pmd)) + folio_set_dirty(folio); + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + folio_set_swapbacked(folio); return false; + } orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp); @@ -3098,8 +3102,15 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma, * * The only folio refs must be one from isolation plus the rmap(s). */ - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) || - ref_count != map_count + 1) { + if (pmd_dirty(orig_pmd)) + folio_set_dirty(folio); + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + folio_set_swapbacked(folio); + set_pmd_at(mm, addr, pmdp, orig_pmd); + return false; + } + + if (ref_count != map_count + 1) { set_pmd_at(mm, addr, pmdp, orig_pmd); return false; } diff --git a/mm/rmap.c b/mm/rmap.c index 3ef659310797..02c4e4b2cd7b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1671,7 +1671,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; + bool anon_exclusive, lazyfree, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; int nr_pages = 1; @@ -1724,9 +1724,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, } if (!pvmw.pte) { + lazyfree = folio_test_anon(folio) && !folio_test_swapbacked(folio); + if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) goto walk_done; + /* + * unmap_huge_pmd_locked has either already marked + * the folio as swap-backed or decided to retain it + * due to GUP or speculative references. + */ + if (lazyfree) + goto walk_abort; if (flags & TTU_SPLIT_HUGE_PMD) { /*