From patchwork Mon Jan 6 03:17:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13926922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C793E77197 for ; Mon, 6 Jan 2025 03:20:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=ehpzhDC3bpkW8kIPU1KlPdIEcv fxwz3TPWPxoLWxFZUlbcByYc6eZlDFSuZKyr1QjKoopLhBXTWWUCNNK6mcGJBUd3q2chV8aZdBPOC UAOh5NI/tNLeeo3uWLhxYvBIUu7AuzO3tmYJu6Z7eokQrqcEsr4b+lA3tLMteeq1l5hNNlzn6D/sB FTgjpXkAlsrGoRF3AjRgVUXDpiDbvz9kGaNCRQk/RPtmTedLk+S96U1dOcgLIGGlLoe4NLeh7pJb+ DkhXqJZbft5WdPKbexxiigoZh96CkAG+ODDatSr9IJDOfTMeBkIaQSxvwUhxd7weY9xZ8Yfjmr1mh 8amUeJpQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tUdeg-000000007u2-1bkE; Mon, 06 Jan 2025 03:19:50 +0000 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tUdcQ-000000007ao-0rzw for linux-arm-kernel@lists.infradead.org; Mon, 06 Jan 2025 03:17:31 +0000 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-2166022c5caso174118965ad.2 for ; Sun, 05 Jan 2025 19:17:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736133449; x=1736738249; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=lHF9QdUQftIqekxZCDYbD+e2sWHy4LNL0Thy7Mf3L7BYdwXZ4LUV/TSFe5lqNhKpwI 1NL7U92T5fdD2SJiaiMx5/OG3aSeB3bYTo4zlmT1C6S0+fD4Z6tYOBoiUoY/mh3+hCKn npCez4w1H5XocQ9Piv0jNWD7gRB04fyEKzdLUzeJp+W0vpBd4Cpl2cMwVLtP5HcJvan3 R/RVg1fFz0Vrwg9o1Lb35NNgYNPyhkdJJnK3qvawyhDVLv9zLvaWID43I7KGsvBj/ZrQ oM/pbbGs4k2mv1kct7T2YB4Du+TORtC1BRgHmn4rBBcfCph1XqucRLfPVknMzRkEAqTQ zddw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736133449; x=1736738249; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=qbM8lLpLPSDILqIAGkOTLY/v7Fm+dbGzYD6uNUn0k4KU3Bof+dAWYJrB9VbO4BrTAu EgyTnZOtxb62rr+a6kgZ77Nz8kEkS71pX0WtsKcnV9VW7scPNT1/mxw7eyArpnsS7HbS WDphAZXQMU2q7tokQZVini9DJ/qnp+kvksxz1NaYVZdTrf7MHujDUpp3vUJ2X/lElFF7 57KLa+lgI3uxuE/B/y/EZXc+ANjZEiKRTmZjjcOe5GAZTBQelUuyxl8Bsm3hsYY2qZrb cf1bWimIyjkSNIVDGTRl6Ydok3JNHl7gGHN/Panhyz2HKsqBrnSZJTt8vNUxZDOfbZAB Aasw== X-Gm-Message-State: AOJu0Yyx35vE1XN8jdBcfV0qeFmtjb7ADrg5XfNT5cNPWKWp9W/kvBuQ BbSRkXN9fKDHtghPS46wlU33LDPBQIfFJRRi2hrTKZxwSyIkE34W X-Gm-Gg: ASbGncsgeLRnLvD5BpcEcK/AT2qhQfTOIEc8B4q1b29zD1k5f7hv9/wjpWxQiZMdSRB ko0ZhuB4mg+0ikWN9hCJJ+imYhVD5j5SSbg/BukK62GnEoQle9xsJJU0kmT7AAxOTopUZl4Le2l 7R7Jfwk3V5Xda0XdQzu7l1cZ8iFXUKPOxqc9pcvX969TRb4ntvWhSCxP3wnNN/CKx+BpLZwAWKC PNWW2w/MGS8TqHHawumVabOE/qiJZaviVPOCsxW3n3X9KhRfZMB2bxbQhi/gBVbECKE4suifGGa rN09BTwq X-Google-Smtp-Source: AGHT+IGUcWPca8wiecs9NrdPulrIQkPrhd7bNDv5OvIdHp7U8jnY2abuTVpzr1KXpxDPDL37Vp5CVw== X-Received: by 2002:a17:902:e852:b0:216:386e:dbc with SMTP id d9443c01a7336-219e6ea1d22mr898673135ad.13.1736133449268; Sun, 05 Jan 2025 19:17:29 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:a54c:5ad3:ad27:edb7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f2ed62cde6sm38471399a91.13.2025.01.05.19.17.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 05 Jan 2025 19:17:28 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, david@redhat.com, ryan.roberts@arm.com, zhengtangquan@oppo.com, ying.huang@intel.com, kasong@tencent.com, chrisl@kernel.org, baolin.wang@linux.alibaba.com, Barry Song Subject: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Date: Mon, 6 Jan 2025 16:17:09 +1300 Message-Id: <20250106031711.82855-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250106031711.82855-1-21cnbao@gmail.com> References: <20250106031711.82855-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250105_191730_269963_BC99D83F X-CRM114-Status: GOOD ( 16.99 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song The refcount may be temporarily or long-term increased, but this does not change the fundamental nature of the folio already being lazy- freed. Therefore, we only reset 'swapbacked' when we are certain the folio is dirty and not droppable. Suggested-by: David Hildenbrand Signed-off-by: Barry Song --- mm/rmap.c | 49 ++++++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7..de6b8c34e98c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ smp_rmb(); - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - (!folio_test_dirty(folio) || - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE - * ones can be dropped even if they've - * been dirtied. - */ - (vma->vm_flags & VM_DROPPABLE))) { - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE ones - * never get swap backed on failure to drop. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_pte_at(mm, address, pvmw.pte, pteval); folio_set_swapbacked(folio); - goto walk_abort; + goto walk_abort; + } else if (ref_count != 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + goto walk_abort; + } + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; } if (swap_duplicate(entry) < 0) { From patchwork Mon Jan 6 03:17:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13926923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4F137E77197 for ; Mon, 6 Jan 2025 03:21:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=n3DoQzm517FFIjqDvouc3tzKgfaa0QgvxVLolob4ICo=; b=a/24sctYD1oPHpXXyPWqHvtq1p oSaQD+g1MvKxQLw68AMcCqfVsDR1ggAWo/NjNbPz45GFHUwh4pwcO26bzOhyKmm63S+aHLjg3BDzi VsGFcapPNlk9CAzr7Q/OSP2i4iuWKXX0AOBbG+9aOETeh0EMkNL8B9xAAZ7z8tBSioI+xZTSATvaD mCX4zyn8FR7bRzTuEXiyDPIQVn7Xduiwol894PaGWa88F6+raoz1zD0/uYxkvt9Qb7aY21Ob02aBo 6zr1WqQOPonRL9Bx3Gzg/u1uAJrMxQty+GY/+IsOtQQ4yWF3oW539XW7Eq8nO4ez6A6NQl1zRxWEv FF4toY3Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tUdfq-0000000081C-1CFn; Mon, 06 Jan 2025 03:21:02 +0000 Received: from mail-pj1-x102e.google.com ([2607:f8b0:4864:20::102e]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tUdcb-000000007du-3HDr for linux-arm-kernel@lists.infradead.org; Mon, 06 Jan 2025 03:17:42 +0000 Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-2f43d17b0e3so20346015a91.0 for ; Sun, 05 Jan 2025 19:17:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736133461; x=1736738261; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n3DoQzm517FFIjqDvouc3tzKgfaa0QgvxVLolob4ICo=; b=GL3KpTuCZtfdO+0TDPVbn+Bv+3T9A25lrHUEdKF6oD+URuqrB7DvsVdTIpiN3hV0lL kemE/IfRDwlh5W9JoG1mx7llEvXQ7CslXMf30XdCRqVkuD03WYY6jLeTnV2RtTyTZeW4 C4/jak7I2qPbhyMmKfz8bih6pIl3gK608nho0AXcDr4SCr1yzTwd229JPLEPNMoUDVj6 90FdGZcHCZSNTwH2D2ZkeOAB4BbM5cmRpCuKgj2lNM/NYcHI9ygkG/LiBYtfCXFhiAnv hrJxrm7haSNBEZ5o5iKo7ImoBeUfCl2vSHZLB+uxSfIZu+WAmZcTEF0DbHr8bHqU9ZaT M1SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736133461; x=1736738261; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n3DoQzm517FFIjqDvouc3tzKgfaa0QgvxVLolob4ICo=; b=MxVWi0d6XjNupk5UWcNG07Ex5vIokd2kufwS/LxyKJPoYiTOZG95dgkhU2VJW5ajAm xdBITjRCoZzJv1058CUzWrjNLD/UhPeChsUmoCSb97fEKhYNV+xVHnMxsh5fjSDB767H gYEP4EYupN5S67Jxyy6f5xao0kCMbz3eCYsqtrk545nIz+m9MsF2aT6Mx8l1JS+LlNXX 9OXYuwuBtYGNU/XjKCI7KDYpaNsL9j28NmR+kNswJnkBgS1b0Q6zGzxiKa0/xHuyRLsd NLHl3ydDGFLu341bSKfRqJUfeyCUuYcgR5sLNEbaJ2Vr+WVmw6Vz7rqDl0GWC8RUGYv2 DyWg== X-Gm-Message-State: AOJu0YyyOPpG6rn4v5RLP6ydYi7Q86Zziiqtj0Nozs+KHXHVKBSZ3scW ScfZamyJKHYfEDynltiD9ytYZ15sM3MWHSvkmr8di2jBpaHnhBwi X-Gm-Gg: ASbGncvRrGrJcJ/a1JQZxw/3D5yUkeOMg6V4w4g7RNHenre9vSVeH8EH+dgjtybpTT8 mwlV0W21TTbk/7YOabAyUiwqzQcFsKINqDubved9zzp86dMobftQsH76dnQ43wAmb1Awascds2E +94RwT9Bm5BoPp/4tTrccgpeXkS/gyqOMI9XJ5oLSQMtafdATJbVjbVzlivFnz6ySKQxxA89AlO C0mp69o6gm+n3+OnKNsWjzpDwOhAvCiYuRbpbjAsbE+zVU2QJHReE8e/qqQtewlkJDRITUmGJMm jhEICdYG X-Google-Smtp-Source: AGHT+IFyfsnRq/TPlAXOEBq4tTLlzzCUGW+jGQSlThHUVYTr85agsMlhfIXWLMv/0CFjghXY79/T8g== X-Received: by 2002:a17:90a:e18b:b0:2ee:fdf3:390d with SMTP id 98e67ed59e1d1-2f452edc2cfmr70372123a91.31.1736133460760; Sun, 05 Jan 2025 19:17:40 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:a54c:5ad3:ad27:edb7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f2ed62cde6sm38471399a91.13.2025.01.05.19.17.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 05 Jan 2025 19:17:40 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, david@redhat.com, ryan.roberts@arm.com, zhengtangquan@oppo.com, ying.huang@intel.com, kasong@tencent.com, chrisl@kernel.org, baolin.wang@linux.alibaba.com, Barry Song , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Anshuman Khandual , Shaoqin Huang , Gavin Shan , Kefeng Wang , Mark Rutland , "Kirill A. Shutemov" , Yosry Ahmed Subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Date: Mon, 6 Jan 2025 16:17:10 +1300 Message-Id: <20250106031711.82855-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250106031711.82855-1-21cnbao@gmail.com> References: <20250106031711.82855-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250105_191741_831404_23F1CC0C X-CRM114-Status: GOOD ( 17.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song This is a preparatory patch to support batch PTE unmapping in `try_to_unmap_one`. It first introduces range handling for `tlbbatch` flush. Currently, the range is always set to the size of PAGE_SIZE. Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: Ryan Roberts Cc: Shaoqin Huang Cc: Gavin Shan Cc: Kefeng Wang Cc: Mark Rutland Cc: David Hildenbrand Cc: Lance Yang Cc: "Kirill A. Shutemov" Cc: Yosry Ahmed Signed-off-by: Barry Song --- arch/arm64/include/asm/tlbflush.h | 26 ++++++++++++++------------ arch/arm64/mm/contpte.c | 2 +- arch/x86/include/asm/tlbflush.h | 3 ++- mm/rmap.c | 12 +++++++----- 4 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..f34e4fab5aa2 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) return true; } -static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr) -{ - __flush_tlb_page_nosync(mm, uaddr); -} - /* * If mprotect/munmap/etc occurs during TLB batched flushing, we need to * synchronise all the TLBI issued with a DSB to avoid the race mentioned in @@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsigned long start, return false; } -static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, +static inline void __flush_tlb_range_nosync(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long stride, bool last_level, int tlb_level) @@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, pages = (end - start) >> PAGE_SHIFT; if (__flush_tlb_range_limit_excess(start, end, pages, stride)) { - flush_tlb_mm(vma->vm_mm); + flush_tlb_mm(mm); return; } dsb(ishst); - asid = ASID(vma->vm_mm); + asid = ASID(mm); if (last_level) __flush_tlb_range_op(vale1is, start, pages, stride, asid, @@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, lpa2_is_enabled()); - mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); + mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } static inline void __flush_tlb_range(struct vm_area_struct *vma, @@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long stride, bool last_level, int tlb_level) { - __flush_tlb_range_nosync(vma, start, end, stride, + __flush_tlb_range_nosync(vma->vm_mm, start, end, stride, last_level, tlb_level); dsb(ish); } @@ -533,6 +526,15 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr) dsb(ish); isb(); } + +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr, + unsigned long size) +{ + __flush_tlb_range_nosync(mm, uaddr, uaddr + size, + PAGE_SIZE, true, 3); +} #endif #endif diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 55107d27d3f8..bcac4f55f9c1 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, * eliding the trailing DSB applies here. */ addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); - __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, + __flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE, PAGE_SIZE, true, 3); } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 69e79fff41b8..cda35f53f544 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -279,7 +279,8 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm, - unsigned long uaddr) + unsigned long uaddr, + unsignd long size) { inc_mm_tlb_gen(mm); cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); diff --git a/mm/rmap.c b/mm/rmap.c index de6b8c34e98c..365112af5291 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -672,7 +672,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; int batch; @@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; - arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size); tlb_ubc->flush_required = true; /* @@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { } @@ -1792,7 +1794,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2164,7 +2166,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } From patchwork Mon Jan 6 03:17:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13926924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9442DE77197 for ; Mon, 6 Jan 2025 03:22:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=n0uoTlJkT3xJASppF2RyQQeqHMmR9kwA4pIBCxskX/8=; b=fylshq7Pu68yHR71sfQz6TeTd9 Ode7U+DNWe87zppuBlglDHX81XHKaWdzROORaaj3AovNmuAKMBFvbCj1cTkCz1AKQXjbYCdfPpcf+ tU3y4wYiOmtg5t/TbbUHS9qj2AOTpkerBiGsMhp0QATKPz6U/oaeBmCMnut7DinjDrwe7/c2pdFap HCMeiDh4BvvzxDybh46S9c0ELOkH6W2eu1byX7AnRy+e94nrVzvGumaL5nsJoF9JzSGyIDGCWywqW jpGCDf1sKDcEMX3XvGnhw5/IRkmc6ehiJrhkjvr+lE1XHuvS2Mediszxa2t0I1dsyP0LltGzQ4avn p2wxFsvg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tUdh0-0000000087z-3xG6; Mon, 06 Jan 2025 03:22:14 +0000 Received: from mail-pj1-x102e.google.com ([2607:f8b0:4864:20::102e]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tUdci-000000007gT-0cHB for linux-arm-kernel@lists.infradead.org; Mon, 06 Jan 2025 03:17:49 +0000 Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-2ee989553c1so19834762a91.3 for ; Sun, 05 Jan 2025 19:17:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736133467; x=1736738267; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n0uoTlJkT3xJASppF2RyQQeqHMmR9kwA4pIBCxskX/8=; b=N1k22VJEPMZH1H7UOecHBNtgpVam9Uo4HaThxY0NaPWTnbeIzM8sQUBSPAJ/XEilVn txH/BEApI5J9nODQmbI2jJEbYGqiZiSkShvpMgB3bJxvFXksEGTtUYbYagT6ncYKiN0p zW7RGb/Lf4+zn1uG642mUYcXkKaY5BAZ3Uim80B8530zpozb325ddchRTYr7YLBt4NU8 sM1rGhzTGKI7b4NfH3S1gofPLJdDx65j5CAYi63DoLHlrF9RGUMgRadSjM5GBv55gux/ dVngyztK/vXtPOp8ytsSkjnLcEgT9yybTLDhiLsOR0abv9lxQ016cBVcVvPV5s7ZRXwB m2qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736133467; x=1736738267; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n0uoTlJkT3xJASppF2RyQQeqHMmR9kwA4pIBCxskX/8=; b=LiLH2jblgqJWRrVl15ql95lWfdooP4Q5rNVt4d+K4K/SNGKnpzwFW4NKlMyU3J3NCi OCiIQuZzHjp0yyTfHbdlfEsfOz6U5PLOYFayTCrQVipvE0JLBUyRqox0YVxv2Qi0tupc H/l9ooCoiYk6+xFOhX2NBwDJTleEk2N66oIBDqizNSJ6MPLUNOzhc/S5lvFsc5g56jZn SinYz7np+LEQjYTzS+de+Lo+nlPtutzeB+rH9hrmp2gx43nYww4cgQM+skKTIrYqd2J2 moClxIYM1vogCo8jB7K0PHXs19MZLNKbno/2gAdF9tWHaIXawWxvnBB44Q2dJcamy4Yn tPdA== X-Gm-Message-State: AOJu0YyWa7JE6U2z8mVv0W/Ov9RiXFAI3zdYkz/eypCkl0pQtD5E7Uag b9s3SxuCq1iObYe3Wkvn1GuYJbDDTh7Rce1qv/L9O0lVa9czxc+w X-Gm-Gg: ASbGncsB23/gdsgu63t6Qk4I7IBtWlnvaI16QIPW19K0o/8v8hRQ6zI97EhZg9Kd3UJ dNmyrwN94fdqBraySejOeHYE/PswJW6rw4udio+oCwokdUZeeAP/+suAad12V9GQKXFuCShLp8I ycq0DOr8yR8CSeOqhAqnrM1vWbG5TwH61/bSgYvPca+iY8th8tVD7dAF0wcm7axDd465EcWn3/Z E9tGNgg3eGBrg8SrHQp2jfGP4YeOVxVEkdrGa1AE6+kiVRiUbQHgSpzfX0kfa3XAjgnQl5NdG5v MhF7oW2W X-Google-Smtp-Source: AGHT+IHWP7zdolqMJNXBm1PXeI0GcejV+nCAy9WuJHL0SUoSVOj1myQWg2/Wp3YfIF+X1TNQvseTgA== X-Received: by 2002:a17:90b:540f:b0:2ee:6db1:21d3 with SMTP id 98e67ed59e1d1-2f452ec922bmr79065291a91.25.1736133467432; Sun, 05 Jan 2025 19:17:47 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:a54c:5ad3:ad27:edb7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f2ed62cde6sm38471399a91.13.2025.01.05.19.17.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 05 Jan 2025 19:17:47 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, david@redhat.com, ryan.roberts@arm.com, zhengtangquan@oppo.com, ying.huang@intel.com, kasong@tencent.com, chrisl@kernel.org, baolin.wang@linux.alibaba.com, Barry Song Subject: [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Date: Mon, 6 Jan 2025 16:17:11 +1300 Message-Id: <20250106031711.82855-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250106031711.82855-1-21cnbao@gmail.com> References: <20250106031711.82855-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250105_191748_187239_58FBDACD X-CRM114-Status: GOOD ( 19.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: Barry Song Currently, the PTEs and rmap of a large folio are removed one at a time. This is not only slow but also causes the large folio to be unnecessarily added to deferred_split, which can lead to races between the deferred_split shrinker callback and memory reclamation. This patch releases all PTEs and rmap entries in a batch. Currently, it only handles lazyfree large folios. The below microbench tries to reclaim 128MB lazyfree large folios whose sizes are 64KiB: #include #include #include #include #define SIZE 128*1024*1024 // 128 MB unsigned long read_split_deferred() { FILE *file = fopen("/sys/kernel/mm/transparent_hugepage" "/hugepages-64kB/stats/split_deferred", "r"); if (!file) { perror("Error opening file"); return 0; } unsigned long value; if (fscanf(file, "%lu", &value) != 1) { perror("Error reading value"); fclose(file); return 0; } fclose(file); return value; } int main(int argc, char *argv[]) { while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); clock_t start_time = clock(); unsigned long start_split = read_split_deferred(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time = clock(); unsigned long end_split = read_split_deferred(); double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n", elapsed_time, end_split - start_split); munmap((void *)p, SIZE); } return 0; } w/o patch: ~ # ./a.out Time taken by reclamation: 0.177418 seconds, split_deferred: 2048 Time taken by reclamation: 0.178348 seconds, split_deferred: 2048 Time taken by reclamation: 0.174525 seconds, split_deferred: 2048 Time taken by reclamation: 0.171620 seconds, split_deferred: 2048 Time taken by reclamation: 0.172241 seconds, split_deferred: 2048 Time taken by reclamation: 0.174003 seconds, split_deferred: 2048 Time taken by reclamation: 0.171058 seconds, split_deferred: 2048 Time taken by reclamation: 0.171993 seconds, split_deferred: 2048 Time taken by reclamation: 0.169829 seconds, split_deferred: 2048 Time taken by reclamation: 0.172895 seconds, split_deferred: 2048 Time taken by reclamation: 0.176063 seconds, split_deferred: 2048 Time taken by reclamation: 0.172568 seconds, split_deferred: 2048 Time taken by reclamation: 0.171185 seconds, split_deferred: 2048 Time taken by reclamation: 0.170632 seconds, split_deferred: 2048 Time taken by reclamation: 0.170208 seconds, split_deferred: 2048 Time taken by reclamation: 0.174192 seconds, split_deferred: 2048 ... w/ patch: ~ # ./a.out Time taken by reclamation: 0.074231 seconds, split_deferred: 0 Time taken by reclamation: 0.071026 seconds, split_deferred: 0 Time taken by reclamation: 0.072029 seconds, split_deferred: 0 Time taken by reclamation: 0.071873 seconds, split_deferred: 0 Time taken by reclamation: 0.073573 seconds, split_deferred: 0 Time taken by reclamation: 0.071906 seconds, split_deferred: 0 Time taken by reclamation: 0.073604 seconds, split_deferred: 0 Time taken by reclamation: 0.075903 seconds, split_deferred: 0 Time taken by reclamation: 0.073191 seconds, split_deferred: 0 Time taken by reclamation: 0.071228 seconds, split_deferred: 0 Time taken by reclamation: 0.071391 seconds, split_deferred: 0 Time taken by reclamation: 0.071468 seconds, split_deferred: 0 Time taken by reclamation: 0.071896 seconds, split_deferred: 0 Time taken by reclamation: 0.072508 seconds, split_deferred: 0 Time taken by reclamation: 0.071884 seconds, split_deferred: 0 Time taken by reclamation: 0.072433 seconds, split_deferred: 0 Time taken by reclamation: 0.071939 seconds, split_deferred: 0 ... Signed-off-by: Barry Song --- mm/rmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 365112af5291..9424b96f8482 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1642,6 +1642,27 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page, #endif } +/* We support batch unmapping of PTEs for lazyfree large folios */ +static inline bool can_batch_unmap_folio_ptes(unsigned long addr, + struct folio *folio, pte_t *ptep) +{ + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + int max_nr = folio_nr_pages(folio); + pte_t pte = ptep_get(ptep); + + if (pte_none(pte)) + return false; + if (!pte_present(pte)) + return false; + if (!folio_test_anon(folio)) + return false; + if (folio_test_swapbacked(folio)) + return false; + + return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, + NULL, NULL) == max_nr; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1655,6 +1676,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + int nr_pages = 1; unsigned long pfn; unsigned long hsz = 0; @@ -1780,6 +1802,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + } else if (folio_test_large(folio) && + can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) { + nr_pages = folio_nr_pages(folio); + flush_cache_range(vma, range.start, range.end); + pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); + if (should_defer_flush(mm, flags)) + set_tlb_ubc_flush_pending(mm, pteval, address, folio_size(folio)); + else + flush_tlb_range(vma, range.start, range.end); } else { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ @@ -1875,7 +1906,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * redirtied either using the page table or a previously * obtained GUP reference. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); folio_set_swapbacked(folio); goto walk_abort; } else if (ref_count != 1 + map_count) { @@ -1888,10 +1919,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * We'll come back here later and detect if the folio was * dirtied when the additional reference is gone. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); goto walk_abort; } - dec_mm_counter(mm, MM_ANONPAGES); + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto discard; } @@ -1943,13 +1974,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter_file(folio)); } discard: - if (unlikely(folio_test_hugetlb(folio))) + if (unlikely(folio_test_hugetlb(folio))) { hugetlb_remove_rmap(folio); - else - folio_remove_rmap_pte(folio, subpage, vma); + } else { + folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); + folio_ref_sub(folio, nr_pages - 1); + } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); + /* We have already batched the entire folio */ + if (nr_pages > 1) + goto walk_done; continue; walk_abort: ret = false;