From patchwork Mon Apr 22 05:52:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lance Yang X-Patchwork-Id: 13637657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96514C04FF6 for ; Mon, 22 Apr 2024 05:53:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 003576B007B; Mon, 22 Apr 2024 01:53:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF58C6B0082; Mon, 22 Apr 2024 01:53:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D97466B0083; Mon, 22 Apr 2024 01:53:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BA1A96B007B for ; Mon, 22 Apr 2024 01:53:11 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1F2D7809F9 for ; Mon, 22 Apr 2024 05:53:11 +0000 (UTC) X-FDA: 82036099782.29.F8FFCD3 Received: from mail-oi1-f177.google.com (mail-oi1-f177.google.com [209.85.167.177]) by imf09.hostedemail.com (Postfix) with ESMTP id 37B9B140009 for ; Mon, 22 Apr 2024 05:53:09 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="ML/MGIkz"; spf=pass (imf09.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.177 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713765189; a=rsa-sha256; cv=none; b=w7Q69pOg755jsBkW+m2TjFPOhdhdGSIzdwUJJlPMCwfMDPabBodp/9S833wZ2iuQ6O5PlG 3E4BIwusTWgFj5bOEQej6zcRSuqOaH6mm23O/TRIbZoGRrqmrMrq+DnTpxJHHX4as5OyR9 2yKofUQ/zSTXHshYdSBDdSxd4Cuy69s= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="ML/MGIkz"; spf=pass (imf09.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.177 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713765189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ANRmJnAxIwQezObQ0dI8wX/f7ITBK7dK8gAwvMFlXEE=; b=eXB7MI97oXPqetE3Pk3VwOJ0IOnDWDxiU4MgWkicwZKoHzF1ukg8wnPu3pDPzgut8vSQll wCbym9ao3kqmfCf690kk75D8GJraVDDdG+NvMFldbPen3Xmn8k5i8YmA8k/bB4j29rqRbU misIICr4hY/2H5B9DBqPOXhqMsyhxLk= Received: by mail-oi1-f177.google.com with SMTP id 5614622812f47-3c7513a991cso1260579b6e.1 for ; Sun, 21 Apr 2024 22:53:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713765188; x=1714369988; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ANRmJnAxIwQezObQ0dI8wX/f7ITBK7dK8gAwvMFlXEE=; b=ML/MGIkzNSaMJTZuzo7TF7Yqmes3kIxsykAq1lWhqVgJpUgtGPisVzExGZ2scYSCTC fOrhhDccXckl0R61r8Gq5Pi0SV5czRKCva6pnT0+HQo4//Wg3SRL+a3rb9e+Oj0KpSGc ckVMQZfbq0/TRCR2Pl0vfUAzpnpFb9H/zp/XpruJJt6bk+UrfbfbvHXLH645EvtcKVw/ JOa7hrwKvwiorTGPkIsmuoVaUBcxBWa9ISat88FkiT1wYQ298bKRke5ccsFcyt3Ps4FJ Iq8BdhkTvYlszBU592X0ArmrBAf+I3/uZgB+Iy7JM1SkXWE4PqnvFWBDf1Oq9IwVkrtO pSXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713765188; x=1714369988; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ANRmJnAxIwQezObQ0dI8wX/f7ITBK7dK8gAwvMFlXEE=; b=HIq1fXs6e/eBY9wRv5Oaw3w27uYs/unlm/9CvUk8DGuRag3xihv9eVeJFqDrvt9GDN euEDVYHa/Qzc3Ekey61w6fSuSzCA9+0ezQvqgFacFX1sbiydQyCLAo+dwmki2hMNwXIb wFBAb6bMrCKzLWB/7t4kkg+z988dtMxVtt/gsziah0zPetIT/U0qnwrrs0SlgLXwg9mU ZT/cT+wbreI33BTE6mgoOUAs1lDpXfCCLBNGY2xQBacgtHotWp0mKnriEJKgH4zBLem9 kGQ5UBlUAGmJsbOC5P4OmQy03PIeReUT5TzRRo/+P3uQtpP0wbfKKu14qv8pqKwXtvS1 8nKQ== X-Forwarded-Encrypted: i=1; AJvYcCXjoCBa/AmOxS3GG2jyVdtGQoraH9l1tJZhDYp3tS2JH1Ag4q3h594b7UVOooiTiqdXBp/HW+9lgCRvBtUnGZKgbfc= X-Gm-Message-State: AOJu0YxxmC0vYUSUi8EG1ofAdsisY8RCksmBXzt29Yh7vK5cevG5qepz KJq3wKZKvRFjBl66fBOyQeRDdcIDB9e1z8gdT1GmATVmieTKl9zg X-Google-Smtp-Source: AGHT+IFCKr+EDfn2W3NT7eXCuWDO7acGSq4KTmc8e2yJzNF19TulYYrOAc2+O4d0o09+SEDQQMmpfA== X-Received: by 2002:a05:6808:124a:b0:3c7:51f:156c with SMTP id o10-20020a056808124a00b003c7051f156cmr12931072oiv.29.1713765188129; Sun, 21 Apr 2024 22:53:08 -0700 (PDT) Received: from LancedeMBP.lan ([112.10.225.217]) by smtp.gmail.com with ESMTPSA id i6-20020aa787c6000000b006e6b52eb59asm7020927pfo.126.2024.04.21.22.53.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Apr 2024 22:53:07 -0700 (PDT) From: Lance Yang To: akpm@linux-foundation.org Cc: willy@infradead.org, maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com, david@redhat.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: [PATCH v2 1/1] mm/vmscan: avoid split PMD-mapped THP during shrink_folio_list() Date: Mon, 22 Apr 2024 13:52:13 +0800 Message-Id: <20240422055213.60231-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 37B9B140009 X-Stat-Signature: kbre4gw1nnhmsuxtoig377cgpbbuc31d X-Rspam-User: X-HE-Tag: 1713765189-622055 X-HE-Meta: U2FsdGVkX19QcyI4zZ6v+fXO8I4HUQ9LSbG0gwhOQBfZHo/ymTBTkiumHyQW4S3lY9rxS8Rwwe7Sh3++R+nULfEk125687jx5HjzhSLnoRSPGsuQTs0Tnvdb27ma1tVIKQlDN43Gr+lN7+Ax7Sgx0riAaF7aeqeCsLXO/awypL3mTM4admObGyhI/xr14qdTZHSRKD02Ue4XROvxQ2CyrOLOsEp/QUuQgmo5vK1iDEi6+kVDDSCPSvRvhMV8Atnu1hDd59PS2r7e0Zf4uHVNGtMO2H/KTMOX3DmupfY1mJE5tNn733JASB8w2LyMHd41fEyjoQ6bPudWnkBJxYnnbDFNhIRNmzSyf14J7WLwsc4Cp9r+QLJcisDRxmr7anQoK/Kibylvw3Li85RmTZbohTsRQ/Tec6jzj8p/vPkc1RyhpKJafbgW/jEBVZTOKVYH0zArSjeFibni9oe93zAtgPmthR0srlrtSDZ5toe54lNEIeXFedB0O/uzWBCxhR6owdwwBu7gn31B2lTK0ldffy+5tDM0cM7eJ6F5xRiaw/Fc5JRkPgbpBoFERIR4pbFAnPLGuVi1osedgiXBqMC9tybRnd/6Ym96pRDvBI2Wb0HusHFvjM3zv2F7e78rKd9ZEWt0EZebGEGnnF6uVUrUSyuK3EgEeWUBx+/R1A9aFPZQOWbbmpRWZCJsxP9HJtD/hDtnwxNsVlzUNtss7vt1PeLK0KTAvh5vQBVWWMRNR0wtSye4r4tYvzRGCf6Lsnf8d/ydFriZuHfaar4ODuiPFbFmYoNoxgQTMRdDp8/HuHAayRUFOF+VLcUY8+z2s68kug/JvKe9E/ooVcLXzWxxXhaulqo1p7jylk6Oi53Vm3YKoH3HvdYy4jSdKuswQIU0QorCL405El9U9AUv7LQR09PVz0ovF8LwAHNhMxEnt93jB1r+3CJhblYdrW9mcd1sXI7Xt8w+FhHPfoGApgs j/7TmbzU w5WfUlmd2WF51Br0nVKGhmlfRkQV+Z0dUZl41WF/IC7MI28S35Y34YkvYSldhChQi+pvj7XH9aqwGYxJl2K8FRWW2vi/VBgd/LPqCEssd3x0AVHSXbFI2zWUAn9SdXO1KHgJSyK0t/Gi1CLm8pszVv0v25TkySAwjpNtcQWtAuNx/NQI8J2+E2pdETVEgKQN1u3+9FZP+4vU1rLTWNKm4AbxuduE19KZACCVKJjE5sDzv3KmjjUwksd44LJLa0eGCqX2AHYgACgOCxFO8sNPUZZ44OzPw5SMrLKvqsLgx+WuCr5NnNIN+yQQHhb62DxJ1+yNlrU46tb/qcpgoCiiWNJwXDZUK4aN8DoXYEI34pOV1U25VLTB0g9/giLbdU0+qAT1bG/umr9NYXeZXgJsbQfWnpbk7ZFe74g6l6Yt/XE0bZrvDyYpJBhF9wK5uvA4Yz/xk3TOsf7+zsnqrAENOksYLIppJjP//bmI45ieGpMm/g2bBB0R1TrKIL3Y79xbAXY0iHJg08MRJ4xeqdSBR80+nzNgDZ0mz5MZcCe8HFg5tXvo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When the user no longer requires the pages, they would use madvise(MADV_FREE) to mark the pages as lazy free. IMO, they would not typically rewrite to the given range. At present, PMD-mapped THPs that are marked as lazyfree during shrink_folio_list() are unconditionally split, which may be unnecessary. If the THP is clean, its PMD is also clean, and there are no unexpected references, then we can attempt to remove the PMD mapping from it. This change will improve the efficiency of memory reclamation in this case. On an Intel i5 CPU, reclaiming 1GiB of PMD-mapped THPs using mem_cgroup_force_empty() results in the following runtimes in seconds (shorter is better): -------------------------------------------- | Old | New | Change | -------------------------------------------- | 0.683426 | 0.049197 | -92.80% | -------------------------------------------- Signed-off-by: Lance Yang --- v1 -> v2: - Update the changelog - Follow the exact same logic as in try_to_unmap_one() (per David Hildenbrand) - Remove the extra code from rmap.c (per Matthew Wilcox) - https://lore.kernel.org/linux-mm/20240417141111.77855-1-ioworker0@gmail.com include/linux/huge_mm.h | 2 + include/linux/rmap.h | 2 + mm/huge_memory.c | 88 +++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 6 +++ mm/vmscan.c | 7 ++++ 5 files changed, 105 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd07b83a3d0..56c7ea73090b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -36,6 +36,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, unsigned long cp_flags); +bool discard_trans_pmd(struct vm_area_struct *vma, unsigned long addr, + struct folio *folio); vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0f906dc6d280..670218f762c8 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -100,6 +100,8 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ + TTU_LAZYFREE_THP = 0x100, /* avoid splitting PMD-mapped THPs + * that are marked as lazyfree. */ }; #ifdef CONFIG_MMU diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 824eff9211db..63de1445feab 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1810,6 +1810,94 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) mm_dec_nr_ptes(mm); } +bool discard_trans_pmd(struct vm_area_struct *vma, unsigned long addr, + struct folio *folio) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + int ref_count, map_count; + struct mmu_gather tlb; + pmd_t *pmdp, orig_pmd; + struct page *page; + bool ret = false; + spinlock_t *ptl; + + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); + + /* Perform best-effort early checks before acquiring the PMD lock */ + if (folio_ref_count(folio) != folio_mapcount(folio) + 1 || + folio_test_dirty(folio)) + return false; + + pmdp = mm_find_pmd(mm, addr); + if (unlikely(!pmdp)) + return false; + if (pmd_dirty(*pmdp)) + return false; + + tlb_gather_mmu(&tlb, mm); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + addr & HPAGE_PMD_MASK, + (addr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + ptl = pmd_lock(mm, pmdp); + orig_pmd = *pmdp; + if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd))) + goto out; + + page = pmd_page(orig_pmd); + if (unlikely(page_folio(page) != folio)) + goto out; + + orig_pmd = pmdp_huge_get_and_clear(mm, addr, pmdp); + tlb_remove_pmd_tlb_entry(&tlb, pmdp, addr); + + /* + * Syncing against concurrent GUP-fast: + * - clear PMD; barrier; read refcount + * - inc refcount; barrier; read PMD + */ + smp_mb(); + + ref_count = folio_ref_count(folio); + map_count = folio_mapcount(folio); + + /* + * Order reads for folio refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + /* + * If the PMD or folio is redirtied at this point, or if there are + * unexpected references, we will give up to discard this folio + * and remap it. + * + * The only folio refs must be one from isolation plus the rmap(s). + */ + if (ref_count != map_count + 1 || folio_test_dirty(folio) || + pmd_dirty(orig_pmd)) { + set_pmd_at(mm, addr, pmdp, orig_pmd); + goto out; + } + + folio_remove_rmap_pmd(folio, page, vma); + zap_deposited_table(mm, pmdp); + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + folio_put(folio); + ret = true; + +out: + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); + + return ret; +} + int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/rmap.c b/mm/rmap.c index 2608c40dffad..a7913a454028 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1631,6 +1631,12 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (flags & TTU_SYNC) pvmw.flags = PVMW_SYNC; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (flags & TTU_LAZYFREE_THP) + if (discard_trans_pmd(vma, address, folio)) + return true; +#endif + if (flags & TTU_SPLIT_HUGE_PMD) split_huge_pmd_address(vma, address, false, folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index 49bd94423961..e2686cc0c037 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1277,6 +1277,13 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_test_pmd_mappable(folio)) flags |= TTU_SPLIT_HUGE_PMD; + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (folio_test_anon(folio) && !was_swapbacked && + (flags & TTU_SPLIT_HUGE_PMD)) + flags |= TTU_LAZYFREE_THP; +#endif + /* * Without TTU_SYNC, try_to_unmap will only begin to * hold PTL from the first present PTE within a large