From patchwork Fri Jul 21 09:40:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13321719 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 240D0EB64DC for ; Fri, 21 Jul 2023 09:42:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B32932801AE; Fri, 21 Jul 2023 05:41:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE3982801A0; Fri, 21 Jul 2023 05:41:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 983F42801AE; Fri, 21 Jul 2023 05:41:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 860FB2801A0 for ; Fri, 21 Jul 2023 05:41:59 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5AE27C01EC for ; Fri, 21 Jul 2023 09:41:59 +0000 (UTC) X-FDA: 81035127558.22.9AE5CAC Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf09.hostedemail.com (Postfix) with ESMTP id 3DBB8140006 for ; Fri, 21 Jul 2023 09:41:57 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=adCbNm7l; spf=pass (imf09.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689932517; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XHg8XCtLoVnx+178so0LHeAi7YesWlPOQ5pb0at6+2Q=; b=XW5T3H0C32NzbZegbaykz9YlG4YAZvunYXlpDZt1fimzIIYNh15bODI+EN7ZRGFwS95mVA 29EhtvaeCLt9e5oeEpBuwpZcm53KZ1UgNHslNBB6gan1XUEt2j3iOTOxQpeYq11FBjAf2Q UN8DQzHwD1dzaPr9qB9H4vP/T+GqwP8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=adCbNm7l; spf=pass (imf09.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689932517; a=rsa-sha256; cv=none; b=Iu+GvvRuz2soE2jzaUeURVj2uTA7tHo+7VsRk9m2VewRQtuMyTafMTMWWUo9GH0wbxmmo4 tYDz0uGQsJVLYF6Y6JS2bEjhT2zrqRqA8aevHVQIo4/OiOolMB+DqYfiywcmBLmY3DRNXc A7kDOctqcvIrg1hB2VbIjEuVDWA7KL4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689932517; x=1721468517; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y7g87XRsxPlXPl3gcejHKFouz+BtPVkulKmHSjYpyg4=; b=adCbNm7lQZ63xMlikNu4u9BSZiNj6FUfQq+uwnH1h0NKJ/C2ca/2L0Qg 8whdPE6ipiBYCYwzsQA2ttKvOMrw3YMqgbA+YM856TDZ1R2bqmYKZxfiJ Cjc19p1m5W4DWkCEsGVuwOHEgqxKrHezIY91TyQI6Xn770zDVfXxhegGw FqlsOZMejH9sk3Q85XnTBnD/72PX5Xx9aD1qrbmxDT+otnq7EFXvqeQfb d2OeFCljOzcs2AFt/z3Lrfj6iamfCI4PDpOyKGu9oF3JW3Pi3RquWYuNX d+LyA5ZmlVze3Sklo41Vj773xmPSLZQe6TeuPdmnzv9zYZ3RJPZH7p4yV A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="346575498" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="346575498" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2023 02:41:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="838480297" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="838480297" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga002.fm.intel.com with ESMTP; 21 Jul 2023 02:41:26 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, minchan@kernel.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH v2 4/4] madvise: avoid trying to split large folio always in cold_pageout Date: Fri, 21 Jul 2023 17:40:43 +0800 Message-Id: <20230721094043.2506691-5-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230721094043.2506691-1-fengwei.yin@intel.com> References: <20230721094043.2506691-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 3DBB8140006 X-Rspam-User: X-Stat-Signature: tu7rnd3jqxszi7u76dya5exdh3g8rfsd X-Rspamd-Server: rspam01 X-HE-Tag: 1689932517-70164 X-HE-Meta: U2FsdGVkX1++ueX6Ia86VABXGI6qAwNJDrL4WWZb7t7btUq01jaIMzoQ6FWfYkA8/qy7kxq+QBrnpnGP5LaWzYvXizlYUsnFwlQ83yyZirIxPEssUYULPZqPFTL9dlGxI2oqsQ27HWI2jPRTOQW2yvLTWNGhxEFQF2ZgEZfi97/69OGqsf8Yk0K+Abk+ccNPgfAjHcfh7C5oHf2oCn1ClDagRZ7y6Yj3ZZ8N8tVQiBKMxL+K2HXfh+OVrourjezZnsK74gAGYn5FK8E55HbnqHBGX7CY8ZmZ19rHrMS1LxEf2VnoHmK3KfmHLhqglE0DirPaQ3sM/93NqFy47sf8BQ45S3xftMSFibCJxEzm2KTRkJhRHqKOAdtvY9FWPi19xIhabXYeysgnpQWCbbx/+LU5E9Nfk0pDZw95XgbLxijSZUDxCm0A7obzRwK4OnODH0CBqLhn0FTuQDnTYCi9a9VZ5ZcajHbY6EpGjUUokLoOmPV7R9/NFK+/JHOVpscTYm6ITjK0ffa1hMRP3xdVjV1IAVTJet9Rd6ipuT2hl2k/LmOeq/S1IhWKXyUia+quyXfx93n51OemACtXPfTIHRDPnm2nB8y7Jz6k7ANAGQGwMDB0r+8SdigWgOMTvwKFJArdz6blXDg1aFYQEpEnOW3eylJLb0kPblLUfEv9wjIHwySJLYOrjfr7qyVnByaEhPs5QEq3VGP8ccAP0oDojYoKaYlOuYLO3aa6Ui801kw8uhpCszzrHp+hZSfeGRwtZOd1lsF4m0yTsK4+HSIOqkV86UHvI8LYI/vBxKagx2UZ3mc+sUh9sZX5QUmVD54mlDeyyt9nJFE8p7+Ye+hEpXlSDECV8c3uE7nkGUzT6JBKT7DGr2z4nRioV16Njc1cdWAjdizUQNVW1zmgJwJCkmSAukfRpTJ9gBJ5cRvTZPLoAAdOHrnTvLX+1vVYNI86+PMm2VgI7abuF4dP2he 9oCokNZQ I1b4En9eOZhGcyrvfiXCv3FGfTjanBBYqeoJOENfmg2UMNoq97OsXPAehLrC6skVKMg4/Omtm6+0hE89jyWeLmn0Iz6iYT7p1qTb7MvXN9EbMPYRPQp9l2njhMyMoms8eLJx4vzGV/pU482i59DcjARwF5CKd8KAM1pwgLyXCKSSS9209ZLls/ajkQMV45ou51dz2q4un0N82QsE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Current madvise_cold_or_pageout_pte_range() always tries to split large folio. Avoid trying to split large folio always by: - if large folio is in the request range, don't split it. Leave to page reclaim to decide whether the large folio needs be split. - if large folio crosses boundaries of request range, skip it if it's page cache. Try to split it if it's anonymous large folio. If failed to split it, just skip it. Invoke folio_referenced() to clear the A bit for large folio. As it will acquire pte lock, just do it after release pte lock. Signed-off-by: Yin Fengwei --- mm/internal.h | 10 +++++ mm/madvise.c | 118 +++++++++++++++++++++++++++++++++++--------------- 2 files changed, 93 insertions(+), 35 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index c7dd15d8de3e..cd1ff348d690 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -586,6 +586,16 @@ extern long faultin_vma_page_range(struct vm_area_struct *vma, extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, unsigned long bytes); +static inline unsigned int +folio_op_size(struct folio *folio, pte_t pte, + unsigned long addr, unsigned long end) +{ + unsigned int nr; + + nr = folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte); + return min_t(unsigned int, nr, (end - addr) >> PAGE_SHIFT); +} + static inline bool folio_in_range(struct folio *folio, struct vm_area_struct *vma, unsigned long start, unsigned long end) diff --git a/mm/madvise.c b/mm/madvise.c index b236e201a738..71af370c3251 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -339,6 +339,23 @@ static inline bool can_do_file_pageout(struct vm_area_struct *vma) file_permission(vma->vm_file, MAY_WRITE) == 0; } +static inline bool skip_cur_entry(struct folio *folio, bool pageout_anon_only) +{ + if (!folio) + return true; + + if (folio_is_zone_device(folio)) + return true; + + if (!folio_test_lru(folio)) + return true; + + if (pageout_anon_only && !folio_test_anon(folio)) + return true; + + return false; +} + static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -352,7 +369,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, spinlock_t *ptl; struct folio *folio = NULL; LIST_HEAD(folio_list); + LIST_HEAD(reclaim_list); bool pageout_anon_only_filter; + unsigned long start = addr; if (fatal_signal_pending(current)) return -EINTR; @@ -442,54 +461,90 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, continue; folio = vm_normal_folio(vma, addr, ptent); - if (!folio || folio_is_zone_device(folio)) + if (skip_cur_entry(folio, pageout_anon_only_filter)) continue; /* - * Creating a THP page is expensive so split it only if we - * are sure it's worth. Split it if we are only owner. + * Split large folio only if it's anonymous, cross the + * boundaries of request range and we are likely the + * only onwer. */ if (folio_test_large(folio)) { - int err; + int err, step; if (folio_estimated_sharers(folio) != 1) - break; - if (pageout_anon_only_filter && !folio_test_anon(folio)) - break; - if (!folio_trylock(folio)) - break; + continue; + if (folio_in_range(folio, vma, start, end)) + goto pageout_cold_folio; + if (!folio_test_anon(folio) || !folio_trylock(folio)) + continue; + folio_get(folio); + step = folio_op_size(folio, ptent, addr, end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(start_pte, ptl); start_pte = NULL; err = split_folio(folio); folio_unlock(folio); folio_put(folio); - if (err) - break; + start_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl); if (!start_pte) break; arch_enter_lazy_mmu_mode(); - pte--; - addr -= PAGE_SIZE; - continue; - } - /* - * Do not interfere with other mappings of this folio and - * non-LRU folio. - */ - if (!folio_test_lru(folio) || folio_mapcount(folio) != 1) + /* split success. retry the same entry */ + if (!err) + step = 0; + + /* + * Split fails, jump over the whole folio to avoid + * grabbing same folio but fails to split it again + * and again. + */ + pte += step - 1; + addr += (step - 1) << PAGE_SHIFT; continue; + } - if (pageout_anon_only_filter && !folio_test_anon(folio)) + /* Do not interfere with other mappings of this folio */ + if (folio_mapcount(folio) != 1) continue; VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - ptep_clear_flush_young_notify(vma, addr, pte); + +pageout_cold_folio: + if (folio_isolate_lru(folio)) { + if (folio_test_unevictable(folio)) + folio_putback_lru(folio); + else + list_add(&folio->lru, &folio_list); + } + } + + if (start_pte) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); + } + + while (!list_empty(&folio_list)) { + folio = lru_to_folio(&folio_list); + list_del(&folio->lru); + + if (folio_test_large(folio)) { + int refs; + unsigned long flags; + struct mem_cgroup *memcg = folio_memcg(folio); + + refs = folio_referenced(folio, 0, memcg, &flags); + if ((flags & VM_LOCKED) || (refs == -1)) { + folio_putback_lru(folio); + continue; + } + } + /* * We are deactivating a folio for accelerating reclaiming. * VM couldn't reclaim the folio unless we clear PG_young. @@ -501,22 +556,15 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (folio_test_active(folio)) folio_set_workingset(folio); if (pageout) { - if (folio_isolate_lru(folio)) { - if (folio_test_unevictable(folio)) - folio_putback_lru(folio); - else - list_add(&folio->lru, &folio_list); - } - } else - folio_deactivate(folio); + list_add(&folio->lru, &reclaim_list); + } else { + folio_clear_active(folio); + folio_putback_lru(folio); + } } - if (start_pte) { - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(start_pte, ptl); - } if (pageout) - reclaim_pages(&folio_list); + reclaim_pages(&reclaim_list); cond_resched(); return 0;