From patchwork Fri Jul 29 01:40:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12931861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3859C00140 for ; Fri, 29 Jul 2022 01:40:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CCD5940008; Thu, 28 Jul 2022 21:40:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57BDC940007; Thu, 28 Jul 2022 21:40:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38186940008; Thu, 28 Jul 2022 21:40:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1CBD4940007 for ; Thu, 28 Jul 2022 21:40:50 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EE549405EC for ; Fri, 29 Jul 2022 01:40:49 +0000 (UTC) X-FDA: 79738433418.13.F608DB6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 6BCF380093 for ; Fri, 29 Jul 2022 01:40:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659058849; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Pdq3GP5K2kZBYouzoKDRpFce2r59+NT7hzfgEhUpfY=; b=eJwQg6JqF6Z5+Axr05nETMHLk0bvRYoTprtv5F1zJ7/kj+pXAG+NgKZkKl6P3QaVA+TAgY fBd1jCtQOKc7vqiYH1sIanEN6qgUhefndXWVm/Af7nR0r+Z3oE8NmGLmdOwKI2OsbnL13Q gwInNIC6kb+U6w5F3Gxp8CLDLEjcSyM= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-606-jOzRa4ZoPMOFIibHQiqNDA-1; Thu, 28 Jul 2022 21:40:47 -0400 X-MC-Unique: jOzRa4ZoPMOFIibHQiqNDA-1 Received: by mail-qt1-f200.google.com with SMTP id f1-20020ac84641000000b0031ecb35e4d1so2072970qto.2 for ; Thu, 28 Jul 2022 18:40:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1Pdq3GP5K2kZBYouzoKDRpFce2r59+NT7hzfgEhUpfY=; b=0l05WbL1EyAlVFgxBRqIUDeaPVihIjFKG/7BXW0jaHCSIGZT+8E/wz7HgGxG+bjg5s AfYheDQzEg0Xv979FkLNxzFz4QqpmdUerI2Y88EeQTUOJITyfNmwbVTy9QY07XHYGlju uefsQVUgjfXVExI3kPa1Ym8Fen8RGfOM0kjxOWDiGBoKHzOjYQGukDBRfINtSDVoFlpA 2uu2uxqvyY7YAg7+ApSxaqDzovYTnL2hTThcLYX/3Q45cjGFoGecs72E7kCS2T00Bo6J OUiOWsi6hCSY54aO6Q0Gu/rddtO3eSy3uzhmi3smApA/ubZR6PPC9zuDHpCP+gGCBzhi 4y1Q== X-Gm-Message-State: AJIora9J3w359z+8PJaPJiA22HEKSuOvaUb0/T9uay66mYK6MmGm0Jqz 3vxGJSjCPdoApDJ1a2sQV/09HkRYwsRkoE/Dcu8Ao0fbj9Brf+/kU+nGrwLAh6z+6xtgQip2VV5 3L2cwgCG4fbqEOp2XKtydU3RBDQKoXL/PqO0URltjeUlRXkDhWqoZy24u4Gmh X-Received: by 2002:a37:3c3:0:b0:6b5:cd61:cef with SMTP id 186-20020a3703c3000000b006b5cd610cefmr1179621qkd.368.1659058846546; Thu, 28 Jul 2022 18:40:46 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uTTF+MpDAe6hgcAbyvybxSoNRT6Fd6ul/QwMr7NbcBL/2EhxKg35/KJlM9ChukfGVxvuJLdw== X-Received: by 2002:a37:3c3:0:b0:6b5:cd61:cef with SMTP id 186-20020a3703c3000000b006b5cd610cefmr1179603qkd.368.1659058846218; Thu, 28 Jul 2022 18:40:46 -0700 (PDT) Received: from localhost.localdomain (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id u9-20020a05620a454900b006b259b5dd12sm1584531qkp.53.2022.07.28.18.40.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 28 Jul 2022 18:40:45 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Huang Ying , peterx@redhat.com, Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Nadav Amit , Hugh Dickins , David Hildenbrand , Vlastimil Babka Subject: [PATCH RFC 2/4] mm: Remember young bit for page migrations Date: Thu, 28 Jul 2022 21:40:39 -0400 Message-Id: <20220729014041.21292-3-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220729014041.21292-1-peterx@redhat.com> References: <20220729014041.21292-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-type: text/plain ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eJwQg6Jq; spf=pass (imf02.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659058849; a=rsa-sha256; cv=none; b=AEEHyj9QY2W3vINGzQbxLDp2vgKIfUgt2BERcBb53D8JXqiqToMB1dbWUnQAoejUi5rNdE x/LjSVBw6QpE3TL5pWoP6344OfAyKR7z6puVYTmf9rZ3cmtQHXM1Hj2J2bmm4qHUR6O3qp LDMvR9Kihm/loTmz0hCVUUKCG46dv5o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659058849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1Pdq3GP5K2kZBYouzoKDRpFce2r59+NT7hzfgEhUpfY=; b=xtSQm5R+vfxCcXEs3T/uNVoUUddJIVKhRNFG64atG+6VZUgd3ecGRTSrPRVWk7pddZo14P GOtPbhzfXF/qntff7Bg5aqaJ6ediQGkEUu6v5GbVGKMTMnnT5oSLzUouVVlRy/LgPK+zkl TzSDHL8L8b01i0APwloHdmj0gG66l5E= X-Rspamd-Queue-Id: 6BCF380093 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eJwQg6Jq; spf=pass (imf02.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-Rspam-User: X-Stat-Signature: zz56e6cfm7kde1tk4h6rhuj7k8zeg9sm X-HE-Tag: 1659058848-952310 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When page migration happens, we always ignore the young bit settings in the old pgtable, and marking the page as old in the new page table using either pte_mkold() or pmd_mkold(). That's fine from functional-wise, but that's not friendly to page reclaim because the moving page can be actively accessed within the procedure. Actually we can easily remember the young bit configuration and make that information recovered when the page is migrated. To achieve it, define a new bit in the migration swap offset field to show whether the old pte has young bit set or not. Then when removing/recovering the migration entry, we can recover the young bit even if the page changed. One thing to mention is that the whole feature is based on an arch specific macro __ARCH_SWP_OFFSET_BITS that needs to be defined per-arch. The macro tells how many bits are available for the arch specific swp offset field. When that macro is not defined, we'll assume we don't have free bits in the migration swap entry offset, so we can't persist the young bit. So until now, there should have no functional change at all with this patch, since no arch has yet defined __ARCH_SWP_OFFSET_BITS. Signed-off-by: Peter Xu --- include/linux/swapops.h | 57 +++++++++++++++++++++++++++++++++++++++++ mm/huge_memory.c | 10 ++++++-- mm/migrate.c | 4 ++- mm/migrate_device.c | 2 ++ mm/rmap.c | 3 ++- 5 files changed, 72 insertions(+), 4 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 5378f77860fb..3bbb57aa6742 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -31,6 +31,28 @@ #define SWP_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) #define SWP_PFN_MASK ((1UL << SWP_PFN_BITS) - 1) +#ifdef __ARCH_SWP_OFFSET_BITS +#define SWP_PFN_OFFSET_FREE_BITS (__ARCH_SWP_OFFSET_BITS - SWP_PFN_BITS) +#else +/* + * If __ARCH_SWP_OFFSET_BITS not defined, assuming we don't have free bits + * to be on the safe side. + */ +#define SWP_PFN_OFFSET_FREE_BITS 0 +#endif + +/** + * Migration swap entry specific bitfield definitions. + * + * @SWP_MIG_YOUNG_BIT: Whether the page used to have young bit set + * + * Note: these bits will be used only if there're free bits in arch + * specific swp offset field. Arch needs __ARCH_SWP_OFFSET_BITS defined to + * use the bits/features. + */ +#define SWP_MIG_YOUNG_BIT (1UL << SWP_PFN_BITS) +#define SWP_MIG_OFFSET_BITS (SWP_PFN_BITS + 1) + /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { @@ -258,6 +280,30 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) return swp_entry(SWP_MIGRATION_WRITE, offset); } +static inline swp_entry_t make_migration_entry_young(swp_entry_t entry) +{ + /* + * Due to a limitation on x86_64 we can't use #ifdef, as + * SWP_PFN_OFFSET_FREE_BITS value can be changed dynamically for + * 4/5 level pgtables. For all the non-x86_64 archs (where the + * macro MAX_PHYSMEM_BITS is constant) this branching should be + * optimized out by the compiler. + */ + if (SWP_PFN_OFFSET_FREE_BITS) + return swp_entry(swp_type(entry), + swp_offset(entry) | SWP_MIG_YOUNG_BIT); + return entry; +} + +static inline bool is_migration_entry_young(swp_entry_t entry) +{ + /* Please refer to comment in make_migration_entry_young() */ + if (SWP_PFN_OFFSET_FREE_BITS) + return swp_offset(entry) & SWP_MIG_YOUNG_BIT; + /* Keep the old behavior of aging page after migration */ + return false; +} + extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, @@ -304,6 +350,16 @@ static inline int is_readable_migration_entry(swp_entry_t entry) return 0; } +static inline swp_entry_t make_migration_entry_young(swp_entry_t entry) +{ + return entry; +} + +static inline bool is_migration_entry_young(swp_entry_t entry) +{ + return false; +} + #endif typedef unsigned long pte_marker; @@ -407,6 +463,7 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry) { /* Make sure the swp offset can always store the needed fields */ BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS); + BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_MIG_OFFSET_BITS); return is_migration_entry(entry) || is_device_private_entry(entry) || is_device_exclusive_entry(entry); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 29e3628687a6..131fe5754d8f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2088,7 +2088,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, write = is_writable_migration_entry(entry); if (PageAnon(page)) anon_exclusive = is_readable_exclusive_migration_entry(entry); - young = false; + young = is_migration_entry_young(entry); soft_dirty = pmd_swp_soft_dirty(old_pmd); uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { @@ -2146,6 +2146,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, else swp_entry = make_readable_migration_entry( page_to_pfn(page + i)); + if (young) + swp_entry = make_migration_entry_young(swp_entry); entry = swp_entry_to_pte(swp_entry); if (soft_dirty) entry = pte_swp_mksoft_dirty(entry); @@ -3148,6 +3150,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); else entry = make_readable_migration_entry(page_to_pfn(page)); + if (pmd_young(pmdval)) + entry = make_migration_entry_young(entry); pmdswp = swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); @@ -3173,13 +3177,15 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) entry = pmd_to_swp_entry(*pvmw->pmd); get_page(new); - pmde = pmd_mkold(mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot))); + pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde = pmd_mksoft_dirty(pmde); if (is_writable_migration_entry(entry)) pmde = maybe_pmd_mkwrite(pmde, vma); if (pmd_swp_uffd_wp(*pvmw->pmd)) pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde)); + if (!is_migration_entry_young(entry)) + pmde = pmd_mkold(pmde); if (PageAnon(new)) { rmap_t rmap_flags = RMAP_COMPOUND; diff --git a/mm/migrate.c b/mm/migrate.c index 1649270bc1a7..62cb3a9451de 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -199,7 +199,7 @@ static bool remove_migration_pte(struct folio *folio, #endif folio_get(folio); - pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot))); + pte = mk_pte(new, READ_ONCE(vma->vm_page_prot)); if (pte_swp_soft_dirty(*pvmw.pte)) pte = pte_mksoft_dirty(pte); @@ -207,6 +207,8 @@ static bool remove_migration_pte(struct folio *folio, * Recheck VMA as permissions can change since migration started */ entry = pte_to_swp_entry(*pvmw.pte); + if (!is_migration_entry_young(entry)) + pte = pte_mkold(pte); if (is_writable_migration_entry(entry)) pte = maybe_mkwrite(pte, vma); else if (pte_swp_uffd_wp(*pvmw.pte)) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 7feeb447e3b9..fd8daf45c1a6 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -221,6 +221,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, else entry = make_readable_migration_entry( page_to_pfn(page)); + if (pte_young(pte)) + entry = make_migration_entry_young(entry); swp_pte = swp_entry_to_pte(entry); if (pte_present(pte)) { if (pte_soft_dirty(pte)) diff --git a/mm/rmap.c b/mm/rmap.c index af775855e58f..605fb37ae95e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2065,7 +2065,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, else entry = make_readable_migration_entry( page_to_pfn(subpage)); - + if (pte_young(pteval)) + entry = make_migration_entry_young(entry); swp_pte = swp_entry_to_pte(entry); if (pte_soft_dirty(pteval)) swp_pte = pte_swp_mksoft_dirty(swp_pte);