From patchwork Wed Oct 10 07:19:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634137 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2336213AD for ; Wed, 10 Oct 2018 07:28:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CB0F2946B for ; Wed, 10 Oct 2018 07:28:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 40C132947D; Wed, 10 Oct 2018 07:28:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 810DD2946B for ; Wed, 10 Oct 2018 07:28:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A42B86B027B; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 537646B0281; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24E076B0280; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id BA3216B027D for ; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id h37-v6so3039087pgh.4 for ; Wed, 10 Oct 2018 00:27:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=u3MvgOCK5UKFfvn3uhz0b41gj2F/KojKmEvUNz05wtI=; b=Pp++5oIiJWmt0oCEcBJCPUSbK8anBgT8A5mYtChU/oQasBCCWeqrVjY0G12WQbeW7E Zr8rZaPRGnvLdPz9jgjubuEN0+QICBitNH4nB4gztWpypORqiPI6GYoqruwLe97w8JaE oabwBopCf2AFzQiVE4HtvsddlGfA//+9FWHkVnbWlwYecCosfkkeFHU9twBEReCmBR2D U3fwqHh6q2XspJmvsVLBll92q/jhWFbo6jTRhxe93JesD1/ztQPjt2ejSxl7HCxhXl3i q/xsYP1PZj7KJ4akKvNhUxIllKoFoWH66/KiojWoLKzcbAaLw5HbXNWbiV2Q8tLr81lZ jtSQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojAB/fwr0sKKy8LaeQ8169LE2ATEL1+MvM0G5ronJTfPRJCWoQn 5coBhB5zJo51D9HekpKwi6QTFlhSd51icfaADUA3GqvdG1Q6HL9LCnRmOiVYZ4l2SPG71Bn01PS GXqwndyOQhL4NwF5sFmltkhZpgt+Q+mLW4JpNT+t3BqKiVCsU67JB6hg5qNxfeC7MwA== X-Received: by 2002:a65:4d03:: with SMTP id i3-v6mr28798167pgt.239.1539156464407; Wed, 10 Oct 2018 00:27:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV60HHmd8iSFCH/x7cryw/c2LjgEKnzwxh5lFGKj64Xd9xk89plI+YqqTo6gmGnOPXDaIFmoK X-Received: by 2002:a65:4d03:: with SMTP id i3-v6mr28798120pgt.239.1539156463397; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156463; cv=none; d=google.com; s=arc-20160816; b=XjGatdm47nwtzENdCRUU2S7Dcu9wcTcQ/gsjsdmyEZ6hZJqpqvhZhvsYTe657WPJJ+ y4yklj3zRWKbmjRQR8idZIp9h8jxZb9TQtXoDj8UNsZY3/rB9bKyk+1+KENWQHV5QL+2 uJInvNQc9vTfNN9JWcIfpKlKVncIHJ8xp+Aq/PFoqokCcSzF9WOckzkwBv7U3OWm9h3b n9jx0oNAFkhbrTcjOH1sOpZ6cwjZHmXwia4V/q674Zxa5k8tZIYxdvz209k/mXbBLqM5 waM3Y2Yivqbc4oli7H2wRBdkyETPSKdYCMVSGVxYs8Gmf0WhnrQaeDdBhAF7ZzTzDlfy kfUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=u3MvgOCK5UKFfvn3uhz0b41gj2F/KojKmEvUNz05wtI=; b=gIofPqsihKHNQGuG3tetXl7X01vLg53miPi4kPanxdkdTshqSWQ510CAyZTrdpKJgl HApmBSPgjdCV12K45kuJXxi6f2so626JfwQvezJDCdYM8FgVf8OwoNYmSX7FYFAdFMjb DvR6Z77IFfxzBoEiEiN++riNouYJhGYciA05UQhwhpJ087f4YNjHgMmuNKo8yvo4De26 CUoyS/azor2s4soU7USf5qhJnXnlMOOF/dX3ysUyYHT2pgyt2tybUsRdloX1gkP2ajd1 Qdn+MLaeukuvngi/SZhtGNCRiIb1LNOR0LzOnyCo+nebKUZt8hmdEBYOZFsVo3XlVKgH 90Kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id a3-v6si22556744pld.351.2018.10.10.00.27.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870250" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:51 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 20/21] swap: create PMD swap mapping when unmap the THP Date: Wed, 10 Oct 2018 15:19:23 +0800 Message-Id: <20181010071924.18767-21-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e573774f9014..f6370e8c7742 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -375,6 +375,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, @@ -382,6 +384,8 @@ extern void __split_huge_swap_pmd(struct vm_area_struct *vma, extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -423,6 +427,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index abefc50b08b7..87795529c547 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1931,6 +1931,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index 3bb4be720bc0..a180cb1fe2db 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index a859f64a2166..017e9060082f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1318,11 +1318,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; }