From patchwork Fri Dec 7 05:41:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717479 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33E8B109C for ; Fri, 7 Dec 2018 05:42:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 207702EC81 for ; Fri, 7 Dec 2018 05:42:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1407B2EC8B; Fri, 7 Dec 2018 05:42:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56A7E2EC81 for ; Fri, 7 Dec 2018 05:42:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60CC26B7EB9; Fri, 7 Dec 2018 00:42:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 59E216B7EBA; Fri, 7 Dec 2018 00:42:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39F8B6B7EBB; Fri, 7 Dec 2018 00:42:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id D4CFE6B7EB9 for ; Fri, 7 Dec 2018 00:42:25 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id v11so1948414ply.4 for ; Thu, 06 Dec 2018 21:42:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=S8Ylpfm+bTKZJ/D71pW1fXimJ0e1xb+uIooq7bg13h4=; b=Ew9EDiksMReXf+VVga6kIE4r7OTxzDgWkHobLxc/C75yWnLmK/nIW6ZdIq9NFBIgFa S9c13P4HlNh+Bey6noo7vZVYgVnIgkjNX2+UqrBOWXieItKyuJIhxI5ReJ3TqbU0ZGps gzPY5p18W/OeGpUBs9sPuDa8ERBLPw5O5kqSdce0bs6jqpAGLSwe95h7IquXo7FIyjBd ReLTk8eGfnYn8C4V1WbiFMWERDETp8L+Sj400W8Ozalr3RHwJ7dxKkbeJATC7T50cNX+ 6TucB1V/vT28hb0fR6wOpJk4tLjsw4doa8+nRI81Wj37bQMQgL0dUCTJOXQEWOT9MJoJ o8rw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZzbCG5EnM0I1NgE0n+/ivQk5OReEgUXbCU3dEv5g/YA7XQy57Z ILGXhmt7SZt/6lrKYpB5YVL0yNqhzkfmO7YiaQ/UrWRUe1mOs2C2bYvlko1r84uI1exBGHzg0By +Lcuf6WjQsmlZZEC9j2AthxNlOP/I92n9Yj/CW/rolaQIiJOKEE8KTat8y0Q+NTVNIg== X-Received: by 2002:a17:902:f24:: with SMTP id 33mr901265ply.65.1544161345507; Thu, 06 Dec 2018 21:42:25 -0800 (PST) X-Google-Smtp-Source: AFSGD/UAuPYAxvRO7LT2TEh11NXKGcK6vb0u2zNX5+CXPF7dTne0VwCG963pcz6movNScL0Wt4+g X-Received: by 2002:a17:902:f24:: with SMTP id 33mr901232ply.65.1544161344273; Thu, 06 Dec 2018 21:42:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161344; cv=none; d=google.com; s=arc-20160816; b=YLosY8JCYSctaU42QhpnF02wVdYpoG05ho1bW6mUphbyg2zRSz8mNBLxmVuaL8r/x+ mAZB7ffZwOYX+jL80DOEZtqqgPf8KfVvZFKs05YMMqvmSE8R/z2NC5gOKmzPUzJphoHe 9DL90bsnLqW6afJ66Wj4DifspgeOsRN+xsOh2K2931da2pa0QUwUSckgm9c9msuOv4D0 WuGcKHhD5tiLJyqoF6c+hkL/+JvCHsAiUKxgfY48Li74+i0bS4JhtcXYCjdi5qnyfu2y 81ipK2l+YhHeVeEFbKdx5jQJDkE6/Hkzq03aWH57LazB+BRjBcD3LHvROBiOuTGFZco8 rFjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=S8Ylpfm+bTKZJ/D71pW1fXimJ0e1xb+uIooq7bg13h4=; b=iJDBY+dJj36dT6JIMnyV2hemyQF2K7btOPq4OjdQyqQS8n72jornwbGqzdg1xo9u1j e2yPjpxpmTZcYog+0gOh//a5ktDjycHEwrUVcxrdyBY6LduZz2HLritXPoFDMUDqgN51 rNJ4lLK6ifofHnfKabWGvTB3YO5NA7iA9Gz/z6IvyCHF2RBr7iagVc/IAORXisejQhSu C1yrUaMDDaPjIkjPw5IGhRyYJu3Dayz+VotzMmUuCeWvZ6wEXouKhkTbh2mHMwP9rhXi q+wSUei78cHfoG/R1ZN3fCFWVGzLtp8YDBP+OEuN6L3hevaMznqh/Gki0FM8ndStOldY u4ZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:24 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567354" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:21 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 20/21] swap: create PMD swap mapping when unmap the THP Date: Fri, 7 Dec 2018 13:41:20 +0800 Message-Id: <20181207054122.27822-21-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 260357fc9d76..06e4fde57a0f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -375,12 +375,16 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -421,6 +425,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b75af88c505a..27de3e547dc0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1938,6 +1938,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index a488d325946d..b7ea50d563a3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index 7a67923c09b3..c6af95025fb1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1340,11 +1340,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; }