From patchwork Fri Jun 22 03:51:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10481147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A61F660383 for ; Fri, 22 Jun 2018 03:55:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96FBC28F91 for ; Fri, 22 Jun 2018 03:55:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B96F28F98; Fri, 22 Jun 2018 03:55:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5B2028F91 for ; Fri, 22 Jun 2018 03:55:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B36F56B026E; Thu, 21 Jun 2018 23:55:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A46E66B0272; Thu, 21 Jun 2018 23:55:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C1166B0271; Thu, 21 Jun 2018 23:55:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 305A96B0270 for ; Thu, 21 Jun 2018 23:55:38 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id e1-v6so2106774pgp.20 for ; Thu, 21 Jun 2018 20:55:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=pfTD9bw7eLiGAKSL6e2W56qsD0qp8aDrOWJPMedWUqQ=; b=jvwnF+rXnxl4Ro9FN6RwPV7A/S/gooMGYW1J+nRcbjnCHE/T65ZMPM3kjYrYcRcUqD mk6pw8szSbIIEV/VarDjTVn6GoX520QJxsa38KIvrex4IhymvG3zuxFRi9T4ue7pldce p/qbMHDWH6BtbIBS4UzFsdXQVQ0ZBLptqRIcdKffnDxInLnKOKB9sgdKYlR9LeiQkv9M HNXhHHoN7O7tYM4rjoOoca69gKXBrgEW5y+rhKqhOwxTaNECbbEVP/S6aldPru060Fw4 k7zkiopevV04jEPBz+C+UDanKGpeLaG8u1rwbl8/lqWAMsNKl1bRsi2PauRXLL7mzmgf deIA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APt69E0+hA2lzoCzLU0Az0pGvENwHaBj2xWe9DCkcxI96uEaNIqB2lwM 9WKLdMOioTMmYvsPqwo6Qh+4FrTOP2AysIhJlKCl5Eh5TumGixZhw2jOCWTHZ+j+XoAcTdE0OdE 2Ue8q0qPiYHkRYXQiiuzfZwsYPQoLUxkUVotnOhiAePYZ0e2IF4VNQ058LYNKhBgOng== X-Received: by 2002:a65:47cc:: with SMTP id f12-v6mr24751684pgs.173.1529639737823; Thu, 21 Jun 2018 20:55:37 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIcce7UggmZSj1PwdUJFxD7RJXVzMGzfY09vCPAixMROR+5et/pwPQNwZtoB7oWPSFpRsje X-Received: by 2002:a65:47cc:: with SMTP id f12-v6mr24751649pgs.173.1529639736838; Thu, 21 Jun 2018 20:55:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529639736; cv=none; d=google.com; s=arc-20160816; b=g2OXi31PbX7t32nY8n1FvNiD8rqDbFHQTJGp0vgXmgyMFRdrDisV24fjtlosKjUN2l cBIhph9SNfZ9GHkc13dP1kePGny4ZbSExSB58bwQFMDsQ3e2ZCeKvLXCUzqBBPiPigyl ZQ19NOc4+z/8JtHVRkTBt43ZqCZfzE2CMbLBy5tvxH0Q13wkxRe0LUCPbDawfLUJ0RlT yMSZ58gFzxg2noZN29lfa3K45mJKpMFWEFvT59phYXiZ/x3d939/1rg/ytDngpX8OwtE mKjAADL0T9bhbxt/PfmpOBiaTb/jIc9BKhvMqT3MFwYHe7rYU7EarEYDqFutaO84qbdA bmPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=pfTD9bw7eLiGAKSL6e2W56qsD0qp8aDrOWJPMedWUqQ=; b=g4jGc4SJFJZ37LkfbV1ves0ijYNfFp3z8SuXNNw0yvwQBfjQnqyUGGcQGmGra0gAOh rMFizlNBrzQVrdaUYSqyAKgiJ8UrOwZh9/PN8GAB/pu0YvIWGJPl8nTKfoTTmAEOMzGG BZY7FGfNV7b1HC7IF7VRLcJlnDUunOTu9vOZB+5/rUN0ToS0/ilE2zigPfwRJPrqzfgu 3RXh+6TNwLy0bdCz35hINh3K/OxsmVkuXSDvm4AKXeXMuGbnR2eIW/qcqvwWYC2UZ2/e V52RC1uqReOSgqbeKSumdr2Ar/GYKPmHAlfuwsdXyUSJ3LS81cu4pKynmfh07HBY7Fr9 t8kA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id n61-v6si6160066plb.256.2018.06.21.20.55.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Jun 2018 20:55:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Jun 2018 20:55:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,255,1526367600"; d="scan'208";a="65335138" Received: from wanpingl-mobl.ccr.corp.intel.com (HELO yhuang6-ux31a.ccr.corp.intel.com) ([10.254.212.200]) by fmsmga004.fm.intel.com with ESMTP; 21 Jun 2018 20:55:30 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -mm -v4 09/21] mm, THP, swap: Swapin a THP as a whole Date: Fri, 22 Jun 2018 11:51:39 +0800 Message-Id: <20180622035151.6676-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180622035151.6676-1-ying.huang@intel.com> References: <20180622035151.6676-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Huang Ying With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP as a whole. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ include/linux/swap.h | 9 +++ mm/huge_memory.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 4 files changed, 198 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c5b8af173f67..42117b75de2d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -403,4 +403,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index d2e017dd7bbd..5832a750baed 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -560,6 +560,15 @@ static inline struct page *lookup_swap_cache(swp_entry_t swp, return NULL; } +static inline struct page *read_swap_cache_async(swp_entry_t swp, + gfp_t gft_mask, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll) +{ + return NULL; +} + static inline int add_to_swap(struct page *page) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 275a4e616ec9..ac79ae2ab257 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1609,6 +1611,174 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); } + +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + ret = split_swap_cluster(entry, false); + /* + * Retry if somebody else swap in the swap + * entry + */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, true); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} #else static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, diff --git a/mm/memory.c b/mm/memory.c index 55e278bb59ee..2125035b6a70 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4072,13 +4072,17 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (thp_swap_supported()) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd);