From patchwork Wed May 23 08:26:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10420597 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9F9DF60224 for ; Wed, 23 May 2018 08:27:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EEF828E3F for ; Wed, 23 May 2018 08:27:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8332928E54; Wed, 23 May 2018 08:27:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CB5428E3F for ; Wed, 23 May 2018 08:27:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A3BA6B0269; Wed, 23 May 2018 04:26:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 854FA6B026A; Wed, 23 May 2018 04:26:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 744A26B026B; Wed, 23 May 2018 04:26:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 311E46B0269 for ; Wed, 23 May 2018 04:26:58 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id e3-v6so12788616pfe.15 for ; Wed, 23 May 2018 01:26:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=CyLD0Z/8x0RrK2peMAul8p2SRPapy9FdhRZkd4uRZTw=; b=SDwmKKwyumS17HWS/u09j1nsOL8pUv0doVaY3LXQXbF4GMjcBIT04/GUH89qQ6vJsf a8s7TCwRw6WfCOZ5Tv8w35OtSUYGaSF4ajTGkIL/oFoHP7SN8AhUgglseAalxTPyMHIZ CGD1oVvgDgs34P7DPN6cV+Ed2EgAg20gdPlC9MTOXPfi+3zSk1Kqo5QdABHjAPUHT+aX 4PrAOT170HdBsv/6ZFIdy5eLkekx3Hwt80b6gTNY8EpUyXRV4sZWRrOX3adeGBJ9pCeg peRpF6Ip7p7XY7sSjOiCA4D7Lhk3D3D0bll6CKzZq7bCTCPinp0O3FSwtx0yvpen1iTt AhlA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ALKqPwfBbS480o1XTLC44U+b8/lgVKiwtnOZWqB06ymsr8QzHSFsDbef LTWkSt4uEeljpVy6B0XC33ICIK448d6Kn9NeQLRbLXQZg/ZxR2GxENdp88aAyJjq+LmVniTkULK 9y2gorZZMtETIJX874k8Nu0HK113/4mkT5XnwJC4Fb/kgtVPovN3vRWtDqcwe3mZlww== X-Received: by 2002:a17:902:aa04:: with SMTP id be4-v6mr2020348plb.20.1527064017876; Wed, 23 May 2018 01:26:57 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpKpGYrEFJzK59EgWgeNIl7NUPTRdXMkY1G8fAbkZQjxeAKSTWXWtZI2s4Cm7tbYb6Mxg6m X-Received: by 2002:a17:902:aa04:: with SMTP id be4-v6mr2020304plb.20.1527064017094; Wed, 23 May 2018 01:26:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527064017; cv=none; d=google.com; s=arc-20160816; b=Q/ullr5u5y0ckp1Xg2kPFhE0PgXsT58tAc1B4ZPQw7kyTZB7LerUy3P9sVJM/m3Dwv AgS10bjavjvZLQrxWU7UcWPZYERXC0bQRjNmB9+rasr1FAYp2qMgrZVvqlUQ6pv5C2uQ ums0jFQy1ToVlIuW2qrOsWd6piX/FjCknmj0A0ImxpnvJz4QfPu3ncSS1oxOPPxGBHqB v7KR7vfccN5j5/3gky7ySCiNWqpXc/OMKc7EemONnun2OW9vxy9dB8bD0CaieF6YlQE3 lGuLBGVns66uN/rFzQNxFsKUXlTCzyqUrl19lFAneToiGbJ8dKnTbkh0IIzUnF1ycJEv /wHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=CyLD0Z/8x0RrK2peMAul8p2SRPapy9FdhRZkd4uRZTw=; b=wmtr0QhTOWfCSXt5Yd9vH/G2URNtCk3+W3y+64LQk7mO85Sphs83x5TiofewcU96jS 9NF1NzYfnsl2LKJ7hjZwst2XqUINUM8ZCH/n6QSPQY4cRvsgJHB2FTayMomy1+bGQF4j 85IWW6QpODHaPdUdiocJbT/78EJQBuit4cru57RjhMMyv8Ue/uuKeU4ik5TlNj4deM1c GN0MMPHvDlDgBJZaQaGJhRBJHK+2d3fZ0pQ9xr5hElqC3ycSvjJVXbgMDCYVKU4UiT15 +xEW/0QrFsmkZGcDX+Cfabh5+jAnar8qP5x5IQtElYUw8Ul9XF3q3WcS6Zg2XQqaKwai IpoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id y16-v6si17687140pfm.140.2018.05.23.01.26.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 01:26:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 May 2018 01:26:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,432,1520924400"; d="scan'208";a="57726075" Received: from yhuang6-ux31a.sh.intel.com ([10.239.197.97]) by fmsmga001.fm.intel.com with ESMTP; 23 May 2018 01:26:53 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan Subject: [PATCH -mm -V3 09/21] mm, THP, swap: Swapin a THP as a whole Date: Wed, 23 May 2018 16:26:13 +0800 Message-Id: <20180523082625.6897-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.1 In-Reply-To: <20180523082625.6897-1-ying.huang@intel.com> References: <20180523082625.6897-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Huang Ying With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP as a whole. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan --- include/linux/huge_mm.h | 9 +++ include/linux/swap.h | 9 +++ mm/huge_memory.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 4 files changed, 198 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0dbfbe34b01a..f5348d072351 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -402,4 +402,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index d2e017dd7bbd..5832a750baed 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -560,6 +560,15 @@ static inline struct page *lookup_swap_cache(swp_entry_t swp, return NULL; } +static inline struct page *read_swap_cache_async(swp_entry_t swp, + gfp_t gft_mask, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll) +{ + return NULL; +} + static inline int add_to_swap(struct page *page) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3975d824b4ed..8303fa021c42 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1609,6 +1611,174 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); } + +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + ret = split_swap_cluster(entry, false); + /* + * Retry if somebody else swap in the swap + * entry + */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, true); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} #else static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, diff --git a/mm/memory.c b/mm/memory.c index f8a4336dd015..a8b85fb333c7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4073,13 +4073,17 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (thp_swap_supported()) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd);