From patchwork Tue Nov 20 08:54:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3AB3313BB for ; Tue, 20 Nov 2018 08:55:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B005299ED for ; Tue, 20 Nov 2018 08:55:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1F1E12A690; Tue, 20 Nov 2018 08:55:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35796299ED for ; Tue, 20 Nov 2018 08:55:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F16AD6B1F39; Tue, 20 Nov 2018 03:55:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EC8AC6B1F3A; Tue, 20 Nov 2018 03:55:19 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE2426B1F3B; Tue, 20 Nov 2018 03:55:19 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 952636B1F39 for ; Tue, 20 Nov 2018 03:55:19 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id d23so915345plj.22 for ; Tue, 20 Nov 2018 00:55:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=0v+iEEGd9s2O6tsPNBfb1jYj1UXNNNFcbpy+UtpBuBU=; b=RB0PsbLKXMzMRiICuA0ozeDm4vETrP07yDA7XTR47hi2b2m3hGrBBL6MkxHUzqOoVN qJjbrcf21g37ZnN+HjV+SK3ZpEjTKJ3qni4e9TlNRHEWnuyT+ewlvSBgeKpXe5MKrNTU sBsOWQWemzUE+kUKW/cely77ZFzJU+G37JvMoijP3vr6XsvWieQWM4yA+udC/7bWWdnR gwRPJDlIFiG12U4WYcUTarLfKHbWdtNPDyiuprapOBQHHchVw9snViu98H46fy2LYW7Q io4o0OLlSh3RjvK2ajiRc69EnX4RNxIm2NfMEnH/dtD+u12orqHWwmqFo5chbYOk9JIA X4Yg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaiCTJsihToGh7utQCNiWzyXQ4qDTRVZQhaBqr3CGGliS0sLICn BuLD2e88M4IvEzV/sA68M1IE2xLtIB1wXRtkvxKRo1xPP5copsTJURXkbExrzkCVWAFt8jmKIMV 6hD9BbvCznknwRrkDVOFwCq596MmhkK10bJGOKkiP0dc+MJdJ2Ax5BxSEZruGVTfyUw== X-Received: by 2002:a63:2315:: with SMTP id j21mr1119815pgj.297.1542704119215; Tue, 20 Nov 2018 00:55:19 -0800 (PST) X-Google-Smtp-Source: AFSGD/WIfmdjtZBU8Bj52kFlKSJBg6fg1KnzIs5Rdjh8Xu68ZQdMJkLHxiWcYxEZqIYo2cfRxRvD X-Received: by 2002:a63:2315:: with SMTP id j21mr1119768pgj.297.1542704118357; Tue, 20 Nov 2018 00:55:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704118; cv=none; d=google.com; s=arc-20160816; b=TsAvLXj8fzpq33u9P/XWHEBM4WUFsV4/GsQRWumsEL1QED+b8frpJ7sCjqlFFSY3LQ RCrABmPXuMlT5/F8UXfAwrMIsqK6ArLZQIdSAPDG6IiHio3SvF9EO/ZZL0u4O+KD1LnG d4IL0X8R1HNB4Cl05Ha0BKH6bRohbiWpZTlH77DBEZi5EEHG0sTT7lFK4cm2vaojXneo n3MAV4ALKK4gRr6UaFBL1CTpuuiZ8m9ZgLdRCotY3gjujdHUtAqiHJvK33bd7emlTqVV cDPcoqJU6TWuBBUfrTJOcIC5J77m8QM1rOnap8Wl+KVorndoR0blSDpmnYI4yqgWvK0N +JJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=0v+iEEGd9s2O6tsPNBfb1jYj1UXNNNFcbpy+UtpBuBU=; b=xVL7fRp50nCltPdctEglI8/4frpZl1TpabqSs/J/5zyfWx6lNhOlClCDULD9KGQgsy mdI2SJEdLSqlUY7CgolmnXukfOwZPhLJIy4rqi25wOTAYnNXU1sVU0y+7qjhcBs0hwxB 81kEnzu+lH7g0nqN4NG7OKCyQglRKb4lFe5YuSEzEdTj4LJqUyqnOnkqu3Do17ngU1KG H/P4gXcOxBycfi/mom9TCOB9q5NfzjyfaO6o3gA3OkU+RRNoQrUlCpfrNGKsZcUcuQ9w Kh1IHdqQJu7MO6egPfKeV7N4f82Jz/kgALn6SNnrBaZ70cFaqMbLemKkGBExe2vQEp/J 3nRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:18 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105817" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:15 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Tue, 20 Nov 2018 16:54:36 +0800 Message-Id: <20181120085449.5542-9-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 8 ++++++ include/linux/swap.h | 4 +-- mm/huge_memory.c | 3 +- mm/swap_state.c | 61 +++++++++++++++++++++++++++++++++-------- mm/swapfile.c | 9 ++++-- 5 files changed, 67 insertions(+), 18 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1c0fda003d6a..f4dbd0662438 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,8 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +365,12 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 441da4a832a6..4bd532c9315e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -590,7 +590,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a38d549fb4dc..eeea00070da8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,7 +629,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); gfp_t this_node = 0; diff --git a/mm/swap_state.c b/mm/swap_state.c index 97831166994a..1eedbc0aede2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -361,7 +361,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -387,14 +389,42 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp; + + gfp = alloc_hugepage_direct_gfpmask(vma, addr); + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, + vma, addr, + numa_node_id()); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -402,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(hentry, entry_size); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble @@ -411,18 +441,24 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; } else if (err) /* swp entry is obsolete ? */ break; /* May fail (-ENOMEM) if XArray node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, hentry, gfp_mask & GFP_KERNEL); if (likely(!err)) { /* Initiate read into locked page */ SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } __ClearPageLocked(new_page); @@ -430,7 +466,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -452,7 +488,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -571,8 +607,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -733,8 +770,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index a57967292a8d..c22c11b4a879 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count;