From patchwork Tue Nov 20 08:54:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689983 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4C7A13BB for ; Tue, 20 Nov 2018 08:55:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A2D1C2991D for ; Tue, 20 Nov 2018 08:55:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 931FB29A1B; Tue, 20 Nov 2018 08:55:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDEF92991D for ; Tue, 20 Nov 2018 08:55:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3ABDD6B1F31; Tue, 20 Nov 2018 03:55:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 306F16B1F32; Tue, 20 Nov 2018 03:55:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F5C46B1F33; Tue, 20 Nov 2018 03:55:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id D46F06B1F31 for ; Tue, 20 Nov 2018 03:54:59 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id az10so932315plb.11 for ; Tue, 20 Nov 2018 00:54:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+XqQ+IrzcJdX8WfYpVhooNemD9YpwZhT7nm5N93usAA=; b=fDcz6Yj82pAfiDgh39Fwi6ah37OYk4iPRXr2UWKy5I6lIwLM4jno00wDRVOe8IzZsv IjH++OvGjzjnxOY9NskO77Z4Et4VnH0D6xVj721070aiI4Y/8cBBxqAfIw82yBPZCtTY a8lu9abzEZmOnSmlz12kLUADgEPpMLuTLg2Do+GOLnpt7tFFJuACIqhwULlgxR5aQPve 0cSQz0cOr34nN3VS/hjjtavWfzcopuWhGdZLePpWMePJjvY0dbSBPJkenoTi/UB7RNuE wCcLqyI8hjAx/WZn/NFQu8Mxb3w1raV7d7WPQUPZCsUNfJeRXwCLualdHZl8ty1JMdbf yu3g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbD9XTN0ycO7MJlthWxv9S+RJCSAygfMp9QxztZ1561AxZFNUuy 9Wn9+doYRIdNqFhbaG32ufXtlnCZZvTpaUz/XZP3FTyDz92+NlYHsugPc+iE2QNmYq1jeLa/HXP A/xLhTqb8BkeTOj3OB5CEvuWTlOtENScG2/Ysfqhuw70QVwKR1/g27JCo4eNA+7gcYA== X-Received: by 2002:a63:78cd:: with SMTP id t196mr1153825pgc.62.1542704099490; Tue, 20 Nov 2018 00:54:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/WjrRSv7tf/FTUgQpqwc/q7vC28g6ySSQotqiruaKoIhFFqT7jWsCQRiSaJDNPZ1t47C2lC X-Received: by 2002:a63:78cd:: with SMTP id t196mr1153753pgc.62.1542704098134; Tue, 20 Nov 2018 00:54:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704098; cv=none; d=google.com; s=arc-20160816; b=NyDJ2gs+gcSlCW0CRxYa2N3y4h6GoKJt5Lp1LMOCYduDiNo84P2ANGRU6nRv/rQQL4 PDbo5IDAMJlA4RpKks/y3XGXQYchmDXEhC0ribOcJhY0G6Csk2qrrqg2SkR5fD4YrVKW ZYVSp/I9o3+V/x+WYZNfxD5Y3toHENJhtXRJuVUgLRUCcx3JZJ2yhh+Bwe+iz+7mNm0C rIgejDVXD4MQoG61kTJ8gypU78SJHzok++OUkgmqhqAFwIpjSKQxfB6UD/Zf17/YG1bE FJtPw5OuN4SBPds94a4eFegYa+TmkZISHjVVUZLRZD7FpA8A9TCJ93CnaSMLcguFHhGP byuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+XqQ+IrzcJdX8WfYpVhooNemD9YpwZhT7nm5N93usAA=; b=sIok9Z2y3SPHO9gTmoyVpVLb4SkofK9wTzM7Hkkkhi8XGVDRnSwM6S5tPOHClpts+S glpE3FiCBMhfiqfbeq6px5u03YmGnaI4Rqc9GqS8WEk4Pos2Y7XNCpjNIW5+lql10/Co +5Z0h9+gXhcGCe0pWvsV5A6y4brcoVQs69R9k2k5rN7mRfFnz3zyPH23CA1eH/kFCmn3 8HbH+ldQkd5WAT7CvkCXKkCbo7n/5DKKrlmWtcbp2B/1OOfW6+6FRZ7kEAKtKM9Z91p1 I7hj7L2Za/ykECoorZNficJ1sutzwtSsFfouXo/FTwoanIN8n5qcTdAbznM3C6rELreT WTUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.54.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:54:58 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:54:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105732" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:54:55 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Tue, 20 Nov 2018 16:54:29 +0800 Message-Id: <20181120085449.5542-2-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++---------------- mm/Kconfig | 8 +++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 40616e805292..e830ab345551 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1333,7 +1333,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 359fb935ded6..20aab7bfd487 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4d961668e5fc..905ddc65caa3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -254,17 +254,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -282,6 +272,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -302,16 +314,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index 25c71eb8a7db..d7c5299c5b7d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -422,6 +422,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Tue Nov 20 08:54:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EDDE13AD for ; Tue, 20 Nov 2018 08:55:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 50D5E2991D for ; Tue, 20 Nov 2018 08:55:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4478629CA2; Tue, 20 Nov 2018 08:55:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 00CFF2991D for ; Tue, 20 Nov 2018 08:55:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09D4D6B1F33; Tue, 20 Nov 2018 03:55:03 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F3E016B1F34; Tue, 20 Nov 2018 03:55:02 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4E8F6B1F35; Tue, 20 Nov 2018 03:55:02 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 979AF6B1F33 for ; Tue, 20 Nov 2018 03:55:02 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id 202so849967pgb.6 for ; Tue, 20 Nov 2018 00:55:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=N19UD6cLHtC6p5LOEKZov7bPAn77ZKnzMG00MJGQCdE=; b=JozRlwYxCx3r/KAA4hZtcyj14XW/TTg7YZ0Po3Xzzuk6vkHv8hRlrLKXWDfnYbwB4S tmWL32goD/KyGFEkxr/6Q8engW0GHwseFCVhKFxPB9AtWKqy5XgzJxct9Rj6awnwOmQQ kncJ5M16ZoI/rjSLi5sNQieo7Ceo25D07WzJv9Uj3F6u3LCeicQ0Qcy18gm/Q9Suv9To DBG0nqcOXdCMjb76/OzqoJSPaXLtQ29HZu02+uYK22qHGkDxMufkxrkUGn3idp1bgfYQ S38BefpFaTZFZWYMlf+HCHJ3yAJYX41jRVTj8s+vZzXgDNkdxffzl828Yzxla1D8y18/ Z7gw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaU4kx4Rxn2sx/8zGj0QE/e8AITR/LCFcot+BPtfPsGFWwpUY74 OqBiW2v1J44yJWKNseopqIWZPeArbUPAq3jN65EkMZDuMriuNFRccu0a3fOcalLw5JHci8G0ubV SzLqlg1SHRbhzKV9Aj/5qfwKfVvhftz1olyshGGaf3UE7EDs41px2AnUP1V+1yNDoaQ== X-Received: by 2002:a63:2447:: with SMTP id k68-v6mr1109545pgk.156.1542704102276; Tue, 20 Nov 2018 00:55:02 -0800 (PST) X-Google-Smtp-Source: AFSGD/VO3ZEQJYN1KcWEjXOqiPEmAkxyE7FiSjS+rDvZVmatIjFLHnGkw0J6EMIEb+OFvHJY97EZ X-Received: by 2002:a63:2447:: with SMTP id k68-v6mr1109500pgk.156.1542704101067; Tue, 20 Nov 2018 00:55:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704101; cv=none; d=google.com; s=arc-20160816; b=O5iUdJR+3LYRl4reSVofqrwNEQVVOnyfrqDpIrZV0M29JI/LaUN0n52OhyJGOj02GV P3J+Zh3LdOmkpOEllv5QPITkz0MoWNMI60IvkblwooXRu9WuCA8qDHQOwk0FLUXy9loR BQ31ankYxfx5QNGlHMNx2nmEjzSdpqsGh17euisAI/Pk0Sgu2nSVRTjpUy4HgEKYKoXX qFW1eBSV+kUTlf/YNLEawNDzS7Ebcq66fGa6B7oc/yRVXtGIAA9ztvi3vX0/bURufcT+ xFAd+H6MoF1Pz/ChHpuxOUMRlbkKbfRfe1sIklzo31vvOP0uwp9YiNzddrWY41w0f0k6 vEzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=N19UD6cLHtC6p5LOEKZov7bPAn77ZKnzMG00MJGQCdE=; b=cme3oMNkOIo27sk9YVAy0esrr8jyAdH48UfBNrai5Hk0UYVuDgKbevX0dcuJousjEe RAne0+/EhgjeWs1RGTJcnpFP9ifEGF0qouqy/mdqBI3I6zmNW+760ffRG39E+F/kbs34 yG1K9Vf1ooxmYKBpvyQk+jD/b0SicjZTY8WyY07FMWsZs3ulIFtHXgDCpkrqJT6sEUyJ oFGqx6nqoX9FIOYsbjI33ChK0ZsGF+Hkd3fOlUD+CXDtLBqmXwyVjGtTjhoc3ANf0RFm eEWrtInWUANP9BWMjgKnpmSmNdq8cZ+bwz9UgsuBdVycjQosIs27xPXprUsVgoPZMF/z N6aw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:01 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105741" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:54:57 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 02/21] swap: Add __swap_duplicate_locked() Date: Tue, 20 Nov 2018 16:54:30 +0800 Message-Id: <20181120085449.5542-3-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 ++++++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index ec210be02c3b..f3c175d830b1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3441,32 +3441,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3476,12 +3456,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3508,11 +3487,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Tue Nov 20 08:54:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4E6E1923 for ; Tue, 20 Nov 2018 08:55:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A2D3F2991D for ; Tue, 20 Nov 2018 08:55:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 964EE29A1B; Tue, 20 Nov 2018 08:55:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25BCD299ED for ; Tue, 20 Nov 2018 08:55:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 469376B1F34; Tue, 20 Nov 2018 03:55:06 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3F19F6B1F35; Tue, 20 Nov 2018 03:55:06 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E0626B1F36; Tue, 20 Nov 2018 03:55:06 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id D66F96B1F34 for ; Tue, 20 Nov 2018 03:55:05 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id 129-v6so1069465pfx.11 for ; Tue, 20 Nov 2018 00:55:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=pPnWbGWYOmtJw91m7jToFobbUPrH9ukPZpBDOHOpT7M=; b=V7cNd7eQgSKxiQ5bcpoglaqBmy68j06a4Ocm8YTnorTuve8Dm9gq7IWSgXyg8i8aOE XEvQCYB/jJynSL1qvwJ/ttZFCppCldZ9iaa4L1VQVvGIk8D2jxDhh+gOYvaY32VrDmUE OWhurj4ddJxCW8WXLf0pyIWt+ijzvUa8N9Vq/po+hsW8Yu/zT/+iG4+YyCnhcX+A7n6w yITOxh32h4ZdlvvfT6KesEjFAkefyt+LeCeX0EA6y1C8eeKVhh1VHpEoaKvDaOQD3m+J jxBcpMqymvB+nszMLjaCuwxL+Mpu7Lzm8uYUiEbcsgtKAlZIHbQzlN/cF6+o0UUdYm3b XXZg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbMXhpLaMWUMlzbuifOdyMd5DeXV8EP0pCY1vmCm560pqRK1e7v cS6zX9mYKojik8MPoiQn6XyR+32aN1Bwh40Qul4o71mz02h/BjKCm0rU4P8E08yK5SAAkuyQnKe Px5G21OvhtmQBbIWG7dEKHQ+XuuFLzLz36gObPVis+v4cG7XXIQkITnNUovNn4UPR/Q== X-Received: by 2002:a17:902:b091:: with SMTP id p17-v6mr680691plr.222.1542704105501; Tue, 20 Nov 2018 00:55:05 -0800 (PST) X-Google-Smtp-Source: AFSGD/UwNqq9jUAIidMZIS7p+O6RaSqQ17uyMPzKYHUKuczN5Ev5/L10CTI9ngTDasxv0pnBnvPR X-Received: by 2002:a17:902:b091:: with SMTP id p17-v6mr680589plr.222.1542704104000; Tue, 20 Nov 2018 00:55:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704103; cv=none; d=google.com; s=arc-20160816; b=nzUfiyGwqoiMSqwqPKd9unTL/UdmWt6SMmEaEH4eC3iGYqBVEoCAMe79GZ0KlzbeP8 fAiQWgLVVdhO4zl0HbwsZdDKq8yCAKeUR2pDmvU8uJYBdgLZbMaTTjsvGzDobp2trXPG 5xkDWM9+wiwoW0ZNxK0VS0wmhFKw3X7XM+g6qK1lFvSPNTtwf7SDTbt8IZAc3kJGjexc Br24jKh6OMvZadW+wOBhFJRghllXBRXxWOorfBNcUXgyc2dzXvm86mLeu6KqnUIFUsTH 3cnYt3KFTdG2nLHjKQdfYKrT2EI16glwf0FTMO1Z5+0wcfERWUtevew2x+bvXa2y/Mfh 3xaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=pPnWbGWYOmtJw91m7jToFobbUPrH9ukPZpBDOHOpT7M=; b=Ew6XakrC4qW1njQnCBcsAHSgtiW5aorx5BNhnPlWYOVY20UywIplBosKdwEejLkyu/ jDdqqtZpt24LNfKAxWeGj2wAIqTpjPKpk6UW6IKAPtoIduzY36h/IkjrUbeXMot/4lFU uDq4ujsaPfX3LIGW0Qy4cjHR+JfBF5fqEiZU5NcAEETBuPsDNapDfklvo974nT9uZbe9 VaeiNrhdABOOMYskeXFeEn85k4E76fanERJSBTJUjCUnh5QufHsjCH6ThE6zYQ1a7Zlb Hbys0+97TuyW4Kx3BKtUZWu1bremS8RjsLvG4a6HSdhKNzhkuvm9CowURWz0HJYa8Qa+ mo9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:03 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105763" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:00 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 03/21] swap: Support PMD swap mapping in swap_duplicate() Date: Tue, 20 Nov 2018 16:54:31 +0800 Message-Id: <20181120085449.5542-4-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 ++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 109 ++++++++++++++++++++++++++++++++++++------- 5 files changed, 99 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 928550bd28f3..70a6ede1e7e0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -510,7 +510,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -521,7 +522,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index 1f7e6eef6ae4..ecc79e923f53 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -709,7 +709,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 1e79fac3186b..3bb4be720bc0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index 5a1cc9387151..97831166994a 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,7 +402,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble diff --git a/mm/swapfile.c b/mm/swapfile.c index f3c175d830b1..37e20ce4983c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3492,35 +3526,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3529,36 +3594,44 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); + /* If kernel works correctly, other errno is impossible */ + VM_BUG_ON(err && err != -ENOMEM && err != -ENOTDIR); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Tue Nov 20 08:54:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CBF6813AD for ; Tue, 20 Nov 2018 08:55:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD03B2991D for ; Tue, 20 Nov 2018 08:55:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B05C929A1B; Tue, 20 Nov 2018 08:55:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2AD4B2991D for ; Tue, 20 Nov 2018 08:55:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF98D6B1F35; Tue, 20 Nov 2018 03:55:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A597E6B1F36; Tue, 20 Nov 2018 03:55:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D5B26B1F37; Tue, 20 Nov 2018 03:55:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 4A9866B1F35 for ; Tue, 20 Nov 2018 03:55:08 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id l131so861174pga.2 for ; Tue, 20 Nov 2018 00:55:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rd+Rdvi1ikh/aGCw3CXR20dJILc6wIwORNlf/CuVXeI=; b=nhM0L+S6UVm0XaHhngl3F0qhk68O4K0ARMwWFyUAbPDUH8hudQ7pxinYnmIhTcB4kw g9mcURW7QrSCwxxrJrujPpVmWISZzPHGS3vsDWnEJPwrNLGTiB8KIsZojRd2jAPlAlEi 9UKKtd5Mx6xaPn1CoA+pgYvVuBBTnlA5rtmDmmL1XbCsZJ8wKD6UXBMemduTb+LhjyJG 0C+S1CYlvOQVij8HFSIVW+tEdQ0RT33i8sL6j6k0xJcAlwkm1uWvqPjvwxlAS6p4I9GO GRJKfB9Wn/DPxIXKw1sq3mErLE/KI8lVy1J025Wc9ywaJKcyrcvmwDAM7jEIrpzAHAhD r3Dg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWb5rtTRWnzjiLZL+DGO4ox1g2lL47pgHwKNSRxZt1nLAVmWqXw5 74SchQcqMcINaqZmtrZjiLFXXmeVW8YTtC48BKXymUF8GtjbRn/5GBRCzk9XC19T7MOjsStrsE/ H8vbe1yM5cEl0F3V5SAy1TvaLBKAsB1/nblT3M0+1D4fm+8ifNPbHk+ySeYlWOGIuZg== X-Received: by 2002:a17:902:f20b:: with SMTP id gn11mr1259049plb.93.1542704107959; Tue, 20 Nov 2018 00:55:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xqc8mnaXjdEXHwYeLZNcZnLkdhnJoLOTm6RJFd0+FxXtF9d+vSqkKAg4MkurMJQOAm2UKz X-Received: by 2002:a17:902:f20b:: with SMTP id gn11mr1258980plb.93.1542704106781; Tue, 20 Nov 2018 00:55:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704106; cv=none; d=google.com; s=arc-20160816; b=jmyaQhZazKIjpEmUYjEc+abcuXNxurceRMn6ZPP8egB6OtQzFLDbw1qHqSwtgcajEc 5NGk8sw6YG6AVtlYp40wgabZ/5Lshetu0QjNK5sEPdnooT/TJ5nJZKmmvnhA+g7oei00 ZBAA+cf2cGIBqoE+oVz/8J0UKySN+qmP6mwat4evBhbeQ3Mm0I1PBV2gWE/7Pit5a6uK pmv4kfTrc26qJ+v1r3beW/r66fA7YysD+OqgMcu/DsicQF+sdbCH/OZAXyh5ipAe+YVW SQ/E+D3GIrRVKp6REAVzGFSeejFsqGk6IvxTWDVQryy45Pxn8PYCFXC5XohrljAjTV8Z GAXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rd+Rdvi1ikh/aGCw3CXR20dJILc6wIwORNlf/CuVXeI=; b=kVjw4xCXLeVKonaUxmYqwqiigII1dauh4KfxH3Ye185BD6B7SkjxBgx56Sy8CZ70HW Dc3gCn79N2lQg7yBkwjaIYwrYcCrC8hmAfYvWbwwLtGb8UJR5fqic44wPtMnTpYKq6Ty iRPLBdiMvpn42Db5T9mJszxFP8tFUrHVh1l9Pqn7gSZuq1sGTSFMKImR+FdHV+8JYAzk dRCRXBwe1Lix0XgRhx92gfPWglnKyH/lKal9PPwizJbE8GEYVYrLWkV/gAFJq361CdvZ ZBBFZpQCraS1mD/OPSPeBhSlTJtP+wohgq1ulz7HoFXgdwraVZsAFOe1+12Rhus3xlvg xX3g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:06 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105770" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:03 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 04/21] swap: Support PMD swap mapping in put_swap_page() Date: Tue, 20 Nov 2018 16:54:32 +0800 Message-Id: <20181120085449.5542-5-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 37e20ce4983c..f30eed59c355 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Tue Nov 20 08:54:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10DE013AD for ; Tue, 20 Nov 2018 08:55:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 000772991D for ; Tue, 20 Nov 2018 08:55:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E853F29A1B; Tue, 20 Nov 2018 08:55:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B23C12991D for ; Tue, 20 Nov 2018 08:55:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 296406B1F36; Tue, 20 Nov 2018 03:55:12 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2201F6B1F37; Tue, 20 Nov 2018 03:55:12 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1342D6B1F38; Tue, 20 Nov 2018 03:55:12 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id BA4A36B1F36 for ; Tue, 20 Nov 2018 03:55:11 -0500 (EST) Received: by mail-pl1-f199.google.com with SMTP id m13so930344pls.15 for ; Tue, 20 Nov 2018 00:55:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=oeQzoSUvTWDkA3N4T5D4J8Gbm826vX1Xxjf5w77J/Wc=; b=H77I/z8MretI8rc3Oiyjn5aDt1qJQgz+QzeYAiNpRPcJ/UtaDKIzHYRl5r2eooNvBA 3LsA1Md8QvpZNkWTU6AROUt3upUWz9qZ7JqTOeseRKzaEMpHPIFS0xP0Q2Nn+xbZTRln rpgZMNbgxuzqXF3K92aCbHp9R5GjtnsbYqFkNBq3x330NlNkJRptacEm+QR2zhwUyaQw HrX7K6UNirbqzxSdb/qNEbzBGPZbdXqL7j0hKZvb8+8Tzga5GIzZ8SijzqI887k2ykU9 /wdDeQcFVhQKiumwSIfL8JEIiaWa09JJX7Kjiv9AxmDo4lyaOeTrprDFbQVzGFTDSTl7 C1ew== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWa881hUIcPUdg8BFtyqepAbPEPFEfdqcCYgwGoVcK4/Qu5IQucM ikEyXW+giQB9bSngbCpVsbRMn4ZHOijRUqS+BPNxLVbCwNEN2lVctLju2/vor/RIOrqDhlkcEAO IRNbWXSKg2BNoq36+b4UU2wKhqWMOo1gpxSCUZg/WLfOpGlCgMxw0E+mO7vcyWFTF0g== X-Received: by 2002:a17:902:b217:: with SMTP id t23mr1261756plr.1.1542704111300; Tue, 20 Nov 2018 00:55:11 -0800 (PST) X-Google-Smtp-Source: AFSGD/WueHamdBfAf7IQX2VqpQjQ5RjO0BQgHNdHhRQvdIZM/vO33XKHrzZDhqJfVOHZxRvZ0e3w X-Received: by 2002:a17:902:b217:: with SMTP id t23mr1261665plr.1.1542704109813; Tue, 20 Nov 2018 00:55:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704109; cv=none; d=google.com; s=arc-20160816; b=gnrdeD+2ZgnsGJLpkQ205APv040hqf0DvVjE9TzVqRuVsIGPJxZQkvrB87LYcP8THi a3AcnGY9rvaMOOTygR7Kt8jf1IJjp7jS9Z0WeE628EpnPdsTIN5PWS3yRaUC/mLt6ebk asCE/e3c+vyFi80sqLcuHtYbrdnHFsNdr5GiqOUQptdNUJ4hlLCYcBxXXknb2sPq8eya MdyvawuALKJo2ThdnGJYCD6meaGk/Wfxrc1/lVIr4cEA5owdyIQR2o0nBbVesYJ+qrxQ tSVTE6pF941lpawuZGWYCV+wMJjnfyAKEmfqJCZ2iHGuOC6zT8UtkEr9cNlE414PkY1L kv2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=oeQzoSUvTWDkA3N4T5D4J8Gbm826vX1Xxjf5w77J/Wc=; b=sVn5+BRGl0MnW3VzCGU8tsOq9N0CbL31wFwItzAiL0Z4diJvW+PWzunPms8YbPYXJH n658ZfKJ30bXIdfF77MU4slRCYWP/v0yAQxAORxrsn/XmDWSGjwEOGDeZcWYjcHHZCIx M75hNEWOCT+skR0WKqHh3dE4FWEjAuqCEF2gXDTW+XBYfmZZRCOs+rYhy4yZEYP7vwrN iA647aTxg/uNnFRfYbB3hAVvJ6Q5nFOPmVDE3ZJt5XC2gKMa1avK9KI0I4PqDwTLtr1N GYoIfaNzLzYeWeG5KbqWw/E6qibjVCZiSW0oif8ayvmXtE1qtS8zQLyIrhXwwODhQVlc BWrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:09 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105786" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:06 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Tue, 20 Nov 2018 16:54:33 +0800 Message-Id: <20181120085449.5542-6-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 ++- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 6 +- mm/swapfile.c | 171 ++++++++++++++++++++++++++++++++--------- 7 files changed, 149 insertions(+), 49 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index 70a6ede1e7e0..24c3014894dd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -453,9 +453,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -509,7 +509,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index 6cb1ca93e290..cbb3d7e38e51 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index ecc79e923f53..5f805c0a6894 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1134,7 +1134,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -2823,7 +2823,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index 32eb29bd72c6..a85103a3e83f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -664,7 +664,7 @@ static int shmem_free_swap(struct address_space *mapping, xa_unlock_irq(&mapping->i_pages); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1182,7 +1182,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_lock_irq(&info->lock); info->swapped--; spin_unlock_irq(&info->lock); - swap_free(swap); + swap_free(swap, 1); } } return error; @@ -1714,7 +1714,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); } else { if (vma && userfaultfd_missing(vma)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index f30eed59c355..3eda4cbd279c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2340,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool frontswap, } mmput(start_mm); + + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. + * find_next_to_unuse() will skip these swap entries, that is + * OK. But we need to wait until they are freed really. + */ + while (!retval && READ_ONCE(si->inuse_pages)) + schedule_timeout_uninterruptible(1); + return retval; } From patchwork Tue Nov 20 08:54:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 311FD13AD for ; Tue, 20 Nov 2018 08:55:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 22EAC2991D for ; Tue, 20 Nov 2018 08:55:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 169A029A1B; Tue, 20 Nov 2018 08:55:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53BD32991D for ; Tue, 20 Nov 2018 08:55:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCFE86B1F37; Tue, 20 Nov 2018 03:55:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C2CDC6B1F38; Tue, 20 Nov 2018 03:55:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B44B76B1F39; Tue, 20 Nov 2018 03:55:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 75EB26B1F37 for ; Tue, 20 Nov 2018 03:55:14 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id b88-v6so1093881pfj.4 for ; Tue, 20 Nov 2018 00:55:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=hOs5djFhfUaRxE2SITXHKWa/fvG1KfSVecMlx5c7Cu8=; b=i3Qq+z4FGTmKTF6MPg7ki8Ncn3hsygMWuFNvqfD/oa1xAJd0SkJ4CL3UX5Op8okSH/ suJrb9P9owrgTRooiTDxFSmqj/05kzClOrpI8C4m7TVBDxJ3ToWiYOr5WAEqJ0adxBA0 QiI+/mkTeLWciVFKxysSrni62KL6SAmsRkG1JelvWvN+hm90hI6uuUHkZwlL2b+Ebf2S /QXtSGTU6PZRMxhx4h5OZXdF92xa/a3M+ni6bCDZxFz4oZcIrISgMj8BOdqBX+RMofrs ZmvtFd+tm3dhbP6eI5uJzqQETWonzDjjVzAxwjNstTI/wA6VFW4b9z5vvbD9/65Oh83q DK9g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZFTMIAlsXz+kLEdVI47UvFALEyhKDLwCg9lhV2NeY18nrIWckg LQyMgBoIBQJDR6oigXwpzAZWdGpx/T6NyGNgVSKuVdsalE9wxY7LBgtpyCSbM5Y9yKqvJkvw2ki ODQ1IBoIISflwsBWpIWSMQO6hv6Plfj0hs1aFsAtIDCsy2NBNTskkLIlA6K7eT5sL+Q== X-Received: by 2002:a17:902:bd8e:: with SMTP id q14mr1292878pls.146.1542704114119; Tue, 20 Nov 2018 00:55:14 -0800 (PST) X-Google-Smtp-Source: AFSGD/WjH+gxrHJwMrq3mcGfEZhjwW3qoWGTtMLyzsvQT9VkeMKeXUmv6bJj81BzmW2R7ojRxFxi X-Received: by 2002:a17:902:bd8e:: with SMTP id q14mr1292789pls.146.1542704112661; Tue, 20 Nov 2018 00:55:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704112; cv=none; d=google.com; s=arc-20160816; b=Rq4/LfB/efM6voNvpL50NBdYC7boo7tEWMrPT00oQ9/cUyiJCJUP6OHuOVRioHHnUJ TyNp5h3qzgN90ZAF2ag6zwJhhlTavvDF8tIyGLPtaCPSPx0FWn9yOXb321KuZfsKLRrL dhxk18m2wfAoOq4DChm0MhMVXQzmyb788QblwEGYUEf9MIeYHpMY2nsmZahaGMIZgku0 uXnryozpan823JEkoaIaxc+uDaBhi9XgXSP8MIk/tTa9BPSdJpHaGoHewqLILcsn710Z pMP+bQKVgPyJuKQ8WLSHhlksHWq0ac4WuZfuNpFcOncXFz7y5zPUZC874f9qtmq3fkZy 3MUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=hOs5djFhfUaRxE2SITXHKWa/fvG1KfSVecMlx5c7Cu8=; b=KM7gh5dd3ZPXIGSWl48VVnvqTgDRQPN1J9/zOsHcdAfNU8i2YDUTajTtjuqSPH9Lhd CSmNLIsuGkaykvV7Q3Uc/aaT2SMSkCKkeRE8arC0hhMWhMtK5pX6N8F284E+/uUXnzsk zlXdlV0JsSDd5uvZKHQT9uP1PKndSywx6ZaSPAb5HlNg2zHDqNEhoQfECmfrHQVlw9so h3MnjEbTPIKS75sEqxU9h1sgRjpGAYRx00czEq275oyfv6qjM6tFUVFs4ig2ZqEMP8vc duUjoocGjehXFX21/8D3GJNXH8GfheQVCNdve2EhBq0k1bIY0aOlb4cb3wD11bWoDVKw zenQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:12 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105794" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:09 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Tue, 20 Nov 2018 16:54:34 +0800 Message-Id: <20181120085449.5542-7-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Thanks Daniel Jordan for testing and reporting a data corruption bug caused by misaligned address processing issue in __split_huge_swap_pmd(). Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 +++++ mm/huge_memory.c | 49 ++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 +++++++++++++++++++++++++++ 4 files changed, 86 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4663ee96cf59..1c0fda003d6a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index 24c3014894dd..a24d101b131d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -619,11 +619,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c3072e9b21fb..f8480465bd5f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1632,6 +1632,41 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + addr &= HPAGE_PMD_MASK; + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, addr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, addr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2096,7 +2131,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2120,7 +2155,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2164,6 +2199,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2255,14 +2293,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 3eda4cbd279c..e83e3c93f3b3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4041,6 +4041,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Tue Nov 20 08:54:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689995 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C02C813BB for ; Tue, 20 Nov 2018 08:55:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF5522991D for ; Tue, 20 Nov 2018 08:55:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A3B7C29A1B; Tue, 20 Nov 2018 08:55:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E30D22991D for ; Tue, 20 Nov 2018 08:55:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E04AB6B1F38; Tue, 20 Nov 2018 03:55:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DB52F6B1F39; Tue, 20 Nov 2018 03:55:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCFB46B1F3A; Tue, 20 Nov 2018 03:55:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 8D49D6B1F38 for ; Tue, 20 Nov 2018 03:55:16 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id i22-v6so1106161pfj.1 for ; Tue, 20 Nov 2018 00:55:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=1BD444hfUzmwBDRoXCZrzN5sxjXmXmtnBq/tQtaCj7M=; b=cs/gkmElo1e5QFSNC3F30lcP8N8F19YnBeUrUjsOa9D7TvPdhJU2Afpihy+ZKOTwjU zf7I6Y8oxVIaZBwumvrAXKPFrB9Sqo8UEV2IQm5pO+1pLUTYiz8dP1Rrp5Y2YebjktC4 QWrtbd1O7MJONc/8ZrPwgCohvnJWGQURD7109hcTLAfP3ZIzn/DSSPl2pia1yJ13nGVm lIoT5SEGJgriBuprwEjuXwlS8prbLCbJedESP0Ur9rjcDId6qx6ILoy0ggY2h33WOshV BQsDgtLQYYAaQSkt7BDFSXK0I2OMnihCwiA0NePJxAHWByIwY4vF9OeneMS3flJU4KUs 9UyQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaK4ed/rLlxbF5ZNIX1EJCxNpwvq9HHyGWNYWBKE2BIF+pGaoQ9 jv40LB85r2p0pKGSaXr5+4L+4I5A+2/SEMKIvb4vN7JNRl/VwklaSbKO1Lrje31/YNUTqDSA1Yy ZHMOT+ezKAs63k65A7iKCr70raElInswH23VQTRQjGj+OOejcyi2OqoybjxcCy+J9dQ== X-Received: by 2002:a17:902:2c03:: with SMTP id m3mr1298234plb.6.1542704116239; Tue, 20 Nov 2018 00:55:16 -0800 (PST) X-Google-Smtp-Source: AFSGD/XTlHW/LiPrqVD+9fMQyJJ88/ln86ygFnnDnstNaaadxgYTpzgVAFElCugo6s7y2P8D/Eh/ X-Received: by 2002:a17:902:2c03:: with SMTP id m3mr1298191plb.6.1542704115449; Tue, 20 Nov 2018 00:55:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704115; cv=none; d=google.com; s=arc-20160816; b=WH8JYU/XKfolKbQbcXZzH0xiHn1dEIJckFJiY9PngI0WgLpJc8fHqJqeAX5pEUTGFQ SGmB1h/WheT04dS/bbio6KQeH8EsL9XY9Je+w+jHjcnn5KH6aMgwsxnoU5nMqSlgD8tb 70NWJLev5qwzqcWrF8NmQWj0V5oLjQH6hkMHwNIAZdTsN80UQcc4WCZtiOhacOuvl+jk 1Ee2i3ja9xEFzqWRzbi5nc0RR2AU21cbkgqx9xP6a+ao0aSTWIRITeL30VGHkjIEKVnb qs9XHIJxjXjLzXSgAwcvoKBPuQsvoXvO9eCNXYnSyb0OG7Z2xqzBY5QJjokENxSaADC1 dCFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=1BD444hfUzmwBDRoXCZrzN5sxjXmXmtnBq/tQtaCj7M=; b=M5dUrjG75NGwXT/i85KzmZdVxE5g/qIMBBLPLSV+bdoYKoHyTK5Zn8VCm/nKjSws+k oAYvsbO0V0/bETiFcTZT0zwynTuYqqdCrQItSzLGcumyfpRNK9T6I0TlX/87dBZHynga GCgRWgxnNx/fD3uxXclyXMq/Z5qTFxpNiH4nqSRo0x6iyMpNRFcnBY8PA71E+WA13rj6 Tt1sbb6xonkxLxjqTrt81EA/dUdg+UlFEO8o8Bx52uhxI1IHqxs80eleckeXMasBy4rO 4E6HOT6v0SolfVTxrAvEFNmQr9C1x4M3CCGVtf2W4gTXyEhvzlqct+2e1p8TyrsjJfFn EcWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:15 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105804" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:12 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Tue, 20 Nov 2018 16:54:35 +0800 Message-Id: <20181120085449.5542-8-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 +++-- mm/huge_memory.c | 18 +++++++++----- mm/swapfile.c | 58 +++++++++++++++++++++++++++++++------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a24d101b131d..441da4a832a6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,11 +617,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f8480465bd5f..a38d549fb4dc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2524,6 +2524,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2747,12 +2758,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index e83e3c93f3b3..a57967292a8d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4071,6 +4054,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Tue Nov 20 08:54:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3AB3313BB for ; Tue, 20 Nov 2018 08:55:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B005299ED for ; Tue, 20 Nov 2018 08:55:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1F1E12A690; Tue, 20 Nov 2018 08:55:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35796299ED for ; Tue, 20 Nov 2018 08:55:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F16AD6B1F39; Tue, 20 Nov 2018 03:55:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EC8AC6B1F3A; Tue, 20 Nov 2018 03:55:19 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE2426B1F3B; Tue, 20 Nov 2018 03:55:19 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 952636B1F39 for ; Tue, 20 Nov 2018 03:55:19 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id d23so915345plj.22 for ; Tue, 20 Nov 2018 00:55:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=0v+iEEGd9s2O6tsPNBfb1jYj1UXNNNFcbpy+UtpBuBU=; b=RB0PsbLKXMzMRiICuA0ozeDm4vETrP07yDA7XTR47hi2b2m3hGrBBL6MkxHUzqOoVN qJjbrcf21g37ZnN+HjV+SK3ZpEjTKJ3qni4e9TlNRHEWnuyT+ewlvSBgeKpXe5MKrNTU sBsOWQWemzUE+kUKW/cely77ZFzJU+G37JvMoijP3vr6XsvWieQWM4yA+udC/7bWWdnR gwRPJDlIFiG12U4WYcUTarLfKHbWdtNPDyiuprapOBQHHchVw9snViu98H46fy2LYW7Q io4o0OLlSh3RjvK2ajiRc69EnX4RNxIm2NfMEnH/dtD+u12orqHWwmqFo5chbYOk9JIA X4Yg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaiCTJsihToGh7utQCNiWzyXQ4qDTRVZQhaBqr3CGGliS0sLICn BuLD2e88M4IvEzV/sA68M1IE2xLtIB1wXRtkvxKRo1xPP5copsTJURXkbExrzkCVWAFt8jmKIMV 6hD9BbvCznknwRrkDVOFwCq596MmhkK10bJGOKkiP0dc+MJdJ2Ax5BxSEZruGVTfyUw== X-Received: by 2002:a63:2315:: with SMTP id j21mr1119815pgj.297.1542704119215; Tue, 20 Nov 2018 00:55:19 -0800 (PST) X-Google-Smtp-Source: AFSGD/WIfmdjtZBU8Bj52kFlKSJBg6fg1KnzIs5Rdjh8Xu68ZQdMJkLHxiWcYxEZqIYo2cfRxRvD X-Received: by 2002:a63:2315:: with SMTP id j21mr1119768pgj.297.1542704118357; Tue, 20 Nov 2018 00:55:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704118; cv=none; d=google.com; s=arc-20160816; b=TsAvLXj8fzpq33u9P/XWHEBM4WUFsV4/GsQRWumsEL1QED+b8frpJ7sCjqlFFSY3LQ RCrABmPXuMlT5/F8UXfAwrMIsqK6ArLZQIdSAPDG6IiHio3SvF9EO/ZZL0u4O+KD1LnG d4IL0X8R1HNB4Cl05Ha0BKH6bRohbiWpZTlH77DBEZi5EEHG0sTT7lFK4cm2vaojXneo n3MAV4ALKK4gRr6UaFBL1CTpuuiZ8m9ZgLdRCotY3gjujdHUtAqiHJvK33bd7emlTqVV cDPcoqJU6TWuBBUfrTJOcIC5J77m8QM1rOnap8Wl+KVorndoR0blSDpmnYI4yqgWvK0N +JJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=0v+iEEGd9s2O6tsPNBfb1jYj1UXNNNFcbpy+UtpBuBU=; b=xVL7fRp50nCltPdctEglI8/4frpZl1TpabqSs/J/5zyfWx6lNhOlClCDULD9KGQgsy mdI2SJEdLSqlUY7CgolmnXukfOwZPhLJIy4rqi25wOTAYnNXU1sVU0y+7qjhcBs0hwxB 81kEnzu+lH7g0nqN4NG7OKCyQglRKb4lFe5YuSEzEdTj4LJqUyqnOnkqu3Do17ngU1KG H/P4gXcOxBycfi/mom9TCOB9q5NfzjyfaO6o3gA3OkU+RRNoQrUlCpfrNGKsZcUcuQ9w Kh1IHdqQJu7MO6egPfKeV7N4f82Jz/kgALn6SNnrBaZ70cFaqMbLemKkGBExe2vQEp/J 3nRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:18 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105817" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:15 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Tue, 20 Nov 2018 16:54:36 +0800 Message-Id: <20181120085449.5542-9-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 8 ++++++ include/linux/swap.h | 4 +-- mm/huge_memory.c | 3 +- mm/swap_state.c | 61 +++++++++++++++++++++++++++++++++-------- mm/swapfile.c | 9 ++++-- 5 files changed, 67 insertions(+), 18 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1c0fda003d6a..f4dbd0662438 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,8 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +365,12 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 441da4a832a6..4bd532c9315e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -590,7 +590,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a38d549fb4dc..eeea00070da8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,7 +629,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); gfp_t this_node = 0; diff --git a/mm/swap_state.c b/mm/swap_state.c index 97831166994a..1eedbc0aede2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -361,7 +361,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -387,14 +389,42 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp; + + gfp = alloc_hugepage_direct_gfpmask(vma, addr); + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, + vma, addr, + numa_node_id()); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -402,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(hentry, entry_size); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble @@ -411,18 +441,24 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; } else if (err) /* swp entry is obsolete ? */ break; /* May fail (-ENOMEM) if XArray node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, hentry, gfp_mask & GFP_KERNEL); if (likely(!err)) { /* Initiate read into locked page */ SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } __ClearPageLocked(new_page); @@ -430,7 +466,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -452,7 +488,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -571,8 +607,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -733,8 +770,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index a57967292a8d..c22c11b4a879 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Tue Nov 20 08:54:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10689999 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25A4713AD for ; Tue, 20 Nov 2018 08:55:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 163FD2991D for ; Tue, 20 Nov 2018 08:55:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0A5E029A1B; Tue, 20 Nov 2018 08:55:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36FF52991D for ; Tue, 20 Nov 2018 08:55:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15B196B1F3A; Tue, 20 Nov 2018 03:55:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 10B956B1F3B; Tue, 20 Nov 2018 03:55:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 021CB6B1F3C; Tue, 20 Nov 2018 03:55:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id B30AA6B1F3A for ; Tue, 20 Nov 2018 03:55:22 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id k14-v6so919239pls.21 for ; Tue, 20 Nov 2018 00:55:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=/GXVF3IN0m5yE0dAQcxhCafZE4d+kcIat9K+OPaXxjg=; b=Wjmh+rJjd9OWYRvj3d6kGNtCA7sucGHgAoZKEaZkTkGj59ALpOTOrtwdo+J10Mmc5e Pree9Y8dq0apYk7JIkltDHJ7K/uFuUr5qhAb8Xkcgc2cWBzIBA+qhcRDLrCQv2Hmd6jc APutjR3FXE3/bbpRiGRMhgHJFcJ8Iveyd/U50bWYc35g0SKvaXJ3Buzl+Uh4hjqGu7Oz 4ospqjQ57i9I8Pbc58urOmbWQJwZ0mGfbsJUCLtsqLvfsVrs58tQUXiYPsIXWa4PGn4k W5HKCU/TNQU6BNCMJvpZNPfWk223RcjdjDltttkC8ssANqpgazQiSmSPzeqTHyo/FdQz snBw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaYLXyerng+KJr/plkpjPmEZJxk8YsENFXP3+Q75cLH1RFDB371 Zcs4alTLjGRDlho21aIsd6AfSRgPOrYLHCdLYoR2Q1E0/FDLuF3hHq1p4wmQkdfBN2z4zPivkma YvrwTaEpMDVowtgEtoOKS2ThGE85U4x+0RCSHWYF6Teoc2E2Iw8hoWxhVfDr3sl8RbA== X-Received: by 2002:a63:bd1a:: with SMTP id a26mr1109042pgf.121.1542704122366; Tue, 20 Nov 2018 00:55:22 -0800 (PST) X-Google-Smtp-Source: AFSGD/VHLcVlwwKRwXxl4iSCRR9kN0BNBNjwuD+vrCqa7HG3utDzAyDfP2v6rUlcI1WXfRlDdTGy X-Received: by 2002:a63:bd1a:: with SMTP id a26mr1108988pgf.121.1542704121140; Tue, 20 Nov 2018 00:55:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704121; cv=none; d=google.com; s=arc-20160816; b=qi4XIkz3Ay5iq8kWBHOV+DU+zy/moAg4z+f6bRZJ4B+fRJ/uTKkug7HkUyWJytjs4J olZG6KBdjNzXWSkCClgE/BGZhk4nMUhA+yAeOcTvmL/YzN4jcQA50LIgqQnOayiqxy4Y xcXS1v79ev1K85BCBWTI/UW9olHpXDjJSPbOwzAXghbnuaZaPzaq90uFBIPbc9m4sg/b ApxJ3yXWGCBNRkTumEUUsVqIzvmrRod8leVC+GK5/ZUgMZWMixee5Lr5gnH7f4YnaiQI DOScWtelfBvOPTvGTPOEHALINPE8DeIP6MUjHg6qEO3MToDPj40JUFB9AQ4rWbde7LLi wAuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=/GXVF3IN0m5yE0dAQcxhCafZE4d+kcIat9K+OPaXxjg=; b=vczCHDzfhBJI7CmT2BsE27brdc/ulp3mIi42DkMyu9bAGTs3gzCzXydVc5JDF/sz67 VrLiegV+dN+O1Tk3/7E/jAhezVUkiRksfwe+pLtYqfp12nQ63KB7HV18zjyWjsne6XNa siQ6/m2L9N7XKjtZuknmmv05Ms2xYm6LjK01yaeAPanyenDTN39SE/Q3AzGM2fbqD5ll 51INPyPpNn5dLNraG+U47nm6kZA2tRbXXznqss5q+h3x01uv11SB5rRpei6qE0PuTU6Y ngcXvrKzNL95/ioTCs+xtRXsCM1JWF0q1LTtzwFPaHGeYcG1SRw9XumqkBK5i1LbAEI0 ooCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:21 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105826" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:18 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 09/21] swap: Swapin a THP in one piece Date: Tue, 20 Nov 2018 16:54:37 +0800 Message-Id: <20181120085449.5542-10-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 ++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f4dbd0662438..909321c772b5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -373,4 +373,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eeea00070da8..561f3fb3d888 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1668,6 +1670,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index 5f805c0a6894..bbc3a08d10bb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3864,13 +3864,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Tue Nov 20 08:54:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690001 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CC4BD13BB for ; Tue, 20 Nov 2018 08:55:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BDF9D2991D for ; Tue, 20 Nov 2018 08:55:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B25AE29A1B; Tue, 20 Nov 2018 08:55:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19BAF2991D for ; Tue, 20 Nov 2018 08:55:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C09E66B1F3B; Tue, 20 Nov 2018 03:55:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B97416B1F3D; Tue, 20 Nov 2018 03:55:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA8EF6B1F3E; Tue, 20 Nov 2018 03:55:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 6B8406B1F3B for ; Tue, 20 Nov 2018 03:55:25 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id 143so861224pgc.3 for ; Tue, 20 Nov 2018 00:55:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=J/davdIr9aFJVTfezpF/Y/QLdXdVJX1ZEu2mHZJ8AaQ=; b=eJA0zX9cSzcUyo5YPP8nPG5VFudyRhxd6ap3JY59uiNwnsrsZrMPDFmO48tJo24OWC axAKDBP3omEQLRX8HbdWKApqHrEhEmm1KYK18muO//eARJRt9SDCCDo00iMppbBtPIUs Dt250YGq1SkxeGV8a0KmaxDE7lyDPx5y91Nd4s014hm2jaURrozAsqi6/Ry5iTF9c8Zp Q0gbdyWzPMoEtXwqMs1bVZiFXujEGWhdyYhGZHrNVC8InFY8teJhBmhlBENr13pFTkJr PsIYiIMxZjEY8HbxkB35LYerAd+2zcNSDVQJXHlpaUxpm0spsIxoEnfm0DvDvE2ZCxpz p9xw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYEii7ZydJqs4i0nF4eM2C9eT9aokXIklmBgZL7szdBb9gVaO8G gb0gfaDvlTkQKV8Mqy9Q5zsUtLEMJKrA6LfTmuu/L19+8q+EEwvMDufQcUzSyh12csTZ632rruP 54K4w/fVFctvZUdv/0pX4fjgXh+WyQ3CGGIBpxsSS8CNK7f4PHZ68t/tf53lfylfeyQ== X-Received: by 2002:a17:902:714c:: with SMTP id u12mr1324144plm.234.1542704125104; Tue, 20 Nov 2018 00:55:25 -0800 (PST) X-Google-Smtp-Source: AFSGD/Wamcm/JOR2E1pd/+C0xyLIW0JhICA0fpqq/u2XabTZRwFSozJvrOoj4rJOHG6gRaNyS3nT X-Received: by 2002:a17:902:714c:: with SMTP id u12mr1324075plm.234.1542704123970; Tue, 20 Nov 2018 00:55:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704123; cv=none; d=google.com; s=arc-20160816; b=KX683F9WWAzsRrg0nRrJCycsgIzuIXiZb9gq792ZDQ2pHfmWAHTuNM3+2rHowIwwbC O5HKC9XaW3f6YAV7bXiRl3D26Nv4UKGUO1VCtyYgnq/7325GS72++7OtIZ53JS4qmTjg Do9J85vXTNhMeiPnXGOzHNY6txjx5tf82fpzcMEvwTGelkGDomVwgNZVg87SCixIKyj2 CVHYyjOv0Ti6CSJUR0JApNl6/OrTAtd62V73bXDtLnH2W6XYcC1/MBHBaFmCdurb1vhE GsPO33kFojd521FUJMuHwq7Kkf6Xw8PYhfGCZO0czKmTyPI6oqrP0IcxMIwYLAabN3aW GJiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=J/davdIr9aFJVTfezpF/Y/QLdXdVJX1ZEu2mHZJ8AaQ=; b=NVM7qPQVJdvXx8T70HRS1cuGTlfvZS0rC9dkJGmSazUMk/wqTIDD/2asdq8CyeJAMK YVPrkG5om3vXYSjUrshJI5B0OJhslGA565ieZEzLkVBBMfLpQgDAJmIp/IrjC58h76Qb ZqoWMi9xBLLh+EixxbALtf8FWXB7Zsm7zWx7UJ0GbJqazitEQ7SGJQeP46ybUr3wiAuq E0hFD2yfr1K6JMAnLEAB18QtBGkThX26xwT4QxnAdQ+AqIaqn6IyAabeFBZBm4NVyDVH UWAGPYecW7bjfvYe4KtOPmw/mVIlZF3eCw311/zFDO+f9sPo7VHJYn6bfr/A4fajLIEv x2LQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:23 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105838" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:20 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 10/21] swap: Support to count THP swapin and its fallback Date: Tue, 20 Nov 2018 16:54:38 +0800 Message-Id: <20181120085449.5542-11-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441cf4c4..c20b655cfdcc 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 561f3fb3d888..d3ee25ffeaaf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1724,8 +1724,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index d4d1c89bcddd..8fd1f3ef83c4 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -401,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous) get_task_struct(current); bio->bi_private = current; bio_set_op_attrs(bio, REQ_OP_READ, 0); - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index 83b30edc2f7f..80a731e9a5e5 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1265,6 +1265,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Tue Nov 20 08:54:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690003 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECF7613BB for ; Tue, 20 Nov 2018 08:55:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD9CE2991D for ; Tue, 20 Nov 2018 08:55:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D20E329A1B; Tue, 20 Nov 2018 08:55:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EE762991D for ; Tue, 20 Nov 2018 08:55:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4199B6B1F3D; Tue, 20 Nov 2018 03:55:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3CB376B1F3E; Tue, 20 Nov 2018 03:55:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 290D26B1F3F; Tue, 20 Nov 2018 03:55:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id DD05B6B1F3D for ; Tue, 20 Nov 2018 03:55:27 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id 34-v6so944062plf.6 for ; Tue, 20 Nov 2018 00:55:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=4CjvoNiywvQu5Bic7qf+fxzeDSh/2HO2+A8cIJawJrc=; b=W1C7VpR4Woafe6wWQ08bWNzQB175VdYeJoVWx/5KvybWCma5SeCDobzMfG86Ksvsad twbLtorwyzvHQzbgCGidjwhRJZTVq1+ElvrBKXxgtUoTQh82aPYdt3ZVNXHVffyRwMcK bfGQyiAEnL0sDQHJgti3kq8JVXYUUzuJ6uqca5hTby0TS858I4C7Zj3fJlNMue3TRH8E Q1Akd12nq/IZ+aCB73Vg/QRhW/N8kW11gnQADvY4Kq9+G+LCRHcI+hTePgGMwYZ0FJ7P vvEKhymfE0ejaJGHrVtSIGpEWbQYYK05kOwSvmC+AoJr30DTQAATsFZsFCCwQsBDjG9j egNA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AGRZ1gLAoKxvp1rCaO7hUoYsXd8tpCmzzYnwM2CG8SRz8fHLG8EYVsGd DUbuX2q1BHrDP02QX2d89Y8lFYTUHUMrSETjt3SwZ9cwUzPz9fC1Vd3Ng4iSiPUqt2cq3mURBKb wTErKymcZmXbEVv9RaDPUl4ueJskUbxEAzF8YkhFiVQPKV+MdIR4y0ER9ElyPmsAAWA== X-Received: by 2002:a62:d148:: with SMTP id t8mr1354698pfl.52.1542704127560; Tue, 20 Nov 2018 00:55:27 -0800 (PST) X-Google-Smtp-Source: AJdET5eFT8GkxJSis/h65Ymt4ox4XnvmyKfXmO3cqgQ+jedWHYN0gKdLaBfFlost+i4KH4T9RisW X-Received: by 2002:a62:d148:: with SMTP id t8mr1354665pfl.52.1542704126830; Tue, 20 Nov 2018 00:55:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704126; cv=none; d=google.com; s=arc-20160816; b=KKD56++tjbvw9yXRzv1OlFiDtHB+JszVp9FujCeaKADNVBF0DaK2vtyQXSu699uBcP aTPwL/0RY3x6Vk+5XefX/9QuBnm1FZ/kjKg76Eo+nWQnWPLCrWGwlnczsS3/n/9MoyWj d6qbFuEM3lLUr4aefuxSJLygfpJxF0YfAnfOy/DgKOJ5tHB0eYBAK92nS+VL0zwiWD+O ofFTAWjTdd5xOUdkZriT6kkZIRnip8ED0K5vaOKZ1m/UynbFkqnyH0ZQYFIMOiwNFyCk e0DQF/Ge1iX1iziIXulB521GDxnhdf0NH80DWUO8rEchGH4UvYNl3DJTFwhYanvlLwVY wYow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=4CjvoNiywvQu5Bic7qf+fxzeDSh/2HO2+A8cIJawJrc=; b=j7aYtFOSx0dnjEBPVXavSNOSlpnsA/Hu/hDNFLv8Y4hWhKQF97sNmZHL4GsNlE/RsK kM8LfIsir7vOW/EczYZinEkIq5R4VZcEgW6FicXygaJAknwUlz1RPxq5CDPbPYl6v3GW BVrTT3SyaWrwyDq+LNwQTj3TWGavFUvpYKDIS5qdV13hpvplX0jYtEHBKoTvOs3YbDr/ V0t2n24EzYvfPmDUdxBlUxfVujwJXP0abPMSmMHJ1XLnF18f3V2dei1kHcOHixhE5nnA hMmYBjD7jSt/kW0iZLvZ/H+QVy3wthEWs2TgPJ42Emc2iqYE5blSHaWcfWtY3mF1+S5P rm6A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:26 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105850" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:23 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 11/21] swap: Add sysfs interface to configure THP swapin Date: Tue, 20 Nov 2018 16:54:39 +0800 Message-Id: <20181120085449.5542-12-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++ include/linux/huge_mm.h | 31 +++++++ mm/huge_memory.c | 94 +++++++++++++++++----- 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 909321c772b5..ea4999a4b6cd 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -375,11 +377,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d3ee25ffeaaf..abaecf96ceeb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1711,24 +1765,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1841,6 +1879,24 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) if (page) put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #endif From patchwork Tue Nov 20 08:54:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690005 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 186DD13AD for ; Tue, 20 Nov 2018 08:55:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 075A12991D for ; Tue, 20 Nov 2018 08:55:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EFAA129A1B; Tue, 20 Nov 2018 08:55:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CD682991D for ; Tue, 20 Nov 2018 08:55:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DAE36B1F40; Tue, 20 Nov 2018 03:55:31 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 661166B1F41; Tue, 20 Nov 2018 03:55:31 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 553306B1F42; Tue, 20 Nov 2018 03:55:31 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 15FF46B1F40 for ; Tue, 20 Nov 2018 03:55:31 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id p9so1099964pfj.3 for ; Tue, 20 Nov 2018 00:55:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ivCWdx+wUDaoKPAOJOJ94T5ufRT2/KbB8a3rJfLB72Y=; b=DSNGhlV8yLrxMpdUFTjNUkVFcjn5v+neOzjJLfrLDR8DJiFm1eprurQyb5A8E0VK+6 f2KexWsg2AN3IGK0Hilxg3plmILcj2KAdysYrIeVyxh0SBvsFWW7mlKMijJKaY6aMuVR Ivf6qeu6Jto4Pw+1a/RO7vWH06hEkjUANXZrOkzFIRJ8A79v/Y2O/LTwSfUC4fKzUK65 u4X56EdNuFCUW/dDgDiub9Pt+IZ5KrPNxgpVoFpnX73EXBjG8Ngvq8azDxSoIaD2M+dE fIBxtNkJxokV/zMOTc53K2h9J3D1qQH768hblYaeK0SoMSukh3WKgoSv5vInAQvw41Gj F+PQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYQYPYiu0AS4QRDEzPPWipEOYUigX8n3iNVyUo2sY846e9dKcSW 9dNVUA5sDbv5pLSh1mEHldj8KzzoEzLFgm3s4pm5IXNPJmbJMojc8ywQAJC2cy2rI7uVsWB27YW wER+SGPmOocJ+5Ci0Zf+jrsSrppiyQQPlNzAdRtffqyQlG9tCjRKLRrRvGw0/dEYlfg== X-Received: by 2002:a65:4683:: with SMTP id h3mr1052259pgr.225.1542704130700; Tue, 20 Nov 2018 00:55:30 -0800 (PST) X-Google-Smtp-Source: AFSGD/UMducZ1QJ6KhBwdOhh2s9J5HtkjsGYAapm7w2yACBSadKzyximZ1Dtf4RCxIRFkUVx3R2K X-Received: by 2002:a65:4683:: with SMTP id h3mr1052236pgr.225.1542704129920; Tue, 20 Nov 2018 00:55:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704129; cv=none; d=google.com; s=arc-20160816; b=i18wbaKbShFxIhrDnEMk4FCyey2unx41jUqhpvm1KQtGyyCZzhuiUHtl6DbwMTtLGF NpWLEW7uETPe0UlTQnZ5PLuggbOtJTbQ7stC5+BKdE1R8Xeov4s+tkDoBwu13Aih8vOZ Hf34xAYpVsAGkjz1jk5/kAT0u9ADSeRHEzPBMNoMTS1ZbRwjtcPyuGe56grDE3H5fhkl ZOn6sY8p/MS71tJiEDuYxwA/fSAMhH2jGjUFsW1nkecf3XYTIHpn8Y7p8tqjdpmi4Z+3 Ns2dLwiJ1N3lWG6jFeGeOgmOaRayTVBPbeF8banYKlb07HQp7zG8nCbJcyHaBZMBvc8k oQjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ivCWdx+wUDaoKPAOJOJ94T5ufRT2/KbB8a3rJfLB72Y=; b=CgbCKj/kYVNfTTHDF4Dd7AS9URboYKh7rL3lRYuLdos/+v1eP2OcduCV/KpJ3yATlg rqExx3gMDd94QnAsEHw4v8dp6yi5Rmhuhw4sNLl3RdudelHHdIDKNl/L+6tRu5rm1qi5 dDRDTR07hvSpcpFi2yZQKYwgqDXMip6cPmofhCWocYAuvPw//c/cSw3w7I0b4N0Aiw+o fYoC4AdTY1c/yNQ86qcCIi0ZLzfZMf4vpszP6mXa0B+9YUEsIScci2ZxDWI3Do9/H8bG ZUMQl1EObqlHtr+zGQDqn3VLEpjKLLvpG/vaoNGdr3ADe9ExQAkvPeSwyHZh0IwHqc5B jEDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:29 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105857" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:26 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 12/21] swap: Support PMD swap mapping in swapoff Date: Tue, 20 Nov 2018 16:54:40 +0800 Message-Id: <20181120085449.5542-13-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +----- include/linux/huge_mm.h | 8 ++++ mm/huge_memory.c | 4 +- mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++- 4 files changed, 97 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 20aab7bfd487..5216124ba13c 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ea4999a4b6cd..6236f8b1d04b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -376,6 +376,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -401,6 +403,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index abaecf96ceeb..079592b9f4a5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1722,8 +1722,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index c22c11b4a879..b85ec810d941 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap, page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); if (!page) { + struct swap_cluster_info *ci = NULL; + /* * Either swap_duplicate() failed because entry * has been freed independently, and will not be @@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap, */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; + /* Split huge cluster if failed to allocate huge page */ + if (cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, 0); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } From patchwork Tue Nov 20 08:54:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690007 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 195ED13AD for ; Tue, 20 Nov 2018 08:55:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A0B42991D for ; Tue, 20 Nov 2018 08:55:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F246B29A1B; Tue, 20 Nov 2018 08:55:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63C3B2991D for ; Tue, 20 Nov 2018 08:55:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BFE96B1F43; Tue, 20 Nov 2018 03:55:34 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 96E786B1F44; Tue, 20 Nov 2018 03:55:34 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 886BB6B1F45; Tue, 20 Nov 2018 03:55:34 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 46B696B1F43 for ; Tue, 20 Nov 2018 03:55:34 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id s14so1063099pfk.16 for ; Tue, 20 Nov 2018 00:55:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=TFAFLyMtcJjXCOTjjkL82qcfCtqqFM3BCy/8n1um5n0=; b=h2e+rCPqJqPIaZri3JU5X1AcIprEZXQ6UdHswhgcYoitLTDJPRwj2OO8u5Fo0yHONM yckZqB7YWdSl5fNtbOIotGFptquqEPo/mdEEVnrLVVLtg6Vl94VNO78e5J1mMYHzbCmw 9n6nXoYhYOGaYxjZUqh0zYcN2AosCjYyMqtfgNOMlASLYe5nlIf9o69zOyW4r9VKY3DG 0Woccq4np82BGIsNW/LJr8C6MBuDQ7RdnYE4o2R+wCiensSjjK6iUM8clc1C8VVBn88T 5jCMb2mheQIk4RGpGja7m0sAL7JggIpbgb35ZgkbStCVCeMXVbYJEESFJJY2g6m7kG1l LzRg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYIYx6SNI27LFVOZ9PQ8N/fCYxIMnzTUSMU8mX0PvsGCebo0o5z NLY5YgSvzmTDl8oyrkAXajWrwYmQ1VR9vOpmgkypAUExXuKz6UKx36i2QyXdn4xY9l60IPYtwu8 FvxhSNVUojSn4mLurCi8+E5C1aK2TKa3l6+fynMzH5WBHbTvmn2fWGOzprJODv0RqFw== X-Received: by 2002:a63:f412:: with SMTP id g18mr1160453pgi.262.1542704133943; Tue, 20 Nov 2018 00:55:33 -0800 (PST) X-Google-Smtp-Source: AFSGD/VC60S+YtvYNy+ABkyD0RF18nWTaVHXlgsOHbSnWVBw/hX97LZl+nA8qQXF60N1zFgyQwAD X-Received: by 2002:a63:f412:: with SMTP id g18mr1160408pgi.262.1542704132766; Tue, 20 Nov 2018 00:55:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704132; cv=none; d=google.com; s=arc-20160816; b=Nwlaj8O1voDCJXqd67QcFV3LmpDnZFvcxL0+nlra2pKoG82fhPImvKEV45eHbDGQhI 013rtaIZwMMZL0YL9/qfoTDi+i3S0U9FZh8XBtobpv/vX+BU6JbkrWcG6k8x0rajBzr2 O5vtKqyI0ixZ9cP4rlmvvuPbAutMRS+hVdhkeN+mQgLujDUmUZm5wlpuy8sLi0yabvmr PTe4sdEdWsRBIkhBwVL7U1OB6tR8DODcz7BbzM3wFcu+L/o5I4/rLMy7nOIcfFMA54PC +IeT7vouAm4p9cfrs0MgHdybqarOqkwuZDNM0i/ksW1URAQ/0xkpfaWfBVr37po/pSjy XWcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=TFAFLyMtcJjXCOTjjkL82qcfCtqqFM3BCy/8n1um5n0=; b=jnTh8mRHMJQJmOQW7d0FwaBatnDEtXOYlaGFoQ4FK2TNi2KYXZM0Hms0ZLT2I5JIyK bXZQXQTqVjPv7nZhyK6YqqE9eTCyOw6GlNWKRWBXdNxpea5nxUsW7NctW5P0FjOIlK++ EIOYSRq9LQC7D6AjH832SSOMGR6R0Iz6mJ8tkbHAWil8sr73qml9bj+W4PLldQgJ0G4C B+fWJ9SUtAPZkUI233KnZFtDSXIf9AbKKytMKT1NNmywZe6cKauaaUVRME+btcS+NvWC ifkViGBU62/NeA247zbWYeSdCWUfCW0G1ydbAFMvj5WcxX3GsMJH2dCjE7thlFd1OBss Fi/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:32 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105867" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:29 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 13/21] swap: Support PMD swap mapping in madvise_free() Date: Tue, 20 Nov 2018 16:54:41 +0800 Message-Id: <20181120085449.5542-14-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 52 ++++++++++++++++++++++++++++++++++-------------- mm/madvise.c | 2 +- 2 files changed, 38 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 079592b9f4a5..89aa93d586ec 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1900,6 +1900,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1920,15 +1929,37 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + __split_huge_swap_pmd(vma, addr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1974,15 +2005,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index cbb3d7e38e51..0c1f96c605f8 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Tue Nov 20 08:54:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690009 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F7F913BB for ; Tue, 20 Nov 2018 08:55:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FE7B2991D for ; Tue, 20 Nov 2018 08:55:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 438D92A690; Tue, 20 Nov 2018 08:55:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7C2F2991D for ; Tue, 20 Nov 2018 08:55:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B201E6B1F46; Tue, 20 Nov 2018 03:55:37 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A817A6B1F47; Tue, 20 Nov 2018 03:55:37 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96EC26B1F48; Tue, 20 Nov 2018 03:55:37 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 485746B1F46 for ; Tue, 20 Nov 2018 03:55:37 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id t2so1070035pfj.15 for ; Tue, 20 Nov 2018 00:55:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=/zRWNiAX9CMo1aC/NIQtjO31US5rRV9Pem3a31poCyo=; b=bg/FQLjCSqi3CpOwCvN6wzKLRJ3kaGG8G3vI8axVc1dhYIpHCQixXe7i3HKPOXs7+c U0tPBZxqgCrb9mkvbdAKUzw4A/+uxztvJlFvRVJt8tcbH4brRAQPpFw+PCa7kH0/UokA Zbql+M7/YjZpdx+ojf/SqrX6xRhIevIwMhSwMux4ngyajfkGUkI88DKG9RAoeIkP/LiW 9oxNK850WUJbVxwyQi3A5IjumTbdXk+UbmzoBogwZXdZRQx4+1YPe42zkHc5Np1Jdyzc irac41tH1ewLThaTx+gJFli2J37GPbv7YoYlMve0TgGq8J9HW9gypLaQXneq/bO4DeQl YEgg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaFxa7isPaQziAaUduR9qsifEI49AqxAmo9Kdph2R5jc0/kDp3B S7yu0EiWuhGDQBZB3fumbzU0GaM/oV4BQNDWXXQ+5jKLc/MIWUnvNzuEWSfyoi5WGX1WCrvRgka bbCNUSb40fTMwQWzbBbp9ndhTVqpGcZGWK0cN0farJbsWgUVTz2i5ny7CyCWbAUCa4g== X-Received: by 2002:a17:902:744a:: with SMTP id e10mr1330203plt.304.1542704136934; Tue, 20 Nov 2018 00:55:36 -0800 (PST) X-Google-Smtp-Source: AFSGD/W0VZsnO4VHtdprkAFYWnP1n0EimrCqoodFVaindo1uYUxN+36V9b2VQYNlGCfaBUGtPH8g X-Received: by 2002:a17:902:744a:: with SMTP id e10mr1330147plt.304.1542704135605; Tue, 20 Nov 2018 00:55:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704135; cv=none; d=google.com; s=arc-20160816; b=ZIPy0cXNZJM7Is0ib8/BaRqpipUYJuPHrDtk/zjOrGaOuXmvX5FRqMNuBeSZ1vmwvu 9JJevuf/aR7sSpANUWUPeFyR8TvXfdUEPW2CvX5x6NXSzzjpQE4QLYc3WJrEzGgHDvtA O8+RnMRofXFxU6YOfzKdtWbibfYeFY88d9o6jLpph63JeOYNemeMJzlmVQXonvBarMdk WNxYIHX6ZU+dS1v/jL3JIEjlA82fT8kwHVaDuAGoA3+rQAEalcVVOGfr1FP3y8Zbwxen XxEOROJAxzpyVpg5LbH56x286qgiD0/dmwsl/VAqU6daN3F014zkNlcszexgQpWTcEGF DBag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=/zRWNiAX9CMo1aC/NIQtjO31US5rRV9Pem3a31poCyo=; b=W7rgFFIRTN/3S362Sl3QXJuweV/UKu2A2rcAaD8rzspEME8XY34l4WtqHJ+jgivakH IvvCOGW5T1vw5OdjIWDBXX8CN3CN6WzRgHRp6sXB1ys2/e1or1x/HOLsiFZgV6/mDnoP Y4B7sNoPmYVrRRqzyQBm4OFxYqxw9UqBMuqw2AohTerjeNHYntfOGua2Cc4jJi3tuNXN V8fkFQgtaZmXd/DDMuZmeZJwHXrfDOWqGSs7r5dPd2w6vUGH+28VR8olK9nNB+FE4DIe /lej77frExQmYEvvI5aVO8+9H3M1EC3uT83mwJ8bdw5zq8cCfXGsQtsnjApGoLb6xdkN p/5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:35 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105878" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:32 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 14/21] swap: Support to move swap account for PMD swap mapping Date: Tue, 20 Nov 2018 16:54:42 +0800 Message-Id: <20181120085449.5542-15-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 7 ++ include/linux/swap.h | 6 ++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 7 +- mm/memcontrol.c | 131 ++++++++++++++++++++++++++++-------- mm/swap_cgroup.c | 45 ++++++++++--- mm/swapfile.c | 14 ++++ 7 files changed, 173 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 6236f8b1d04b..260357fc9d76 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -376,6 +376,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -403,6 +405,11 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 4bd532c9315e..6463784fd5e8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -622,6 +622,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -632,6 +633,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 89aa93d586ec..3aade329fe8b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1686,10 +1686,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long addr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1721,7 +1721,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6e1469b80cb7..37c245d6aabd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2660,9 +2660,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2673,23 +2674,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4642,6 +4647,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4708,6 +4714,28 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} +#endif + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4896,7 +4924,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4907,37 +4937,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4950,6 +5007,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4958,12 +5016,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5154,6 +5216,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5161,8 +5224,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5173,7 +5237,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5181,9 +5246,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5193,7 +5270,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5227,7 +5303,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index b85ec810d941..d7717b694ec1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Tue Nov 20 08:54:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690011 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31ED713AD for ; Tue, 20 Nov 2018 08:55:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21BA12991D for ; Tue, 20 Nov 2018 08:55:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1619D29A1B; Tue, 20 Nov 2018 08:55:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F0472991D for ; Tue, 20 Nov 2018 08:55:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F02C6B1F49; Tue, 20 Nov 2018 03:55:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7031F6B1F4A; Tue, 20 Nov 2018 03:55:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B4E96B1F4B; Tue, 20 Nov 2018 03:55:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 089EB6B1F49 for ; Tue, 20 Nov 2018 03:55:41 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id 34-v6so944356plf.6 for ; Tue, 20 Nov 2018 00:55:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=xPFUqy0lRtJKEg6SL8GuqeAMvpHV9AOMSMwx2iQeFjM=; b=PO6rYkdQKjkjLTtEt6iP7Pzl7P/IDc6euxSAeUf8Eh8lEBxN4v+kuTcpwCJPypFXiV FXhU8ivLYUFj/nU0FLlpngIFOAzSViEU7fywu/ue6eMc+7zkpfw+zWFt5UeAGuJ1bnjj fHYYaCtrvyQ0fzK9hGeS7ebBSPkhLTeyk/Hu0vAJiHwngZ2BuTYxKNgQSLHdp/B0Ue70 rYa+4Wckr43/mk1d0mjhc+PhJnNiEy0mBIOKduU5YbciA+yBVZHkPe0YHbcXFGqCI4Ns EYi5W5VrWKbXH69W9V46TDDc7kdIoH4QfunuhOdR5DGjNy1eQtVNpFIoJQ996Fxqo4F8 efVg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZbegxLQ4wP8rcr8HlD00c18lw9Lrd8r8pqDb2bKpfdJ/u4NRTA 1dTJ+1ElrR+3377TGJ5y8iyecFFMD7CV0FPvEftBOeDiB0mfRh90N+rCfuf5ZzQl3jeb4lKk/X/ P1dvsEEgSSSgFBZyFNUy7HMvj9/LRdYvrFUO3Sw5fl1OblWxMxIY4MOCHHPFFaCXcKA== X-Received: by 2002:a63:170c:: with SMTP id x12mr1096248pgl.364.1542704140692; Tue, 20 Nov 2018 00:55:40 -0800 (PST) X-Google-Smtp-Source: AFSGD/WnKLDCpfGT1aRDUVKupXroY0C6K7zpVUrE26b+lZKV3Y68xUvi6mv2KQewTnt/U9ctRLfJ X-Received: by 2002:a63:170c:: with SMTP id x12mr1096209pgl.364.1542704139483; Tue, 20 Nov 2018 00:55:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704139; cv=none; d=google.com; s=arc-20160816; b=zlh7ZPzQ1b4icY0Pip6Yb0UHKChf39VQnjrYhft2MIbqCO8jRCVsmtq4qpvltjIw10 bkc5fl5l1HUKYfE446gTJO/n2ggMAeLBZ6DcP4TCfMBoA3t8x/0uoizIGXrpE8XBwWf/ KF/UzD6HgHhJxhqVfoXoZPN25Oh8uljHVksOUy5TMIukHDObDMA4/FgKxFhp0wuwEg2O IAh+dxIpF9DJrREMG7AIfWp3jp3VZOJY5VkNmmOJlJ6rhituhoESYI4Ie1ZOYHjuLT/Z 4ttylupKTov13zwAqDu4Yh2syJIxCJU5D6I+eowZmS2IkwyU4+YqMdLWDAry97rIWA29 b3Jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=xPFUqy0lRtJKEg6SL8GuqeAMvpHV9AOMSMwx2iQeFjM=; b=ctFj7j74pTJJRyInU8eCUXZkivzY3UVtKL008Pcv4Vk2JHsyDZasW/g3uQLBwqa41M 2iYTVyb4jhkZFV1yBb1OG2v+I638bgNCoPK3F21FVO+kVTndRRHykGj/ROuPn3g5xtYO 5Sof0uxL6pQbaXkk/1dnXYXEf3MhrZZy19ccj8Y5vd5SavK8WpK+eoj/20D5CQ2h6o++ y2fFRhuGoS9qDgaot60UVur2C7q1/wF2Q4ZCJXc2Bn2MW45KRtKqmwPRysHxykCnOBVY f5jsTZzp/ZuiH1kar4IctPv7R075wI3DpgZ07HbC4gwFVVulUUAyl81YWlfKgWQcb/V9 cExA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id b15si24149550plm.431.2018.11.20.00.55.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:39 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105888" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:35 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 15/21] swap: Support to copy PMD swap mapping when fork() Date: Tue, 20 Nov 2018 16:54:43 +0800 Message-Id: <20181120085449.5542-16-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3aade329fe8b..2a49b2068902 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -985,6 +985,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -992,26 +993,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Tue Nov 20 08:54:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690013 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADB8313AD for ; Tue, 20 Nov 2018 08:55:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E37B2991D for ; Tue, 20 Nov 2018 08:55:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90B2329A1B; Tue, 20 Nov 2018 08:55:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1872A2991D for ; Tue, 20 Nov 2018 08:55:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE6636B1F4A; Tue, 20 Nov 2018 03:55:43 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B23F76B1F52; Tue, 20 Nov 2018 03:55:43 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 863646B1F53; Tue, 20 Nov 2018 03:55:43 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 3520A6B1F4A for ; Tue, 20 Nov 2018 03:55:43 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id e89so1060078pfb.17 for ; Tue, 20 Nov 2018 00:55:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=uJhqIv9xRioYvD8lO5hp5GTopqjYCKDOsDnT+Xx235Q=; b=E+g4FjOaFK7u0jO/pgKj8lFmSe//YLOnl4LlAIgYq59qHHBeMEkBSXGTAcrdmhWCrE FyqZl8Q73XdysJFUULCvysl7PY5GDXmTBr396qh9szZmhiIeM6SUppDy/8uNKCh7ldfV Vof/eAjeflOzFc/OiDNp9EbbmH/niEJoadARDZ5o8/Q9wwY7+NRSz+JP9qFKdGiz6nXG Liv5QDpWo2hupnRoz6ne5eIbc9PAxGUW5NKyq/OjWjV/zgVD1VhmnRHbki7l4R+MIEYx 2/ajSU+0Ry3MkLcXZ+wlwHcU8ilRF6ILhcFDy3J5eNUB+MFTAj1nks0QqSfDIH4srEP8 vUrQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaEo9x0/djMkHbMOFJxLKnHqFTGQaK91xSGY9ENo2lr+jlEauBI Il9zpwufruXR4uGg4LV4ZrTMOxn7bURFl8vs2wCX1+k5+YK65SH1vXQ2JksqAMYjBdQn91oQ/Q1 ghPs6joDU8U3vfZJvt1DrgvsGI2+Sz9CB+2FLb9JD84Cjuh+mt6GjGAuuVLbGY6IVMQ== X-Received: by 2002:a17:902:aa0a:: with SMTP id be10mr1318699plb.266.1542704142887; Tue, 20 Nov 2018 00:55:42 -0800 (PST) X-Google-Smtp-Source: AFSGD/VTDVGIPFjSe8MzNMThrQgfRdiEYn8wOjN8JDHS30V6loutxLXRHw04IkBsGFa4h2AToGAq X-Received: by 2002:a17:902:aa0a:: with SMTP id be10mr1318645plb.266.1542704141821; Tue, 20 Nov 2018 00:55:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704141; cv=none; d=google.com; s=arc-20160816; b=SQVRzsQesMJ24+XnZ+DmJT2jXkwT2PLmLeLdjrL/3iuJINl8n4rfcst3Su0+nYHcrw /o+Xpg51D+hFMQp4nZArDRjR2XQeydVMilVPMW7tu1DvMEX8zn5Eq1Zc94aZMbpk7iPU 3wtbbnC73zgj4qSmwv6VLFZ16kVShB0r/9kDDtibAzh9V8S0epekcQ+UNJVM/gNSMOG7 g59Jyrqe9o4DDW3fxD4wDQA55oPgS+vl4KZqBGr8G5xwKq3h/cHMrlJ1zrXtvCChQ5Cg HKkcA2R1djBuur8+cRoclSzq0E7euW+QxAqEHUwhiOwmGsLPRwWzCTO4WG2V/xOEGJDN bktA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=uJhqIv9xRioYvD8lO5hp5GTopqjYCKDOsDnT+Xx235Q=; b=puTGZchc0aBAQpO4CJlJoBBGX8ICIcyo0s34E58gAkgloKBtLvhsNK6EuKVOti4SUh dIl5zsrQmteq8mMPFXqpyDYqIbcTv7U9hg97RAVlUXp6LPB4pXDfovgQeFfpI+3oji6+ hz+dgAEJMTmER7YRUYep74lEMupLddn5SY992ZZXPpF4xybGSzJK77aoNx3FBWaVHsqO UCpSFSDRtQeuI6RCNz1tmTV0Xf0Ma5EPrRrc0hip4+Ej6GMw7iG/0sY2wd06fhrl5BeX iTpCpcDiO1N7E353dkNP8aoLI7XpwbYkkBRJQdhaOnU7Bdx2viNgHGtaUwTb1QCOe52b 5yyA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r25si673627pfk.28.2018.11.20.00.55.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:41 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105904" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:38 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Tue, 20 Nov 2018 16:54:44 +0800 Message-Id: <20181120085449.5542-17-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a49b2068902..c2b23dfb0d55 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2072,7 +2072,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2085,17 +2085,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2103,7 +2113,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Tue Nov 20 08:54:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690015 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2D5313BB for ; Tue, 20 Nov 2018 08:55:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3B6E2991D for ; Tue, 20 Nov 2018 08:55:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C7F7629A1B; Tue, 20 Nov 2018 08:55:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4EF302991D for ; Tue, 20 Nov 2018 08:55:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C8F76B1F52; Tue, 20 Nov 2018 03:55:48 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 550926B1F54; Tue, 20 Nov 2018 03:55:48 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CCB56B1F55; Tue, 20 Nov 2018 03:55:48 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id EAFD36B1F52 for ; Tue, 20 Nov 2018 03:55:47 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id s14so1063449pfk.16 for ; Tue, 20 Nov 2018 00:55:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=eHCfojupe637JnGFz/yRaskx3CwIUMdVvylmAhyeWmM=; b=KjQj/cxQ4ydvFO9AMJIFOnnuDoIsCe9R+cESlHVLJiEblrEYp22vAAfO8+7d/wSkmw f42vPy+yxBu0Ouukjrn8Pw3RB8zX7el+50J45MAhHxh3oAX1dWAGhlBoyRauubkwYfNG fjZFzSQPiabxbT0RsIOdOLuOzLxhH26V34qb+g0uh0HqTU1Gf8JTtmc3KbCZtHsr3iGy yYH5OgDFTdXjk2jo4bqZNjcljLU+II9i79YtxDC+zy0yb/s1fmdYx7cMkgDRfX58MqNa 78fM5d/ZiORElt4X1Yd8WGrIBR1ZxpbZ/ih5o1KX5syNc0Kt04KsoFXTWNQzjxQFjPVq DHyg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AGRZ1gK4RHzCuuRiVvicyhjfoVuqISD4uGEYgQr4S4AvJ9KcMti5Wd9L K3QQUZuFSuGytXX+A4H5bvFGoF0pamNAqsCeeFT6lF4JwQbr/kKpJmbNhWx5Iz5CNpQw0LoP8bP vPkhotKQQ85VYfWAPeGI5rvmJcRtEa6t3e/eAUELGWQdVGYY1yd3gy2Wdzs6ueWMFjQ== X-Received: by 2002:a62:a117:: with SMTP id b23-v6mr1246063pff.163.1542704147637; Tue, 20 Nov 2018 00:55:47 -0800 (PST) X-Google-Smtp-Source: AJdET5ezTN0/0psdd+tmc3jgugHKXcZqlzx0g6fpqKkSVPpR3awsX8liahl8VjijanHJosLabo9n X-Received: by 2002:a62:a117:: with SMTP id b23-v6mr1246025pff.163.1542704146612; Tue, 20 Nov 2018 00:55:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704146; cv=none; d=google.com; s=arc-20160816; b=I8Fa68zC9RyQWIdSHSE7uGm7mf5fL/b4upERXjzRTB/RlTn305rtiXBwKDUazco3oz QiItkgdgQNFADNXmZiaHi6jBSNFYnQC5c2O039iQfzoKomqQYLx+htpVkebMYYvl2Gd7 pfuRTfSrEdNQYJqxy+q5Edbx7gsdKKo9biwCh3cxPJ9hzs/2wFfm7oTU/XdCxZv6Qbg2 sgzNHS/EQaZquQPHWxIGIXBewdv7ppxMSBw6DLJ4Oy/kkMGLyPks6573CKnsBpqa3Nrb J78YgdB3h4bT/x09BSS1xwvQCPdjc3QbegeZpDbXOy0u7x31Y4FhWP6ZKFC3FTVY7v+E UbDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=eHCfojupe637JnGFz/yRaskx3CwIUMdVvylmAhyeWmM=; b=pAHdyQMy24p6H3XCUaqIw9BHPEYVqgWQqItEelDxBRE/sg72TZEnNStCdwyivuhpfM SlpDl2XeqezgVPyWHFGnGlQxtHKw54MmXizaWzjwQh5mPWQWJ82Wri0+UYK4UHbXllYz PZoemUp2x5biqtWBRVc/zrx0jdaBVSNSZ3Ux+H1RUL0e+XfnpsoFbUMgBJOSj3MLUB8o Ucb5uMXu0DSGpSWg4a4FUaRuMAVEZNv53kBJ2jA+tKjpW26ZGrAs5u0D0pyWp+C8+Qjd 0jPlWRucGopAq7kzzEa7cH6Go46qh6g3vuL8ZaLUvlnttp0IC94FcqEyYOXg3TyBITrU erlw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id r25si673627pfk.28.2018.11.20.00.55.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:46 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105916" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:41 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Tue, 20 Nov 2018 16:54:45 +0800 Message-Id: <20181120085449.5542-18-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 0c1f96c605f8..52d27e04a204 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Tue Nov 20 08:54:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690017 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BFFD13BB for ; Tue, 20 Nov 2018 08:56:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B7B02991D for ; Tue, 20 Nov 2018 08:56:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F8E529A1B; Tue, 20 Nov 2018 08:56:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E64262991D for ; Tue, 20 Nov 2018 08:56:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A61E56B1F54; Tue, 20 Nov 2018 03:55:54 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9E4C06B1F56; Tue, 20 Nov 2018 03:55:54 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AD4F6B1F57; Tue, 20 Nov 2018 03:55:54 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 43F6B6B1F54 for ; Tue, 20 Nov 2018 03:55:54 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id v79so1058257pfd.20 for ; Tue, 20 Nov 2018 00:55:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PneHrFIvBxeR39UtwCqcyYL55SkL1f0iPo6rjUY3mOU=; b=Ok4ibYdL57K8Zcy/+0ISC6Aiobtt01isQKaTojjNc8r220yN5hTMmP8nURFWT2bkLy 1RxYTlCXpRY9REEP9pttmUtHHuuQAzbQlGshqCxVD9f4DXlJ6w5Tm/yFnP9wuMc25S1w jPOVbxSA2y73mNQv2hGe9e4Dx/fDj9b/YfdrIZ4qtwEJcYPQp/NttK2Bgo7LbIhWrWcW ARC7Q0c5WqbIIfudNqmzZlvwUC8RGub0wR5WXqUhw2dneAZR28pqyf4zxxFnGHgNVkPy /eTBevqFlSRBbPEoI7pKQ4X57tnwdXd01S3hW3W4UlwDI+YpQFdQ6nGxceM6zkThTm1e qQ0g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AGRZ1gJ4YTAM2AEmyOTV2IHdPluwr3AlmVHyZ9m747E1L6p3ErF8zW3b aGP/uOFxu9XHpEo0AykoeE1x2aPZ7tWi8EXZDCjk6YkfHideY/v7LoukQxRft3Epz5RtZ5fxwOG bu1+U8UxR2T8+sVfU2fw+Qif8knbSNWjkkuyrdwl5URyJktlIfeWpw3OB2QGDgu8uLg== X-Received: by 2002:a62:15c3:: with SMTP id 186mr1344487pfv.240.1542704153939; Tue, 20 Nov 2018 00:55:53 -0800 (PST) X-Google-Smtp-Source: AJdET5cn3k+iwXa6aHfzp5h82bpBOgU7ybgi21jgj8Hk7BmmLcfLZ4jH49Ljuu2pDmt/Oh4TMUh+ X-Received: by 2002:a62:15c3:: with SMTP id 186mr1344436pfv.240.1542704152658; Tue, 20 Nov 2018 00:55:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704152; cv=none; d=google.com; s=arc-20160816; b=NfznVRLcwcQkhCYUfQeHpzTId22vYp6O8T/8JrhO0F68sLcqeTDk0XaKSldeQmMVy+ 5sdL1671gWIdCRpY/6CL6uZ2H4orxHopX7o62tcQ0mGiQQ0lZT2YdC9xgjX2VVcMnwNt JpEcJax9Y0FmyxBDeZkc9/i5bqSxSqRiTmIXoGNcfjTaPLd7TTKGwyAcBV2QdmPyqpSb hD6nStRnJMUaHXDrXeLtqJBUHhbLqDYLTJRbhB99tTuIJW5/OBrUrDOEEHvCb4Fn66mD n76AjClkl49+g8iFa6eSnnDX//lKXEkuxnPAm+2YVn79TowYasbDsuGU1jK9IQtEBH2s yoaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=PneHrFIvBxeR39UtwCqcyYL55SkL1f0iPo6rjUY3mOU=; b=fEgaK8j86hkqQoipxXUMD6fOKwHmOlxtonXElW6JPLTqlNmqavaU+luEunLEXW1/rG rVHSIxu5j+bTTFUEvD1LVwac3BEnsvJFuQiVXHku4MU7tKkEphmUmt3hCme4StpmxOHa R4DAH5gYxoyeziaVyswEA1DeE9yire2a/TPWK2/30L6e6QtCZ6ze8lJqfK48+uWpAJml gkxGY8vkQaqZu7UFST0MNZKK9VieombfiCAHTTKTvvgwJVaoWocpTZi6VO+BVE8eAc7b ztBL4PJwxoLK4USEU6Y62+LIbMT9H+Z/NYOIKrn8brP3ypgDoPtQWup7E3QPjaj9ojrC FVfw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id u9si42237341pgu.570.2018.11.20.00.55.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:52 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105938" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:46 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 18/21] swap: Support PMD swap mapping in mincore() Date: Tue, 20 Nov 2018 16:54:46 +0800 Message-Id: <20181120085449.5542-19-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index aa0e542569f9..1d861fac82ee 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Tue Nov 20 08:54:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690019 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C764013AD for ; Tue, 20 Nov 2018 08:56:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B851B2991D for ; Tue, 20 Nov 2018 08:56:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ACBB429A1B; Tue, 20 Nov 2018 08:56:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91EB02991D for ; Tue, 20 Nov 2018 08:56:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BEC56B1F56; Tue, 20 Nov 2018 03:56:01 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 04B086B1F58; Tue, 20 Nov 2018 03:56:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D92BB6B1F59; Tue, 20 Nov 2018 03:56:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 7A9476B1F56 for ; Tue, 20 Nov 2018 03:56:00 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id e68so898693plb.3 for ; Tue, 20 Nov 2018 00:56:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Xv4rHmTh0d0948RSLrRyE6TIjG4J2RSH+Kce2LLYVy0=; b=OYpNvnjf+BSvb10+cJ2j8Z+BliF/KI7O6bFAqHZKftzsotAWOQvNDD6/BYKzxsTuaI qYHhDuxDf1AskzZ3dJFfONRrzvRZfpRaosy5unDPWyX8XZ3qVyh2/jIYKuI4zSBggJ82 IABtMxIjoeQ7RenqXXCkSiEpiu+/2X31TiuPlj8d5jHH0R59xGcrbKUEheVY0KrjtMxq jxlfFY17RdGY/DbxPFo0C3KaZorsUr79SZcFSOAt0qDw890HyhgClya/EbX+KGwlVMQy HpjjdGMVaqRuOFfvA2BhJErAklZkNynm9YZ5x4sVT22un50z8DJBnWl+6kyJFr99sZAM u/SA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AGRZ1gJZgQp2TDKgpWcwMAnK73HLev8exVHIwtIx+YdhPQnsjmRB55PZ +RBXl9J8LDzXinwdq7Zde2dHs3QdMbqWuvTaZLlMiN67u2hpXRH5iqGiqsnc00wRQnj+/icBGON zZbjaPWAIgYTA5V+0a3Irm16ZqyKETDRnBY7tRisnVHrkjjH+8DnpcNxtP3yLbdRFvQ== X-Received: by 2002:a62:5615:: with SMTP id k21-v6mr1301115pfb.190.1542704160160; Tue, 20 Nov 2018 00:56:00 -0800 (PST) X-Google-Smtp-Source: AJdET5dMk8JvCC1SugTk8UfiOlV0blgID54GutLvzfYapXMMY6Bspq53IYSjV/XfPOXOGWocHkOV X-Received: by 2002:a62:5615:: with SMTP id k21-v6mr1301066pfb.190.1542704158896; Tue, 20 Nov 2018 00:55:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704158; cv=none; d=google.com; s=arc-20160816; b=GME4QlRgl3DCr84UD42j7JInNEAs5ubCUke0KG4ocBiirAruOpxdogHXWnS75qjoyV mC5hsyruxdOhiQmtHtmvCLs9/axbwsN7q5eAcDiJenaNlrLH/Nm7vXXBdEQpvtgui5HW HL61LOP8hperGcy2yO1sgHvVwuR3SAEOZRm74uE7+L8L1EePhF4m1rHoNXLbF379vKSv nF/aS7qMYd8PvR7PzMu545o5H+YOtVHTDvBBKySJfCS8+o9jklzMYzrakfvgeaZ0qD5U MSl2XcoIJY8+cQPUvJz7xdt1J02UrCPQEJ7t1AiR7Xl9TgNvF5Bcl+XEbGWMB8vRuGNw xONg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Xv4rHmTh0d0948RSLrRyE6TIjG4J2RSH+Kce2LLYVy0=; b=bZB/pqc/PYiRZYh18zyaUxFc9KUXGraM63UlQ8Q0LXJU/yQ0rZmTomKmtydJWYf8s8 inqT+UG09tLJYnKB1XvdbQd5TS9uZESi5eQJLuFY2PuKcnLaxkxrv3ayPfQWdTcZNWGR AZAR1LYiwXsHsUgIjcoblHr3CodSGvnip1k1FRvBd5Zt/pgoABXqMr6kM9G33s6moMbu wyGPFt7H4SP2+SjNke/Ud13rhQ3HaRxIVKl/DdCHSX3rfNj2/tccs9RV7bXTo1NpypYs pS38+xDe8dLj0zpr+WEUdlxieAiDA+/zQRK81BGOrm3SgX+r0lfdXDxqrupQ6Cf4jiWk 9ecA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id go15si11450251plb.219.2018.11.20.00.55.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:55:58 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:55:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105951" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:52 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 19/21] swap: Support PMD swap mapping in common path Date: Tue, 20 Nov 2018 16:54:47 +0800 Message-Id: <20181120085449.5542-20-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 39e96a21366e..0e65233f2cc2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -986,7 +986,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1316,9 +1316,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1331,10 +1330,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index aa43620a3270..3ecaee6dd290 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -215,6 +215,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -242,18 +243,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -275,11 +280,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c2b23dfb0d55..e7b0840fcb8c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2139,7 +2139,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2223,11 +2223,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5837a067124d..7a5c1d2faea2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Tue Nov 20 08:54:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690021 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CC4F13BB for ; Tue, 20 Nov 2018 08:56:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F0AD2991D for ; Tue, 20 Nov 2018 08:56:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8317729A1B; Tue, 20 Nov 2018 08:56:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C5CF72991D for ; Tue, 20 Nov 2018 08:56:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A387D6B1F5A; Tue, 20 Nov 2018 03:56:05 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9C1F96B1F5B; Tue, 20 Nov 2018 03:56:05 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C1356B1F5C; Tue, 20 Nov 2018 03:56:05 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 36F8A6B1F5A for ; Tue, 20 Nov 2018 03:56:05 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id s27so856752pgm.4 for ; Tue, 20 Nov 2018 00:56:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=S2s9kLMw8E31oNphEvTiP841hA9SLAMMBdvNd0TLr50=; b=Qn45Krhvcg8wQO15lj37YGAzkZTM4ejD2v3PUcHUf+YNQZ994paY3vMHnFS3eLUy+I 2RlHoTzomFTZH1Q2EZtVS1+g4UF+fGPJoEOiuH4DU74q5R1KgCUTDKmvX9eKHOsOVTwq 8/NvtP8U5V5RB6S5kXGxQBvD42gTfCaJaGj4ahJ4INxYQeNIhVlwFwc98QbFzJfy8kkr TMeIreAJ8xYatILDQ2fqOHi6dt3wx2FQSswUy1DOQoS9OzRpM8DavZqdpEdv+40tJqiB 6q2M1PhgemeMosIEnv34ZNXlXvhFjFEKyuokxU53igBB/FLuZiCnFfzz3cgVpri2aWEn 6w6A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZ4QeO32kq9J9EMkhNgYzoYnc0e1rd4/1hhyENLl+t7mfFNPOEF L9QeYJgQKk8x3BnflLWkZJWYwsZgEKz2Ibr+izqEFnTs4EpHJz74OwFDusTK2I6DYqTgu42lxUW 7NeVdHFq6UF2lj4jOCqmvAPdhW+pHrI8hgMxsYVxZob2cdGK5JP+Uttv7qy2wyYYfpA== X-Received: by 2002:a17:902:70cb:: with SMTP id l11-v6mr1399997plt.30.1542704164865; Tue, 20 Nov 2018 00:56:04 -0800 (PST) X-Google-Smtp-Source: AFSGD/V03DqLmQLXtcEz+JMS0Rz9tNNBNVVDSQX5DyiSEZrOVyUrU2jZBO2P4CPH2UzpJZrhaQXZ X-Received: by 2002:a17:902:70cb:: with SMTP id l11-v6mr1399940plt.30.1542704163598; Tue, 20 Nov 2018 00:56:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704163; cv=none; d=google.com; s=arc-20160816; b=fvPJqCzofoBsMAU70UwBhsmB/NQr/XF/IjFWvQnwxqqXUYvKjilkorFlklzEw7oCi3 InC9/7WzzZGpNlmXMD+9p5lj+Po+S7rVX5+6KD7khQhlrwpzp+cLGKka+uhhdWvzKgjj mwy85dWZ+VcCeS8z41KluzhfPR15PG/9PZCeH/Fo6VYDJBXj07PiP6Y3H69xapUwjStf ZrNrWuWZrV6J2mBWqhXdRADpF1AYrNuRai0BbBrmMzbtXs+p/7ExF5SuPJIkmBBYXnJ5 Mtz511EhGzp8iR16gQ58FPb2F0HEM9BCzv2UGDls/1BeBISOS8IivLdeeZUvQdhcD3kS eSHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=S2s9kLMw8E31oNphEvTiP841hA9SLAMMBdvNd0TLr50=; b=RkEfZCyGzy3dYChptzZ8UkXdHhgKKR1h02Ua96e9h1VRT2G03bs8TObpC6E8xLyH7A bH5mUe7RyrLJizdk86PFZa5dvAY9gz5e6HcXHhw+uiGTC7BH9DX/Pptavc/ctaQzxV3V wiNkoRv1QTsbhSl8oibbeRm0EFBLyC+aXOJq4W9MaP2JPGzbpNhoqVBMat1J2n/0sM4D XUMAePLKtTvADU/ZNwuEJwonyVDuzTql3HLQ7T2FwrTCRr3PWhZVmHZWrp0ay18cCEH+ MByI4oyNI/tgS17FIagPJ7ZLP3EkYKZSrC3Nt4CHnIFUl0ZdLVbAGPXjEXW6iLcE4d/w 68Yw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id go15si11450251plb.219.2018.11.20.00.56.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:56:03 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:56:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106105994" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:55:58 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V7 RESEND 20/21] swap: create PMD swap mapping when unmap the THP Date: Tue, 20 Nov 2018 16:54:48 +0800 Message-Id: <20181120085449.5542-21-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 260357fc9d76..06e4fde57a0f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -375,12 +375,16 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -421,6 +425,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e7b0840fcb8c..dcc907f6bf4a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1939,6 +1939,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index 3bb4be720bc0..a180cb1fe2db 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index edc024e3aea7..2480dd4249aa 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1337,11 +1337,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; } From patchwork Tue Nov 20 08:54:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10690023 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE32013BB for ; Tue, 20 Nov 2018 08:56:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D0B9B2991D for ; Tue, 20 Nov 2018 08:56:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C51CC29A1B; Tue, 20 Nov 2018 08:56:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA2942991D for ; Tue, 20 Nov 2018 08:56:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15AEC6B1F5B; Tue, 20 Nov 2018 03:56:11 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1162F6B1F5D; Tue, 20 Nov 2018 03:56:11 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA5C06B1F5E; Tue, 20 Nov 2018 03:56:10 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 9B95C6B1F5B for ; Tue, 20 Nov 2018 03:56:10 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id w19-v6so963252plq.1 for ; Tue, 20 Nov 2018 00:56:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=2WTIQvp7eld/sZnVvchF6+tWWrGAA8DFZDIDwNcgF9E=; b=A5GqQoOeiLhpdEOjhWyJ2g7hcsi5rmLlIotbYw2CbeAGqThC/HlrUS+pDX6GA3bMep 5wE9ObgUd7etDIcSwzdOvRPguxHzij5NaEMJZ7exOICeHfXDVc/VrGUxcBpAx5T+8Rs8 way8F5WQR1JG7ZXoh77FDxDD99HOyo+7FyzVBp0lYyfVuGuBsp4cLk/rfKjqTgDQLRaI qvTYzSkMVcZAAktdKRbnhp1ObmtOvdsZSzF9+GcLdB3iPBggA0Q3k/3MFacSwT3H8j5o eopp0ke4/RMljSAjR3L4iOI7mMi0TqjZ3iHNwLRlCu0JuPsHRH2ichs9Yk9mCsrJ9jPL cQpA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYtbtj/T5Uh9k910pkysRs2Weg4WTfsZ/v006b6/tIlIdWf00Od uSRUsgt4qzVEuf4z8FLI+3VkS9SBOMvIDmYR/+XfGIBuDK/M9DDszDZNChWePtgoKju1QeksaPB yp6kZovj2iBkEDiPdDXtzb5qEBcl5ai222Jx0cycwt/2FT9bxmHosphiOzoHON/zCUw== X-Received: by 2002:a17:902:66e5:: with SMTP id e92-v6mr1335051plk.92.1542704170324; Tue, 20 Nov 2018 00:56:10 -0800 (PST) X-Google-Smtp-Source: AFSGD/V6j0vQVUbi4Uh4b6zAi3I7brq9M/mQyu0/hG9dSyUrzrLsP7ZF8ckb7n437iSrtndvISCM X-Received: by 2002:a17:902:66e5:: with SMTP id e92-v6mr1335010plk.92.1542704169500; Tue, 20 Nov 2018 00:56:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542704169; cv=none; d=google.com; s=arc-20160816; b=YXPf2SpVY+uD53OsfeBwif9kBm4JlDNIjO0u7Bo8nR6xQYdcasZ/AEJb4+ZF8lY+sG yYqKiaIAs3sqwLbkMt1XETdZF4I/ScFuKT4cgjRxP5bPt/wwxNP3ziUF+CayCtZG8HQD TGcrvTnrxrpgnA5ik/3IwWco1qf5kE4/oCdetLGQMpeJj7rwWEM5xyBKF8fFXLU+UaMG Y2rgaLzCmQmAxcrbUWLPjI6Upde/WAshPda5BmLufISMP2MB2Egs/Fu7HZIv0h/F5ZPn rbimndNMyWKSJMx2AYYnEMjX7fD0i1dLQg0xp7GoP1GUAq/mAbqJJhr6k91P9o+SHGgQ cTKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=2WTIQvp7eld/sZnVvchF6+tWWrGAA8DFZDIDwNcgF9E=; b=qgO5kBVc63zjjCR6r6JNTa/SAd/H5ixsgJMrxQxc4diMHM2CktvQQxf0vU1qIobLbA PJ68Uw1bpZJG/U/+vmsgJawB//0KCDVviq4l9RU7ujX6xus7/Vm99YBKzVg/26xtTTty EbXW8bHktWXyNDkNb5MAPdfGSkF6ZUAiWxHHS4EnudAtnHtZa4RMyI3K9Bj5FfCEKBHI cGjlDfH42+Qndeo3rgrvncxomWyvonBzgYeLfQh80ogkpoAEHwKpeyDeb5dQfw9GY0yn 9baXdd4eq0ivhwTHsoMTfgXypqQfYo9lZStqODmbfyGmwBp7aAAx3BoOl41RTswSP6Ox AHQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga12.intel.com (mga12.intel.com. [192.55.52.136]) by mx.google.com with ESMTPS id c14si6866550pgw.151.2018.11.20.00.56.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 00:56:09 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) client-ip=192.55.52.136; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 00:56:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="106106014" Received: from yhuang-mobile.sh.intel.com ([10.239.197.245]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2018 00:56:03 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan , Dan Williams Subject: [PATCH -V7 RESEND 21/21] swap: Update help of CONFIG_THP_SWAP Date: Tue, 20 Nov 2018 16:54:49 +0800 Message-Id: <20181120085449.5542-22-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181120085449.5542-1-ying.huang@intel.com> References: <20181120085449.5542-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The help of CONFIG_THP_SWAP is updated to reflect the latest progress of THP (Tranparent Huge Page) swap optimization. Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index d7c5299c5b7d..d397baa92a9b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -417,8 +417,6 @@ config THP_SWAP depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now, swap cluster backing transparent huge page - will be split after swapout. For selection by architectures with reasonable THP sizes.