From patchwork Fri Dec 7 05:41:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717441 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7D02109C for ; Fri, 7 Dec 2018 05:41:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5D122EC6E for ; Fri, 7 Dec 2018 05:41:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 995672EC7A; Fri, 7 Dec 2018 05:41:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E7672EC6E for ; Fri, 7 Dec 2018 05:41:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 681596B7E90; Fri, 7 Dec 2018 00:41:33 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 607AD6B7E92; Fri, 7 Dec 2018 00:41:33 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A7DC6B7E93; Fri, 7 Dec 2018 00:41:33 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 01FB96B7E90 for ; Fri, 7 Dec 2018 00:41:33 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id 202so1838094pgb.6 for ; Thu, 06 Dec 2018 21:41:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+XqQ+IrzcJdX8WfYpVhooNemD9YpwZhT7nm5N93usAA=; b=APgXCcNUgOEthdhOLN9LfYVe+4sXWyLkToQF3jnqNQ0lLMFLxENxa5sGieuBIOaLYl NlHblwLtHKdB7Lik4ZvUQ07qB/QQNOFd6laS74c+URvDEMVBpQuW4Enjfbmf9dmMj12V UjZnMgSSggJj1CsqF/dRB06IRxTeHYPp0xApH8yA1U8tBr3nWXuT7chCXA8HtX/89vRB GQQVrefhbd0uBDfJINLAFi3kPlmoohT8p4hg4U7s/ZRSFBbrT1Sd1QXmH/5ZLxP9LxqX pBW92uEebDKRQ4KjR0icY/TsJfkBtlOZbEAtowhCrgR/vfh89gGDHbDABtcrIIrf3QTR uo3Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbuV+5WuVzPpqZGGH6f/9NJeYq4PfVewQz5D+cMj+y4/G6EAEEg oysbt8Z9jf4uhddZTMyj82y1If4k3jJGit54XWtHmcTgK8aFgV0VoIL2hjIJioMoroUw1sObRij itczql7yx7QrBI6ndSKD+BUwlDCKYeoK4w4fhjCthyVtwPstNqWEfKTpxXRUL5xlPWA== X-Received: by 2002:a17:902:f24:: with SMTP id 33mr899812ply.65.1544161292657; Thu, 06 Dec 2018 21:41:32 -0800 (PST) X-Google-Smtp-Source: AFSGD/VBL+8AWWLjdChoyydmsTKwxQQU8gdARIAFNKKxIRF3rxGVMZpldOxbFA8sbQLBycoexEHg X-Received: by 2002:a17:902:f24:: with SMTP id 33mr899778ply.65.1544161291414; Thu, 06 Dec 2018 21:41:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161291; cv=none; d=google.com; s=arc-20160816; b=TO+Gxpk8g4/ghvZcRSC9FM+wXevr7lgTwfaCxHYkyMeWUgeaABl3Fi2yAVyw/i+coK Tvmj7DuCY4eHAHX0CW0gTfHFQneK7jLxT3wsUKrD1+VHTPdtWa5qRnJswFbxzpRhc0wp XPJWQo0GwQo2wjAjifAX9WnrmOzG4MO3IuJEstqFapnH4Xc+JjPCr7CkaeFyh8JY/qWT HiorJ33nIOrK2Hu43/Xdi6NVg5eO+TrYASYDntYhP1nCoaICq1mxMWrLReBNoTajLkCs KOGTBLT9QNcVBaNd48i10CouqpMqhcv6n0s7tOacwtn37J0cgXzJyWaQvw/K0r04tPaZ tP5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+XqQ+IrzcJdX8WfYpVhooNemD9YpwZhT7nm5N93usAA=; b=goN2etXTMAOzoWCRi1YGLmn4324o37sEUwvred/bvdxYYm860Qcy6bUX927OEPnpg8 fQOXxzuXk0qS0WCTmagyFCJEBbz3/fOYu7imF/Sbvk8Rtg92ZUOq6stvTjwNB5vXQtMj jSl6RPgYvRvCA+6WFnSwgHHcn7HkvMUDMWhiLImhUvW6f79MGd30GXIuex6BTSrK0tWs XbheBvmMz34QUkqySmDGPzg49+1oumJ6v0/lSxOqu3cjymcQl++1aeYGgSCb0Xm0eMEU nf8CAZxtFpdMgUh1URSvh6F/7+L03z/U6Y50Lz127JbJd2ZKpLVLtCsecLMuqxEbtqEk S24w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:31 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567183" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:28 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Fri, 7 Dec 2018 13:41:01 +0800 Message-Id: <20181207054122.27822-2-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++---------------- mm/Kconfig | 8 +++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 40616e805292..e830ab345551 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1333,7 +1333,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 359fb935ded6..20aab7bfd487 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4d961668e5fc..905ddc65caa3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -254,17 +254,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -282,6 +272,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -302,16 +314,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index 25c71eb8a7db..d7c5299c5b7d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -422,6 +422,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Fri Dec 7 05:41:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717445 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 72173109C for ; Fri, 7 Dec 2018 05:41:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FF542EC6E for ; Fri, 7 Dec 2018 05:41:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 53A532EC7A; Fri, 7 Dec 2018 05:41:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9F602EC6E for ; Fri, 7 Dec 2018 05:41:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4112C6B7E92; Fri, 7 Dec 2018 00:41:36 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 397C46B7E94; Fri, 7 Dec 2018 00:41:36 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21C1A6B7E95; Fri, 7 Dec 2018 00:41:36 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id C1C546B7E92 for ; Fri, 7 Dec 2018 00:41:35 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id s22so1832161pgv.8 for ; Thu, 06 Dec 2018 21:41:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=N19UD6cLHtC6p5LOEKZov7bPAn77ZKnzMG00MJGQCdE=; b=ZmT+od+Vgnz7URR+bRkqxhVAh/LkmYnWasou+yCt0PlDzt3+CkIm7bSCV2yNrppi5+ 24WxZ5foeuuCesn39x26rAZKrkQ41pR8yVar+b89APTWrSRIyVdN7nk933ExQat9QQ9k vjpOa3v+bM9vqihpiVZ3unAV0rLMsj2vpbYbhYUzqpCTHPly69EJHO5AISevcEhH17J9 6PsMFLgY8odGU78KBAZlCLtYkA3t867AB7pHCEsgsy0TKRzzcWPwA9AMvxFElG+hQPO2 Wm0W8blhHUe98vqvdShY6TmnKzvmF52R44Hxx6CVN0/UAOblKmbmwt/VxGseyhNcnW/F gCUA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYMaSnt1Ul5YCCtawLGCwbrrFFhvmN4QPBxrMG2BAZoSGlsUoRH UzPaHsVfGKlkE3hTBE/jad6hJfytFawM8MIMaad6jMQpaIcDZtOinZeT7di9j1NkI4WH2f5/6Mp Vfpf+FJRbVIax82U52nlI0d3Sup/Yg5nh6z8EwHpXlWEcKatMniM2m1YMGMom6iuDwA== X-Received: by 2002:a63:c447:: with SMTP id m7mr801624pgg.27.1544161295365; Thu, 06 Dec 2018 21:41:35 -0800 (PST) X-Google-Smtp-Source: AFSGD/VAVUzpLJ1mtBRmcYgJQC7HciU23KzaW/TJcf5aVZZarvw1O/+blag5lyMd8dzgxqWju8kG X-Received: by 2002:a63:c447:: with SMTP id m7mr801591pgg.27.1544161294192; Thu, 06 Dec 2018 21:41:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161294; cv=none; d=google.com; s=arc-20160816; b=lztZ/WkMYaxYkuXi4+dM+r2mRtJfs8U7OgnW1r8xY9VPw3cP6RBkFK04BEqVLbDCOv oOF62JLBrLnaXVpZUm3RcXgZWW94PooWNOaO6aAD6zYEh2aTc+Ahgso/WYNJd1zIZjTG +O/fJ2UorQITMapyya4L2QghbgSW0yYqZTn5/GJNCFnKod2f0gfboh05j5zz3YeSR+6P TUe4VJzLOlJ/SELXGsQov/LiEYj8pRr7HA+UlcdNrYEebu8B3S/PbnsajQFAsJ7T0Tk1 MitE0NS+L0Qf7g9JtaagpYtKSW1PzzVq8IS+QprdZ7Ha3DKpyugj0rcaASUU4tbwSYGP ddOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=N19UD6cLHtC6p5LOEKZov7bPAn77ZKnzMG00MJGQCdE=; b=u1PQXZSlHtDikbW5O0vfVQmzDiZMRr1wp37bDhQlsRH/BQ5Wjdm0KHLoHVoTdOJrk6 lAa0bJJltjyuaquFx54xQP49/xOqJPRpw/LI10TnOTFlOCibSP3Aoql/FfIdSS/6nR7x 9lid4/FNhHEU4paW1g2iZ51YTa+pcSHBXy2lY5uUDmsd5BR89uyTqwBPUID3gOQ3sfDz 5kKuen7b5vM5Yi4ODtmqM89UdZeMSB77sVixb1zO9Em29fLECzGCWkxuOvtHJ3c7GnPW crJwsV2FogcTuv4wRab8bUD0TvgLXCzRgUkCBtRKglt8MVhtiAPOUQeEp+mgAS56N1le R6+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:34 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567193" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:31 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 02/21] swap: Add __swap_duplicate_locked() Date: Fri, 7 Dec 2018 13:41:02 +0800 Message-Id: <20181207054122.27822-3-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 ++++++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index ec210be02c3b..f3c175d830b1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3441,32 +3441,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3476,12 +3456,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3508,11 +3487,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Fri Dec 7 05:41:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717443 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25733109C for ; Fri, 7 Dec 2018 05:41:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0FA0E2EC6E for ; Fri, 7 Dec 2018 05:41:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 00C5F2EC7A; Fri, 7 Dec 2018 05:41:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E784B2EC6E for ; Fri, 7 Dec 2018 05:41:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E4096B7E94; Fri, 7 Dec 2018 00:41:39 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 56C266B7E96; Fri, 7 Dec 2018 00:41:39 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 433D06B7E97; Fri, 7 Dec 2018 00:41:39 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id E30066B7E94 for ; Fri, 7 Dec 2018 00:41:38 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id q62so1837389pgq.9 for ; Thu, 06 Dec 2018 21:41:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=nFvmqntu0JBEkAH+RcaQ0IWJmk1bnI35d91k7V1AYvU=; b=dXkDvX4bzxKkw722UX85ZNJ6d3CGG/DkYo6/QsNOcbrVMgki9PRbyDoknM0mLCQHc8 qSjZoWgG2Vbt6M4t/rZrWlc5gzQfY9Z26FnNDrTEx87Qb03JWqL149+AU648aBx+sf3t 5A0Y3QY/mc64MOTDyOlWPjl5H/9DVXImMP7zaxh5x+JxmEuIxof0dYOMSJbiJm4DEA17 9kJhMUDQtTAtiv808NXsfDhhNtsA4zEPprHNCRpnYLT6epCyFyY6HYEMK7YjF2ZBECD8 7U6H2yJ2Nabe8S3C5mnhx5II5dGZkSfj+rMTs/geMzY1VpQvjxwjOeNtjEulbwu3AbvQ wa7A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbSNoO18O099VtrSxAqqRNc+C7JSC3yiNz7eHgW3ojG23LgY7FQ +OqOoCZFau3qIlsagO3V9NJLPuYRMzWcYJS9oHPh/dMKDuBGiz2fBaBEFKkv4jAXi1gUw03VxT8 kfK8UFeym6VBBCq4SnULZAB8lVUWhK2fFHYXezhXcDwgWUTxTnEq22EmB/BluY+5JiA== X-Received: by 2002:a17:902:bf0c:: with SMTP id bi12mr927115plb.0.1544161298557; Thu, 06 Dec 2018 21:41:38 -0800 (PST) X-Google-Smtp-Source: AFSGD/XK9HztzhE7tg5PTCnG7o+WREInGjPk2/l9LfcQ7fcpLpLlpI98hxhdjfepNcJN9faY14TH X-Received: by 2002:a17:902:bf0c:: with SMTP id bi12mr927074plb.0.1544161296980; Thu, 06 Dec 2018 21:41:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161296; cv=none; d=google.com; s=arc-20160816; b=LsgR04EclwbueucbiRcenmttnxlMebaQDryqTdz3HnvadC9bDyIy45r7yQiqUlizOL yU5z7ocNb8TU40khS4lDgwsu1EfBTcisdEdBDlZT/traosN/mDhaJ+984yV2jwL23u+f 32yvIU5WISd2vexxm/o/UB5x5CbUAnz/wlYwPiE8qqLX4PEyJRMXupWShwn3xfgUJk+h z/KKl06gO+92ctHeuGHXTAtGEsAys4hEZETKRcZO/BC/6EdmT4ngf7GOxlpPmfx4QTZu me2YPipqKGgSjt7YJ9xCMT6+OWhl/RzF+CXrdER9jcS5EJFh+wWyhVV/3RZLN577AwGs 1mHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=nFvmqntu0JBEkAH+RcaQ0IWJmk1bnI35d91k7V1AYvU=; b=xz6OkWW9a8pX03cleMTfAlFWhSO6V60c2YCx1XskSJyoqxf37+kCR0x5zmKYxJ8diQ NptEYG854xBZaa8BsQCuLvMrJ0XQVVehCIGGR8BY2b4OkSHT6ZvjtiEEamk5gsYHq4fL puQSrX4bWro3Y6WDABYUOKOxFwAatOEZ53C240K5rFIjJX8pKi2PlaX+mOvB308gNaQp tdQOdaJc3rTc649OJUiQbi9HVwYMdNGIg62F6aGbGr+bKpVAVonp7SaCZsELcG/aqybq mW7yXh86q61CqWSe4HzuQHc0rdC8HVInGsIFiEleCPzDuS9DJR2B4r8XaJKWlB8iRN4q 53yw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:36 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567201" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:33 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 03/21] swap: Support PMD swap mapping in swap_duplicate() Date: Fri, 7 Dec 2018 13:41:03 +0800 Message-Id: <20181207054122.27822-4-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 ++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 109 ++++++++++++++++++++++++++++++++++++------- 5 files changed, 99 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 928550bd28f3..70a6ede1e7e0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -510,7 +510,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -521,7 +522,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index 820e8905d0e8..6ec2d0070d4f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -710,7 +710,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 85b7f9423352..a488d325946d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index 5a1cc9387151..97831166994a 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,7 +402,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble diff --git a/mm/swapfile.c b/mm/swapfile.c index f3c175d830b1..37e20ce4983c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3492,35 +3526,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3529,36 +3594,44 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); + /* If kernel works correctly, other errno is impossible */ + VM_BUG_ON(err && err != -ENOMEM && err != -ENOTDIR); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Fri Dec 7 05:41:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717447 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A8C913BF for ; Fri, 7 Dec 2018 05:41:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58DA92EC6E for ; Fri, 7 Dec 2018 05:41:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C4982EC7A; Fri, 7 Dec 2018 05:41:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C82392EC6E for ; Fri, 7 Dec 2018 05:41:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7920D6B7E96; Fri, 7 Dec 2018 00:41:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 718D06B7E98; Fri, 7 Dec 2018 00:41:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E5256B7E99; Fri, 7 Dec 2018 00:41:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 0CA036B7E96 for ; Fri, 7 Dec 2018 00:41:41 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id m3so2388333pfj.14 for ; Thu, 06 Dec 2018 21:41:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rd+Rdvi1ikh/aGCw3CXR20dJILc6wIwORNlf/CuVXeI=; b=eLXyMQ28k0UFZBXzVlDTkEogXkld6VCyqruYYVdrb3Qi6kGHg3ftqRLQ3cK4w3KpBO +cCd1olAqZM0XzlRrNcDRMvjK0RBKqiSvj6YbCSmjAcNhNAUX1d7dfPKQvAXb3cqW9HL IEE4YH7VCzV9gU0L8PfD1TJ7/2yg+vBDqdqXPuEOmdu3Ui490Jt7bAraQyp3KZMoz4B2 H72yvUcbZSBxlCyjrB8/ue4/j2lhHS8uLMrm6uUdRknD2MdD0bTHnCmHagq+pGDV8cRS sfyUeyQ/UTJQ4RYo8T9C72oUYpID7np9F/73ed4iueCxcEJVto+DNlEGVn8odQAK+7OX eYNQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZDbTZzVJJqX/clgLqVNGHI3fbC/ag5IC2GxrjmJ6RVz3mG9H4/ heWxOXQvIuug+XWuZdo4FZJ2Zn90w8adlW069S4bhR2HjxA/F1emSIJ4+nPb/+nqDZO9qcFQ9Pg 1ckhFba+VareguFIxU+G+E2z1mPCDg7oJhWb3nvZeZIxRZTaBa9MQbIo18yzzrLFNYg== X-Received: by 2002:a63:1258:: with SMTP id 24mr802659pgs.114.1544161300681; Thu, 06 Dec 2018 21:41:40 -0800 (PST) X-Google-Smtp-Source: AFSGD/VLfE/nf6qw2RWlGWNQjAjkRARExEeuk9aa+lVAN5FbIJPAscgtaDSzuuTbvD85IH6jjtqZ X-Received: by 2002:a63:1258:: with SMTP id 24mr802631pgs.114.1544161299631; Thu, 06 Dec 2018 21:41:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161299; cv=none; d=google.com; s=arc-20160816; b=FFAIQYp/M7HpVVDpye7eQRloxipIqDXlSu1Wfels648p5X+QtwXwowj97etNaa8nHx tPSnJXn0Vrz2LKG4onDptpS5xs/OBOT+94zwTi+MLv1yWQM/LiB02Y1pRGPaUP47pOEy kUE/gNHgKxuWm6doX1E0sSPBCirhs4ehWLUxFkPpSG4LFch4WAC8dcOGApSXFrPiPY88 lKtObIdWP9eFsTslm6NPUHebv7nv3LxFeiQBTWjDGzyMxcPWDayOIhNc9aODsURRMpVB QymUuAJTSBUGrXJV1tCmZzAbQ5P05KqhcW4ex+P8P/5gwfevW9QJwm/2xrj0FvkZyDyq rwGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rd+Rdvi1ikh/aGCw3CXR20dJILc6wIwORNlf/CuVXeI=; b=X2BBQnVaQfT5YkJRQAPRv509hdpQFdlk0NB+TKkcW5Ja6yx270BcKgiytCCfdcjRQ9 eG70z7BpGgJSOd8YuECkAnN6Hn/TA6N/zXjrH7CXxf2oZKaHHZdZwFNk9odRM5LhxhaA GbOfdvNQbBjrLdZoW0BnyiZOEOBfSYS/xcfJlV+a0Pm8tlb1cRBIMNiS9k6IiJuZWDLI rbFlk9GFEA8JwF+s4vF0GOQDdQ9h6A4dS0EVLebTFtgJtx+9lC9iROPIqgXvrN/Limbh hGd/Qv+wXhH5E/rb5vT8X/x5pglKodFTjyVTOaRd2Sr5Ssy+tCC5EcCMETr2NpL7oQoe rWAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:39 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567207" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:36 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 04/21] swap: Support PMD swap mapping in put_swap_page() Date: Fri, 7 Dec 2018 13:41:04 +0800 Message-Id: <20181207054122.27822-5-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 37e20ce4983c..f30eed59c355 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Fri Dec 7 05:41:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717449 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 266EB13BF for ; Fri, 7 Dec 2018 05:41:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 123C42EC6E for ; Fri, 7 Dec 2018 05:41:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05F082EC7A; Fri, 7 Dec 2018 05:41:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB7F72EC6E for ; Fri, 7 Dec 2018 05:41:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 242516B7E98; Fri, 7 Dec 2018 00:41:45 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1A4A46B7E9A; Fri, 7 Dec 2018 00:41:45 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 018426B7E9B; Fri, 7 Dec 2018 00:41:44 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id A37046B7E98 for ; Fri, 7 Dec 2018 00:41:44 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id t2so2388874pfj.15 for ; Thu, 06 Dec 2018 21:41:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=He+gFwYLQmQozZVVwVgqPkASwt/V6pdwShzeROa0pPs=; b=lToFW56DcMuu/fT4wjxtCvv+wEXNvK/esONNEUaCYe6QTVDlsQ91NQ/uLBsyhl2hcw c/ge8WluUn59lWaO4WU4U52fKqrYim2uwWx9TNEbequHlRcUknCLbmkJlIcBg3VIUEmY cjpZEIkoi4PgTd9NBkgNdouUb52kz9koZGREBchcYsD4GUcYafpiKAADlEPkn3T9cx/r djq1UlDEVs1kVqH9XD0idkEAdW2hTItIqEXqzzU+/HURtvQs3poTgj7AlKpRIcl6JHko N/drfHdWj23vkIPPtZEg8dA7rj+v9VwFpqrhz9hF6sqKyWCHV7VSgMLxiP+qXcPBUKpC oTDg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbfPtIlkt0HM1fIN0MxEXE2WPl15qZoIBq7NC7dAHwnA5EwxRsq 7mnQq4YiSDBnZHSGG3hHX/C5u8GCzMU0KRMQW6/0NzTdFdsNxI3qm3LnmDufD8n3AU1OaHR4+XQ Evj0j7BVB95lM+/W9p05JNW0dXycxV/rSHFOvdYB7N1WrdQpEksNsBgFUcTktl6loXA== X-Received: by 2002:a63:801:: with SMTP id 1mr772877pgi.275.1544161304302; Thu, 06 Dec 2018 21:41:44 -0800 (PST) X-Google-Smtp-Source: AFSGD/VdsB90rnqvteEXxbwpQtblg3LHG3xtzxqspfl0AAiCdErHwbdYudQauz9IIDDHJ6ZZqHJQ X-Received: by 2002:a63:801:: with SMTP id 1mr772842pgi.275.1544161302673; Thu, 06 Dec 2018 21:41:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161302; cv=none; d=google.com; s=arc-20160816; b=WzCG2eoRSUnEI6c7Tv8CCFzFsLFg2C9dY2sHasVsgsF7o8AtYKfErET1bLziBqXZRU gnSSjFPddaw2BJ3y3PvucMtzFmeDtuUHXZFwJiz9dWE92ALqSAW3feKz1PpieWvEGgPh BiO0W9JBUxEkRV2NPlA4yigXPUZmrYbiZuW79gscU8h4vzoDqU3EJBOKCU7//Oo9ydpT 92+yc98UUtQLKAp2sickLnNEB240+hptA7WE0QsEovPlP8jw8NojrvFdUpvuXaTNYzPE z6NwIASrbJmxlIbi1sStrLmajHcuPWvW1ZboW4XbAC5GSt2CLXNBToKl9jCjucbzpcEQ P+GA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=He+gFwYLQmQozZVVwVgqPkASwt/V6pdwShzeROa0pPs=; b=dYs0m5L1gUx0GKWJRxYCqeSnFJq6gpx3DhBfc+WpiHp928lCbJEQv971jFKKRMcCaH 0xAn5GUwQZRI3C+JXAhKUzJq0FWG0lypJOdC4WasRmkSGGvSenMm6tyH0ZsGBfeZixCL vd3ViIGrezi25NL68hMmpSomeA/WbmD5xbaPk6Ykm36andcScJqZBiF1J0oQS0XNePu4 shHX0gMXW1WiybvuUD7pPkkpSoDCToayDqoq6wiu8gTQC217dyZYzDxF/NGEeEGoKPmT rfYwhs7frXJbnj1BITZW3MbGevE45i4/eubdG4p+rSrJ2hNoeEff0u/QFV5hCmMAxi1j anyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:42 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567220" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:39 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Fri, 7 Dec 2018 13:41:05 +0800 Message-Id: <20181207054122.27822-6-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 ++- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 6 +- mm/swapfile.c | 171 ++++++++++++++++++++++++++++++++--------- 7 files changed, 149 insertions(+), 49 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index 70a6ede1e7e0..24c3014894dd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -453,9 +453,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -509,7 +509,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index 6cb1ca93e290..cbb3d7e38e51 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index 6ec2d0070d4f..35973cc5425b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1135,7 +1135,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -2824,7 +2824,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index b74a3fca9f81..2862be798997 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -668,7 +668,7 @@ static int shmem_free_swap(struct address_space *mapping, xa_unlock_irq(&mapping->i_pages); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1186,7 +1186,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_lock_irq(&info->lock); info->swapped--; spin_unlock_irq(&info->lock); - swap_free(swap); + swap_free(swap, 1); } } return error; @@ -1720,7 +1720,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); } else { if (vma && userfaultfd_missing(vma)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index f30eed59c355..3eda4cbd279c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2340,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool frontswap, } mmput(start_mm); + + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. + * find_next_to_unuse() will skip these swap entries, that is + * OK. But we need to wait until they are freed really. + */ + while (!retval && READ_ONCE(si->inuse_pages)) + schedule_timeout_uninterruptible(1); + return retval; } From patchwork Fri Dec 7 05:41:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717451 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 324B2109C for ; Fri, 7 Dec 2018 05:41:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1FFB82EC6E for ; Fri, 7 Dec 2018 05:41:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 13AFF2EC7A; Fri, 7 Dec 2018 05:41:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54C312EC6E for ; Fri, 7 Dec 2018 05:41:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80EE56B7E9A; Fri, 7 Dec 2018 00:41:47 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7417F6B7E9C; Fri, 7 Dec 2018 00:41:47 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60BE36B7E9D; Fri, 7 Dec 2018 00:41:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 0D00A6B7E9A for ; Fri, 7 Dec 2018 00:41:47 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id t72so2373916pfi.21 for ; Thu, 06 Dec 2018 21:41:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=A8SPG2AWA0GCKgY7w38h46NrJ2ecbrE6BWHT6tbx6cY=; b=ItJaZesd52T3JLSndxYeCJVy3JDZtUDfE9huI7ed5JmJDC0kEG17C9VvaJaCAoUbzK T7i3IZjsSIM1Pp8/C4nJoxQ3LBTWJq925DM0/si+ZpUrlGUZLVTi0d1+N9LhJWhH9CD9 oPcSw6pJymjeUsBUoh8fbZj2LkoyP9Rve7sx1Zu3U/7FP286Ins/+M3Qjg7hvzOMoJ0q wVd2VnSjLdS4ISDjK9DmLva8ij5ixYwLF32jhzez9T1+kf4EmsC7rR3DZiYjIKiTE2sI nDXjcNYOslFXfZumEbLgxBPqqF5ZA9JxyQnI9RHT6AykaudbS9Iva+4aP7v+o+hyN5By ae/A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYMs5TtZyYit1PynsLri+QdsRpdNolJH/xz9rA87RXkK8A35l3/ 1BZ1AaSUoVNFwdf4F85a8sI41HZgS5yEEutHOq+NJK7MgcuhUETYJHRz4xAq91gG9HdNiVlvW3+ MokC9j3PF5AnWY4MdqGs/IBf7Hsf3ilDp5oDVis8zqsi8Xk1+wx/4dZvKVkL8JG82SA== X-Received: by 2002:a63:f201:: with SMTP id v1mr765235pgh.232.1544161306707; Thu, 06 Dec 2018 21:41:46 -0800 (PST) X-Google-Smtp-Source: AFSGD/XnIJSDqGD5knPQ4gCBHh3ra8MA3W6TFzF1k9mj+E85C95wGQ4aVRGJ8PiWYSyq517TESoG X-Received: by 2002:a63:f201:: with SMTP id v1mr765208pgh.232.1544161305381; Thu, 06 Dec 2018 21:41:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161305; cv=none; d=google.com; s=arc-20160816; b=Ox2bu/gp2+C9uiV4hP5mJCLoMmwexBIzfkknT7QGeMzBRAolpjMuKTxHh6TAflCjD3 rtgNGdOo+FIsFdCZheVx1Bl4+HvSVXFS0nqaW/esC2arsIzCHG+acAoZCatIO6Q3OaSm 2VfZgeMhn3pxVlIXO0BWAD4jAftQ5MTCSqx3VwFqFztI8pu7LdW74LKKcy5iYluMJZby MlxpX+e02QvN1tsM8LM1Kcpr0a6VmHqahSrBj7bcZl4f+ByE2yoaksVDqGIqYDmEh4lL Ya2TEFchjdzHT+l3ejVUCGBNHTsgAsCY7X209fM3P9e3kAXBgXgtgTnW7xtgm409o3IR LPbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=A8SPG2AWA0GCKgY7w38h46NrJ2ecbrE6BWHT6tbx6cY=; b=ds/KSX+Qa0VmAcxrebKL8ZZm+o1vbGzqIK8bnX38+wglHrFmthwEYxbPYdScmjwsYB N/2pAZBRNShWv9p3D6yQK9Mztr7Loy+nsAcnbzUv8Q2N0Zs9EwVDhFmaLIA11gv5vV7H bMbhGREuDzdQVmPOvJirDaDnO8wHv3q7jAVeZXENLmkALRdhovMZmj20CbWTLORvNBd/ rORI01rnHpDUXV/rboSdKESsuVvz1fUZBYYnirOCPXKCn7+yPMzrFw1lV3Od7hgtjYLW G1BWZKoV6EwMOCX5C/3vm3GHjYO6aS72nitV0NyWFtITXtyKKCoacoVLm7lLSdIpHWZB G/JA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:45 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567234" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:42 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Fri, 7 Dec 2018 13:41:06 +0800 Message-Id: <20181207054122.27822-7-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Thanks Daniel Jordan for testing and reporting a data corruption bug caused by misaligned address processing issue in __split_huge_swap_pmd(). Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 +++++ mm/huge_memory.c | 49 ++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 +++++++++++++++++++++++++++ 4 files changed, 86 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4663ee96cf59..1c0fda003d6a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index 24c3014894dd..a24d101b131d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -619,11 +619,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2dba2c1c299a..9ec87c2ed1e8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1631,6 +1631,41 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + addr &= HPAGE_PMD_MASK; + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, addr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, addr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2095,7 +2130,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2119,7 +2154,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2163,6 +2198,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2254,14 +2292,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 3eda4cbd279c..e83e3c93f3b3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4041,6 +4041,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Fri Dec 7 05:41:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717453 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9924109C for ; Fri, 7 Dec 2018 05:41:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B41D02EC74 for ; Fri, 7 Dec 2018 05:41:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A51702EC6E; Fri, 7 Dec 2018 05:41:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E97802EC6E for ; Fri, 7 Dec 2018 05:41:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 264016B7E9C; Fri, 7 Dec 2018 00:41:50 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 177886B7E9E; Fri, 7 Dec 2018 00:41:50 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8C926B7E9F; Fri, 7 Dec 2018 00:41:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 96B896B7E9C for ; Fri, 7 Dec 2018 00:41:49 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id 202so1838447pgb.6 for ; Thu, 06 Dec 2018 21:41:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Aios5D2BTvmB1cr4Xbtp9aJgJd3YhAd50A1foJOsV9I=; b=jaNizgdMOz5cJC5pdAE+0WZ5AvMrwKJBt46WBgig5Qua5l8OshB+RPWl1BX+bCd5rf IViJZolQ79n0wHq9QBK4ZWL1TxxD3mW0B30lej6SHs9TkZqp+m7wRleXoNAJOCH3ibjO staJu/ibGPNi+ecv3Jpm6tdrHxpZo4+xeffeIWvYXeNS8u8tHT5JyQeg142ytDy52Fdz rTTKamhbZlAbrHaqLzSGuu2fi6Loxzc0znlV9ReR+WFwVp5XAm/AU7ucjqj736BTLbvO 6G1W2YxdvZmlkzJ+dW+xHAaklkqj8Ejn15NuIPO9R5K3/D3zDjfHbAm7H8fDdSwVw+4J fwLw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaTVmozrORXLEcbseAktp1Vx+TEYT8euCvvmSxv3NfRJAhrW2Ld AORESMPHNbqjT0DkiG9vAVHIhssvgFVLjRVK1J3zdMIXGOrDXIn4AhLFWz5KLqeFZCOUqtC7l0U QY2hRbJXaoAR8lHnpek6bx+OZAAfCFe/1e9Vwo3MuXwWRdSYO/8m9DhNjuNdBQ+WNPw== X-Received: by 2002:a63:d34a:: with SMTP id u10mr788324pgi.301.1544161309251; Thu, 06 Dec 2018 21:41:49 -0800 (PST) X-Google-Smtp-Source: AFSGD/VmApKe5EdBGDrVIP6H8UqmnHbp+w+owDTXVC6MEA7PIvHZANQCnup7iF4hUhKERZ7vWS/g X-Received: by 2002:a63:d34a:: with SMTP id u10mr788288pgi.301.1544161308014; Thu, 06 Dec 2018 21:41:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161307; cv=none; d=google.com; s=arc-20160816; b=YFtG25gAIAPIqFnUx13OMbZXpHE8sCq3LzUhBe2KLqENkzdeOwhba9yYn2yu6/bMdU 28NJEWp152PqeCt4NXQMe5ux1nWrEtYtfUVLF/zDq2l7jSrYr/wje/pw94sl5g3Q00hf gmdEC7Ovq+SAlCpEJsNanP0G56muspO9ijm4Vqi3RIjEIx/LR3fG8zEnBi74g5OcKqOB hGCENmH08VlEkY+X5YbtGGrL3xLZqWAC0lN8Er8W1lTagp9+6+vlkMiNpF2kKNEklppP 65ikwyioVO9pB564+eJvdFFQyUz3bAznthnPnM0edyb/nLkgSWxwdS7XNzASq66UhD6A pxhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Aios5D2BTvmB1cr4Xbtp9aJgJd3YhAd50A1foJOsV9I=; b=GdyPNEy9fP/egz20PXE2wl5n7nLbn6l1fyfb7hF/UoYg/HgsnFGMqY0RQCnAA3YVKJ fEbJOIp3O7BIO9AMnqDSHF1zyKLOkLORNKVL3IrmWDW6bmXmF2i+NwhsXHOGI8Oqb+xn pE2F+oY4aj0DBPjqpZE8ApmwH3b15++QjmAjno65aw20UJHNjZsM7Zm+DK2JMPuMO4pO hudmVPzbXUH0IHzQCB5vPKQKTVNclV2p742UToOCENXNSKfBj+BOlT4h8d5XxHDGzSef 4MxrBRbRKGE61P9h/3KC5U9MJ1bMBFdF3StYXCeIq/yKgbXA10UhHEHBt/NHbKOVkDJI yGWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:47 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567239" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:45 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Fri, 7 Dec 2018 13:41:07 +0800 Message-Id: <20181207054122.27822-8-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 +++-- mm/huge_memory.c | 18 +++++++++----- mm/swapfile.c | 58 +++++++++++++++++++++++++++++++------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a24d101b131d..441da4a832a6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,11 +617,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9ec87c2ed1e8..d23e18c0c07e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2519,6 +2519,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, remap_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2753,12 +2764,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, end, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index e83e3c93f3b3..a57967292a8d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4071,6 +4054,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Fri Dec 7 05:41:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717455 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B651113BF for ; Fri, 7 Dec 2018 05:41:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DDBB2EC7A for ; Fri, 7 Dec 2018 05:41:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8EE7B2EC81; Fri, 7 Dec 2018 05:41:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 872C92EC7A for ; Fri, 7 Dec 2018 05:41:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB22C6B7E9E; Fri, 7 Dec 2018 00:41:52 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B39E06B7EA0; Fri, 7 Dec 2018 00:41:52 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B0A66B7EA1; Fri, 7 Dec 2018 00:41:52 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 501E96B7E9E for ; Fri, 7 Dec 2018 00:41:52 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id ay11so1926855plb.20 for ; Thu, 06 Dec 2018 21:41:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=7HHlYNpLoojSOccRN04egdpMmiicnxZkT9GhjNJhnOE=; b=Q6q4sU2h5W/qDWH4O+2AAPOTE0JiffJck+0u/t4fvZlCVJY9udHf095O6IaNo1zlon lqgLhXsl90R20C2Oz9ou1HtenCSWv7K9wWZhS1lnnrxDP53sxGmHvX+gtFJoqycBEzyE sMFxoutR9bhYLoPHXZwPcsgp4C4eliAMDWw/wks01vr3DK8l7dpXcdMNDarmo90KPRcG Sk2wV+DJZLEfNsFmtoWvRJO0XTJph2GfPjsLt68p0GKAjxrdXJD9f4+wVB5dK6Iwar1R 2Tz0rsJGh2VFtO6zbJCwIHtDNAyLt1EWwaZ/QzEmNxzaQvrJ5HrNH7QLsJzKlB0n0V7f d7BA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZzYk4mPhUFx70s7oz5s/OtHz2xDPBC9mCzlwgZvjkyrimEA1q1 CcCuLz0+MI9DRbkaV03/KGBQhjbZH7FWZl1/4YRZDsGRDemgLlw8hBTuRKqb9nlKDhzR5jfLNVe 9GekP63X8Q4eIuupsBn9F6taKsmTQhkjuTSqD+uGRXJZEl6tePgQ1q/iT2r8OPHDEXg== X-Received: by 2002:a63:b81a:: with SMTP id p26mr784157pge.433.1544161311916; Thu, 06 Dec 2018 21:41:51 -0800 (PST) X-Google-Smtp-Source: AFSGD/Uz7mt3VjoKdKAkaWLzFjCg14+2nNfRlOYvB+k4qKTNfVOUxfnLNSGXrIOZvWH6OQDxzIht X-Received: by 2002:a63:b81a:: with SMTP id p26mr784134pge.433.1544161310815; Thu, 06 Dec 2018 21:41:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161310; cv=none; d=google.com; s=arc-20160816; b=J5XyWFQXuIvXyTdikhJHH3pfkjcWLszCSuRMDpA/XlNoVX2gowgkQsBlm0bDGfg4jV GWoaqAly7M8/OpkNrYjknMN79IeMhnMW5u6qYDFh352fMZ3ZjkksHubO6aSDsKnXI7/b wkvdy0vc8d3T7+/mbEGdRlqnLXyXnbETtrQmupbJ4zGfVRJStadSBqWKT3+fW9yDadWM q/uTu4kBFzA89Rjb32BM15O+YjnFvfSI1FOP+W/Qjjx/jYflkILtbccfahsowdANx/vD anmOE5Tja3eVQUpBfcW45JaQika1uP8pZlMcEMu/xXTnIgO5VAkX/b/z5n8b+pJi9FfR 9fRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=7HHlYNpLoojSOccRN04egdpMmiicnxZkT9GhjNJhnOE=; b=DUx6oN+8mhGS95kQ2Uv8e+2fj8gd4gdhF5+J1Ggcla7SCrPM8m6IVrI5CBXEYJpM8i 25ecaovLzi9E0fkE/XkL6SlOpJLbSZ4k9B3ybtqrIqaK4Q0lVl1ASr0+uGT7GpeHSGlb 4QcUTNB76aSx6yWydQDWssBuUydI8m9EVdArC36DgfYpVmMOz/neIfJMS2QYXZY9B99P DvgU0TufVzDMhlA1dGCfaRz4t060P/ZjVAOq15LEFMfrdO+4UWlJ9kO5kBP4/8++CHjt 41Ellcz57ubN8uLx4q1hxVijYn8P+1dfD0gtjaqV5EWncCnuNEzReWTugEkeWayMR/Rc +JmA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:50 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567249" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:47 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Fri, 7 Dec 2018 13:41:08 +0800 Message-Id: <20181207054122.27822-9-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 8 ++++++ include/linux/swap.h | 4 +-- mm/huge_memory.c | 5 ++-- mm/swap_state.c | 61 +++++++++++++++++++++++++++++++++-------- mm/swapfile.c | 9 ++++-- 5 files changed, 68 insertions(+), 19 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1c0fda003d6a..f4dbd0662438 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,8 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +365,12 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 441da4a832a6..4bd532c9315e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -590,7 +590,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d23e18c0c07e..2004d8ae4390 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -630,9 +630,10 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) { - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + const bool vma_madvised = vma ? !!(vma->vm_flags & VM_HUGEPAGE) : false; gfp_t this_node = 0; #ifdef CONFIG_NUMA diff --git a/mm/swap_state.c b/mm/swap_state.c index 97831166994a..1eedbc0aede2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -361,7 +361,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -387,14 +389,42 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp; + + gfp = alloc_hugepage_direct_gfpmask(vma, addr); + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, + vma, addr, + numa_node_id()); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -402,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(hentry, entry_size); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble @@ -411,18 +441,24 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; } else if (err) /* swp entry is obsolete ? */ break; /* May fail (-ENOMEM) if XArray node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, hentry, gfp_mask & GFP_KERNEL); if (likely(!err)) { /* Initiate read into locked page */ SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } __ClearPageLocked(new_page); @@ -430,7 +466,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -452,7 +488,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -571,8 +607,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -733,8 +770,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index a57967292a8d..c22c11b4a879 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Fri Dec 7 05:41:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717457 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9FCAD13BF for ; Fri, 7 Dec 2018 05:41:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8BB042EC7A for ; Fri, 7 Dec 2018 05:41:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7FAF42EC81; Fri, 7 Dec 2018 05:41:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B252D2EC7A for ; Fri, 7 Dec 2018 05:41:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C03B36B7EA0; Fri, 7 Dec 2018 00:41:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B39EC6B7EA2; Fri, 7 Dec 2018 00:41:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B33D6B7EA3; Fri, 7 Dec 2018 00:41:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 521E86B7EA0 for ; Fri, 7 Dec 2018 00:41:55 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id s22so1832574pgv.8 for ; Thu, 06 Dec 2018 21:41:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=qphqggT+z+ztuCdngCwytzHIbtcR4ITekQvItnf8PsM=; b=C/CH7W2x6sKVSJfUB2raTgWNx4fVo5jTpd7YU4GGd2L8zOYqRTXXffURJMp099b9n0 ZFtHXuZfTBwydELwG6YAXIeZdI6VTh3mhrZ1tGJEzdjm97pAsygVfDmmCmSKgwhJvHbe rWz/XZn0ME4Tl/CbNfVyYXSNISRkZcv4DLMaj40oAdqxii0JeqlKYColUuK4UuyhMI9e ehO0MFr7eE8VAdqyWL9uq92jEu1JEcvjFBZJKcwT4e2JRhlj7wHiso0q9VleOUccz/Sf s7+fYACidsNw62sGQ3Bf4OJQF+MVteQ9yXGufV5oBEwzYEoXSsBfxvHNjrS8LJKJ1oJj eNsw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZ+uZUvPD+BywRye195Wtku/jEowpwujDZUn65lxilQdzexxgj2 go2HIb8a2mkkDIjXi6uFHFdU59eK5KHttf+M/SetUNWrS4+7t2z9BpyHIhu6BnIJKpMF66clKMZ CO0U+lfFe9s2wvQyQ1IImpM70mOlmvbuHS3AZgkOaX3YPEMmfbb3ZizB2Mia+NbFJXA== X-Received: by 2002:a63:eb0e:: with SMTP id t14mr773624pgh.445.1544161314936; Thu, 06 Dec 2018 21:41:54 -0800 (PST) X-Google-Smtp-Source: AFSGD/XsBQBYu6DQSVWRqYNPtBOB91PoWMEthBItQacDuiFRz7cCd/sFPaaeANPvA/YeE6zWCZeX X-Received: by 2002:a63:eb0e:: with SMTP id t14mr773592pgh.445.1544161313488; Thu, 06 Dec 2018 21:41:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161313; cv=none; d=google.com; s=arc-20160816; b=G3MDKV9hWzjZ6aTpG+QFoMLqHOI5gYlMouWsHS7R7T400skm6d1RXUF4NoqT7LcMwH H+kCPqUPZ0bkgt7OK1tkSjPIxRLWqhHPl/YPbbOcAQNCnpnH1UkLB0ivlO3aCSEJfK6S TXxgmiR2vJuac8NsFqDro4cLMf1tYpUShELqlmSjxQQJo2QiZF740U/Wyp6e8BfPFg/m 6W6eC9GjMlBl/gcD/PtvPlYeFBtCMvw7oeiHrR2zscvSb3cHLWJV3OT4VEdr7KpOqLDN jRcAnuoSJKL9+YxIT7gEHwJnBI5C7SCIQ3s2Ax96dJug04KhTda300w0/ZaF1PXApKDj KKMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=qphqggT+z+ztuCdngCwytzHIbtcR4ITekQvItnf8PsM=; b=tkYxLvCW5LRLeZYARy0NSlwCCUztvljotayZwJCdp5xIGjel2zMzVBa3siT5+/n/fL /++A5GEjtaEoKboOtwkuZC/19PAtAqaZZDGkdF3apeJH6AEaALM3s9sZMJM4E8Vvxa+I AoD0CxgfA+0ejdFneQgQFQEAGNlTnvpNk7tLVclASL3dr0MeNEtYnmQNpOS1sVq5ZGdo 9J1SvjXscfNV86QfpL10WmvFaS+lNwD4kNfT9YFlj7Hy3nTaQk556PyY37JHdmaOJlgR AQ+jbgTo0jwljlwmDsGWVbzpUwkYLSu38+Nb4FYbW8C1Z6E4GhhKgnd8Hbw6vFQOb4gO txVg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:53 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567259" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:50 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 09/21] swap: Swapin a THP in one piece Date: Fri, 7 Dec 2018 13:41:09 +0800 Message-Id: <20181207054122.27822-10-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 ++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f4dbd0662438..909321c772b5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -373,4 +373,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2004d8ae4390..3bb2df7f5f84 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -34,6 +34,8 @@ #include #include #include +#include +#include #include #include @@ -1667,6 +1669,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index 35973cc5425b..39b9e9ab412f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3865,13 +3865,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Fri Dec 7 05:41:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717459 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 462B8109C for ; Fri, 7 Dec 2018 05:42:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 334B12EC7A for ; Fri, 7 Dec 2018 05:42:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 272B12EC81; Fri, 7 Dec 2018 05:42:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B15C2EC7A for ; Fri, 7 Dec 2018 05:42:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 594806B7EA2; Fri, 7 Dec 2018 00:41:58 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 520926B7EA4; Fri, 7 Dec 2018 00:41:58 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3719C6B7EA5; Fri, 7 Dec 2018 00:41:58 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id E1F3D6B7EA2 for ; Fri, 7 Dec 2018 00:41:57 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id d3so1812594pgv.23 for ; Thu, 06 Dec 2018 21:41:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=y0/sVC3bmQURZWvLIXSzWUkmUi9dLD3cA/MNYt+cBb4=; b=PQPvti3kU7bzJcq/Kld+fCCq84aAu2nAZYfl7+OGBKCMfIRiAxxkVXwNgLM9MeKQtj vLm9jNz4hnNyqWkT4V6Gxyi+inj+fSOkDJqntXcigQ6QYS20CSb3K3PnTv3p1/Q8CnGe bMQM6NvN/kyqFSyVe62j/JHHr8OsUo+yg+m/tQs3W1p13Zru/o/Pud3Pjk64V+a3LVxy puF83jauIawUgJ/0kmpw1fL1oWmm00Au/VFOjeugDMYEvV7PTNhGwrjTsC+YdpfE/bWL a/1EDX/mt3lI7GCdncE/5UFdflTvJaP5BfgYieeC4wYEdZcwbL/a1HzGty61figQoCQo cIOQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYEQulZlcD72HC6WNvSJzTityHEkE1I+xJgFGejrxy35mLMSx6j 4YTul8Teqq6H+qgBQSUSlY0tnfRpjGBmyiZnM7BjoKN4gDmz8WSpYWVSOYNiPuSK57z9I4brykT yrd95Szy+Zwvao6tiWo9Sf0ze5PD0SZaDmI0lHCvSEAA/EYTh0rf2LeWEj5SrsamaEA== X-Received: by 2002:a62:b15:: with SMTP id t21mr992324pfi.136.1544161317560; Thu, 06 Dec 2018 21:41:57 -0800 (PST) X-Google-Smtp-Source: AFSGD/XWy0Nyokll7AXRddzYVMgLwL19kcZV9gcJTPvrfPVR55w9qSIzlUC68urzMJ2JATMLA633 X-Received: by 2002:a62:b15:: with SMTP id t21mr992276pfi.136.1544161316363; Thu, 06 Dec 2018 21:41:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161316; cv=none; d=google.com; s=arc-20160816; b=SLJOSnMnfeElt+xt8VcRwb+RwaIr9uiSXXb2NRDVgpMAziTk+Ut//OBiQHtez/HeQr Kp7lGJmlI6pKCc7U7i0tq/d+YolqhLM6onSLz2z73usMZQVghnagq3EIJGuJe9ZfkAiR LGuGo/ca2Wk+aBPB/LWP0APmRsXeHJhQNc1cMBY8066mb4FKKPYeJTE4YyuyiYBpIMrT Sh8fCcsg0jKnxIm7Zf3d1K4/Ept90yxi+7QgPeoGFtdrrItfJjwfCPYmOC4Acz9lGCN/ Z6kVM+PDYu52W4sdWtWuJiVwPRfNnYar7uYjUKWf9q+AGb7sNVq7RxoyaAzNWuFmmrVA gumQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=y0/sVC3bmQURZWvLIXSzWUkmUi9dLD3cA/MNYt+cBb4=; b=sC+vX1sWc/CxoQ+sLSKJycfUTAOX95G8jUJKhd103J0Mf18EAD9TJy9e+aaP8Rn7hr wBKG6ZSiLuetyTxgUABc5nHHKCeYR5P/U3J/r75unyCS5WHOg37xSd3joigbvFrayARU XMqQW17Hi/DLKC+wAbRDuVu1sencTv4hoHDjwJ1tXkRszv4lWaU6csJbARnfGYikPYPk ZzGqj3AFRhPzyUqlLIqtBkc7yaRaX8LUbV33V5ANWomt7eLvzlp8h4MggS4KqqX4mwsv wuSpuPlC6Y6di0q3GsG2AmCMnFAIoT60Y+LIP/UOePPJZ9QHsLZakdJ59MuzYwsYyScl 1tTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:56 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567268" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:53 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 10/21] swap: Support to count THP swapin and its fallback Date: Fri, 7 Dec 2018 13:41:10 +0800 Message-Id: <20181207054122.27822-11-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 7661abe5236e..24867b771ae0 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -89,6 +89,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3bb2df7f5f84..c674deb643d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1723,8 +1723,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index 5bdfd21c1bd9..4da5e2083893 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -401,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous) get_task_struct(current); bio->bi_private = current; bio_set_op_attrs(bio, REQ_OP_READ, 0); - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index 18de2ca87533..37638638b9bb 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1266,6 +1266,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Fri Dec 7 05:41:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717461 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D3CB13BF for ; Fri, 7 Dec 2018 05:42:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A3652EC7A for ; Fri, 7 Dec 2018 05:42:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3DEE32EC81; Fri, 7 Dec 2018 05:42:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 70D232EC7A for ; Fri, 7 Dec 2018 05:42:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C1C66B7EA4; Fri, 7 Dec 2018 00:42:01 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 349456B7EA6; Fri, 7 Dec 2018 00:42:01 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C3186B7EA7; Fri, 7 Dec 2018 00:42:01 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id C1F0E6B7EA4 for ; Fri, 7 Dec 2018 00:42:00 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id h10so1935747plk.12 for ; Thu, 06 Dec 2018 21:42:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=HeT2AdnvnTNjaIIsiSPZjuY8CPnZ8sDkcacDj/G4Hg8=; b=SE9IScHZSaymVxwmMRoFS8ZW3FFIjHmVBe9Nw8CcwH3m1s8mz2FxlwRHiJiCYwcopE LwNzmghDlNWmnZPGiBODJ1TXxYjeQrBNGy5MxggBgegx4B/u8n739+C605mDiGd4Cjei OnxVTRNoUaK5nDFbTHbT2ocji3olHlafltIEl4bU52Pxy4ZgyeaonxP4laErxkc/B7Sn AgVvmpw7z9pf1kV1I4ygfMcrjaqOkk0nS9lcyRZDpTTobeZlPFOawK88xtNpRmwCngGa Tu33UroBvqDpwpqLWA7w96mVWJlC6TVgJrsbRv4+TPBzV6c/nlgylh85sB1dzteCE+35 kAHA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaK4jyWf6yyssg+hEMCT0E3nrqwmDISuuJCN+6O/a0IRKrbrIRZ /IWHitvIb2lwSaBEHwJ6IAH3ThR19s33KJ51JrSq1XV2LW3Vxc2DHdunomKxrkAmP9P2NGH4mLp D4KRkr+NZedsre/X4zVtY0AoLkq8lZEcLBuqv83+AenppEMD/mLa8qNQbPnIytjQOzA== X-Received: by 2002:a17:902:2b84:: with SMTP id l4mr880395plb.191.1544161320368; Thu, 06 Dec 2018 21:42:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/WbaXy0zXm0FXdwgjhWOk6RilKOBVRhfdix5uyKUqrX0e0ysdmdYW63UtqSGMr3C/0k3JPt X-Received: by 2002:a17:902:2b84:: with SMTP id l4mr880368plb.191.1544161319112; Thu, 06 Dec 2018 21:41:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161319; cv=none; d=google.com; s=arc-20160816; b=B0iQjZX+nxnjXngruYQPoDCb9ASEw6572ZhtqO44QIOd0PDkegTNnvujUU/ejrNiHN bmfeVLGe53u7Xygf40nLbh9fLgT1p/1LWx7O4Ri5n6vJvosYP+y4wGlHBfy+lpeNdOpv dEgQSXDGXIepSCKGnUoS7911T9rweDQ68bkCybyNgCfIOEuNfWSr98nVyBao7EHwoH59 o2sGT4QKZAbAiBSNwO4OGs6kTONo0qhfq/8e/mcRwBZBaaEPbr92yj2i4d8cjU0aSpFp CbznRic34saJKA3TuKgFqmuf6oFKest6dPENFrDe5h2iUJ3R8JGP3bPoDUG5PJxs6liq 7xAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=HeT2AdnvnTNjaIIsiSPZjuY8CPnZ8sDkcacDj/G4Hg8=; b=s51yzN8xMpu8mFtLn3zWfqTlrTmverFyiMEbOIBQH+blTQzstUZOc5nE0dTjaXXcvj L0kslLXyIqIg6p//C4/T/em17tiuS7bNpRY1lVdGX9LF6HN5W7u5Ap2DRZPWLZsyfpOC PI7kYeWDpisNtO1GyI/XSPRsImf8zSmkg/sN9J7gsHcpUJg/PSChARM5S0vfLIskV728 Mll9/V5T9HkTK9XwJFgyDF5EIF5qeFuXMwbAVN7hBqHCGdpWaecxNbdpUMaUS0MafTt9 Ti0N8c1o+VskijFGC1zr/ussZhY7ENoMjt1Z+uDBSeQ/rn0jLuwFmsJ81X347yZcAlDP 1NbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:59 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567276" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:56 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 11/21] swap: Add sysfs interface to configure THP swapin Date: Fri, 7 Dec 2018 13:41:11 +0800 Message-Id: <20181207054122.27822-12-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++ include/linux/huge_mm.h | 31 +++++++ mm/huge_memory.c | 94 +++++++++++++++++----- 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 909321c772b5..ea4999a4b6cd 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -375,11 +377,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c674deb643d6..0ae7f824dbeb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -58,7 +58,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1710,24 +1764,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1840,6 +1878,24 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) if (page) put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #endif From patchwork Fri Dec 7 05:41:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717463 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4EFD913BF for ; Fri, 7 Dec 2018 05:42:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B14F2EC81 for ; Fri, 7 Dec 2018 05:42:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F6CD2EC8B; Fri, 7 Dec 2018 05:42:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 664392EC81 for ; Fri, 7 Dec 2018 05:42:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67DC96B7EA8; Fri, 7 Dec 2018 00:42:03 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 606EF6B7EA9; Fri, 7 Dec 2018 00:42:03 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AA166B7EAA; Fri, 7 Dec 2018 00:42:03 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 012206B7EA8 for ; Fri, 7 Dec 2018 00:42:03 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id s14so2381213pfk.16 for ; Thu, 06 Dec 2018 21:42:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=m2PI4JsRqJi3760nv6ld3npjwPn3SCxPP6cigYImlsw=; b=iXC7qy7jtBVfKY+/h/Q1VZGzm4xh/8jQNh1IhWgYfp0DVTi/Ie9HHpm6dRfnvMJkv5 GUEsbSYgDCm71+62rzUG1jH1oY0PA4P4q3Qqhi8yzWC74xxvdd2gs0m8ZXQiUwCzB0cw 4CWdj6qxcSzUBKXFOXxkF41qpW9bmqbWdlwoHHB7V3mGGD3P2KxcI7Dv0FOvGGbOmIVh 3ZWo7jD3QhSv0+eQZ5q6C2ytiDrcT8Cmm8+KpQe6A0/7W0ynEn5leVxJrjdev70hz2zE 3MHzWO1J1KfUKZV3jFTBlQ9tD1KMx8cKQ0SmVaXM5TE1KJvVB1YuhrlSNfVu4ZDTlb9+ 0Rpg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaL4luoxqQ0bC4kZyP1a/9uwt60sIKLydud/ee6btu3bOzMf2Ac T2C16GXqNcOE1unRvTiaW+TFcSRaCHhLcOFf6gcH0VfFVIr/Tga/jEAjfDoF55ghRH1SGElH9kk 5jpPbq7H6JNoWWBaDC8oTdSEp+7EwdssXc9KHfefXOBXjXyU+jNkvIvRJXZSCoxOmeg== X-Received: by 2002:a17:902:bcc7:: with SMTP id o7mr884809pls.281.1544161322646; Thu, 06 Dec 2018 21:42:02 -0800 (PST) X-Google-Smtp-Source: AFSGD/W5a8PX7IocqL8C1GsAl6ewrquGAaHC+wFlQgVpPP4Opfm+SxbZKquyoNmsMZ/iuXJyvVGF X-Received: by 2002:a17:902:bcc7:: with SMTP id o7mr884785pls.281.1544161321871; Thu, 06 Dec 2018 21:42:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161321; cv=none; d=google.com; s=arc-20160816; b=X0qqmk+1QcyfrWQBA8TdLx689QEFtqeW3O0CvA5xEYP+TdGm5sMfXrx09VP4J0lUgO QrZotnCgm9gQgB36GZggUgXI/trkHIhOj3OivYd8DnA7EWWEV+SAYFQ6Scv05us1XMMp LNGEwP3x4fosS1obbsxPgRbJYTJK86VBsu+21/RezkoLgMzBp84bS1ZQoCtkNpsxN034 NTtOHIwrG0nUbxGlX9CDAsvSmx+XNyHmd/YZadBI6jwyqHWovEOl8NKdq4iT4ZTRzrWY wiBggBHcFVkXRJSxupEtuKOqqVrWZZ1YGX1ulOZj0dRjgXJL2PqErqkP6Ye2Q0ueIwKf htTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=m2PI4JsRqJi3760nv6ld3npjwPn3SCxPP6cigYImlsw=; b=wX7WY3+l7CHLG66k+CRWxCiTWl1XjS7Let9g0egsRpT+a2en/laXoFYYb7+cczL20i FCyqZtVhODTTdzmWSQgIXUyN2FMVuDAw+cohvrHHqqgBHkboXcIs4PKOzloPAmakjPUY gp6OuxDmskUAcYSDMcRbUg8ge3GBDWim+0Ggp1L6wQB4DrCfO9dCx7kAXANNdYVSUdWd 8uhN9k/z4phCEFPWuNIJ5L3ZMeSOd+0AAQa1kTNhk/I9t3iC4wiquULko5jtLY4Ng1Og y9t8lae/Q7QJvYm9+cwotnv6Lh/mkoL/dWcYS0Q96RiqHStJmW7yoWcaNB+gJnVvWF5M a5Yg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:01 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567281" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:58 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 12/21] swap: Support PMD swap mapping in swapoff Date: Fri, 7 Dec 2018 13:41:12 +0800 Message-Id: <20181207054122.27822-13-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +----- include/linux/huge_mm.h | 8 ++++ mm/huge_memory.c | 4 +- mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++- 4 files changed, 97 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 20aab7bfd487..5216124ba13c 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ea4999a4b6cd..6236f8b1d04b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -376,6 +376,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -401,6 +403,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0ae7f824dbeb..f3c0a9e8fb9a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1721,8 +1721,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index c22c11b4a879..b85ec810d941 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap, page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); if (!page) { + struct swap_cluster_info *ci = NULL; + /* * Either swap_duplicate() failed because entry * has been freed independently, and will not be @@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap, */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; + /* Split huge cluster if failed to allocate huge page */ + if (cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, 0); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } From patchwork Fri Dec 7 05:41:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717465 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23EE8109C for ; Fri, 7 Dec 2018 05:42:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 121452EC8A for ; Fri, 7 Dec 2018 05:42:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 030822EC8C; Fri, 7 Dec 2018 05:42:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67EAD2EC8A for ; Fri, 7 Dec 2018 05:42:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B63566B7EA9; Fri, 7 Dec 2018 00:42:06 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AC2786B7EAB; Fri, 7 Dec 2018 00:42:06 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CC9B6B7EAC; Fri, 7 Dec 2018 00:42:06 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 3619F6B7EA9 for ; Fri, 7 Dec 2018 00:42:06 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id 75so2391469pfq.8 for ; Thu, 06 Dec 2018 21:42:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=s55QkRo7bru0QHsNNF3RTjF55E2oTtnwkjkp42PNOvM=; b=CT/mqGOkrLlhnLLvv6jP7Xc8TI6HbDWySh95paRfqgBTIlHKTLT3pnELZbPxmMWa0N yzk2o12gA26at7gVa1VznCsHRvj1ocrCGK1W85kjdSFx2uVjvBdRX2RUv+2tUyn0cwT+ KuOyXukkDcbm+HqiVXiWa3FgBNRY1Wa27+2a/iTPh+4FBX+V6fJ/B93pJvGbRN8hu5ig o4yNV8yPLqWZF2jU2DVF41mXbkwxoHL812TTHCn+c0IXi5PnzoRVOX7bPjbn6pzqHI3f 6kFsuqr4UuzMIdaPv16sYZpNlj49/bD+LLkQLEOcL+XOiyHKOl5kOxlDDC1fkDyPrgqS twTA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWby1l8Tn1n4gRVoYpaxnNfzHf+GqUbPAldc2lwGIbuVlku5QLmF CORRtfD/E4XAhvI2RHFAyAX+MEe0kVDBkhrqurrEUjXbXFOyUHc6wqPMDedOyMkPygczqfXeFIm sFhTFRSt1Jb3hRvADiT+SyonV9DKbtgIUJRVUGu8GKqgeFMXHncIVGC3DbV7RLemCnA== X-Received: by 2002:a62:109b:: with SMTP id 27mr897093pfq.227.1544161325856; Thu, 06 Dec 2018 21:42:05 -0800 (PST) X-Google-Smtp-Source: AFSGD/VO79QOx5Df3Wb+2UXUU7cZa/D7zl1zbTKdP5AcD36dF4/2qQi9Wf+Ri96o1NP7GviAGtHK X-Received: by 2002:a62:109b:: with SMTP id 27mr897062pfq.227.1544161324619; Thu, 06 Dec 2018 21:42:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161324; cv=none; d=google.com; s=arc-20160816; b=W9AfloqFloA8PiOs7EVuCtNoidKifvvy4c6g0tnK98EdqQ7X0tr7q7F1h1sqG7iOmH XeRf+B+seUPKkqTBXq+kRQy+moM9rOamH0pItaUUfglIVlfRU2AsnP6h9MmwfwljGtpP tO0PrnYSbRU+XAszlLOgogF/JESlITg2anxAccrUjuN9tDtItquBlTQW0FBvB83YP5+Z bFKo2/Tu3yfqFoPwPDrGg9Q9dxqrRGgrtEBdM/fQmLauldt9z8Sgaprvwuu1ES0StrU7 v2M5aDyJTDPNOO4t8TMxGuhXxaF94+L/pe2LMTnn7aE410jRGPRDcs4aA/t0jS03QLl2 rq4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=s55QkRo7bru0QHsNNF3RTjF55E2oTtnwkjkp42PNOvM=; b=sUdBkglgAIX7sWG/b0S8caJdxXWlJoxmu6NKN+G+KcoXui6G1af/IuzYATvgmwoBK2 a9YDiBpFZ3m0zjd4mP5a1tWFQB0S/kwLjUSAZXNmPQ1KLjT160ZIUtaDa+zNDeh+5qgD Fl2AZuAQwXx25eJZAoX4II22nPDupyMjLWXUWNIev9HD+Y1CBA1ziJePYOAps5aB1Yui nTAjtGIsC2IVzZz2dMChjRHiFJlFo1NnEJXFmreUOMIR3L1KBcIbPyGIC62oCnMy07r2 FZSYbZZgeSW0kZnEhChD+gyorJoCll0WX7pn0otmQdllBTPOdn8k8KyVK55ovBB9iZKh utgA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:04 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567298" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:01 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 13/21] swap: Support PMD swap mapping in madvise_free() Date: Fri, 7 Dec 2018 13:41:13 +0800 Message-Id: <20181207054122.27822-14-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 52 ++++++++++++++++++++++++++++++++++-------------- mm/madvise.c | 2 +- 2 files changed, 38 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f3c0a9e8fb9a..9cf5d4fa6d98 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1899,6 +1899,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1919,15 +1928,37 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + __split_huge_swap_pmd(vma, addr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1973,15 +2004,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index cbb3d7e38e51..0c1f96c605f8 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Fri Dec 7 05:41:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717467 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE75913BF for ; Fri, 7 Dec 2018 05:42:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C80612EC81 for ; Fri, 7 Dec 2018 05:42:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B82DF2EC8B; Fri, 7 Dec 2018 05:42:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80EFF2EC81 for ; Fri, 7 Dec 2018 05:42:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F95B6B7EAB; Fri, 7 Dec 2018 00:42:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 484246B7EAD; Fri, 7 Dec 2018 00:42:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B06C6B7EAE; Fri, 7 Dec 2018 00:42:10 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id C4BDD6B7EAB for ; Fri, 7 Dec 2018 00:42:09 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id d18so2408689pfe.0 for ; Thu, 06 Dec 2018 21:42:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=hzify5z6gdJk9KoPzkiVsEDjlQoZy62FVJE1TlMdljQ=; b=WDvUyH2dcOTtNFtUr8IudEccpBqfYjZdAYDymC5p1jvxWW8mUBPj0u+jv+9ZzeJSUd x6LGVxIHFZyh2Xlyg3mh72sRJrbdHLW5stTTZWmhLl8KGKNpPVkbLZmc7xYUp2TcciQG /MFISp5JvBQJm83RQRCpGIfV3HekVZ97u53iwNJi+sOMy7t7YQHayyG36KRJHbZNTRui MN+qGBp7CEShxxkZTSzVVkn0X9HePBktRQSB9t6et3D5vtmcgqKP03rxPik4Ek1pNF3Q G4hO6Q8vHik7de45qbnijtHROxT7x+xiwuzKubarpSJsfYafatoNCLyRqRBXTQVhm8dw uMJw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYxwDBt6Q1zPDzmygHhY0mC9ZN/e+mjcp6GpXMicUQwtJmm2AeB QeMK8H0vEhccgaiS3txVwmqsj9M9PBQNTM3Y89+T9ubpJAd/O4/vQ2BvpKfByzOABOVaHwEtflx DL9nTer26VSKgi4fQopEZMLtItdySowNEGYRAlRPe6VEqHunzdaoBvQwG5x9yFRPy9A== X-Received: by 2002:a63:1a4b:: with SMTP id a11mr814781pgm.254.1544161329403; Thu, 06 Dec 2018 21:42:09 -0800 (PST) X-Google-Smtp-Source: AFSGD/UZSGZ33qACDBVmv8roHH7osmTd5MBbXE04Tf9AQQlJnHra4f7yKgNvhd2ZUQF2k1UMfxUM X-Received: by 2002:a63:1a4b:: with SMTP id a11mr814739pgm.254.1544161327675; Thu, 06 Dec 2018 21:42:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161327; cv=none; d=google.com; s=arc-20160816; b=I3KhERqOuzbXDPGqF3cNA0Dr2mF59lws7A/XR6rsldrwHmS2r9g249AZUK/m3iivR+ TfaK4PGLrJTiwOr/v2HRN8v3LtOtmSh6qfek4rOlWv5OjyknUz688LirrsqUR8PH6l17 3TjOzD7sXQW8vyauzELXUfWIekBnmXXoK5t8Z/bK7uoyvOI/7Gmut2AVP6h8bwPgKC4P bl+kIIGMCsQAGFk7zMB+whv+u4j99bpapN4X0+JUcL95BON8a37F6wa9oYeT0x3Ly6Cq SDo+wzcww3paw+Msz5/3fI52YsIIMy0UOgX1qmTqrxwttG0KhQV0G2ChpTcK2e0TbcSK B9SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=hzify5z6gdJk9KoPzkiVsEDjlQoZy62FVJE1TlMdljQ=; b=lbMDBh6qJbJAQtzv6Hk3GiyrMouNZpXMqGM78Txz+2gKMw/lK4hsGZQ57qiEHY+duj LdOozrrL+PTKn8WrZaIDd6ROlELaeYqBKYNXHqHJzF3clLiwR3WeXU7CmVTtIZ9lUl6N evFdU9doNV7XmUGpeZHb64Oi4QEK3wm6SRf4ceqbThEpCMltouwBVruiu23vL3aLq7g0 m2ISbC4nKJ/4ZgVWWGSfRINQ87d97f9yWH3WyrvtTvPHpTqXbshv4X6X3LdF4wlsS5zn MbnQJnrPCR5Yww1Z4w+QzT9SZgRTMLB9RfLlzZN9VMqXESAikGMoC3j31HlfHSTPNBRS 9p8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:07 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567308" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:04 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 14/21] swap: Support to move swap account for PMD swap mapping Date: Fri, 7 Dec 2018 13:41:14 +0800 Message-Id: <20181207054122.27822-15-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 7 ++ include/linux/swap.h | 6 ++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 7 +- mm/memcontrol.c | 131 ++++++++++++++++++++++++++++-------- mm/swap_cgroup.c | 45 ++++++++++--- mm/swapfile.c | 14 ++++ 7 files changed, 173 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 6236f8b1d04b..260357fc9d76 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -376,6 +376,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -403,6 +405,11 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 4bd532c9315e..6463784fd5e8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -622,6 +622,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -632,6 +633,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9cf5d4fa6d98..83140c895b58 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1685,10 +1685,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long addr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1720,7 +1720,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b860dd4f75f2..ac1abfcfab88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2667,9 +2667,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2680,23 +2681,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4649,6 +4654,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4715,6 +4721,28 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} +#endif + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4903,7 +4931,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4914,37 +4944,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4957,6 +5014,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4965,12 +5023,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5161,6 +5223,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5168,8 +5231,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5180,7 +5244,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5188,9 +5253,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5200,7 +5277,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5234,7 +5310,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index b85ec810d941..d7717b694ec1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Fri Dec 7 05:41:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717469 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4267A13BF for ; Fri, 7 Dec 2018 05:42:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F1E22EC81 for ; Fri, 7 Dec 2018 05:42:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 230222EC8B; Fri, 7 Dec 2018 05:42:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8930A2EC81 for ; Fri, 7 Dec 2018 05:42:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54A1C6B7EAD; Fri, 7 Dec 2018 00:42:12 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4CEAF6B7EAF; Fri, 7 Dec 2018 00:42:12 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34B116B7EB0; Fri, 7 Dec 2018 00:42:12 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id D78F66B7EAD for ; Fri, 7 Dec 2018 00:42:11 -0500 (EST) Received: by mail-pf1-f199.google.com with SMTP id p9so2406285pfj.3 for ; Thu, 06 Dec 2018 21:42:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=qrZ8biXpNCJsUZeVWB+FdHH970GO/MBtW5UAhCzDFhU=; b=GtIIOA41tWWNbWR2Y2KhZUIC/UHlejGJYIkqJCjAdsQGi+CYq/5Bh9AQeLElJjyYyE AzZqzqiht2LcfzsYIft4QuCcDVM1o18/HlSMkPuhXE8VzpITK7Qthmedb7hNvs4nd4Gq K9MVoIp7s1k29o9L9h7+4kjC5f2W1/JfLsBzgvxciZLUzI0f89GOHhn16u9F+SZ6mKiT 5joxPc1ZTMMGniss4IDAUCDcK85q+M76msHxGCLQsCAOIWFCYVg/pCQNOCjl72GLKVzD y2Aqi3PBYTlDkwvLqsvGs58PY2noI0g95oJNI57bAeOR2uTVppzP2nwq/IGvw6lH3pZA f7Yw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbgsHsLSkGo73YELwzhbezOQ8J9Faj+erSj9UquxD2mek4JPQZC OziTNNzDKamyOeSY9MkrAlFBlzjLH+VeNak3vCC5YK6y1HeNFdNoCxV9Ukz/ZDd5k+1ZQ7uDIKS /USUyk/F056rpsJ+10cfq68Xmpq7RVUkz9hR/jaE8T2a735E2mOna/504f3aMcSv+Aw== X-Received: by 2002:a17:902:9a04:: with SMTP id v4mr925280plp.34.1544161331538; Thu, 06 Dec 2018 21:42:11 -0800 (PST) X-Google-Smtp-Source: AFSGD/WS0rS0aEp/ylWmbnHHuEBQJibYTzGe/VkID1FO2gJuK/pT7Y2r6N2RWGD37z5fhIeGy5rS X-Received: by 2002:a17:902:9a04:: with SMTP id v4mr925244plp.34.1544161330434; Thu, 06 Dec 2018 21:42:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161330; cv=none; d=google.com; s=arc-20160816; b=eTXRfgHeITo+wj0pzNpKEamMNRhGKb/P89YyAo8nufRM/Myy9PiVWx4f/QQgkfb+OH 4/cKCJIIOM9kXMAbnlS7GOoa/W0T8/9M3UXZrdOzLZp9ijonri6nOoXA1UZXOr5Fb06r 1byyqdBLOknsfXnxzI55NVYgc3+leGcC3vGqKntaURdYnyr5DrJcrb3Z4VAcxdb6sSkv Uk4c8uxk2yWhIwDVmH8l/rCM8Z+W5AMQyjvuDhqQv5FNDs/yWS/M7D5wD6cNqUhdpFdk geHcG3NRlyEWxtcVdCV1gH4eAB6mjtBK/v3wDMmpYzVcg8bsiAOQ8zTM8Ng9pMDRDwzx Tv5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=qrZ8biXpNCJsUZeVWB+FdHH970GO/MBtW5UAhCzDFhU=; b=zxoFZI/bFXXZkKaaPpAaiYKNfEeCXNKfwYR94uaeSJLkxtuN7Qjkc1t+NBZocbL0HT GLtMtgkUzuCn1tMNKXE+yCSD7RvosMHKETznlriMFmXVAanHrdhvfUSOIeX9b/H1fHNC ei2o5dajuFlcIjNIAMV382V73xT/yDHt8DFfo+kyVuZAiOzhSsIpArjj6x0c9ivqF0WP 2xD0F4AJLCoFLaGyblqgdF2Nd9NmS+mlX0oFWRVlGz2n1OIKpJcIK4ghfnwkhCGyG4SI hjIxWbG4C8tIz77POckNX2CpwCDWnw+ZchSZY60RHpgcK0LTTmokMmwsL4/lWV1Pb4he /tWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:10 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567318" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:07 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 15/21] swap: Support to copy PMD swap mapping when fork() Date: Fri, 7 Dec 2018 13:41:15 +0800 Message-Id: <20181207054122.27822-16-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 83140c895b58..077d569da13e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -986,6 +986,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -993,26 +994,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Fri Dec 7 05:41:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717471 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A575A109C for ; Fri, 7 Dec 2018 05:42:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92E852EC81 for ; Fri, 7 Dec 2018 05:42:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 86DD42EC8B; Fri, 7 Dec 2018 05:42:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0FDDB2EC81 for ; Fri, 7 Dec 2018 05:42:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E73A26B7EAF; Fri, 7 Dec 2018 00:42:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DA97C6B7EB1; Fri, 7 Dec 2018 00:42:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C24BE6B7EB2; Fri, 7 Dec 2018 00:42:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 762566B7EAF for ; Fri, 7 Dec 2018 00:42:14 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id a10so1935983plp.14 for ; Thu, 06 Dec 2018 21:42:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Pti8ArGartXwq5/L83lWAXsZibuXRAY7eW8UI7jAzkg=; b=IpNXycP7v2iQAnoNPZVHak5aFEBTYNDS+tmvcxyTctAtlH9r6Li4eCOie0q2NA9Ua+ w8dBoxzm5FgurfkoUQApAei5thyjVF5DnEsvuSYI7s2OWQ09nrr4xbfoE7ztm0Z7wscn uXd3UHI4ngLCqjpExeGjyk4HhoyysPSmVlk9xf8bdnfGmZLEVlwEYgAZxrGG9Q1hyLq8 lPE+8UpAVYsqtfVC4ybRxgL3praffcMVZxH/psnX9aOb4DcrJDXbNvX93NLeVGOEHcsw WYqLcaMK4uFjG02BduIzN1WPz7CpmfClyVfZHrKs8Xh0/9r6ms3fXcInaFv4AtLEaG1+ 3y+A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZes+yXx0JI8UOXip/E+9fC9JhR8mHcvDNkJNWCpkndAcvQsBN9 K/x3RVPUR3yHp8TlzqFr2YiYCMBu54ja2uYkejmtDsAbdOkxFlsrtML64+IfJGJPlKtqnUNvaM/ FC46SYDZ27irF3gdWxazXmi4U84J9QNsPqwW5n8P/+KrpIJdfCOURRLEDwpFpUfwQnw== X-Received: by 2002:a62:1e45:: with SMTP id e66mr949982pfe.152.1544161334148; Thu, 06 Dec 2018 21:42:14 -0800 (PST) X-Google-Smtp-Source: AFSGD/W4DX1sGB72wfMo/1eUAaheAT+/OWoUqUlxhtl332hCcEOVske8y5maoDCYfJm5v4DHXoB7 X-Received: by 2002:a62:1e45:: with SMTP id e66mr949947pfe.152.1544161333133; Thu, 06 Dec 2018 21:42:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161333; cv=none; d=google.com; s=arc-20160816; b=n/whwPY3RjbgKk6+hdA9hkjvZ8jtZ6Pq237zndO/Kka8HW/ns2Hg6WRZFNIS1qK/1e 8Yz5mHmRGoWfISIA8vtd/NHqUPj3E5WhgPQdld0xtOQQtQSw411WCyyqH8gYyJcoi5CX tijZPX4mMk8tbk+PqNMmprZxrX6Jl46DtAKDFBN+NkVZcDWuzy6OSUprRXePLoMjfTEK IlJDcikHXG6tfZTUfseyzpl8f3o1v8V+/4HJBD0d99qDPQkPE6cIaydnTsGEcxNLQgOB SbeHSN8cdWubPjTlGQ+7+HxE32kzw4QOgrIf8whLf1LE/XAOceS6RZ5pTxoDlsNG/QaG 5ahA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Pti8ArGartXwq5/L83lWAXsZibuXRAY7eW8UI7jAzkg=; b=gTqf2N058JOimy4fn7x8klZh+FeNFxX+TS6HwcSIKoGTJGgD7ynDw0nweDI7QBHuDZ jLTnmJyqP2sODnnEubOBYy7Kdd+IQ+A0SHZlYVdkeS6j7G8g9URHfTAP49sv8Ez589CX WMte3p4PtFPgbPZ0KKds/ZPiD7pe7goBA/7iqgqvx1nYg++5K3KFUnluMdQW7grRhZ02 w+dtqkfYpcTq66/SaRfTKYb3Io2Xi+wa5MPn4gchD/9fLLghpoHdzLHMbq7W5stoBg/Y ZoinI/N7PxpIBoy6QrLpCZdOtwOQHbfnIHuhXrskqurbjBGWXbZWOXtpmWY1xtOwuODE 0aWw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:13 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567323" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:10 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Fri, 7 Dec 2018 13:41:16 +0800 Message-Id: <20181207054122.27822-17-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 077d569da13e..5b2eb7871cd7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2071,7 +2071,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2084,17 +2084,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2102,7 +2112,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Fri Dec 7 05:41:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717473 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36BF9109C for ; Fri, 7 Dec 2018 05:42:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24A1E2EC81 for ; Fri, 7 Dec 2018 05:42:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 18F732EC8B; Fri, 7 Dec 2018 05:42:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 934B12EC81 for ; Fri, 7 Dec 2018 05:42:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAD576B7EB1; Fri, 7 Dec 2018 00:42:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D321B6B7EB3; Fri, 7 Dec 2018 00:42:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8B8A6B7EB4; Fri, 7 Dec 2018 00:42:17 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 61A1C6B7EB1 for ; Fri, 7 Dec 2018 00:42:17 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id l131so1846948pga.2 for ; Thu, 06 Dec 2018 21:42:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=eHCfojupe637JnGFz/yRaskx3CwIUMdVvylmAhyeWmM=; b=V3oVBuFImEFkCQ4xwK4T4rCWpgZUnN0UoIEq8I6oTmenn/EOZmqkkTR+uHnnha0D4E Mm+htb2n2OOx0vL6H/d1sp1z2NCK3buFe+zV8r7bAC14WBf763KoA8R9c0fP/2d5bu8Y v7dEGe+lxUFRY+lyCFguZNxl1cNQD7/MI/O9vAN2qIUCAwrlcYYmdcmtSxuzlgaMOh+I 5r9zEry6VDTeABvmqIZBGSMsqeVwFQHbmvs4pNK34BVPEeAbS7qb1VBTgDMGS4/6erHc JCFUVJVIryy6lQIRnBGuV0OOnmEaOpsguginHwb+mJ3EcIc91MNmvXyRy9BW9PxluS72 Gz9w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZIh/LnsanGiPy/3SQhJc0SgmMYhTPXNilkRPEcCRgVftl9tF28 /JzDXnTOmN+Gg389RPfXWy7NAza5IrCNq1b1uxqYs4Kk7SztI4QtJi3hirfaDQGtkZwhLmtd+sq T9XQ1bKtueT0Hl+gBoXUIGnU/PIW54HhwJ49jzcjVYnzhf+VtO+TNVqQgyJKnD2qqbQ== X-Received: by 2002:a17:902:442:: with SMTP id 60mr846066ple.73.1544161337061; Thu, 06 Dec 2018 21:42:17 -0800 (PST) X-Google-Smtp-Source: AFSGD/XWcktqEAEcO7AS46jiQa4+VLqxjc8Kt7UCB9g0l1Ib8qa6zlKZdXofK5IkqMu+UbNQQrqt X-Received: by 2002:a17:902:442:: with SMTP id 60mr846041ple.73.1544161336066; Thu, 06 Dec 2018 21:42:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161336; cv=none; d=google.com; s=arc-20160816; b=JoWlpNgomXJkrF4rS8YbMJnGhiIRIcqWXvd9B2IoNuxkT7tooVx0nPWLhcYCV7VFA/ SEefETJnTsVenzCfZ5ZgGSGlV19TmJrnX3hAhoTD4IwL23Cr5UbT1OoCWgN/FiI8pUCr BcSBdr+LC291ARfb/FEnJXEq8kc7yPsY5cHOwuyEQRx4AQn6REbiVattlwkCC6YOcAcT cxwkfMcFYnkVcCynuBmuRgRb8Wxpu9kmhAl7u9s0pI0nWReWDkxWq4w3IXPSwAiP9LNb PHKkIPAh4oZrxjgHRTu2sJZFjI98kpETscLhCu44QzueYYvfhi0daHVPw/TP3GYQOScG IFnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=eHCfojupe637JnGFz/yRaskx3CwIUMdVvylmAhyeWmM=; b=PbwQYJmildv72sNWp4j3jYjmBOr9T700IX68YKYY2l1q5hPH7ohzATSeWbB06xZJoi w4QrF59/v3ylnx1RI4u+5H5k4NKf8jFPIUZGcmd9yv3D6WzwSCTUR38kzroZtTQUSv/d bUBYm8pYfmhNMvrO3/QiXjjheVvEeiIxMU0+/x3ubcxzrDbWK+O3NN90PBEliakrQsj4 43aUEo9T7qmhs8VFwzBOwulXSebq6vIA1ZvzEd4He1fRdClkAhUcaw52y/Ikpha2bluE yHU1IIvwAHpAYI17ZTYEjxJuHVbPxfxTt2u//RORRjUPl0IgsDuvMEO3C/iPQZXlViQj eB0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:16 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567329" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:12 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Fri, 7 Dec 2018 13:41:17 +0800 Message-Id: <20181207054122.27822-18-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 0c1f96c605f8..52d27e04a204 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Fri Dec 7 05:41:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717475 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 291C1109C for ; Fri, 7 Dec 2018 05:42:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 164102EC81 for ; Fri, 7 Dec 2018 05:42:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0A9562EC8B; Fri, 7 Dec 2018 05:42:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 687C02EC81 for ; Fri, 7 Dec 2018 05:42:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E23F76B7EB3; Fri, 7 Dec 2018 00:42:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D0BD56B7EB5; Fri, 7 Dec 2018 00:42:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD56A6B7EB6; Fri, 7 Dec 2018 00:42:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 70DDA6B7EB3 for ; Fri, 7 Dec 2018 00:42:20 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id m16so1851325pgd.0 for ; Thu, 06 Dec 2018 21:42:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PneHrFIvBxeR39UtwCqcyYL55SkL1f0iPo6rjUY3mOU=; b=ZQxuEDRMkcoh9v8dSFVn1yyO7CmQBu/PCb7EPmScAAuMeI4NCS2NX3UbR5YFp3mn38 G+XFhTa4uhbqK+dToM/3Q6EzORMl5YMEwpMqTw+lt4fYzR27UIsgcuywvJr2okDdSDT6 vXRCf/6B+w8GXJR7eQtlTxOOVY7+kR/9BUhVsFUUawM9u11vjjD7HJQyAjdvxtHq0ay7 Yeyv7EmOMQLRRbRpOrV1N++JxWQ3MW3Irk9dOGrgRX77VUVeLlGtEaNrmKyX8ItP3vM1 NTGXN8VXbnmIsywdbFwgdEYi25+Pu0+jaAr9YjKC9Pmq9nAt3uqw7PuAsH+PR5lcPjs1 24rA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYyTmsTP00Xe2HLfGpKai8ate6oPCLvXizA+5iiNZOD/l15iv09 mW3TS5oqky6wAPAh/9c0PCPFxunE9Iw1nChH7zUJUsoQXR9AYWZxbIdYyKtdObBIaVm4wwaoPWt nSIk8NU7tuPfX/3mJMtHLRtnMCZNCXBconieIJdJ1AEvHK2FCAOT3YvarbKD0/bAG7A== X-Received: by 2002:a63:d301:: with SMTP id b1mr805336pgg.61.1544161340093; Thu, 06 Dec 2018 21:42:20 -0800 (PST) X-Google-Smtp-Source: AFSGD/WdOeQuwXe9PRAIcA0IAUf/93wMGbDBTUHRiaMdw2RCHuXBxaiGeLo67IXPb7PMMGqaGmfM X-Received: by 2002:a63:d301:: with SMTP id b1mr805301pgg.61.1544161338807; Thu, 06 Dec 2018 21:42:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161338; cv=none; d=google.com; s=arc-20160816; b=bKpfontNgI+xC9xgrrxVbhLRmssah5aD5oV36XdCL+Me0KRK/+1HFUYVyIed8b/8jL 0Ra4K/sa1RK8ZQUqhKcfAPdommfUT12IX8SzlnFMuuATZWiOqv3iS1mEvm2UwcxD1/HA xxzrRaqcQnuKeA6NU0UaLlMU09SFMklqBHGl5F83L8Lp8eJXGZHwdIZnD5YlcPFkC+gj Bvnxine6GXKOvqxJmpdEnQWS5jVX9fSyV2vyMrSXZzsE+I8xqLIfHYbi+F04u4Y5IdJ9 PQnWyozLko1Hz+yaejuthJw+ecLC56HY1DdQvEB4czxVrJXkSEO3n+zVtTV8aGrUtGmh bWCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=PneHrFIvBxeR39UtwCqcyYL55SkL1f0iPo6rjUY3mOU=; b=z8v43K2i0kNGEIiRg0xsFFXh+u4PveDoEFiqss9IEcLmvpUwUcHs1haF6DP5G5e3MO cb+Rrp73P421FMA+OYhPuGEZVK3hZQkIlb0IcrkKwdyMKzdxoKiP+EkJ42zy6fbENtHi uQnE4xOSVWnArUPYDeGCDkWKr0C8xi63XtxDIwNMAaVNOHSqRo9DZNEQeqfe1kxK/JPC G+odJ4/rhdw6khGcg4Zgwx/29872x3jojhR+3i4yQnRM1UrLq5XCyWsWVKkZfUjEAl+y a9wImYlIBductsWwlPdFjbXSa0mzlutHWoE8zOP4km3s+t6iPY5HEUdZhVxGoctSoKmE 84dQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:18 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567336" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:15 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 18/21] swap: Support PMD swap mapping in mincore() Date: Fri, 7 Dec 2018 13:41:18 +0800 Message-Id: <20181207054122.27822-19-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index aa0e542569f9..1d861fac82ee 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Fri Dec 7 05:41:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717477 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C486F109C for ; Fri, 7 Dec 2018 05:42:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B23B92EC81 for ; Fri, 7 Dec 2018 05:42:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A66042EC8B; Fri, 7 Dec 2018 05:42:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 026B12EC81 for ; Fri, 7 Dec 2018 05:42:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93E4D6B7EB5; Fri, 7 Dec 2018 00:42:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8EC6F6B7EB7; Fri, 7 Dec 2018 00:42:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 744366B7EB8; Fri, 7 Dec 2018 00:42:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 210606B7EB5 for ; Fri, 7 Dec 2018 00:42:23 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id i3so2410365pfj.4 for ; Thu, 06 Dec 2018 21:42:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=9mrJ/VPJpAOOFZYER1gWWHDnV3mzCevD2jQH3eLNO2A=; b=n/SR0QAmU2wEnoV8PvXNE4bbyvlVQ1xthu8UUG1EztKOclqRZDY6pUJNUGYs3KDOri RdxIMqA3vqstlDCFw1/uer0WD//RdDdsnF1m+IE0AjKBeRNd6C510GDM2qGwoijvqvKF y8h+8Uq6KAL24e2TXnKZ/DUaLZeCu7QM98nwRgeo/QKBT1uWSQqPCKT1yCe+CBwE6C8h 12r45Xyhl/FUO4HZ5vGIO0YYmdq/lRGsyQLzUoNgVf5uxQZ+nf5XvtanoQlj5n+QM3Wr /wghq6Qw640KJ5YPsiwAMyQqjq9k63YM+iyN8eVaM9ohAHJJO9vqmFn8zZ0URrHPW2LA uztQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWae36NfBiIBm/ts0OKn5W8duIeMPyQ60++YwR1Ij1RYch3BwqbL iIz0cYXkX8Gp9vEXygtHYQoZ3q+ZUVriZ0NbbxvS3hsRIIG/WIYTnsbar6Sh4SeVZ5+wM/iHCxS n+6eCwQms/M+7NX+K8QJE6Q5Btm3OCAcgSQrIGh82rC/3TVfAy5rpnIvvVOXfGEWk0g== X-Received: by 2002:a62:220d:: with SMTP id i13mr948612pfi.162.1544161342779; Thu, 06 Dec 2018 21:42:22 -0800 (PST) X-Google-Smtp-Source: AFSGD/XEK/x2Y3cGBCqDIDs3dE2uXSEJIpPkyTEBWjJl6XMJkHs5pleKbkhnHzKhCixOSE9J437f X-Received: by 2002:a62:220d:: with SMTP id i13mr948568pfi.162.1544161341517; Thu, 06 Dec 2018 21:42:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161341; cv=none; d=google.com; s=arc-20160816; b=vbeczUfNqX75AEmIWQFnI4y4KIOx5G/EsMNYSVN9IibUC/Fyrl+mk0gYcjvyUquO8e ezmcnDYqTUH6TsPnqO9EZqk57vQbqWyKjRdEOyMn/dDVk95WtkU9Xy9AjbaDIOXGr26B wTQAWlDaUzEgtODCsoYEI6HUl3UU1PtEtMvqh+Wf43JSBzyxCBDy1cb2i0xEv5tSrJv+ 8PjwU9ef2xiL4lltpS8SOdYcO8uuBDL5MJr/sCqIQN62uEc/02J6VClV7mps7knGgz66 Plwrt6e7UrrA0orqUP7u4XIF1ZX7wmCyEatJtil/cPr8GG0xIb8XdSbuYB35fWzfjUza zKFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=9mrJ/VPJpAOOFZYER1gWWHDnV3mzCevD2jQH3eLNO2A=; b=RL+i+d45NHLUdSI8CQCNGPVRWnp76pjN0/UN3kAXxY3dQXPGZvd5ergTrZ6/yCULQs P1cRbl+PGABjk5iAaLBK8Q7wSUtEwxGhQlyi/O+xbaXeMeDYKWJczn2PCU1yiuy9Ktsl zT/SZxhR21FGKlwWgWXP3ByHDSant4E1joqSScH5Lk63p/TPpHtHzTfROijQShly8AGc l18x6THpxhAI8/Q3Yt+8WsEv9YKQolBjwt0sVZyQGXwvgHltB0YJRNWITneycAWk2EWR rL//pgNA1WTRlrRgT5dhk7zs74qQOjAAQQbfnvQ/uI9Vm1Giclgb7FcubepJ2ALwHjjY pmXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:21 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567346" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:18 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 19/21] swap: Support PMD swap mapping in common path Date: Fri, 7 Dec 2018 13:41:19 +0800 Message-Id: <20181207054122.27822-20-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 39e96a21366e..0e65233f2cc2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -986,7 +986,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1316,9 +1316,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1331,10 +1330,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index 6dd33e16a806..460565825ef0 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -215,6 +215,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -242,18 +243,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -275,11 +280,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5b2eb7871cd7..b75af88c505a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2138,7 +2138,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2222,11 +2222,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e4f8248822c1..39335bf99169 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Fri Dec 7 05:41:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717479 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33E8B109C for ; Fri, 7 Dec 2018 05:42:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 207702EC81 for ; Fri, 7 Dec 2018 05:42:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1407B2EC8B; Fri, 7 Dec 2018 05:42:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56A7E2EC81 for ; Fri, 7 Dec 2018 05:42:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60CC26B7EB9; Fri, 7 Dec 2018 00:42:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 59E216B7EBA; Fri, 7 Dec 2018 00:42:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39F8B6B7EBB; Fri, 7 Dec 2018 00:42:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id D4CFE6B7EB9 for ; Fri, 7 Dec 2018 00:42:25 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id v11so1948414ply.4 for ; Thu, 06 Dec 2018 21:42:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=S8Ylpfm+bTKZJ/D71pW1fXimJ0e1xb+uIooq7bg13h4=; b=Ew9EDiksMReXf+VVga6kIE4r7OTxzDgWkHobLxc/C75yWnLmK/nIW6ZdIq9NFBIgFa S9c13P4HlNh+Bey6noo7vZVYgVnIgkjNX2+UqrBOWXieItKyuJIhxI5ReJ3TqbU0ZGps gzPY5p18W/OeGpUBs9sPuDa8ERBLPw5O5kqSdce0bs6jqpAGLSwe95h7IquXo7FIyjBd ReLTk8eGfnYn8C4V1WbiFMWERDETp8L+Sj400W8Ozalr3RHwJ7dxKkbeJATC7T50cNX+ 6TucB1V/vT28hb0fR6wOpJk4tLjsw4doa8+nRI81Wj37bQMQgL0dUCTJOXQEWOT9MJoJ o8rw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZzbCG5EnM0I1NgE0n+/ivQk5OReEgUXbCU3dEv5g/YA7XQy57Z ILGXhmt7SZt/6lrKYpB5YVL0yNqhzkfmO7YiaQ/UrWRUe1mOs2C2bYvlko1r84uI1exBGHzg0By +Lcuf6WjQsmlZZEC9j2AthxNlOP/I92n9Yj/CW/rolaQIiJOKEE8KTat8y0Q+NTVNIg== X-Received: by 2002:a17:902:f24:: with SMTP id 33mr901265ply.65.1544161345507; Thu, 06 Dec 2018 21:42:25 -0800 (PST) X-Google-Smtp-Source: AFSGD/UAuPYAxvRO7LT2TEh11NXKGcK6vb0u2zNX5+CXPF7dTne0VwCG963pcz6movNScL0Wt4+g X-Received: by 2002:a17:902:f24:: with SMTP id 33mr901232ply.65.1544161344273; Thu, 06 Dec 2018 21:42:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161344; cv=none; d=google.com; s=arc-20160816; b=YLosY8JCYSctaU42QhpnF02wVdYpoG05ho1bW6mUphbyg2zRSz8mNBLxmVuaL8r/x+ mAZB7ffZwOYX+jL80DOEZtqqgPf8KfVvZFKs05YMMqvmSE8R/z2NC5gOKmzPUzJphoHe 9DL90bsnLqW6afJ66Wj4DifspgeOsRN+xsOh2K2931da2pa0QUwUSckgm9c9msuOv4D0 WuGcKHhD5tiLJyqoF6c+hkL/+JvCHsAiUKxgfY48Li74+i0bS4JhtcXYCjdi5qnyfu2y 81ipK2l+YhHeVeEFbKdx5jQJDkE6/Hkzq03aWH57LazB+BRjBcD3LHvROBiOuTGFZco8 rFjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=S8Ylpfm+bTKZJ/D71pW1fXimJ0e1xb+uIooq7bg13h4=; b=iJDBY+dJj36dT6JIMnyV2hemyQF2K7btOPq4OjdQyqQS8n72jornwbGqzdg1xo9u1j e2yPjpxpmTZcYog+0gOh//a5ktDjycHEwrUVcxrdyBY6LduZz2HLritXPoFDMUDqgN51 rNJ4lLK6ifofHnfKabWGvTB3YO5NA7iA9Gz/z6IvyCHF2RBr7iagVc/IAORXisejQhSu C1yrUaMDDaPjIkjPw5IGhRyYJu3Dayz+VotzMmUuCeWvZ6wEXouKhkTbh2mHMwP9rhXi q+wSUei78cHfoG/R1ZN3fCFWVGzLtp8YDBP+OEuN6L3hevaMznqh/Gki0FM8ndStOldY u4ZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:24 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567354" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:21 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 20/21] swap: create PMD swap mapping when unmap the THP Date: Fri, 7 Dec 2018 13:41:20 +0800 Message-Id: <20181207054122.27822-21-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 260357fc9d76..06e4fde57a0f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -375,12 +375,16 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -421,6 +425,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b75af88c505a..27de3e547dc0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1938,6 +1938,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index a488d325946d..b7ea50d563a3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index 7a67923c09b3..c6af95025fb1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1340,11 +1340,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; } From patchwork Fri Dec 7 05:41:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717481 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 65A5D13BF for ; Fri, 7 Dec 2018 05:42:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 534302EC81 for ; Fri, 7 Dec 2018 05:42:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 475B32EC8B; Fri, 7 Dec 2018 05:42:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C88C72EC81 for ; Fri, 7 Dec 2018 05:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1D786B7EBC; Fri, 7 Dec 2018 00:42:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DA92E6B7EBD; Fri, 7 Dec 2018 00:42:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C25136B7EBE; Fri, 7 Dec 2018 00:42:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 748526B7EBC for ; Fri, 7 Dec 2018 00:42:28 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id v12so1933591plp.16 for ; Thu, 06 Dec 2018 21:42:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=2WTIQvp7eld/sZnVvchF6+tWWrGAA8DFZDIDwNcgF9E=; b=k5IdrXbuRVHFtMVqTicWWSnDyxGZ6cEjA6oaFnOtNuiFMQCpFjlR1BYM1LjxD1p6R/ I+JyZBg7agLr0FJ7cDZLpziPEQZym8JDaSq1n+sMPE2QlX/ytnsDxs1Tq8HyUsE9RaG5 Bj11vcwgIMkqJjELlby9ezVfOycrQ73T4wrV+ghPStpjH3fJEjocU52R6cVp2CXwDPuR HXGxYuyPPqKE97skvBNE/rzI9XNv/AwxnThb46BaOuJEZvAFMI+YyNYexEFDm+PXaQ6F v9O5lZr2LrsroiMF/6qHF2yBu0SmJQaFqe9fZuU7h9g4C3c0SnKCYHcoduFs3CTdqsvl TxSQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaMT2f64bFHiuEnwQVZNrXNiXBBD/pLc5Qlub4R04drj4GfkoRy CW7ZHWfUoAPZcg4QZ3FWi8c4saoVkEHMFTMYx9cQt7Ne+l6m0thJrZZxxHM85CgSc1og7eNZwTS J9JscnKBqY1iVl8Qcl+6Ug505J9UrdKKZacezZ4f2QLjTOP41bnVwFwagegK5S7yOPA== X-Received: by 2002:a63:f201:: with SMTP id v1mr766210pgh.232.1544161348115; Thu, 06 Dec 2018 21:42:28 -0800 (PST) X-Google-Smtp-Source: AFSGD/VyAlbe9v3JGc5cleSJg3XYPCrNOuxPDJ+2qdBaVCoTRTnsZu7G+FOjkmLaspl/a5FKK6sd X-Received: by 2002:a63:f201:: with SMTP id v1mr766190pgh.232.1544161347255; Thu, 06 Dec 2018 21:42:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161347; cv=none; d=google.com; s=arc-20160816; b=aCUz2IS8jsRrHCAxadTrlLLQ5lyF0o/yaaAy0sUXLrUgaFhwqCDx5GB1S7nUb1MFIn Brge1svckqfuB19Lj5awb0q6hjeIAg2KsMcWsJB1J0oO1FUBxAtqBNxzIv3EQhY/pLTq C4XwzG2B+cNmjo8BggRijK7huOIZ3tRh/sR3dmfQslavRhxJuXfHl4dOwf/XlWaiei8p SbTgAlCo0nmCxVbmtgKZApUKVhDqQXjB79XD9P7ArLPDFXsFePLmGXHlpkvGV648KTJj rvyJ13nwcqIY1aab9NnsXZx/9vJTLN9uiB+s+XDwmBMbMC7fMq5q9peAUF+1UhHj39pA o4Tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=2WTIQvp7eld/sZnVvchF6+tWWrGAA8DFZDIDwNcgF9E=; b=cI8KR1rX5WL6dG03h/TUu2ZKAZ9rNTdPzCj1ZzkXXrMKfa4Rh8rOxlF4GWW49LUoMF 4lbcvnl50r8I6bGiHfyvprsWEfG1U2bQGhSqueOZ1pTqFTWTc1xzDXxbY08WL+VET+PV shHivkR0nn3dnv16nK97evStuKezlL3sZdEqxBsSB2DKMBrhi9z+lyAmg4fPQ2ND4P11 OkxP5TePEZ8khlXjpCAThni1if2xcKku0S2syTuid+m/2myj4tNzH2+zCK3PknNMQR3N yIWlBHsLLDOITkiq3CJqDLj5+JnjXjGxPJQ3B7z0yXKVrv8WuiC0MVy46f6vW4fhXcKr DyRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.42.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:42:27 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:42:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567365" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:42:24 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan , Dan Williams Subject: [PATCH -V8 21/21] swap: Update help of CONFIG_THP_SWAP Date: Fri, 7 Dec 2018 13:41:21 +0800 Message-Id: <20181207054122.27822-22-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The help of CONFIG_THP_SWAP is updated to reflect the latest progress of THP (Tranparent Huge Page) swap optimization. Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index d7c5299c5b7d..d397baa92a9b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -417,8 +417,6 @@ config THP_SWAP depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now, swap cluster backing transparent huge page - will be split after swapout. For selection by architectures with reasonable THP sizes.