From patchwork Wed Oct 10 07:19:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634097 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B86C14DB for ; Wed, 10 Oct 2018 07:27:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 51AF3294F6 for ; Wed, 10 Oct 2018 07:27:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 434F4296AF; Wed, 10 Oct 2018 07:27:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F6BD294F6 for ; Wed, 10 Oct 2018 07:27:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0D3D6B0003; Wed, 10 Oct 2018 03:27:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DBAE76B0008; Wed, 10 Oct 2018 03:27:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBF276B000A; Wed, 10 Oct 2018 03:27:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 899256B0003 for ; Wed, 10 Oct 2018 03:27:14 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id i189-v6so3041376pge.6 for ; Wed, 10 Oct 2018 00:27:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rslseqXvqXyhQLlVzgKsZG8t3WvvfzaMGcv0UgKXarE=; b=GnmYiRXKcwABMpc+ar/Akbw981fCp4/A56gXXnh2siyRPSSMTGyV+ZwZ7rM58yVMLV qDBD6/HZ4WCndSh2izhRsJO/QAZOy3qLLxTE5nj2ycJh26U9Uxh51yfcuac7WXoWFk3b 3LS4O10UeNYRPzHcvcnwtiiHBYkpnZLWn52B/LE3oqwbSLaFN9w6gv/frJo0OIxOGp14 Nb4HeKZEMH+XkO3Q/jSKC7vxgy4xrHhfae7B1VcB9Jefcc8Y/ua1xO0TX1nZO4BHue2m zb6ufAZlTvA1rbI2DL5Cs5Yxi4q4qbOowCxNxlYHeuqSOzmR25PWLNs6IGIb4Un7IxIt ssQA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoiT4jyuQcySYndlhPTiVWd0qCUtmVoLPS1tREPVA+z4IWuDJFBw OfeALlrsAI6qgl1HmNKFzc+DdAaX/gooAQYW8+gypQ2wJh7OBWPN90Ird1PpEubMQBS11Fgm5sh VS9gzxozOGw7K7h7JLv33UlSYS8nY/fengeghhzmQhYTtRusmW1TXs9soE9KV8RqgFw== X-Received: by 2002:a63:4a64:: with SMTP id j36-v6mr28777355pgl.168.1539156434200; Wed, 10 Oct 2018 00:27:14 -0700 (PDT) X-Google-Smtp-Source: ACcGV634jmJnR7ApxtwfXEN9vxL5EU3T7VIsZ3Km8QmllDMee+BOYm7s5+bkqFuqIRi5ylgI1KMx X-Received: by 2002:a63:4a64:: with SMTP id j36-v6mr28777311pgl.168.1539156433379; Wed, 10 Oct 2018 00:27:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156433; cv=none; d=google.com; s=arc-20160816; b=NtGMAYksJRH0Yfw/GOxHXQ2uxjWWFlrOMGHKSzQAlnMtTNJHCmkq3a78JxYYmXrXgB f8P10ofBZrgBGS5TkPxJR8H3ZUl3+KDGXm8x//Sz7O53wbBWCfbuMRDr5v/EHeKAVp4Y QzRIKH4SlA5zRzhS805SB3hD6xHgWvCNjOUnruQaPT3j94eA43Z5lGEZiuddeqQ2LCuy cugJSalQv08JTrObmb4L3LoiCYTXAJcR/zoSre8GVw/USjJDX6swah2laIB+ShifNLT0 IQZk3+VpoZH8dUezl6HyssjAQNXn7xhKBZjOvlDVJO4Ck2xkSyRITchq96OijcZXVBUX ZtFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rslseqXvqXyhQLlVzgKsZG8t3WvvfzaMGcv0UgKXarE=; b=LxhsWEd9ZEqVenuGFEgqmHM1yXoT92woBXy1YPFwESybTXwhl3Fi3QHHZ2kDGC4Qt7 3cGm17ZIEtwWUQkeLnoLHaqoWPDAbXzOwwxK449BV0dVVfS6QW8DhgnPLkrYI8nDN4K5 8PO7KctaAnKjuY5XC5p10ZNq2Egiit/T2DiKSFOlvCPZgB0jgftKIy8tBDesi6XQvd50 2iYnkLBqDk4AfxJjBtzK/hcI73AdjrJODQEXNyve6XT9qcIuPQ2CgUX8BnoQbuJWOcRD HcFHYGnJrzrphP/LX9sNTu9yLM6QzZyP42QQlzKIPoKHnXhFEvMSaTbtd+2KRfCREa5l fCAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93869956" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:18:55 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Wed, 10 Oct 2018 15:19:04 +0800 Message-Id: <20181010071924.18767-2-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++++++-------------------- mm/Kconfig | 8 ++++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 40616e805292..e830ab345551 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1333,7 +1333,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 5657a20e0c59..eb1e9d17371b 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4d961668e5fc..905ddc65caa3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -254,17 +254,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -282,6 +272,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -302,16 +314,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index b1006cdf3aff..44f7d72010fd 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -424,6 +424,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Wed Oct 10 07:19:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634101 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47E3C69B1 for ; Wed, 10 Oct 2018 07:27:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C81A294F6 for ; Wed, 10 Oct 2018 07:27:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 200EA296AF; Wed, 10 Oct 2018 07:27:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E723A294F6 for ; Wed, 10 Oct 2018 07:27:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 359B36B0008; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2BDD46B0006; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 185B16B000A; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id ADA9C6B0006 for ; Wed, 10 Oct 2018 03:27:14 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id o3-v6so3162907pll.7 for ; Wed, 10 Oct 2018 00:27:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PlULZszEr/zinic5XxLsYn/qt2WZkbM5AQ0JfYWFyu4=; b=f26tToND31Q1AtCYqpzUHQbhOUSpWJg/3ck1ycVSO4uutrHolN4Z5WIPzI4tQDr1cc VvmjKR5DUmEUfKo0YfrQvQ8K3Xki+nqv9VI7ZvzZk239LyQk/c5hsVPoV6oBvuvobX6a mxkrm+oXXYI+wp2vZ9IHF29mbkkNGUv+8R5D9f2mpvVJ7sIYHA1EMH9N/yj7cAcTZUh6 5MEEO5Kl+ZB+18NxBvLKvcjIex7c41BnEultqMTO5XqsKVzh/5WZTDTGBCnZyAG8wiZW bVB/Rr/s4r+VSJ8/M6xDS8aracWmVGnzApy3nOBflYHxh+5iFv20fcL+G8QLukV1lYse CPtQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogumFpyE18Nv8Q4+zo8GZ8k4PMYHzL5I2WDhcAVUNMGr2IHqtEq o3JEdDw/oztwnzWi9Qf05y9FzElvt5BL17NvqvoRkYlijsFUxsZWBFbWKJqNy41zYn8oQpfcfT3 5Iv367lrlBFl40W6OJPbFgbmXE4I160Igyuj+AkWLCjh3yebCFJ5bYnv4fumb7+daAw== X-Received: by 2002:a17:902:4103:: with SMTP id e3-v6mr31979361pld.236.1539156434365; Wed, 10 Oct 2018 00:27:14 -0700 (PDT) X-Google-Smtp-Source: ACcGV619beXzgDfZFQ7G4+epn/OmmsWC/ycoxaKsY8Gp7+zFS0wvgG68VvCzZN5YQWu2M6f+MwIV X-Received: by 2002:a17:902:4103:: with SMTP id e3-v6mr31979309pld.236.1539156433588; Wed, 10 Oct 2018 00:27:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156433; cv=none; d=google.com; s=arc-20160816; b=q5R39BELwqvlLxNVTuPHTXgzSuD7bQhxFsVfXiZ3OCR98gbY646UCYULF5s0njXIJ+ z6rNZWJYOUnLth5vXrD29LplgxZoJNeWiaIZP1p0KbPwESDDI8JyfaRTOytkmLEXfGZb Yh7WM7u/orWEhQ2IG90lEIfzUI7cMBt9cBzEO5yVwSR67i5SXaDjJhoJfz1ymrFwyrsX +xzyBEAzVxsJ/dmf2dURwr/7XW0Srwb0N+7Fs5CIr0ioC4J3w5qIGJ6+Wdoq23D/Z4xe Vd8RKNyplVamVrKClLH0PL7/MsBEa+KcERPb9fgkRZaaaPYJAp7EEJiXL84UBJd0kxgh bYiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=PlULZszEr/zinic5XxLsYn/qt2WZkbM5AQ0JfYWFyu4=; b=I6/t39m0cjLTzieNN6+AGSh94Sd6D2rtF5JYP9ndc5nknKyrxHepc4WqEcJUaP5ynZ 3WNivLLTpuk7nHOqLdHxbzYAOGOIsNLfwZBo1ag4kuzOWbICyUxt11whmJtowQ72js7n DCPt7KBDjNkcsAaquaXhJZdVfMyzDOIS2Pu87mQ5sPMrl6Z+y4HTGlvX2/FJBKHTIaDt utQWdMQ051thgEogPxcqGfgfM3SX3e3kLxePxgaUcQ8/svJPxvy0x0wtOlQr4WIaYh75 W4nbeIRpWJGAqzl7LxZa0ZFCPYS2lOIr+LDwySsBCw18Zr+LA51YwuAstcWwUUod01Jj /eCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93869958" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:18:58 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 02/21] swap: Add __swap_duplicate_locked() Date: Wed, 10 Oct 2018 15:19:05 +0800 Message-Id: <20181010071924.18767-3-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 97a1bd1a7c9a..6a570ef00fa7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3436,32 +3436,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3471,12 +3451,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3503,11 +3482,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Wed Oct 10 07:19:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634103 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54AB314DB for ; Wed, 10 Oct 2018 07:27:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FFD4294F6 for ; Wed, 10 Oct 2018 07:27:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 34615296AF; Wed, 10 Oct 2018 07:27:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 28821294F6 for ; Wed, 10 Oct 2018 07:27:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BEA666B0007; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BC1E96B000A; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAE076B000C; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 5DB726B0007 for ; Wed, 10 Oct 2018 03:27:15 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id w18-v6so3206107plp.3 for ; Wed, 10 Oct 2018 00:27:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=AjSKcuCvZm5H17i3z+CAkxRO+lspyVqeYh11bE8gc9Q=; b=Civ4o46ODVY79cLXjyxvnXpLiX9Wdr339Gz3Oqn/YwmmTMPEQTYqsdvvVZkoXOrNWY 1E9TNpb09FlS79U5xN3QcYWxMAjPJMY7NhEqWZIKNbDJsNVM12ivJduCAhSLt8sN3TwI IHwtqkAfAwlEt8gseVH8/vEvPuRtvJsWoq7nSjXDzwf3cEPKyu3qud1iNq6kXmRMcjld oP0EcM0iEc8X4qE9onoJFAa0dJ/z845rU132fkeEUshmO+KFsu/F9rOKKC1kgusoXUdK VRCGFGmC5tVKVt+3hNO6jKq4XBCO1ACmVRWuoPFYemF4/ljwdpDsmRzd+Yz+SeEvZTA6 V8ig== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoiM0ZjHBXIngOL4lppGQZWBUyjb/UoWIFyponu+DM8ubF/F0a0D Rs+jdWSlN6D++Q63s5Aom/fByKD0u5IFyv27wxYsYeQUbhHDDLUjtlYvG5+RG4fcCTm2NrlNJND 8mXlwDXeA7vRMfqXLiwxMTOFd/x7LOOQPh6Y7z0yDVNWIm8gNXVag/c2KgawrgdfkYA== X-Received: by 2002:a62:b87:: with SMTP id 7-v6mr33587800pfl.67.1539156434990; Wed, 10 Oct 2018 00:27:14 -0700 (PDT) X-Google-Smtp-Source: ACcGV60ai64q7aulrw5ThchEKYVD388FWTAS/KPTQLkIB6N9xmVbCPILFTD78rMFZcyK09ObO9QD X-Received: by 2002:a62:b87:: with SMTP id 7-v6mr33587729pfl.67.1539156433776; Wed, 10 Oct 2018 00:27:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156433; cv=none; d=google.com; s=arc-20160816; b=1CqvJSF4CFbnKiDzlaWShnV5lM56b6Qu0mepOoJ49J7ve8z00pHPIAi+KHbex/rhtU n7q98dlxbvQ8EQfwg+x4v6Ir8IEG2jeYfUFLbbM2gna/QmxUPFMjOQciXm//rAWjlLca QWbupr+vxHUeb9gjyfFX1pO8kv+DnmmtHP0DqsJlk7DburJdQpVNkK7UZxSUwmc0NRc4 N/GUSp5G4D/u6FTjDmf2eWuDntbD2ZWSkrnW8Za+Dfc6YCvU9KCZ356TCzY0DCrm4kZT F5lVHgHqTLz6NP5NYuMEIkkpBjhainnhpHIIKeDposCjy2+zTvKzrTdYTCHiZvFCOP7J HyoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=AjSKcuCvZm5H17i3z+CAkxRO+lspyVqeYh11bE8gc9Q=; b=TiPWe86GSl/PGuob/E9Nx9ZbrzHgqkIaFkSII1R9lFpc8YoC62qwjkhU6xqUAn5hxY no/T60V1yatcYEBu6DagU3Ykn7JcbCZ8EIlkD/ZJuEPg28rqj5XaC5ChSa01opBLRRUM x7TueNEB25UnJqARjmPxG7rkvP0qKGQWX5RwN/Ymoq6pW/ju4hW6AazPyBTA/YxJYA0N 09c72juLmYzWlZK60umXa6EU0AQ3b/OLPQOKcq4z2loq3yOtheRrIPTnbKWkkGrjc3uF wXgvjn5MfyR07IjBxGuKOhnIlkiz+Ku0qiXa+FbbPuexgBGxkd3vYaYgloRvxAaemIH0 cmmg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93869970" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:01 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 03/21] swap: Support PMD swap mapping in swap_duplicate() Date: Wed, 10 Oct 2018 15:19:06 +0800 Message-Id: <20181010071924.18767-4-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 +++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 109 ++++++++++++++++++++++++++++++++++++++++++--------- 5 files changed, 99 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 34de0d8bf4fa..984a652b9925 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -446,8 +446,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -505,7 +505,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -516,7 +517,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index fa8894c70575..207e90717305 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -709,7 +709,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 1e79fac3186b..3bb4be720bc0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index 8f53d2a80253..bca34fc7a5e5 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,7 +402,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble diff --git a/mm/swapfile.c b/mm/swapfile.c index 6a570ef00fa7..a5a1ab46dab7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3524,36 +3589,44 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); + /* If kernel works correctly, other errno is impossible */ + VM_BUG_ON(err && err != -ENOMEM && err != -ENOTDIR); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Wed Oct 10 07:19:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E304C679F for ; Wed, 10 Oct 2018 07:27:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4EA2E2969B for ; Wed, 10 Oct 2018 07:27:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 42F21296CF; Wed, 10 Oct 2018 07:27:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B28982969B for ; Wed, 10 Oct 2018 07:27:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A613D6B000A; Wed, 10 Oct 2018 03:27:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A134C6B000C; Wed, 10 Oct 2018 03:27:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 901B56B000D; Wed, 10 Oct 2018 03:27:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 5291D6B000A for ; Wed, 10 Oct 2018 03:27:16 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id i76-v6so3896150pfk.14 for ; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=SPI3fL10Tf3ioe9RQtf0bh0kWvivOVdjWM8YAqhViBA=; b=t1YgxA5WirJgJ6JjQS5lmTupksPu283elSfQT6JOv2CZCuuCdj8KPINO5JvuPtlIC9 xKvNYtrsTU9bdOwwsLv/KzU5ynZhkZ3JkEmgPxhiolqWOC1azBCUAYJqluy7e1rNXvcY +5foj4KiTwcUb2LwrUdwUTSqG0/6stkfU1VoViFSvte5RhWmdoFma6WoQK2VQcru9BSo ppQs31vKkdDV8dPj1AHoM+uF2xXjbY9mVtvih3wKGrUuG1nb9Ts+76/Sabcj+GFLhP4H s56NMcWCGCQnYz2hthgBv32YJfd4FV1n/o5VdKLl8dtG2/n3hKdV2VqukA+iH0mV41EZ ocBQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoiIHky3XmRomjNlhVvzs1W3l1FKrZ0BPlzNvOtUEcBsBeEZ5d78 /WEXjRHTTKyi8+OAVjfrQsSvAmsKfxfgEJarwBbyKilnaNnmKdCuYstZhg+83VylJwA/pczhfE0 pOdyrrjhp9wtmS3VCSbZU6Amboq4+c2wFiS80O3wl3aDP2SmHUPnRyo7E2KFaNOCFLw== X-Received: by 2002:a17:902:4103:: with SMTP id e3-v6mr31979464pld.236.1539156436020; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV639B7nBNefnQohM5Mkfwz68IBu4LSoXlFn6TvLfrlV5OqMXgEsKpegLf3Yp3o5my2C/QGZP X-Received: by 2002:a17:902:4103:: with SMTP id e3-v6mr31979419pld.236.1539156435267; Wed, 10 Oct 2018 00:27:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156435; cv=none; d=google.com; s=arc-20160816; b=RmPVi07sQ5hfyCGe1cGL3ytmdR/G2TkuavJ7n+ezLFSPO/wzplAtb+eWvYFujEqWjO RRGnXPtknOhFXlkA9D04OOjam391V1Hxtoc+HiZUBPzfe7dxM9ms52ualx0dj+/DiwXy QZ3gJGdCCJJMjFyk/7qNR2eJSn5UygFUqr7Wt4q3VhGXmT/3XgRik8nXPiylL8dH6/7a QS2BetGBn3fo+IcjrAWkQZBWi4TPwMjJ5+eOZ5LXawpY4imVuNbKEzjnRGfo+1c7X8P9 SYbX+flgYP08jyYWTPm7lA8tl02B/WAOGAhpFIOWPdeL24kqsn6m8uevVbrzr0QHoybF oaOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=SPI3fL10Tf3ioe9RQtf0bh0kWvivOVdjWM8YAqhViBA=; b=Bl9p3MHi9AEPgvZ/BYIq4lhfxmIaJTWry2yhvZrQOo19uJ8G36DKZRBCDFpR8+38pD 6EoPozhoY+NCDzQ6W2xargwidiMWB0F7Av1i+sWAPGHS541XJcsb1MG9WEdwsfEuzqyM wMb6MwSwwFyfrabeZymXzacZ/DZJWJP2DeRQThfuLVpHlUJKn6Tdfec+QCvC07rzMVNp UHZGv04SeubLN2wX+k+UIkHQGhhYZC2zlQJi4C/URg5fyWeqHEH7LKHkgnOW6R6HX8u9 0zMh9IadzaXgXyqaoNOsAvsOhjktLKeu0hcnGbUu0DqE2CWbAh98nYlOmHXyRLY4goPS Ln2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93869987" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:04 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 04/21] swap: Support PMD swap mapping in put_swap_page() Date: Wed, 10 Oct 2018 15:19:07 +0800 Message-Id: <20181010071924.18767-5-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index a5a1ab46dab7..45c12abcb467 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Wed Oct 10 07:19:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634117 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69F2213AD for ; Wed, 10 Oct 2018 07:27:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 565BD294F6 for ; Wed, 10 Oct 2018 07:27:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4A8DC296CF; Wed, 10 Oct 2018 07:27:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1AAE0294F6 for ; Wed, 10 Oct 2018 07:27:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F7026B026B; Wed, 10 Oct 2018 03:27:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7C9DB6B026C; Wed, 10 Oct 2018 03:27:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 642866B026D; Wed, 10 Oct 2018 03:27:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 13F6F6B026B for ; Wed, 10 Oct 2018 03:27:20 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id f4-v6so3907923pff.2 for ; Wed, 10 Oct 2018 00:27:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=VN3CUrMq55MyVl+jAWpxLdfpK0fzhB7uPSIC91j3eRo=; b=oqeWoTxhmOUh1T81WtEOnHpiofvghWfOGy5k/VK+eA2SmZWO0LuwkOyWAWWuofzL7/ L8JpNLL9hHN2HKBp35RimYDn4wPg/yP4F4aUS0tJHLwFSnQNF2dN+BUyXXsJqFE+nMuI RJR0VHt6f9gKTmdDyRSd1q6cVN7fxepNmfzlEIEtVUilXRrbYXyz0g6TJ2I3+uNBDool u8P7cAuln8UGUb467uYAMh7vJBezcoVKMvurAUReYwHkBrBSArpXFBT+Sxlz85ueOXH/ 50LcGgLhHglWRa+xHNiOEZiY7p9bQh7nPveu/DHvCCXrYhZb49jEBS6F56VeObBTPvIX APSw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojedwhCyHoEgF8Q7dX4h/xALTDbmYGEm+zE5N8143xJI6RKwcqb 8B+ugvboVELREkKUWNK/Hkac0dhsrewvS2l07qJmRJrRBtfBWvF10hS1OaWDOwZtIbZcrIYTkCw TLj/QeQTIlyNf/3RsXOLJ5hNWW3tWL6rpN0GpVrzUrDIoRH+I3WTzwed1YW7Y5XSsyA== X-Received: by 2002:a62:3541:: with SMTP id c62-v6mr33641091pfa.45.1539156439724; Wed, 10 Oct 2018 00:27:19 -0700 (PDT) X-Google-Smtp-Source: ACcGV605afik3e+hyWM3Npscz3SqBrzZL86laUCHllXVoAUW8+TghtOtYsX6I7SvE5PkVHUgW6si X-Received: by 2002:a62:3541:: with SMTP id c62-v6mr33641006pfa.45.1539156438194; Wed, 10 Oct 2018 00:27:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156438; cv=none; d=google.com; s=arc-20160816; b=aMvEjHkwukA/1+57jfQtU6hBT2B2vxC3e8voXRLBEk/+4P+WRAeoOlozV3AoB31F9W arfNQDUkm3DB1nNN9LK4oyoB0AkIBCO1lC9ysmK6b41Sd+NUi0UTomFj0LATlrtEWl4R aCZHNsPTxvVWPnE0j1PijhnIirrZ7Sg72Q5LRSpHh/XgfYt1v20W8BeuLvpK0CiQyVkw rg0MolYGMwwvrRq7KeQoDoVxsjkN7/7Cxc9mY6fxcd8UensPFEyEXGwNhc388EnQz4Jo NGfx1tbu5XAtzmVDO+gGgp1AmqHcp1TZN4OAYXZ9X+pUoRYYWYhkdUKuFGzPvku8lqTd cG+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=VN3CUrMq55MyVl+jAWpxLdfpK0fzhB7uPSIC91j3eRo=; b=OFjtK5LGJQGIw28K5mugGduhKsOAklrnHiLsBM4QjvUYJRcMYNpFiutRw7wzhHq9UK ITxJ4A/SQPNXr9YlW1z9HhaRSGgFv9Qo7VPEk8ovsr9HUEFXJGKJFq1d0GORgIpH/1TJ UrNC24O4hIg/GzUTdUbY8+4qsGgC1JKCM64WiZFtWcTGt/L7vRINOaOEaFrARGhkKEAJ 1heCoy7fiWa5tdvNypT2vOFu5dYuQObukezReMUUjfYRbJPKTqdQDjtAZFK+5ctTf+Jd LjwhKTD64izMghjkwXg6jgBB6lZ5pGToYep7Gwj60ZTs7Uu7mbRba7nG0uBgEcOx92xI GsXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id f83-v6si26259632pfk.231.2018.10.10.00.27.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93869991" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:06 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Wed, 10 Oct 2018 15:19:08 +0800 Message-Id: <20181010071924.18767-6-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 +-- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 6 +- mm/swapfile.c | 171 ++++++++++++++++++++++++++++++++++++++----------- 7 files changed, 149 insertions(+), 49 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index 984a652b9925..e79d7aead142 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -448,9 +448,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -504,7 +504,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -522,7 +523,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index 9d802566c494..50282ba862e2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index 207e90717305..17895a347056 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1134,7 +1134,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -2823,7 +2823,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index a6964ba74d50..3cc1d58a534f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -662,7 +662,7 @@ static int shmem_free_swap(struct address_space *mapping, xa_unlock_irq(&mapping->i_pages); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1180,7 +1180,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_lock_irq(&info->lock); info->swapped--; spin_unlock_irq(&info->lock); - swap_free(swap); + swap_free(swap, 1); } } return error; @@ -1712,7 +1712,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); } else { if (vma && userfaultfd_missing(vma)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index 45c12abcb467..8d8803103543 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2340,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool frontswap, } mmput(start_mm); + + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. + * find_next_to_unuse() will skip these swap entries, that is + * OK. But we need to wait until they are freed really. + */ + while (!retval && READ_ONCE(si->inuse_pages)) + schedule_timeout_uninterruptible(1); + return retval; } From patchwork Wed Oct 10 07:19:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634105 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B124C13AD for ; Wed, 10 Oct 2018 07:27:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B5A0294F6 for ; Wed, 10 Oct 2018 07:27:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F374296AF; Wed, 10 Oct 2018 07:27:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A9FE2969B for ; Wed, 10 Oct 2018 07:27:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CE446B000C; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 75A106B000E; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F63E6B0010; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 193C26B000C for ; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id d22-v6so3925308pfn.3 for ; Wed, 10 Oct 2018 00:27:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=j5HQTFFZbWZbENw6KpikPY0NqlpDDK/BRlN3IJEDRps=; b=ONV10/oHNN/guWEa3bk3skChZ8tufhhjxfG4Q9X1Z5ize35cgzwA3/YJr4YwmC7cfH 0J4oc9rj91zKm4kTc/PFSsY+Ck1OagGriVIR6V3BrrGJ1kqgnE7OfuyHKO8dAMz+We3b cG2IMdBn0aPoYMklMn2IJ2F0Z5gPEYMtP39Xyax5OwJexU7iG3rOle1h88Xbt9jtZXdF ra/NvrBT3Ap3chZrVARAkahfhukyr6Q6GnAYkeBnDjrUEY7RIWF62YB3xFWxmldxw6gh ghtA7uETrzMtWg8AsA4fvbt5NFMKXUP+1h/o4VTxxV5hjZpImzRfCg5puL2MjOACenD+ KxmA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoi+MZtHLlXS4cUubCO4GyK7BMwOSX4pCEVPUpazxjjnJW6yTL3w VjS7DpywpRsa71qMecj4K+iWU4GimDp9e2hRxfjN8y3GQvONpnUZx06JITAVAuUFg5Qfxm029EH GsAev7v07tvSWmyWHVXwJeiGTTySnT+zbG+7s1G5QIiPHZwTvsHM5fOmeCnfW+FSQUQ== X-Received: by 2002:a63:f005:: with SMTP id k5-v6mr28860338pgh.259.1539156436752; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV60ozCAgpXmQPiInHcorYddd/MQ7WNoztSfmeFIQBlhoqUxp0IpW7QaA83QQOXvWk2AealQh X-Received: by 2002:a63:f005:: with SMTP id k5-v6mr28860292pgh.259.1539156435922; Wed, 10 Oct 2018 00:27:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156435; cv=none; d=google.com; s=arc-20160816; b=bvjbMhieMUKQkgOcNtZqXggSAkJBanYb/IvIlW6cReMjeYOd5tiDr4CJesoUfNp+qq p939G2BOvByC+/rLdW9RLqlncm4IKXYs2UwAgJwaQPB80XSWCLETwubsKPmKNqo3TYcZ K7U+Mt3Oq+ak7Ssxt0DscGD/A98zlngP+eS/5C6KZP+QBsWs+Ib1+BI3+0GcbV5Xl0Qa mCy0Y19SZVQGm8z4tQnh1jymjGxUpLfGJvC5ZneVSsveJstwX3J18Paq2G8R0HalXHIo +5MS7BsFw4Sd3gJEwBiJ+Henq+KJz2USF0br59zwJKMUCHk0cnO/vltAOxkY7ZYNmSbw R+OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=j5HQTFFZbWZbENw6KpikPY0NqlpDDK/BRlN3IJEDRps=; b=lMH6Kt6bShzMDVJTw9LvxgofcLVhIwP52lrQelyOY9Pj8/KKFk5sXroCSCHdxORa1B Y86puftdPAsW8yDbXFPjBQx91CjdxvVDD3Xb3E5JFtE9l8bKXnIiD/FqwZM4455Z9lbe sqUoWYDHm/c2zGcHlFRK+klY1LWTQ/YJ89NQKDHnbSz/S08cktF3KAT0AR3e52lnoLnu uRokm0Bozox2/2DgPSwgPZzRT38VpmYjDSFICp0tIKZmXuQCNXLD81Nc7SxoetC4wHeM AICKfg8nzxvUXSQmTmH8PvJoOnA/Ne3jhexzagIIjL2chhySbDeDTA9NV25npe0cOz0o jDwA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93869997" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:09 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Wed, 10 Oct 2018 15:19:09 +0800 Message-Id: <20181010071924.18767-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 85 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 99c19b06d9a4..0f3e1739986f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index e79d7aead142..9bb3f73b5d68 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -614,11 +614,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bae21d3e88cf..9f1c74487576 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1624,6 +1624,40 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2090,7 +2124,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2114,7 +2148,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2158,6 +2192,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2249,14 +2286,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 8d8803103543..fa6b81b4e185 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4036,6 +4036,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Wed Oct 10 07:19:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634109 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C19E469B1 for ; Wed, 10 Oct 2018 07:27:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB6E1294F6 for ; Wed, 10 Oct 2018 07:27:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F7C3296AF; Wed, 10 Oct 2018 07:27:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB693294F6 for ; Wed, 10 Oct 2018 07:27:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B78F86B000D; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AD9916B000E; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97B636B0266; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 381116B000D for ; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id q143-v6so3006715pgq.12 for ; Wed, 10 Oct 2018 00:27:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=aNdtzsYao/w0pZhL4hmukJr6ZDEaaaPLI5V4qAK6Gw0=; b=Yeu0SnFxumijg4QgoxFLRdIQEU9FDiyb7ON3EkW31J9c5zhONzCur45k/V2xmge13I C62zcq7Xs8o/G467p7lE0o5MI632+Lpa422Ip8uWr3jEW+I3CsBZh7HMr3R+y0XUKjuX N7JqaWXtTJsa3Wn0Ml+hIFIKUHlWdf052AGOx26GsV8V+GymiYeuZWTosM9zVFNkTQGP QInTjMIt2x1TAlu7X6zwLGgTiN7+pHB1f8frIDPgegS5J2qtcYiNDjh3c5+8xEE9lzr8 fM69l8duyYPHLlk059LwcvOCLAcyemSkUkrDSISztTKPj3SMMEHIV6bKIT29u7ssZzvw 6L1A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohX0Ci5yJil6OcUx00A33Q6ccvksT8LfOk/GUJYZSxwUuF21bSB nKJ2LN8Sp/EYwP7NmfP4XNHHYnDaWEdKPho1AJOaPECqOtYRAu7TZTKfVdPD5FlWwjo9RAi27qe Qid5sMCVf8EPFo0jOB8CfLEzLDpj8i01+soqMMDRkKfM+nS8sP0Icg3WGb3VFOWWjIA== X-Received: by 2002:a17:902:5a45:: with SMTP id f5-v6mr32437462plm.26.1539156436901; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV63rRVkeGggd/SIfgHU8soasIImkDUkoVcQBj2g1NdMl2eESXCJuewnwp4cwtmm2GqB5k4QT X-Received: by 2002:a17:902:5a45:: with SMTP id f5-v6mr32437408plm.26.1539156436104; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156436; cv=none; d=google.com; s=arc-20160816; b=CcX2xCaxfEROPWEi/4n3MXYKQwGicgJK94BeuRvkY+t7NdauwmXTRiru9jAyUvK9tx 04Daku4h0H+p52IhKT/zTzTorcVdzBpzahX2YUtrny608c7q/4mQTQYRhgG4GhpVUwo0 PqJzTwWBq4BRiPVV72H0qXRlDm2P/AacIDTFLUDZkg1UJ33VbVDPVfZFYeWMmQLPyCt/ gXj2BR6Io8Cq0XNdxqq+zCLrGt2y1jAYWWzRQSrQBef1CNLTB4NEg9BUWcdDCL4H50Fo IgMZV6Ac6UMIc72RwNyVc25naBvt4Gdb98QD4BOa0P1o6DwfKyy5dgqRgh++ezHgXtYV kPuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=aNdtzsYao/w0pZhL4hmukJr6ZDEaaaPLI5V4qAK6Gw0=; b=MGt4QNsYiwo8dyMcSsPJF1tF8LMFumpQKuA9LlleyT6VtP970xjNr94KlpDS7Vnt4r UgNWc7t9i8IU86JqJ4jGBVZHVMqGo6moWHpJJBEZ9QY7QKKZqpVyzeAGx1Xuf9p3tb7H cVMpG+CfOIFGxxMy0BMSHEhwGQlQ8uJDns1x2fhG4r4jjKdEJnPJXy2MZZTX++HPWmHs A6hvq59D6c19ltky5siQGnbTwZigW51CNa422RPShUhZIUsTPfPH9XH2szFDJWgRxxzW DzLk1WZK4Fw9NYLVKsFBa9cuFxVFvJb3mp/YeOcyGKUnhyIqzZhEFLI0ebJ22Es7Tq8r 2IIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870001" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:12 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Wed, 10 Oct 2018 15:19:10 +0800 Message-Id: <20181010071924.18767-8-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 ++++-- mm/huge_memory.c | 18 ++++++++++------ mm/swapfile.c | 58 +++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 9bb3f73b5d68..60fd5189fde9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -612,11 +612,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f1c74487576..92e0cdb99c5a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2517,6 +2517,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2740,12 +2751,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index fa6b81b4e185..2020bd494419 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4066,6 +4049,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Wed Oct 10 07:19:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634115 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ACDEC679F for ; Wed, 10 Oct 2018 07:27:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 807B1294F6 for ; Wed, 10 Oct 2018 07:27:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 74792296AF; Wed, 10 Oct 2018 07:27:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F973294F6 for ; Wed, 10 Oct 2018 07:27:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4498B6B026A; Wed, 10 Oct 2018 03:27:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 41F076B026B; Wed, 10 Oct 2018 03:27:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E7996B026C; Wed, 10 Oct 2018 03:27:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id D72896B026B for ; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id t8-v6so3192472plo.4 for ; Wed, 10 Oct 2018 00:27:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+Asm0aDXbs+zuqg7wRUvJ8qEiJGd5e22cbgPyphISlA=; b=TF6eaxjwIL5XoZDzv11QPok9ZD8Zvz13vrKNOaqVv0L5bhyKiXCLmLT7jhga1mxVor 54UeTbyG1TzbwF0MGYgNFVCTS0s/DUVF8lzR3Q5udeavV7CkcgF/PD1HHt/8MOWBsHAy tNnSJqIuMWZsC/jO+o8aCbsJ4NcbA9GVH+y1WsuW930ACPc4IvqCrcbGFV/x6FGbXJRI qoSllxY0iEiAgCv8LyI62hCKIF1nMn1WFmbtFppl96HV5BsSBPVtaYcm97zRz96wuAdD MHYnLioXCJUXUqvE0x9GzCV0kpXaqyiucgzMu1CBQ8pFdfuugOpbTkfxIBLcikiYatNq H1Kg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojtTvy1Sz0idlFUuLjJngBCyu0xAqrpC5iAX7BdqoDjY2p+05tB Ci9f1WWrjEq6rbFYYgIK/kEnuwmKKUp4d6kRo0+05tPNcfq0b2m7aLMTvqmWAcG2LA5lwNQsb3r nJG3jwxTHqrX4KIB/VXF6vOELXuZE/UYAeru4hQFQ/qJ8HWXTOjEajHvekQt93vtqcA== X-Received: by 2002:a65:528b:: with SMTP id y11-v6mr28865426pgp.269.1539156438516; Wed, 10 Oct 2018 00:27:18 -0700 (PDT) X-Google-Smtp-Source: ACcGV62fK7aLMc2jOL9NfJ6L37UdKy8spSzjE1zIt6t0yxH8XX9FagWfeut1KZ2HGhHXCX7s3oVK X-Received: by 2002:a65:528b:: with SMTP id y11-v6mr28865367pgp.269.1539156437505; Wed, 10 Oct 2018 00:27:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156437; cv=none; d=google.com; s=arc-20160816; b=v1Q/u3XfiD+7IujPNyT4j+raWdznDC+/ZzB+WNJRILid47Ry6aYKSO0DGmuQXbhY60 aXGpthu3zHvmW7rIH3at/PXeSNeW9yptzqPg7XEUJIDAWUqXFz1s6RMY7Ld628b+b0ge wawpJ7ztIh6jCZZASXJCfhva+AWEg9UEAEmNT4wcY7+zC+HaFH+wxHcdvET5u9VmRIlQ IcZnYWwNzJDI8o1OLah3+028ehvO5xMgq0ugHy/58Lt2dXorNNZyj3S9yCQa2TW73a1v XSGoTaQQtfBQIom2t2VZn4ln3AU6m364lVQePgioHi/WGCtxtNX3zHtmbe/EBc8yMRtD AYgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+Asm0aDXbs+zuqg7wRUvJ8qEiJGd5e22cbgPyphISlA=; b=AZNN0wVdtrY0X6z1fv/ngCwqar2Pvkd3FCgY229i/HZv/80HPA5vfBVahWOev4E3tp Q10wbXu4QLLZYszeWlw5vceaK+ockYprH8tXsLw0VINfSlboFuw7aOff5o5Q3fKNeBZ5 vJ6Ljdtd8VKxVVml8qYHOjLIwf7+gDMGEeOagQMG4v5LTrVJY0/tOcrGnnr8Q2Ubndli WZP+u4zlek3Rdxy/4IaHwfT6E2+PzE1efC6e0M0clAyC6QehdFNvnmFD2sF0Sw7V8Ci2 YQBFOJyzC5f/ayumr9X+x2DYenTJgBr2S1vRlZoHD0t5psEdaOQuBCvH+PORLiaoO5ug kLng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id j4-v6si22942674pga.152.2018.10.10.00.27.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870008" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:15 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Wed, 10 Oct 2018 15:19:11 +0800 Message-Id: <20181010071924.18767-9-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 8 +++++++ include/linux/swap.h | 4 ++-- mm/huge_memory.c | 3 ++- mm/swap_state.c | 59 ++++++++++++++++++++++++++++++++++++++++--------- mm/swapfile.c | 9 +++++--- 5 files changed, 66 insertions(+), 17 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0f3e1739986f..a0e7f4f9c12b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,8 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +365,12 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 60fd5189fde9..f2daf3fbdd4b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -457,7 +457,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -585,7 +585,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 92e0cdb99c5a..a025494dd828 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,7 +629,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, + unsigned long addr) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); gfp_t this_node = 0; diff --git a/mm/swap_state.c b/mm/swap_state.c index bca34fc7a5e5..784ad6388da0 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -361,7 +361,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -387,14 +389,42 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp; + + gfp = alloc_hugepage_direct_gfpmask(vma, addr); + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, + vma, addr, + numa_node_id()); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -402,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(hentry, entry_size); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble @@ -411,6 +441,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; } else if (err) { /* swp entry is obsolete ? */ break; } @@ -424,6 +457,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } __ClearPageLocked(new_page); @@ -431,7 +467,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -453,7 +489,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -572,8 +608,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -734,8 +771,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index 2020bd494419..2ca013df35e1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Wed Oct 10 07:19:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE29A14DB for ; Wed, 10 Oct 2018 07:27:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E84B294F6 for ; Wed, 10 Oct 2018 07:27:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 62A71296AF; Wed, 10 Oct 2018 07:27:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92F36294F6 for ; Wed, 10 Oct 2018 07:27:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C36CA6B0269; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BE78F6B026A; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD8C06B026B; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 5F92D6B0269 for ; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id 87-v6so3864138pfq.8 for ; Wed, 10 Oct 2018 00:27:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ZmrX8pez/6JM1XM11+Unxx+hydS5FpNR+SSfu6AUgx0=; b=BNrPVRUVe8U4qvTL7psksFJeBSdAOdfrhIcZuhGMdC4SaQW5sBVZXmyDT8TmiSSuNz zXPzfgEczsdMUXSegwtoOgByuUlwqcZQgLY9akJ5Mby1Y5gYNB1RhjsZpMuxnECpYtn+ IlFXG+6gxld8X9bmqdDdqzv8oQlxC2u7cn4ceLGJXhST/GC6M0+dJ+a9cgPhINcvWXhu 5JpR0DXoR0L7rn85QqC19yMiqEY/kXyfL2lsht2QGlGzeWeAEwv4/euBrF7n9SB73bjx mfeefJwLMYHbydBTFbwcdiUkrxWWPHQs/jttY/dRghnhpvYt8Ib7Hk0fbunXFrE05lQO tlHQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojN6x+H01sYlP32+pCCgYaEVrC2+sVDnfQfTwEB9UHEnEKppI4U 1BDkT0R6qe0VEAaBbn3vqNmBTVp4g95sjk9ykQ0YBF3KLLnqJxtILmef7MU1xk/wZQ5KSNKtR9S AKSof6Qc5ivVVt5+PyvlT/o+N6RODOdfjGiYugv73Yc3y27C42UtRiQGf/WTmkWShcg== X-Received: by 2002:a63:f80a:: with SMTP id n10-v6mr28990029pgh.57.1539156438046; Wed, 10 Oct 2018 00:27:18 -0700 (PDT) X-Google-Smtp-Source: ACcGV62LzOFt4jnQFeQO+lBCfKa/djWHatkEIxgHoZ2kBFOORZkyrGmeRScaQy2XCxpbsv16D826 X-Received: by 2002:a63:f80a:: with SMTP id n10-v6mr28989973pgh.57.1539156437222; Wed, 10 Oct 2018 00:27:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156437; cv=none; d=google.com; s=arc-20160816; b=cFSOtLGFJj0iIhWBAtzqM93h6gUUHPP4Sfjx3hAj4lZe4afTZ5pLBhEdeTpR0zY1Sn KxLCYj6d+8RvbFKG3C+zPG2belpz6bvHPrkpFTbCRIESTZmEnP9oXQjghXQhnAuGFSyl xte/ZggKmek1ES2EOjbBS4XcW5TnMm9zyMS/0OXvCXjDVeeCcuA/Y6m+xC0ePfw7LNHo jOr4smp2asIY7fvQfmX9rqNMZecYC9NB4nSwmClMiKURtbU3deGAVEnkAO8Wui7exRYL weKdKrcQ2ycHoy2RpI6ar9eQe78pBZvi6Vlii1SiQ5aOy0sJ/x02/WKG2fYnnIFXdeXt 3pzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ZmrX8pez/6JM1XM11+Unxx+hydS5FpNR+SSfu6AUgx0=; b=tEB15ooQuvUuDq6q4r0xx8Rj8YlDRtgVYTnUcNQS/qht7Zp3pxn0Vg0QT+kWHsu21D 8PhSo/FsMG6pNSwT+/uCmMzbV9aiLGB2VNfKVqcy+zjtHkNG5Cct43CesRLULZf0rMu5 6alpPIZHBK2zh7Wx5bFrOeHraiNjQ2dpyAlPAx/b59EfIVdwEPsaAEDFyfj4FlRHB0Mv A0OtxaliZaIpHVJp20avnvQuypKnB7OohgIhtOY7osJFMXtFhiAdJZO7FzaExEKsLNwM 0XNndPfC/L2wK/R04a6N2zeXymm9+VERE1+mefNtkiasfRGKOsZ5u+12xyZb5eSrErzW XZrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id j4-v6si22942674pga.152.2018.10.10.00.27.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870011" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:18 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 09/21] swap: Swapin a THP in one piece Date: Wed, 10 Oct 2018 15:19:12 +0800 Message-Id: <20181010071924.18767-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a0e7f4f9c12b..d88579cb059a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -373,4 +373,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a025494dd828..fbc9c9e30992 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1659,6 +1661,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index 17895a347056..6970bb10cf5a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3862,13 +3862,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Wed Oct 10 07:19:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634113 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3ACD113AD for ; Wed, 10 Oct 2018 07:27:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 111E4294F6 for ; Wed, 10 Oct 2018 07:27:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 056AE296AF; Wed, 10 Oct 2018 07:27:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67F78294F6 for ; Wed, 10 Oct 2018 07:27:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81DD76B000E; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7A27A6B0266; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BF5A6B026A; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 1D4FC6B000E for ; Wed, 10 Oct 2018 03:27:18 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id e24-v6so3025347pga.16 for ; Wed, 10 Oct 2018 00:27:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=xENK5JuD3T4p9vZlZcLFjnr3ZVhqcoc/izsg1VPR/kw=; b=sLaZqaV84la+1qLI1mVxqSES5oS2JM+j/S26JxfQJjWvlc6hsDoYhXQ0NfU2T0/z/P 0ktWPqzUMzig4ekHnMbVznQjAOOJVMtUlbk5TzEebNS3AU4CDq1x6evwpkLYGUFSgBbu 75Ye7/5UhxPhsLW0ipIgal6iSwPVcez6Ar/ehq3YB92pD/tntgRzMd9QUZ6fAioWca4J NR1WcB72w+JqatnyMm2ZYzLGOacMbVI9lnqPHTBrYJWeOlEYHnWuyE5QIL2gwnHGKbPn gQYSWOgeFcr/Nig2slXVPZFaG84Wa8mfjlLyd1AA91gBf1/cl8qx3mSGDs0zkJk5xyzM TLDQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogXPEps7rQd32jMp9DXrTVDi4t6HNfcmiD7SJwLF1NAYOj9HD7g COOd/75xtadSQl/hwd5U2lW+Koq6TjSd58SMGLXKCNGvKjJHlBGdwqYNCYaDyOOV4X4nQalteyP 7+uD6u2yQZabPNgs4e2F+j9CJeKKypLz4H/Wr0EG4pG8oXWqAJKF/jsLnF7IAsmLeew== X-Received: by 2002:a63:ef0b:: with SMTP id u11-v6mr28782168pgh.283.1539156437800; Wed, 10 Oct 2018 00:27:17 -0700 (PDT) X-Google-Smtp-Source: ACcGV639c1keYqNnBRUDzPA7QL8Gp/mQqVDrsWnzeNaWHRA/GdiW+DFxW0KhxqC7mEVKXIFuJ60S X-Received: by 2002:a63:ef0b:: with SMTP id u11-v6mr28782127pgh.283.1539156436989; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156436; cv=none; d=google.com; s=arc-20160816; b=bIq5BnDAa1yB3/4+po8ymrSK1b3q90zJcB51LuiTV0XxIwjbGcioX+whQkVxSrMYSz 1Ne4GNTmAFHXSIaBfBsUiOurJEZg237CvaWIxrWuVumBPJTaeUpctJaRR4V4GT47n74H YacrdLx+L+CFUt6bIWiz+8cbJ4z7yKJbuqAjmyzL2SVOAleEnmCmJBkoD9yRE25Dd3V1 u9HCCVmgCZ99vqBuHB8yTVWE2ftTV/UrIrCdW0pkdRPvwkJVr5pz4345q2vGeIi6w7Zk l4AAaOTNXEPW9mX01wVIjyyQkbpvyZjUvMd2c+NxbrwaS2qqryuCXrHmAea50ZvgDYh2 HblQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=xENK5JuD3T4p9vZlZcLFjnr3ZVhqcoc/izsg1VPR/kw=; b=Ml+gIInSJZaN5v944QWRwYuc+of6etJfXISYjOBQJdrYInFDAwugmpG9Q5Uxtsw6s7 NOQG2/EvCmECWBVMOcK77QiYJapTN2TNUZ120lw403TABQkMG851qQFl0Uvw0mcF2qgV UxHGaBQsrKw0Om7KHu80vnFBXTHCOa1ASMbnnvyn5F75S8VbTwMdCmFlqF2nAmYkDb7+ MJErv3x5HZgwxcnQ0BcLRc4q5iOYq74K0uiplWKUoAj2/I9Nm8HnnvkyJmz4YGjU7on3 nnANNbT0dX22legL1IG+sIkyRCaf1tVcdwp6ReuTawcZlaWFrxfPJcG6VfdLpA8Tv6X1 TFow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga17.intel.com (mga17.intel.com. [192.55.52.151]) by mx.google.com with ESMTPS id j4-v6si22942674pga.152.2018.10.10.00.27.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) client-ip=192.55.52.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870015" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:21 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 10/21] swap: Support to count THP swapin and its fallback Date: Wed, 10 Oct 2018 15:19:13 +0800 Message-Id: <20181010071924.18767-11-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441cf4c4..c20b655cfdcc 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fbc9c9e30992..8efcc84fb4b0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1715,8 +1715,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index 573d3663d846..bcc2da750590 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -401,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous) get_task_struct(current); bio->bi_private = current; bio_set_op_attrs(bio, REQ_OP_READ, 0); - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index d08ed044759d..823856fae136 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1264,6 +1264,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Wed Oct 10 07:19:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DEADC13AD for ; Wed, 10 Oct 2018 07:27:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1423294F6 for ; Wed, 10 Oct 2018 07:27:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C4C96296AF; Wed, 10 Oct 2018 07:27:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EEAB7294F6 for ; Wed, 10 Oct 2018 07:27:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D26E6B0275; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 72CE46B0274; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EAC56B0278; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id D032E6B0274 for ; Wed, 10 Oct 2018 03:27:42 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id b22-v6so3844135pfc.18 for ; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=SywdTI8nFb6y5KXrrQhIZVZ6MZRLfHD4CUdDKrr8e08=; b=j6fgGiYRAuNtHqAxBI0/bNlRB9xNk43cejNp2IuE6toAbU0oHl1JFDCe6E24xqJhud dShz52W/wU2joJpM+c+6qaqtw55j/l2S4t+Yw9POhVfjAE3YJ4H1YSEMWT4spGkus7pS EM0AyrgP6K2Vnx3BBYJnNFE2QEd8LtahitGgb9KEfO8f1A9WLnVTSw7YowUQq0CG1WV7 AODdGFtl8ie4uEIZC9cu1Az3uJbGHr676gW3R5RFXJkmfqPfx/mwaiGKrOe2qvQAP72K 74YmDOoaST+MRLMZNogT8N4yC0t8amw3S+zPDhsBbINqGNKe/AB7zxNHZBjJvYB0Is9T 3GQw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogR/iUMQHEL15wHGlXvNSYM3YOD8hHyC+MPesXo+fAa1+KzBzE9 zWMraZj+OhIuVcI7NSa46/iZ8zXa19eHCDZCLe2ihz1Xg+2faVaWvMdrP8r09ulDWMSJdypuSNG JP7jdOPBInJXGubhi5If+OIDBpKVTjDiMi2iHf/monNZrVDwpfVcIMRTiqgZuWZigHQ== X-Received: by 2002:a62:1ccb:: with SMTP id c194-v6mr33283189pfc.203.1539156462513; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) X-Google-Smtp-Source: ACcGV6276DXtRqH8/jn0hCXD8bJyXYvXo3HSTj+KuGRgc2Jp4lHg8DGXrxIwl/qH6t9H+7u55rTC X-Received: by 2002:a62:1ccb:: with SMTP id c194-v6mr33283140pfc.203.1539156461627; Wed, 10 Oct 2018 00:27:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156461; cv=none; d=google.com; s=arc-20160816; b=RKurWMhADRIi5GDd3r2gWyaSuCgj+P8ipqWNXFpR8ooj/5BHMV3AX/BuQCQzmWmtfd /oG1zAnz4ya0JaC7ygMxB9qU1D/V+X1+8xZppie3zSPuSQC/4zrXYkI/SGNkKfylOpQF 1YfIlHhSPBAfzTch0AyLo+EIGK0thUcCfkoFFHndMNsqTFaCfnv/dEvrHnHmSMZQXA5Q 2CTFTVN0nC6MLTHlMZCDJPlNJIKCLHk3hn80fhpHX/5+mRamObXrgo+GDlDeY4eFXdMp FCRQRM09lfo/IT8GxRGfBKSnDRU3z1IZsS9B75UzCHPsXnO+WoTYnNMZGFbl1TuW2Qlm R4SQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=SywdTI8nFb6y5KXrrQhIZVZ6MZRLfHD4CUdDKrr8e08=; b=vTrXnuEEtEtoXpr0/i0vW+R8E6dOvkp6yLxUJJ3/3biOB/usuTEAEl8Bz8tkUlF/QL QDr6x+1Ajj9cEz4ZZbs/6i0gy1ULHVJS58vNXca+aO97OnW6aCWnYRIo4oxGTM2qTqKg RtjgY4IjXU/EITNMwUfgkGYX692cOIJCKHoz89sAmfFKg5I4fE53vhuF/u87+MukoccM K+Uwp3Xn1QD+dzzzf98lszwTPkpgwECHWxb9833NuOPKCfrVrh04Ns5vAHX1ecu2clNk P9qyYMFRbDZ5vPWiO691KgkNKq2hivxNLdx0zPOAQiMW+GpJuQKKM+25zbOQoEzlr6fQ 3pwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870022" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:24 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 11/21] swap: Add sysfs interface to configure THP swapin Date: Wed, 10 Oct 2018 15:19:14 +0800 Message-Id: <20181010071924.18767-12-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++++ include/linux/huge_mm.h | 31 ++++++++++ mm/huge_memory.c | 94 ++++++++++++++++++++++++------ 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d88579cb059a..a13cd19b6047 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -375,11 +377,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8efcc84fb4b0..0ccb1b78d661 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1702,24 +1756,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1832,6 +1870,24 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) if (page) put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #endif From patchwork Wed Oct 10 07:19:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634119 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28B89679F for ; Wed, 10 Oct 2018 07:27:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB6CF294F6 for ; Wed, 10 Oct 2018 07:27:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E007296AF; Wed, 10 Oct 2018 07:27:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 97C32294F6 for ; Wed, 10 Oct 2018 07:27:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 228CE6B0273; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 119196B0275; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE3526B0277; Wed, 10 Oct 2018 03:27:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 8DBAB6B0273 for ; Wed, 10 Oct 2018 03:27:42 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id l7-v6so3191644plg.6 for ; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Ny3A1D7mRTD6Gh22ZkIw7SPD1iOARg4zb369WUYAN8I=; b=bX1ffAkYQaLvkusLNJW2LtMsznyESgN9BwW4cFlOhgkQL1hrK8ASKtbVFsC021XKag v5htomo038d1EbzX7nGZZ+DlTbOX96nWCLp9LaSPevzXc4M1O0I9fr3O5Mf96kJ5K+HY AnHe31kzras4ju4CLMU9ESdKCwpHio3EUa/B4LgCqHb9rGTdmtq32/JCAccZVLcVSxU1 iXki+yU3c1MmOKN6MnOZqmD19jmOH5ywmaMT3xE8hfvaEGj/wOLk/fNSXx6haoKEAPZc zRgWKrM+c7B8h25ToxLAt1+FeftYJKpDQb+I/YVnnRo4axFrpySF1gvejvtL7ZjK4TeK bm9Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojNhgLGi4VOA685raOX3gJNvGTXgpyiLr+1/8jy4RJsIJ3GiFLX Z6TiC5zKipz0DwdE3HcjJp8L9ASRvhNagRZYMo3bqn3EJRHqkoWzWgEFMJDT4tfGTv09+VSDFjQ ZBx9rRZZpYbNu2AL5FogoxPtADrLvW7sXAYoZt9YhLLmZKYQJbKUWImWTDe1RkloVVw== X-Received: by 2002:a63:844:: with SMTP id 65-v6mr28204544pgi.144.1539156462249; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) X-Google-Smtp-Source: ACcGV636jCE0z3+ydkGGp3SVWuB8vAtG/VTrfbkdAI6jghYi4GUqnrMJj4/f7ahBjWCIPKSUwGKo X-Received: by 2002:a63:844:: with SMTP id 65-v6mr28204495pgi.144.1539156461424; Wed, 10 Oct 2018 00:27:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156461; cv=none; d=google.com; s=arc-20160816; b=y2wCFVCLf/07qfBaiQZ/hdf6x6izw1uCF9m7L+p7bFG7FgGh+w0CsHcrUv574L68M0 zoPDE4BnQWISEOlvyn6EWmYastTgzsX6kGuG2kNZ9Av0aShitArWisJaF0J76+LIX/0G 0j9e0wnCBOwcVvcArx5auHE/uehC8LyUXulav9oDZFI8C/t0yVqys3tYv+Z3sPDGaukQ k7gSDZzeyqDoPDUcS+v/kzIcjkK1MlIVpW9JwKxejHoOgvFt7TKCgbqp5fNSBX5YAHTY lvldYGCCQExuoQHqZ78uqL0GNODAqrihgigKrf6FxXTPGZJ2Mn1fJU+iZWLIm9XCKKpU qMUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Ny3A1D7mRTD6Gh22ZkIw7SPD1iOARg4zb369WUYAN8I=; b=nLDXo3cVqsI6wZz78nbw9a4/MgigfYxK65Q/U6yuenY6PyIBTDVHUPWOWHV667xOaB 3nNOQou7TMH8yTR6ECLClYXy2u+Zwiw221KjtO2yjOuHpoxmPaJrvClrboRiF5fvB9ek /Xkg12SUy3ehk4KtUayioL5j7Z66/9mustk6yLGJ3g0/EM0131z49KNtSwIaunDT40+d Du9b4gr1kcLiXPuYCaaHF8k2YQ6HAq7RZi1I7YWXZgckuii0FRH4IiwvO/f34I+YonAD fjhfMoiQkDZgXN/Qy7RLgAlxUJiIUEoSKg6tnwdMx26TZcZ2zzp+yz4oIwv+zO0bBQsr OgoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870109" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:27 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 12/21] swap: Support PMD swap mapping in swapoff Date: Wed, 10 Oct 2018 15:19:15 +0800 Message-Id: <20181010071924.18767-13-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +------ include/linux/huge_mm.h | 8 ++++ mm/huge_memory.c | 4 +- mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 97 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index eb1e9d17371b..d64cef2bff04 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a13cd19b6047..1927b2edb74a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -376,6 +376,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -401,6 +403,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0ccb1b78d661..0ec71f907fa5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1713,8 +1713,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index 2ca013df35e1..93b6a5d4e44a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap, page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); if (!page) { + struct swap_cluster_info *ci = NULL; + /* * Either swap_duplicate() failed because entry * has been freed independently, and will not be @@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap, */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; + /* Split huge cluster if failed to allocate huge page */ + if (cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, 0); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } From patchwork Wed Oct 10 07:19:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634125 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 87ED4679F for ; Wed, 10 Oct 2018 07:28:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 124E6294F6 for ; Wed, 10 Oct 2018 07:28:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 06271296AF; Wed, 10 Oct 2018 07:28:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 683D2294F6 for ; Wed, 10 Oct 2018 07:27:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3EB76B0274; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9CCA96B027A; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B8596B0277; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id D8DEF6B0276 for ; Wed, 10 Oct 2018 03:27:42 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id n81-v6so3865039pfi.20 for ; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=mX5ypdDMWV6Io78WDyuMiseXST/DCvpQ080Aii+wG/A=; b=g7fRdMs3eVYW/9OwYAgl7KYT12mtTrq3dDot1DaL0fHcqoIAfYoc9OS+6Tb0Zncjiu ycOQi8avFcAMbfCh3aJG42jWn1xnwYZZxdNS+Ylx8A3hQgFVgQBSJ/sUybWL/TiSohh5 0k54d+fCWR9EIEvOgnTK8ZY13CJYw8AfJf2+Rx5oe7CINDuyMwZ92p+lo5bFOaVnti7c LmKkdl7qxTN+v+bfDQZdQSf7mQC3gqKL8clWHOZ2s8p3t/4CsU82hU/2DpnUNI1wAkKV WJSvxfmcU1i7BlGR2rNufXC5Xv+RpmyTgDXju6fRUYYYrZNVsBW3tuqT9sN3sLxzY+7J NKww== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohqt84A1Npi6oYYDN72wFKjOta+Q8Vq486QL/35jvBhOrot6J/u PBtxo18tLYv6+LiZYQWPX1022fqozU+n601xH5iZSQG2FkCV5+bW24KDf7sndi3F4Jsv1mK/fqD g1OYNsF9FeJUdUY6nEn3TyG2dnSHgbKCI+ZzgmLKBfjEFQs8fDamtydDEcWGD6s69/w== X-Received: by 2002:a17:902:8f8f:: with SMTP id z15-v6mr30134305plo.305.1539156462562; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) X-Google-Smtp-Source: ACcGV62yoH5WWSWKBKUTJoAkaFqhuZiglvO8fvA87vWcgCjeyI/AVPunmz1i/PX4gOtsFYIdAyT0 X-Received: by 2002:a17:902:8f8f:: with SMTP id z15-v6mr30134271plo.305.1539156461880; Wed, 10 Oct 2018 00:27:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156461; cv=none; d=google.com; s=arc-20160816; b=aGdgLTOQgTrcAcfhwbJzX11lBSc40+hSrFSEXkn1KwYDLZfWJ3v1crbJYeIqzZGX+Y JfH+T2mMi/zxhybyb60qHrE9SfMBLFuUcab9mhdl19uFKgtOt05iP8nc1tAw8DwoLswa C8M2/Jc72BctLq2AfLUNbR1v9kN1UQHjLctcGAgcmy/c8xQDJyK2GMRjoWfKwtFA5Aog lenWEAvCtMp5C4Uxj8nlZx7JkurHtO5pazWh0feA1Qz02/lX0SCPVN2PMGiy4T/aCN/G Nshbg7Km+f8FlPoq/THU+8nMZOvuYSuJdbYtiG9pXKVuRjS+FWZyNal6r8J3PI6mSfmO lRow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=mX5ypdDMWV6Io78WDyuMiseXST/DCvpQ080Aii+wG/A=; b=x2VXbaL2/nr+KH0JF9WQTCfM9Qh2woFCZsJ6eAyMzSbHILNN57wHGXCBtp85OJHTxd 93CfefdEeBKf3DxRFjOzOHEEnB7jAbbeL/VtmjaFUg2yHZdkJWoztIgdGCZP4RGPso1m xtbtjSyTAKeVT/Dlb9FirdmmKBllu4956Fop4UCiwfTS/ULhk/ngpgElyK82//FfEVO4 dNWcw1mCA6Pwx1JmFie1ct5KwYsB+wNOXVi7lJyNVq1TyWLW8+/YwWDzjku7XOKoOTlP pMYBfCIojGvKLPut5Eato1zQVnINzgGxuKbYJkXX4lZENfQ5oOTgnxEC/IwV3K/6F9y5 3E1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:41 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870228" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:30 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 13/21] swap: Support PMD swap mapping in madvise_free() Date: Wed, 10 Oct 2018 15:19:16 +0800 Message-Id: <20181010071924.18767-14-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++--------------- mm/madvise.c | 2 +- 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0ec71f907fa5..60b4105734b1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1891,6 +1891,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1911,15 +1920,39 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + unsigned long haddr = addr & HPAGE_PMD_MASK; + + __split_huge_swap_pmd(vma, haddr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1965,15 +1998,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index 50282ba862e2..20101ff125d0 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Wed Oct 10 07:19:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634135 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 635A714DB for ; Wed, 10 Oct 2018 07:28:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9BA9329348 for ; Wed, 10 Oct 2018 07:28:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8EAEF29475; Wed, 10 Oct 2018 07:28:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5945B2946B for ; Wed, 10 Oct 2018 07:28:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 861B26B0278; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EF6896B027B; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F5D86B0277; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id C0DF86B027C for ; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id z28-v6so3759951pff.4 for ; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=OVajpOfijjoV9zQvBsBM4hRRITTVpV0huyCCQqmR58k=; b=LMetNEtXWk9G5Enx+KEGq5XLpWsduozRm3wqLE2gdvSxgSQpn7ZqkIiF2wHwDXTr4T eubyyQUDkrTkcYZ8eaEyQLVWIZShMJoeVzj8xZiEwxoYeZpo40EZXHYPCYyG1qoSkJwk TlCsbUCRFP+6pL008V1IpR27FTdWTYlUiLxYvzPnAzzN1orwbKszsDZSkIglVetkqKkK MYPi8acHDXL4I5yGw84k3BP9MBNTgD5zsDAOMy+l9MwmRaZrpcL+g/2cb4SvoQC5viAm HauPRu7iVyYlaRF90PpAQPPpH9c3ylvl9B19kIdagLfhlzE3Xk6r2xGCeGllrNisj46B ZTXg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoij1bE3GNSjQUBHWI6vACDjKpK8wiufNYxBINxr6p/0N4E2vJUx KhI7fQsSJvl+VC8ycFXCXQ6VByC/aoZ+VExFK3BCsEEHXbEjWg0kf08P/h3JzsnW+x5mQJ1A55m kobRXUeKll5nnT41o+5gHW/YDuJKjsfEt24N/Yxn2v7eA/HsVyOIFiK0l+mB89b6jDA== X-Received: by 2002:a17:902:162:: with SMTP id 89-v6mr31480852plb.91.1539156463419; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-Smtp-Source: ACcGV62fHzZxdlD1Iwhefakon8Y1UfSpjf7yHLkjjgMT0unSEeufHZ/idHH2qoYIoTR3yoqAChX3 X-Received: by 2002:a17:902:162:: with SMTP id 89-v6mr31480773plb.91.1539156462070; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156462; cv=none; d=google.com; s=arc-20160816; b=LPcY2oXrzf3dxQXGPYO0XjvJThovfF6vf2wGolyGuxQHRrIkC5UqxLyQJT97yF5/mn N4WeGnvIOJem6h8EV/tgQCkQPoeXCOoQ27Wuy8yPSh9EbhcFD9UrlOnLdBFm+ccIHVF3 kKbFkuYcKE71vw0aEUB4X9KuPgyo2P9XUr2DSwMGc0OEYc/3EbQ8kLzGsYxE3kn/ZK57 P+Wh3RGfWHgOgN5v4+PdAxD4wL9owcOn8/2fbprxXLLFIZp1cd7HR/ZTPkcJcnj1iBFQ 8YHt0DUMpaWHCCsnHtu8LKxeC+WD3K+XoDoI4xTff871QzW9lbe7wVq/csTjFrIdo2Gf obrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=OVajpOfijjoV9zQvBsBM4hRRITTVpV0huyCCQqmR58k=; b=ct1HXA7jneH2fIlYcAFcAh51Buer61qTKH7TDvJBlJ7q0nbN2/goiPABSVrJ/H5s6d Kp1ZiWrNqKFCZ+dh/lmnyBzKj3cDLEVmwjAy3E5MqAzTKa2E4SisoZwx+7NpCo6iAleV vwflUdhrESpcmHC5moSJ9RtFXenl/VF6EYa+gHT7EYOTH3ygvImwkDs/J7LEwUxXaC0S lMq3zMjs+LVm8747dqZR77UfBSnjiZJgIT5AAFRuzjO4GqiZqYKRxT6ko3nt6GSV+vcV vig/XVY8QEV1sD095ribRB7mEoTN0k0yZsPddSJ6ZIpnzdPlZLwsF298AoOMEmu8ehHZ CSNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870232" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:33 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 14/21] swap: Support to move swap account for PMD swap mapping Date: Wed, 10 Oct 2018 15:19:17 +0800 Message-Id: <20181010071924.18767-15-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 ++++ include/linux/swap.h | 6 +++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 8 +-- mm/memcontrol.c | 129 ++++++++++++++++++++++++++++++++++---------- mm/swap_cgroup.c | 45 +++++++++++++--- mm/swapfile.c | 14 +++++ 7 files changed, 174 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1927b2edb74a..e573774f9014 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -376,6 +376,9 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -403,6 +406,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index f2daf3fbdd4b..1210f70f72bc 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,6 +617,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -627,6 +628,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 60b4105734b1..ebd043528309 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1678,10 +1678,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long haddr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1712,7 +1713,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7bebe2ddec05..b6504b81ae0c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2660,9 +2660,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2673,23 +2674,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4644,6 +4649,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4710,6 +4716,26 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4898,7 +4924,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4909,37 +4937,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4952,6 +5007,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4960,12 +5016,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5156,6 +5216,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5163,8 +5224,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5175,7 +5237,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5183,9 +5246,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5195,7 +5270,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5229,7 +5303,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 93b6a5d4e44a..938a800aee12 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Wed Oct 10 07:19:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634123 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8253B14DB for ; Wed, 10 Oct 2018 07:28:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6DE4A294F6 for ; Wed, 10 Oct 2018 07:28:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 62070296AF; Wed, 10 Oct 2018 07:28:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CAE06294F6 for ; Wed, 10 Oct 2018 07:28:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36C0B6B027F; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1DDC96B0276; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED8A76B027D; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 7551A6B0276 for ; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id a64-v6so3893655pfg.16 for ; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=eRs1mVrGtRkrqceo3WVBrbltoam21f/sbNF58/KaFNU=; b=RlqcVMwI0nDD9aVNHi1oF8EaZhl8Sod1ls01ncZPAk8T38qCL0w/sRgMn4yH9VqKes SlCCVDv2och4eO+ZoC+3dmTr5PH6KehzPnqAOaaCy6VqLsBv+2SJQoHU30GO2jJenb2F 9RGeIpGO64CanFtphgerSvvOmXKSTIGq9YQmKimk/IVymyk+J8xD8DjdXA2QkFIvog0K +GE0BKIYvtBzx1z1utiwY8w+V/c9BFazTNv82gjUhfcaz3+8KmhwEzQZawU0iDJAhVdo MikTlWJngwzzv0MDMv07X8pAnCX2qhXScDw9GEZKwVR+Efoxg/NhwMOMh3mtHRu4j3CR bUVw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogrVFIJZx4it/m5VHqBBijI0fCVQDsDzdICHea2PoQ+GspvEK6k towpXgDb0qCH+bstT44+JI7cd2cOMGSMpMJp7ykn9QGs0ttlwBBlSxjvMAf/TrZGHQhZAeTLRbY BMxr1Vc+GtL0wo3KZ3guvNEPZgl0p6ta2KhapNMb+/mLdQ1Rnoo3WVjs1dvUrMjVzyw== X-Received: by 2002:a17:902:744c:: with SMTP id e12-v6mr32143104plt.186.1539156463138; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-Smtp-Source: ACcGV60x3PBuDi0n0wB+bi++9q0MmypsiTz+xdbEHemZDfml9Q8o4o2THGy7EPosUGfCDhiMht32 X-Received: by 2002:a17:902:744c:: with SMTP id e12-v6mr32143055plt.186.1539156462475; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156462; cv=none; d=google.com; s=arc-20160816; b=ykDZsY949wRvEdddHD3oKpZV975nUmpWp8bSTbOluh31jsKYu6byb+Sewk9CjJ18JK Mpii2UVO8VlIvV60UKXuWRGVrOZSfmTVub4jTQbfbue0Qys3bEWZZ77kaVjzv789SACb mpYti5RGoaOI9przP4s3I0wl/WsbUdk4APtG+bwuwQzyIzQpUNjMtMaEuuBkYvuxBNkk qWBQMLzt5EmN3vVSW4Bt8/NE1d05VEQhuJ0eml3bu08+4aWumkfJIxQ3AlWYuN71RrlM U0tNCt3q7y/ARNda4tqQ4QwtnSAJw8988LNVKn+/AQtFMU8YOdTUDLcgeoZcWbj/wYeU P/iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=eRs1mVrGtRkrqceo3WVBrbltoam21f/sbNF58/KaFNU=; b=LJcHB9vYiCvkyEVSVA9fVdN2IlEzMLl032F7heynHwryt8f1ic1JSEKdp3EeNWL7J5 RNzlqB+9Tkhbch309Ai0uLALzwNEm2z7j7ftspGqvp3vQT8yPJOjsYcCklokc/55S4L+ dRKh03gcVaD5U+UOfMP3C0a5RFRxPn6Qg9XC/p0X7NzAreGqDU7uG04W17Xlp3VinHSB 7eSxx8obx/NnQC0k8gAqC+jseqC2nFwCqib0Cu6yu1LA7e47cEMlcgBNSwZy5Z8TaT7T UjTkYtkUawRkPp9jikAljKvPKj2o58lhv1aSNaeAzFQaAzhWZm3ObZGUuZp8N+pZpyei y8JQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870234" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:36 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 15/21] swap: Support to copy PMD swap mapping when fork() Date: Wed, 10 Oct 2018 15:19:18 +0800 Message-Id: <20181010071924.18767-16-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ebd043528309..74c8621619cb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -987,6 +987,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -994,26 +995,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Wed Oct 10 07:19:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634129 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C445869B4 for ; Wed, 10 Oct 2018 07:28:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE31B29348 for ; Wed, 10 Oct 2018 07:28:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D23BD2946B; Wed, 10 Oct 2018 07:28:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DD7429348 for ; Wed, 10 Oct 2018 07:28:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44F276B0276; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B77E66B027C; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6247E6B0278; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id A84DB6B0277 for ; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id h9-v6so3012578pgs.11 for ; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=LhEa0dJbAoeFYliKrYNKBTRV9eW8yVlc0nO1Ec3CVgs=; b=FQ8aTquK+wYQy5XTdD9thmEJK+UUy7vAsbtKX+NqW16VJAnspwo4exfKf4y/ZUwWYL VoPmWjaYy2pUA20h2NgiJzWgrSkEndH3pMntcxq4BXKmD68RMpEUoSFz0qBr7ajfLNHe 6226j7MMrjWbvefxx/0fTAdSYq3887so6tW8tXZhKBdVPV8lLw8KQgNBYxsxAdULkq/Y 1oG1VD+gK8mVGWGw8U/crTifXhRd+r+IOCgWL5EUIy48R6BEve3afdhTajzRSJp5Zp4Y Fk16FDAqom3IOOaHMlAHMd8iOjezIBlgieuhaxCNqpzhHLD5DDbgs8TkxdQqRylBlCAQ +snQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojTy4AjdqIVG2NnDI6Bmm6pdE4F+th/DpXDCnqF4Vm9bvH491CA mAR+wf2PVRjXCmkSj+OhIHm8Mxqss03uBCf0mx1M1YtUFXFf+SzFWPqKhAt7zl0cYlAtwQaVEjX NpXYNqgA2LDrye6jI4Hst8g1ToW8N+6Ad/k3LpSpberr73HhunkxW5nkF7SxWYbMMyQ== X-Received: by 2002:a63:2643:: with SMTP id m64-v6mr28401321pgm.435.1539156463367; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-Smtp-Source: ACcGV63wq7FkFhCyNorI/oal3wLHImHhRE2ZmiUvEXLf1tFYQ7pVrfLl5VevrgbdfsveRbm5HL+r X-Received: by 2002:a63:2643:: with SMTP id m64-v6mr28401262pgm.435.1539156462261; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156462; cv=none; d=google.com; s=arc-20160816; b=gflAJeLIB2Fu3HuP9qzBnKWmMGikJK/LS58GWS80DOgVruhZ/j8vM6/n6AY7myyJBQ MAguSjYxxkWG9scCJN9sFG2kS1qfdSB/S9dY50NJLMT7odiNzq2walKVTAlIjPYe2Ven g2DNAdJKO5P1vUZ0SuENZQ2kl9LAx5WJx/Vr/lTprGWP7dZXZ4r4f3+0XxxRO5DJoX1U SIcn4u5jOGzjssy9bkjXUdcXg+WgsTTwO+4DnQ0Rpd/8ilDY5l3EbjBmpbclhJUqaViR h0l156bORqjeieNktqHmE1uHXpZcSaMqJ5gSuCA1LMxWg9sJgTRrjUvXB1jSviW82aC9 8XfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=LhEa0dJbAoeFYliKrYNKBTRV9eW8yVlc0nO1Ec3CVgs=; b=TeMgCAsCOsxIsjMpzqaiWPpuQTq+UiiAe/8WOiuWIruc0+kmU2zOPzKiR/bNtM8jgq nPBXXoP8oYmFlR9cIqyr1hCSRI3VEt/GCThdlKkx3f/9QwFR1flDQI9VSU/Nn0cEp7xV O+qTisGg8YYLBp/3tFMzVX2Rlr9mVB21RDfBimzPqw0WhemSIxP50lJRNCgSEIkEQWnf MtTrry6s3ov4pvblrmf2rRwSoJvs7Z+f6uUfzdmXEQd1BYQE0Wtt9oqwGAj2SO0YzJpG 3xzYQs/Ycq8EQNFBOO7b2LNLQ9Pp2W2zqlF7/XS1sSxF2a1N6VWnpcRSP7G98hfwebo/ F6WQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870236" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:39 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Wed, 10 Oct 2018 15:19:19 +0800 Message-Id: <20181010071924.18767-17-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 74c8621619cb..b9c766683ee1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2066,7 +2066,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2079,17 +2079,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2097,7 +2107,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Wed Oct 10 07:19:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634131 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 260E0679F for ; Wed, 10 Oct 2018 07:28:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D61A72941E for ; Wed, 10 Oct 2018 07:28:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C523629348; Wed, 10 Oct 2018 07:28:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4478729348 for ; Wed, 10 Oct 2018 07:28:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 064846B0277; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9286B6B0276; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41FDB6B027B; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 9DEF26B0278 for ; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id t9-v6so3174400plq.15 for ; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=qEI4+HGjPHHxxgCWSGHSddNRiHDyEbmk3jwDJQ5xyCE=; b=WgRKDqa/pg0cFwOEdDc4ylCsvXv/zBWRBdsNitIVIzNZZ3mJHhTOs8h0b1iDEPbK1b Xc0qKKD2nGwGtaXUyTpEdOMW2yzwAG4gBm2FO8NEz66GzPSc/oZnKPA9lI56UQ2KJUgs XXNZFnaWIChnZg7PdCXRF/xKNsCBpRFa7OWBOwImbR5bDlyT2UbAQL6w/40ibgWULkMi CGNdT7qBRmJU8fYffUm+YnuBygNakr2heQ9xzWqWLRyBrsj6s8iC37Yviggtb1Lunkme eIfivCU2L5sZnxACv8SD64MK4El9YozX+D29lN+9S8BFdCw+5gbavDg9hg41iCgdKmYg hd1Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohJi2of4ZGUWZK0Ja9eWMoxEMabgtLART0pnEP4uvzBnHyrWCSR 7sk6iUZHHLMLDg4ubXAdk1Wst1kiI5HeAP3tGyzso26kuDxYGxgqhLQ5Zf2Eu0dDeZm+Cq3O+dC AOCMxwT7S9yokRyWg6lndFJDwV0oePGlcln8dGTXtVrRuc/3dVHnT4qBEO1bA+mw7oA== X-Received: by 2002:a62:75c7:: with SMTP id q190-v6mr33930432pfc.137.1539156463328; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-Smtp-Source: ACcGV62AOyTav/bSqQTYbm19nU7TI6JZzRCyYloWu7cPAzGajfNG8Rn3OsezXqm80/xManIpexFE X-Received: by 2002:a62:75c7:: with SMTP id q190-v6mr33930380pfc.137.1539156462691; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156462; cv=none; d=google.com; s=arc-20160816; b=e9FoIW2Ir2QZ0M2GQAsC3ixOriloZdRXRqRsytNfXOlQoD4yGvMir/pws1VvhFvs14 eKjtEaPP6T3Yw1JhLHMAoK9fTALfwrrZb0nKVAh0NamuQRO6R8A3bw0N2HT6/oVbGtL1 TRy1YDS8E3cAF3THzPG13a14klFVqv/T8X6piNgOMFjZ6LRJIWR90hltDqJ0SaMzhwB4 q2dREy2P+5dMJpdVdtu8iw5lZ9yBqR8hEVcP1OPItXCZwHLlznz8vrOHFqDbBILKs8Jk 4VmgbenmTeuOMtHGb1k4Jpkpgq/nae2qsYg1RmzxdpmI/UrY+YGNEXrXufdtrKpTvuUK O+IA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=qEI4+HGjPHHxxgCWSGHSddNRiHDyEbmk3jwDJQ5xyCE=; b=pfRbi+qb153GOwC1Q9oR1g/MMtqNgSMNzNe0I9RAeQM+yCwpZXB+kYLWHhhDQV+le4 6V4JOkL9as8dJx9CZ7lvOFKCPszVu5jKoUlW6dlMqmwkchmp6ljRRO0ciAThrdQjta2D K1Wx3MVh/RNOCQnYsqMQBJZl6VrNHaB0DmnLwW3pgag20pW6pVbcNc1vq6AM8SVboFVa ZkIGaIrEK1Vu4tfzKLHuY9XlsEu/Oz1aQuIeynyeEVlRca+KZ0waeW8gptLR43jWRjGq 3hSqMKCmAu885+hGHA3X3uqzmmrr3B2CDKSQiSzq6vL62/BgiymMNBeNfm6F5jBRlnLo uWMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id n84-v6si21364295pfg.127.2018.10.10.00.27.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870238" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:42 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Wed, 10 Oct 2018 15:19:20 +0800 Message-Id: <20181010071924.18767-18-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 20101ff125d0..0413659ff6ba 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Wed Oct 10 07:19:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA3D569B1 for ; Wed, 10 Oct 2018 07:28:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C431294F6 for ; Wed, 10 Oct 2018 07:28:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90ADA296AF; Wed, 10 Oct 2018 07:28:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04E81294F6 for ; Wed, 10 Oct 2018 07:28:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87AE86B027A; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4CAD16B0279; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EEFA6B027E; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 96E066B0279 for ; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id y7-v6so3176307plp.16 for ; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=7axeIJc6D6MV3ztEZZQeXxD/d0S8djj9mhUCanNCdp4=; b=Oqh2jri44DoCYp4IWDyma4J50vGegcrjYUFn4eIOSSAreY9iqO66MqtuBFuFP6nMF1 AyJ/tRjqO2zx7TRqt9AdygmZBbq4BaCcZK8rmeXoDcXIHf0plhDDte8T5949HPPWSCcR X8U4HlM8yB74Hctd6Vxw2qVQDNj9Ayapoj5+ckWHm9jUauzWF5qTkV/BWhFBkte6btAl yQuVFGsRTVbWaAEp71lx7soGyLSUg9YtAeBk4o8F580VUtRgvrU3g1CTBszh+cPi2TUm /2XWqdBhJw3aVoglktJT2gmH9pg2kgu/x9lqXgbKH9C4qvTshGjADgLkdgTllCP0XUOr pyww== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfogQh/poijSc9dQXklBkbUwKW/4+QmoCMMIL25I5Dv14oaOQoImP K6anky83I5N1eUsC1FFXzV6jLrhSy1dMfOmBLWrLqYKDBpub7aNYoK1gXNBQ/+Agvr4ls5j9PM4 SM4dF/Tvpg4P3legN+P0kprMZLl96ty2qYCz/LCl6jju24EmvcB67SCx4PsjWVCM5YQ== X-Received: by 2002:a62:4151:: with SMTP id o78-v6mr33808912pfa.66.1539156463304; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-Smtp-Source: ACcGV63TEL4EUc5DuvrfAFkxcSEaOi3MhLK9dcaztU79lGAorE8LKeEDxKU41/kT1eAm9MqNMUJm X-Received: by 2002:a62:4151:: with SMTP id o78-v6mr33808862pfa.66.1539156462558; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156462; cv=none; d=google.com; s=arc-20160816; b=vXv073DwzQOZX66NLsD1qeit6BTIXgjTTgL2/xurZJLgLM3DsZWFFLr6PQOI/4m4OU KjLDe0jnGX2J+gz+XBZjBkaDJLIpaQZbw8uUfstCy3SV6vVK87jILjFA+DmgRxzbSoeC Y30rwNYz+E2f0YVCEdTnI27nT1MiNdFM/y45ERrHRb1b6HguJyRz/subibFVdrAY1ZXn rlTUhiKdynVOuBtbI0tmgbWE8Wm0kwEIwhxb5Cb1I9mW/eGZbD/t5fRrkpLLyaExQeg5 gepsIrtUbekzKjK/QgdYqzsTvce3HCv162TG0/bhssaikiVZz9XS7Y0HwYKmPEfJKXVS 3W6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=7axeIJc6D6MV3ztEZZQeXxD/d0S8djj9mhUCanNCdp4=; b=BVnPH00xah/vwFWQ61ygs2gvI0vOH5HXu816YDoza1v+7bjNfes6pjwIKaJCe5JaOu FBhD3vzn5vm4UMh3AR+TE/t/WBJ+wg1yz1j9DMuO/xonWnrMyJsJVoY9a4PEeR0PuHnA 0/1eLcMjbaMmF5JVZ1nFO/cRkzvJObCnkc6jaVf0PTFQPS4DwMAJMJKnEOvijND9OkI0 c9jN9tP5RrBif4Edo9oFhebCRCLuAxxcoxYEKJeqXIpkG9URUlzk129Fvvyk4nRdl6cN N0B1UpqJnfTJoSeUKW3i/BP5B3dE0Bko/LPgsQyBmcX+QOKw7pKTrYxvUYqXRD+6qyZm PS8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id u4-v6si2721941pgi.554.2018.10.10.00.27.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870240" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:45 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 18/21] swap: Support PMD swap mapping in mincore() Date: Wed, 10 Oct 2018 15:19:21 +0800 Message-Id: <20181010071924.18767-19-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index aa0e542569f9..1d861fac82ee 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Wed Oct 10 07:19:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634133 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B164913AD for ; Wed, 10 Oct 2018 07:28:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DBBC294F6 for ; Wed, 10 Oct 2018 07:28:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 920D3296AF; Wed, 10 Oct 2018 07:28:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E53FA294F6 for ; Wed, 10 Oct 2018 07:28:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C000D6B0279; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 690946B027E; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 364E46B027A; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id B21B56B027B for ; Wed, 10 Oct 2018 03:27:43 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id g72-v6so3920976pfk.9 for ; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=RJGqsCPIMKQSid7ITXwAGT3UEHRHINXdfemoKnfozFw=; b=mGNK6iPkTTjxRZtfZ4W2UcXazUD+5SVM2C6y1fU7t8U/kIerDPJncAldfQjsJx7o0i ndiX2odotuo4JMV9/DsjzvReVo+uz1cFUzMnzKmT1missxKeghMdqIXBQaQrc0/u74Cc 6ddTqux9MFCvUwLlQA+S1QnuYNTpnCvwEf7V+F4pLmGpIhiTa+DfcdIq4EcAzdgTGNe0 PE4Gew14jckLmFNkNdrl5qL6xfUcZL5JEHXrkm17WCPMPyoirkyDgE1Ql93V5kNY1t5w ckmGCvrheZyTOGJX23W/r+CP//yGAhJK6/6HpxU1+z/KU7EPgRvrR4KKKLt6cnXWeEmg re4w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfoingg2iKeCrqrww6pnO3ikrvmnrsa2559kembwZL3B25Iu21AUq 7Ot7WFDtBueVkyfmbrq5KTJj6SxIMq0BaWLKxt7VGM9oetzkDsEpReDgXbUQfUAMK8fq9tTdwE6 EOoUiZTUkBieYEKGF5zwbhXEJNdLH8h2nxbA3xCw3ci4DrPVxDJAy9qRApGXdpxNh9Q== X-Received: by 2002:a17:902:d881:: with SMTP id b1-v6mr3821216plz.10.1539156463401; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) X-Google-Smtp-Source: ACcGV63yjh0VR6Q5XTPb1VVr6TyPGga4dCZKJMrplPBnQklrsU8cFUpQDiq1rg+bmjivJBvbrco+ X-Received: by 2002:a17:902:d881:: with SMTP id b1-v6mr3821171plz.10.1539156462735; Wed, 10 Oct 2018 00:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156462; cv=none; d=google.com; s=arc-20160816; b=bZe5NexsQPku5Sp1gnhR2At3H3Ha3T9ypnytZHl30i++cNmhkpjQx/H0mrtvLNLyrG 7I4iI7bKHyiz3knBSpcTOkjAvFTprvACZcbeSjO/I0Ny9NyTpZw+Hg5Aj9DqEoCTs9WY ZrpbbtzUMTIE5seKCSTVHa4KEs3/InQpQ4W8yOPjSL0XfP9bSSD1tM+Db4cJyJWIsvW7 xJx2BhkEY/pxYYJd9467NE72HZKQ/egnF2suj5iwwAcnBectQhRihrCca3cuWSrtjKlZ /X31U2uiNJymzTV4yTKKUGOgfiGP7Tm8VSKLbKGeLJvvnBNF3j0cgOY1gudQzg4nICee L6sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=RJGqsCPIMKQSid7ITXwAGT3UEHRHINXdfemoKnfozFw=; b=ttX5HylAR3skzZ/1ly7UG3cEG3sE0o94/pfoxeiqX2XoXBnXmCUHlY8WFpWNySIC9E b0Lw8vOMSzBE2/B+dQP78Hnc7ieptAcTbL0bUcRU57a3plX3erzcu4fNW1lCa6hw2U7E nu35xQKvMFyv0t614DCfhba23c29zHlKNygL8E8wxz8KIwWoF0Qf/cSIRCm3Rm51uJvF I6vwnmTHwd/gLxjvT2s+Xdfw/VyuAYXt+Z6PczW7IrgFFvqt+mhAvyJPhGEWXl2C8H7W lRGwutowIX357EZaK85ZWt4YrGO99b9xZGcKBjPQ5WDQr9RelQ4DMJn7bpU72psupZ+g NB7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id u4-v6si2721941pgi.554.2018.10.10.00.27.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870243" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:48 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 19/21] swap: Support PMD swap mapping in common path Date: Wed, 10 Oct 2018 15:19:22 +0800 Message-Id: <20181010071924.18767-20-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 0995a84c78dc..befac96b42d9 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -984,7 +984,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1314,9 +1314,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1329,10 +1328,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index 08eb350e0f35..17fd850aa2cc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -216,6 +216,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -243,18 +244,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -276,11 +281,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b9c766683ee1..abefc50b08b7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2133,7 +2133,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2219,11 +2219,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5837a067124d..7a5c1d2faea2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Wed Oct 10 07:19:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634137 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2336213AD for ; Wed, 10 Oct 2018 07:28:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CB0F2946B for ; Wed, 10 Oct 2018 07:28:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 40C132947D; Wed, 10 Oct 2018 07:28:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 810DD2946B for ; Wed, 10 Oct 2018 07:28:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A42B86B027B; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 537646B0281; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24E076B0280; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id BA3216B027D for ; Wed, 10 Oct 2018 03:27:44 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id h37-v6so3039087pgh.4 for ; Wed, 10 Oct 2018 00:27:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=u3MvgOCK5UKFfvn3uhz0b41gj2F/KojKmEvUNz05wtI=; b=Pp++5oIiJWmt0oCEcBJCPUSbK8anBgT8A5mYtChU/oQasBCCWeqrVjY0G12WQbeW7E Zr8rZaPRGnvLdPz9jgjubuEN0+QICBitNH4nB4gztWpypORqiPI6GYoqruwLe97w8JaE oabwBopCf2AFzQiVE4HtvsddlGfA//+9FWHkVnbWlwYecCosfkkeFHU9twBEReCmBR2D U3fwqHh6q2XspJmvsVLBll92q/jhWFbo6jTRhxe93JesD1/ztQPjt2ejSxl7HCxhXl3i q/xsYP1PZj7KJ4akKvNhUxIllKoFoWH66/KiojWoLKzcbAaLw5HbXNWbiV2Q8tLr81lZ jtSQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojAB/fwr0sKKy8LaeQ8169LE2ATEL1+MvM0G5ronJTfPRJCWoQn 5coBhB5zJo51D9HekpKwi6QTFlhSd51icfaADUA3GqvdG1Q6HL9LCnRmOiVYZ4l2SPG71Bn01PS GXqwndyOQhL4NwF5sFmltkhZpgt+Q+mLW4JpNT+t3BqKiVCsU67JB6hg5qNxfeC7MwA== X-Received: by 2002:a65:4d03:: with SMTP id i3-v6mr28798167pgt.239.1539156464407; Wed, 10 Oct 2018 00:27:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV60HHmd8iSFCH/x7cryw/c2LjgEKnzwxh5lFGKj64Xd9xk89plI+YqqTo6gmGnOPXDaIFmoK X-Received: by 2002:a65:4d03:: with SMTP id i3-v6mr28798120pgt.239.1539156463397; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156463; cv=none; d=google.com; s=arc-20160816; b=XjGatdm47nwtzENdCRUU2S7Dcu9wcTcQ/gsjsdmyEZ6hZJqpqvhZhvsYTe657WPJJ+ y4yklj3zRWKbmjRQR8idZIp9h8jxZb9TQtXoDj8UNsZY3/rB9bKyk+1+KENWQHV5QL+2 uJInvNQc9vTfNN9JWcIfpKlKVncIHJ8xp+Aq/PFoqokCcSzF9WOckzkwBv7U3OWm9h3b n9jx0oNAFkhbrTcjOH1sOpZ6cwjZHmXwia4V/q674Zxa5k8tZIYxdvz209k/mXbBLqM5 waM3Y2Yivqbc4oli7H2wRBdkyETPSKdYCMVSGVxYs8Gmf0WhnrQaeDdBhAF7ZzTzDlfy kfUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=u3MvgOCK5UKFfvn3uhz0b41gj2F/KojKmEvUNz05wtI=; b=gIofPqsihKHNQGuG3tetXl7X01vLg53miPi4kPanxdkdTshqSWQ510CAyZTrdpKJgl HApmBSPgjdCV12K45kuJXxi6f2so626JfwQvezJDCdYM8FgVf8OwoNYmSX7FYFAdFMjb DvR6Z77IFfxzBoEiEiN++riNouYJhGYciA05UQhwhpJ087f4YNjHgMmuNKo8yvo4De26 CUoyS/azor2s4soU7USf5qhJnXnlMOOF/dX3ysUyYHT2pgyt2tybUsRdloX1gkP2ajd1 Qdn+MLaeukuvngi/SZhtGNCRiIb1LNOR0LzOnyCo+nebKUZt8hmdEBYOZFsVo3XlVKgH 90Kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id a3-v6si22556744pld.351.2018.10.10.00.27.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870250" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:51 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 20/21] swap: create PMD swap mapping when unmap the THP Date: Wed, 10 Oct 2018 15:19:23 +0800 Message-Id: <20181010071924.18767-21-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e573774f9014..f6370e8c7742 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -375,6 +375,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, @@ -382,6 +384,8 @@ extern void __split_huge_swap_pmd(struct vm_area_struct *vma, extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -423,6 +427,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index abefc50b08b7..87795529c547 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1931,6 +1931,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index 3bb4be720bc0..a180cb1fe2db 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index a859f64a2166..017e9060082f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1318,11 +1318,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; } From patchwork Wed Oct 10 07:19:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634139 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E089679F for ; Wed, 10 Oct 2018 07:28:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C8FA29348 for ; Wed, 10 Oct 2018 07:28:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 50A6C2947D; Wed, 10 Oct 2018 07:28:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE38529348 for ; Wed, 10 Oct 2018 07:28:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C230E6B027C; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7D4CA6B0280; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CB056B027E; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 06E956B0278 for ; Wed, 10 Oct 2018 03:27:45 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id r67-v6so3874979pfd.21 for ; Wed, 10 Oct 2018 00:27:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ENt4zqdOoB7uHRc6lezMAbPgo+fZ6Fax6wy0cRS5gFk=; b=AxX2R+1gDThJztllWIr9wF1KNdUhE6BAIiA3m1SA8vArGGD1Lj4KkMC+PmgVvIn0Sf fKEVHQMzthH7eNMnmh3wG7CG7owQLX8bW4qae8xK4Y227UYW551x9mdDsuiX/zXTufkN Qo/bLw4KpQeVgKml9GhRPr4yUEQTUm3yrk1Dlv6hBXGZDaAvh+61Y2mHqcZIvjE6/tnF I/LPwatlySum+jS3IJqT3h8RZwTFsaNJUdHrkSjcqhv8vfdlG8JSO38JHnmpYgKXcS9i ZYgYozxK+4ci/tQXu2g9ZHinv3/bDRH1AQf5TkKL+TyoLygMJqVniDlGLd8i7u/lJoO2 UuWA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfojj+JW2hZySU0jr/XMZ8t87SOFssDmLYkFK5eUqSHk69zqyHOyp QAvnH3wcnU44MA0wnTbhkcQ883GTkpsdwkFdgw8TgZsPlt5d2yc0AGNHHo3p1AJKXy+E3zto8Wh c1oaYKSf+ZfgQvV672QhtQF//r21lK1UPDXJljXRExjfBXw1i+XoORhHrF+zbehh8Og== X-Received: by 2002:a17:902:aa47:: with SMTP id c7-v6mr32735757plr.100.1539156464717; Wed, 10 Oct 2018 00:27:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV63iSn8U9kegfvbDVgBTTiydIHTQPsoEa8Sil2PiEPg4UZVAGpG7D80XAnsRPX6GUcthrf4j X-Received: by 2002:a17:902:aa47:: with SMTP id c7-v6mr32735709plr.100.1539156463902; Wed, 10 Oct 2018 00:27:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156463; cv=none; d=google.com; s=arc-20160816; b=caHSD7pw/TcaxlQdfeD787UXxlnU1r5umythByPFDjOfjq7xDqCgBMLqYPsKpeAWWC EEtVnoO5CXeYVGoF41gT7JxdklyI6sXn/tCnoFx0HiCXeftn3VS8V9N2vGq53N2Doh2b p0furqdnCMDeqhfeSka8pXQtYY/D4FmTTaF8kTNR+bCxRmvmZ2bgZZZUeJCZAOlJ5h5v KbChmJYqb9WzUk0oQCaewnmZ0OkNK6m1Shu4LC/tVm2SDq0e+c3meFHhQeq87RY1TCLc 9mWQlj4zMeozGLwhlvcAa6mwuy51h9g4MArGvSqfdnMfRAVfpIeP+Hksd6/h10yIE8lS J3Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ENt4zqdOoB7uHRc6lezMAbPgo+fZ6Fax6wy0cRS5gFk=; b=vMg44cVFSb85HtWihxirdCAfjXRfkDhI5MDDBy39cDFfsPEESuckVIiNB1dFAx+frA exTtdXWdZg775ZLLu4e0LdQU8HMFtp2pd4lktRDwxA3xcaK8ZLWkGjbDcxvChl0DWwg6 TapCR30B4OXji4D4ivbJbfHTfoOm5veDFBA0m/PsO5CSH3gjxr6ce8unIfvDnNlJnlQr 8PZ3cRaPilRruSZbUj0WjqaTu14fPnZxXE3tVEtxeXNo6G2aUk5l508rj8p9wk/eeq33 2CxjitvKm5i+GH6UF+JmrVcCEPA1LkGsh7K0rAVWFj5yL7ddU+wK1gUPMfaxRA/gLxwA j3KQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id a3-v6si22556744pld.351.2018.10.10.00.27.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870252" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:54 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan , Dan Williams Subject: [PATCH -V6 21/21] swap: Update help of CONFIG_THP_SWAP Date: Wed, 10 Oct 2018 15:19:24 +0800 Message-Id: <20181010071924.18767-22-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The help of CONFIG_THP_SWAP is updated to reflect the latest progress of THP (Tranparent Huge Page) swap optimization. Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 44f7d72010fd..63caae29ae2b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -419,8 +419,6 @@ config THP_SWAP depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now, swap cluster backing transparent huge page - will be split after swapout. For selection by architectures with reasonable THP sizes.