From patchwork Mon Sep 3 07:21:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585551 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E358913AC for ; Mon, 3 Sep 2018 07:22:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD53129189 for ; Mon, 3 Sep 2018 07:22:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF4EB2956E; Mon, 3 Sep 2018 07:22:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 128BA29189 for ; Mon, 3 Sep 2018 07:22:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C6D16B66A7; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2762A6B66A8; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18D036B66A9; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id CA78C6B66A7 for ; Mon, 3 Sep 2018 03:22:33 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id a26-v6so10196678pgw.7 for ; Mon, 03 Sep 2018 00:22:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=L8kQtIEtW4lg7vG3ygekpUfrSnZGs3I3qFVOP5bkFF8=; b=Q6SKVWTVH5hw3JWaBOp310XRlEdzhdEvbOM1Kya1VxhIjOtaj7p5uMLPQCWsA002MH 8stIR/8l2wt48YBkh94DWU1YvDFTe4IpOS3B0fQO972GrLLhKJUBU9Z/qA0x1qVEX5T8 fYqvtdpWt0/WwujpnrPFK2Jh3yNBJFRqim+h/G2z1TbxwjcOpKS7gGXpOmdle7qDmVfZ gaVeEs31EDP1jcAt356Qnzzcg61HWmhVgbftSCakcB9p705/09n7yma+wr4xC8Pccti/ 5v/neMML7lWkYVWXiS0uSzhlfUUfJh3sANxS+9JP1BSBjRZgQxy0bDA+pA1HtcfVHAkS YHuw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AoNHViRS+HOX+6EZAvuA1IWdIkWWKp1M1qPT4Z50FtFyhI++js 9vGrp72sVfFJU0nlxr6G3NeB7h6DSXKjsNjo31+eM48qdWcePauMVOxyJh8Ufkb902exKai0YGz bvWJRaT+xrvCTg0/6Txk5QNccWcktBmGkP5n042f/LHuv4ALXHjLtpNWO3TX+Cw305A== X-Received: by 2002:a65:48c6:: with SMTP id o6-v6mr24327892pgs.99.1535959353516; Mon, 03 Sep 2018 00:22:33 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZhLQrqWyMiIrYQiX9ueCygIh7Z0qYUYK9hdqYY0bIKSPWt547V81/VPPblUkja9Dp1iFjm X-Received: by 2002:a65:48c6:: with SMTP id o6-v6mr24327866pgs.99.1535959352875; Mon, 03 Sep 2018 00:22:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959352; cv=none; d=google.com; s=arc-20160816; b=TT9Y8TqrUcvPTDM61RS8lRAqxea782um/YFM/wFYbJaFZockscrViGlAGsK+dtrOiX osiwO/+VPM1exe9uoYGTiV04q6d3Qo/ECCK0g5Si6pdGWmWhnNbriblOOh5wFsBCJReh BBmCvsJcKtDOFvHLzc2ROjX+X4nALQh1bmv88GmBlnOseuMJmtmV4SaI8I9sHrMV5wL8 yvCPQn+o96izoEBmZ9GkghMiBuEA2fUPbKa5K5Z1EoLGcYDNbmSrO7uk69fON6Q/PIuA vDTR+ZrWoS6jdw0G5Od++hgm6LFCUy6Wn1SJUO6KZxTWdY3t3mINwSdQiO9fSTaKnlXD CqHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=L8kQtIEtW4lg7vG3ygekpUfrSnZGs3I3qFVOP5bkFF8=; b=GXNGRLlYAd1EaUi/zhQ0MMHJGBqJTSxIaJzlAnlmBbSa311+shMkpCyuvTSvEUj0X8 +/hLJtCzA9wI5eEk5qjJ1KM9sMFFuBnG4h2qTkHA+MfzhRQJ26r2Ei/BYsWS7pTZ7Ry2 XJX4pWEWPbs+MijgCae4n/RuL1zP6ecDAXOmk6VzbBb9I8Gla6r9Pbdi0h+RpAHtM9KT pYoTA8JEkGQLXJowO17PCrFyCJhbLhyYa2UE2eVqVnChIzJ6CL74DYxa77dqnOp7sQKB se9id0ZqfuYPJjQWfRuQFn+0uZlWoES2jzUCBr+GNVUpcdIIMBBnMv4WhGQy7eZnIzCA nAyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id r24-v6si4679764pgg.403.2018.09.03.00.22.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:32 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146848" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:14 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 01/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Mon, 3 Sep 2018 15:21:54 +0800 Message-Id: <20180903072214.24602-2-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++++++-------------------- mm/Kconfig | 8 ++++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e4ffa565a69f..194f97dc4583 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1334,7 +1334,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 88ebc6102c7c..bf207f915967 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4d961668e5fc..905ddc65caa3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -254,17 +254,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -282,6 +272,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -302,16 +314,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index ce5782ff3110..0163ff069fd1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -430,6 +430,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Mon Sep 3 07:21:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61EDB14BD for ; Mon, 3 Sep 2018 07:22:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BEE129189 for ; Mon, 3 Sep 2018 07:22:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4040F2956E; Mon, 3 Sep 2018 07:22:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B085829189 for ; Mon, 3 Sep 2018 07:22:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6B116B66AD; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BED856B66AC; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADC6D6B66AD; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 715156B66A9 for ; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id q21-v6so10804204pff.21 for ; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=3/zbj882JM1io85kkkLx0aDHDlHkhXjaPs/vQA4jdIs=; b=coQ9sdF5e3Vn9rtcZ+JycVpJ121IYXg/9B4sUMZQxunH5Je8fzAG346+tNbs1W7rse 7fGXSWQ0zMVGeYlz6DnIQzR2qSVn22Z//eO0in5gJLNqyaZ8mvmss+QECXQMxMLy9QK+ YKyTLxZrDTKHM5SWowCiHnQGPES3u5SoqHqialakPhMZZBdyU77g1GqVX9Gu+cCCGb4p JrVlxaY054MCvHZssyfWT8Oz2APzmy/CaLyrQPNX2DsjwjKlOm5CTsbR9OYF2fibxdkQ ctCAaMYxJkEMm2hchx4z7YKOAXJ/JMDGjjEyqeHyU23WoA9sFunX4gprPptNjVY+2z1z oyhw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51C9tuONULv6Bsz2zmViGWvleCBNtGnTN4lh6AUW0cMLkPwjdyMZ 9VSBtdbqENKxiFTRZ96PRINzJerbjhq6bO674p3M3oB7iY33fE885BMu1ZfO/sy4pwybgEGSyIB 7CiSLgB6+bxW8NXYlb9dOzCVM92SBadoZ1rYPp6gJS6U10d6H4CQwzuDVPM72m18drA== X-Received: by 2002:a65:448a:: with SMTP id l10-v6mr25355422pgq.382.1535959355138; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYwAqf01OR9Jf8fytfzGbXTMnxCMrTFbbrSlNXdYYpoxxe2N8PZikuP1mVgsNk4kt6Mzy+Z X-Received: by 2002:a65:448a:: with SMTP id l10-v6mr25355388pgq.382.1535959354389; Mon, 03 Sep 2018 00:22:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959354; cv=none; d=google.com; s=arc-20160816; b=pyfCf6EYQkQlSLkMSwmTgFAo8m3EvSSOnM1UR4SrPCcGSAPuq0Dk6RkbpDzgyCTByo nuQ7IJD41HhuWWtnsaVRIB9JwqwIAK3OeiAozksbBd+J9TPo58dymePaRs/YUSPKCopi dswKcuEBZdVHCtHY/n16zxMxV6Gprtz/rfCyYjziRpVZyAogFxVvrM3BWXcPcfrkzFQP RD5fvfCqTRNjJdZ6n45m5aePOo9b8besFstg51SyCoIz6PJx1FPL0iQ9fdOuv2jAM00r obYnxrZsryMDc5LpTWuWs8qQpRFmr/f954Iw4CConrYMfvoVgbwe/C+J4SBdeubogUT4 7AOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=3/zbj882JM1io85kkkLx0aDHDlHkhXjaPs/vQA4jdIs=; b=VwqrVqH5lLhCi9ka9dGicP2A3RcEnFnBSY5v0HVh3BBZ9kBecC4NmgzD6LXKQqW8HP 0pcP42sgaomlAOWqRnzzIav2SkbAfI1GZpb1t2hbyIIBEyLifIAhm1xfluX0H/xkd/Bf DTtGV89fCxZ7G9qXnqcDZeJuJze3Z7AgaiGBFpyL5GgL73Qjj/Ob+RW4S/BUJ/4taM4r EultkL84vLXJzAzE34vHOHquEeYWEyynDsH5GlqyL8NCqvQTRfHIgPczsc319H0mlEmS 2rTtkKOGivDhXs0XZjBBSgY23RugVDx8Y1+aOX4GXSHl6dTCTNMz4QY0V4uq6dQV4j8e QTyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id r24-v6si4679764pgg.403.2018.09.03.00.22.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146861" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:17 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 02/21] swap: Add __swap_duplicate_locked() Date: Mon, 3 Sep 2018 15:21:55 +0800 Message-Id: <20180903072214.24602-3-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 1996dcd732b6..532de6f8ff39 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3436,32 +3436,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3471,12 +3451,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3503,11 +3482,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Mon Sep 3 07:21:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585553 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA23613AC for ; Mon, 3 Sep 2018 07:22:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3A8629189 for ; Mon, 3 Sep 2018 07:22:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A81602956E; Mon, 3 Sep 2018 07:22:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8068E29189 for ; Mon, 3 Sep 2018 07:22:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E62426B66A8; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E38406B66A9; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFF2B6B66AA; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 8BAD76B66A8 for ; Mon, 3 Sep 2018 03:22:34 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id 33-v6so10354266plf.19 for ; Mon, 03 Sep 2018 00:22:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=K4BBP5O/U+GsTOQKt0WV8WbD4gS/gEiziN1SA7hfy9Y=; b=gsKfwdycLR1CWBCIED436qIfKAGgPn4zqzXc0hXj3jJUwNSrCc4SjW4/ZtrQdKbeAh e3jC3+9yUlZFP63YA+TzW5XQvIK4RVYQ6TBG1CuvHIhAzXC4RF0eW+6DqlfWLYBFUNJ4 floOCLMm2dUdUlL3dE4P9A5qH5M9dry+/UebT8B5gYUg3Z3zYAZ1wDd7S1aOO7CWLtUp AXcKKmHj0mVE/Ktp+7/mRMytowOtu17+Svk8C51eqP0/nkuiLWTt3ucKrSK7OtcJxB2C jtvguPEro2xPo9yeVXffJUINtlTNjT26gonzTMQICA+F6YcP/EkBIX/4sBbYHJISCbaq 1xMg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DLCp8VuQQe7K1VNU0O5K/++4H2IPAAyADSsivJGAAJyRlod2pf d82l8kp88x7m0P/K9ARIjJeJLzTq7/kYoGpZm1YM0xHTMDPallpFX+9wDgyzS70/eXl9Pwaf3Cd rfDuy+Z6r5YGKRY7Vln22qJb/eLYT+ijQw41eH/3Pq8fB3t5CXO74LowHr4Wax9LT6g== X-Received: by 2002:a63:9841:: with SMTP id l1-v6mr24207176pgo.228.1535959354197; Mon, 03 Sep 2018 00:22:34 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZ1dq7HmiMdBkVuWSJxdKqZkcUfMXfGVT/WelN8p7RV8/VjiJy+CHoxkX/VOapWgPSOVLZd X-Received: by 2002:a63:9841:: with SMTP id l1-v6mr24207130pgo.228.1535959353078; Mon, 03 Sep 2018 00:22:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959353; cv=none; d=google.com; s=arc-20160816; b=tMMVMR7I8C6AGMvOM3LvwmtLCfw1pVzgP1BZiIRzNfXkneQfrP4gNW4t3QS8M30R5q 76EEGdBBteQ6i2uQWZE9ikv2VAlFe7AvKRCAjKa0dXJR+S7cTI9K8Trd/fUBBorQbA6m 6lvqzgdZea9dj4F5x34dujtX8MPwwlviyhXROdPK6X8VcLKfbGEXEAMnssfqmSNxK/5Z l86cO/PDzZ11bKRj3CmJRq23eHYu/9sFErKguaJVtpwIO/WNNRaeTlpr3S4R/QgG+wGt yNm+W73c4IdmMrdq3+WhY0xLPQJmVMnElgCxhRaiBGJD5OOxL0MRIPgZGcRsPHP0Rjgw fOLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=K4BBP5O/U+GsTOQKt0WV8WbD4gS/gEiziN1SA7hfy9Y=; b=Duxzl/GDUQVjXVOLLwIJo0GQ3d8Q8Ot0F2oA0Sm/+vmgBOH6dYb5QxRytZPEqhXXlj fRKFfSqjUvZoKbpy1T2j6TkcCUat0hAs5mUdg1rpc+Vkc5SYqIs0DmS8+evdLSgAl4Au 7s8LmsVlfNhrjC9wbvRQFnYVGePeg+Kl/6arMCSCNv8gOPzRpHwy1kYgBotZBYN3xofJ szwQChCnOx4l+IwpoRulvSdQ+R0sn4pRrGv6RlsKxOW62Cm3W2rsH6qo8R/AnruvMeIb ZcM+r4k+hTq9GRowXd2ImdXhT8ALA73ihfPywlOM06eP61b/fpB8du/GUr8fFSj7+ziO S9gQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id r24-v6si4679764pgg.403.2018.09.03.00.22.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146867" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:20 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 03/21] swap: Support PMD swap mapping in swap_duplicate() Date: Mon, 3 Sep 2018 15:21:56 +0800 Message-Id: <20180903072214.24602-4-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 +++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 107 ++++++++++++++++++++++++++++++++++++++++++--------- 5 files changed, 97 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 378792e41043..bfcd30769564 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -446,8 +446,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -505,7 +505,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -516,7 +517,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index de01d14870e6..736f928f8f0c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -963,7 +963,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 1e79fac3186b..3bb4be720bc0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1598,7 +1598,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index 605d211012a1..5a307d220e33 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,7 +402,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble diff --git a/mm/swapfile.c b/mm/swapfile.c index 532de6f8ff39..746e7c4eb2e6 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3487,35 +3521,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3524,36 +3589,42 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Mon Sep 3 07:21:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FF7614BD for ; Mon, 3 Sep 2018 07:22:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 079A829189 for ; Mon, 3 Sep 2018 07:22:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F03D92956E; Mon, 3 Sep 2018 07:22:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 721A229189 for ; Mon, 3 Sep 2018 07:22:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C4FA6B66A9; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 155836B66AE; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C63DE6B66A9; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 82E576B66AB for ; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id e124-v6so10205929pgc.11 for ; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=7dtAfJBVoKBokYfjxTXdpMJAG+j8zl6XFqHOf6HPYkI=; b=FLAOVn3DPr7usRJ9t6V/+J/iepFbP9FxW+yFB4V8FnDAZPpTuOgSK7OJyk1L0+8OWO +xlrh7gPkZo+Zxif8/g+/f+ygHuGl+1DPWncXA3iZ5hsJrbd78OdC0xgxxvyc0jgjwSW hnY5nnXh6RcYA6gqHTOlyvsE1KiQMKxT4tDoOvIMEvGOrzLE5GsAR8NMtqEJ8aEtxkpi +79JnXiUgAbZZxiMtFmm9srL5f76lq/pcVDLfLVliFcrgFbXazp21mPULHtiFhRNWnHB ZUT4hU6uelaET4f+x8Zkx89VOzwxQNC8x5auS6JaeuZskIiNurfdzYrUW6/vuiSd0BI4 F7Tw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AcD4YlLbZlxVcQZLt5GfwHDFKd0n8N74KFfQyeIaPR5J8oMwn/ qcAc5VJOPjrO6fn7X2KoUzNt3ji4i2PX52+wMc7uzbKcCfs9GRp4aw0u0Xdm4C4k6CGWKzhnuHN 8yEv0+oK/TxFL6ov5mlzDwBG8LafcWgjSize9RP3/bz/CeFwrhMy9XbnfV6uxtkZJLg== X-Received: by 2002:a63:f:: with SMTP id 15-v6mr25404037pga.430.1535959355203; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbBJNNNQNqekhWGiLeM4lChvBpjZko0e9W4ez0E31YbI6L4cOKzDYqGKMNP1ss/twns6TGd X-Received: by 2002:a63:f:: with SMTP id 15-v6mr25404004pga.430.1535959354514; Mon, 03 Sep 2018 00:22:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959354; cv=none; d=google.com; s=arc-20160816; b=z/veLFanqkteBjuCt7jcGcxr5sbWD3xCgZq2YHaU39ZXM2fjtJzEUZFwncUP7740M5 FAOKDL79nlSIkOZFIoZAAbRf9230kTlLMdJ4FC+CZbm/0+u5jTPJJ8DXk4EgP5spk0tB 0ZXo58XKg2y9SOfIeeOYi48iWZucjIOo9T8LQOR74hIGbXZZmAdWS7OY218dFtOuZR2M 9dHAIr25FuaJxRBIltfTCvpfdCX1fG0msyw2+IpOgEDKQvZewZhBHLJa5yxVBc4MHmi3 T/CsJ58/rMlZBFNhZWK8/aGnysmPHyyMOhc20QMi3EHNJDHSC8lX+jiw6C3F8Uf6R4Lt 4RlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=7dtAfJBVoKBokYfjxTXdpMJAG+j8zl6XFqHOf6HPYkI=; b=RvqPsnZ6yv+OfU4ohHXCkUtRsv7NRL+DVAW/ubw223Jdz7qGIwr8ktCnGGXrdgfZ+4 5K6wNukV3SLaeyDCktzHMP1hVdtz158Wj7i+WXnEnqrA5aB9c62EbN3XSUGQSTJ8K5BV dO2yA7yHulID3Cjik/KeS6AFprU//jLXszv0Pxbqq3CFSX3KR7pqT+cT3HGGixC/NFJ5 Mw0R096Ax2zqw7+pZM/DttihQqP9kWqQrTnfKWbOlp0kqRAF7smaOreANwn+thsR2j8a LmPsVo/13p/vuYKSSxSx4ED4Yai6SqgGKWXyxTijMDqMjQi0Q+PPCFuov7siJrSJFH/C +0gA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id b5-v6si19104766plk.176.2018.09.03.00.22.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146881" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:23 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 04/21] swap: Support PMD swap mapping in put_swap_page() Date: Mon, 3 Sep 2018 15:21:57 +0800 Message-Id: <20180903072214.24602-5-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 746e7c4eb2e6..32f4e661a7e1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Mon Sep 3 07:21:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585559 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C2C4B180E for ; Mon, 3 Sep 2018 07:22:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC16829189 for ; Mon, 3 Sep 2018 07:22:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F7542956E; Mon, 3 Sep 2018 07:22:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62A622956A for ; Mon, 3 Sep 2018 07:22:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64D1B6B66AC; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6222F6B66AF; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49D9C6B66AE; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id E8D646B66AB for ; Mon, 3 Sep 2018 03:22:35 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id d10-v6so10286649pll.22 for ; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ailsltoNrqCtZMMw5ERFiobmOZc8zLgTX1kV/wSwpoQ=; b=KEra6bOAmZcua1JrUv58L+CxisNg6qAKuYF4+Aks/L/WGXjxfr9ChUXlO4I0FJTLgV 6YxrBqsLA3lOW5uMXrMrrsWfmNr8z0aeC2Z8DABRXjXeZndt0SSB0e8CaxDqwpEpAOiH hzEWQaPZzr32B2YvnrfDlpMpGpJjsXYwJup4yIiI1tTm11GdLqUl0bwireLiIIkKWMbn NPBEmBagJlgjkD9KsN5WrxNGv34Wk4DEqDbKFefcli6w6TFncANmi0ZbW+uPno6FHiBE wRxVX9x5JQEUTHlUv+nonaGAoIohgAviJ+ycg26FpofYhEULRqLaEV03l1U/IHUG9jw1 c9vQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DV945n/uN4W7m2oUmYENjBKlM/G442ybb4N35DnJccVqayJMn9 Nl383ZS55QtS22wWDlm7ZwYUMZAPMESVBUFc7od836V8IfjoYfHbqMf12X2STkEEcfRrTJFk0y+ WBMCNH4G7/RyOCqueI8428MeTfDDhV1b7PU6aGc0KtVAJ68mudfH/xWzmtkTGHUwlMQ== X-Received: by 2002:a17:902:ac8f:: with SMTP id h15-v6mr26819599plr.161.1535959355607; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaH6MmIbWiVI5Cfs6tC01K2m3TwNg6ep8+HSNKEpi4eaTdF71vs2+o6+lsF8b6NuOzkkn6/ X-Received: by 2002:a17:902:ac8f:: with SMTP id h15-v6mr26819546plr.161.1535959354623; Mon, 03 Sep 2018 00:22:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959354; cv=none; d=google.com; s=arc-20160816; b=kshssC0uwrR8sLwIxjL/7K60X69oVmYJ18v51FacO63e16UagJcQZvhcrPahsneJuL S6JEtlnaSmlkHZ6hZSPZQuQ6lXNq5WgSRVKchZ3UtsONlOpaB9PdpLw2gjr8iRmh2nba I7OMa100DZt4uTQuvR5DRI14dozIHMqplw+L0YCAxWYlYR8LCa/L1FmLTdAjrQeyeKK9 /45wpC5T3CLCapYwKUI7qtcLuoUZuCB/V2n+pFo4E0JOVvI7DDdtU/xPXbiw3zjKkMS2 pBgZgQqHj+wCPvYlHStAwfjySOphP/E4UUROdnhfwo7lHysOdSbw9s0Y9jruehKYC0Pt Y6AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ailsltoNrqCtZMMw5ERFiobmOZc8zLgTX1kV/wSwpoQ=; b=V4t0RAsqE9lC7IY30YH3nKgryJmi2VVR9ktB9D2PV4e2T2pVqwhMP3KddqWuFuIv8T UqC+nIIgvoDS34hxQbF/Tn2gtFiigQGBzBg2n64VBfRmbufsfbmOlJOwM7udt6c6OKKd D1qYurPaDyi4u/1CTf6HOWevX8DeVtU0/Hr21y1p1iuukjFk/08NcC+tyKd0R0rk9VYR Pm0agv1fTtJbYy2HKcd4p1oihepJLoLu0uKdd5TiecWleUvHRYzac9+VUfme6IQruNwG qJtuNgjfUfXxLuA1yBy8ELEcdNflOtMTl7Lj+RXEqT+ChYj86t4rrKk9IE2+5zND+QUg wChQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id r24-v6si4679764pgg.403.2018.09.03.00.22.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146884" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:26 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 05/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Mon, 3 Sep 2018 15:21:58 +0800 Message-Id: <20180903072214.24602-6-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 +-- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 6 +- mm/swapfile.c | 171 ++++++++++++++++++++++++++++++++++++++----------- 7 files changed, 149 insertions(+), 49 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index bfcd30769564..63235b879a59 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -448,9 +448,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -504,7 +504,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -522,7 +523,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index 9d802566c494..50282ba862e2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index 736f928f8f0c..b604f16d031b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1388,7 +1388,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -3074,7 +3074,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index 4829798869b6..be0e20abcecd 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -662,7 +662,7 @@ static int shmem_free_swap(struct address_space *mapping, xa_unlock_irq(&mapping->i_pages); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1180,7 +1180,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info, spin_lock_irq(&info->lock); info->swapped--; spin_unlock_irq(&info->lock); - swap_free(swap); + swap_free(swap, 1); } } return error; @@ -1712,7 +1712,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); } else { if (vma && userfaultfd_missing(vma)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index 32f4e661a7e1..8da03551ecbe 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2340,6 +2429,16 @@ int try_to_unuse(unsigned int type, bool frontswap, } mmput(start_mm); + + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. + * find_next_to_unuse() will skip these swap entries, that is + * OK. But we need to wait until they are freed really. + */ + while (!retval && READ_ONCE(si->inuse_pages)) + schedule_timeout_uninterruptible(1); + return retval; } From patchwork Mon Sep 3 07:21:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585561 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE58213AC for ; Mon, 3 Sep 2018 07:22:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B79EF29189 for ; Mon, 3 Sep 2018 07:22:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AB2E52956E; Mon, 3 Sep 2018 07:22:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC15C29189 for ; Mon, 3 Sep 2018 07:22:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB13D6B66AB; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C61886B66B0; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A987B6B66AF; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 42D366B66AA for ; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id v195-v6so10266135pgb.0 for ; Mon, 03 Sep 2018 00:22:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=sphryy7c4LYsJYfH5LNDpxrgpZCVvi98eY54y6/4cw4=; b=odIc0WewIGGF1RChxnlrxUU56WC/AhcRQvbaQlMAq3gPuQAiZ36195sMQB500mPe9V KwTHIVaMVxro3QFm/Tia3ulB4HNB1S87HKgJlIcrBr5w0WvyOeStPPdwEs73QF1wZYiV vTYgWeBZg03xks+WPaam1A2MNupXeF21i7M9vEowpuznVrfMxjjATV5G61js8kfP48nf JE+lZHt2hhvzIJ30/Pqk+4BdghEVQNqQMZvqnzz64++wDLoXLZtKKrKR2bz1FfouXCj3 vzBE3qNclqZTUfrdvhad+1UUb/HEFf6pvwVbakC0oKizr21r88Y75qSIu+9wh3tQf1lA Sirg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BOJk+nomJ4KYcA6HMjJoN0E1mPEgeivqIjqFMJNUthUBeoj4Qm 7rqZLAHlopL+T2PQFufDbBKKQkQGu6EdRsLO+qMnnUUloZLwbaFFTRGy7/aR+UaGmDFT6MRTpWp RuUP5x85Aee2g1v2qhwC6MMoJETmzENHlro/rtmKnCOsDPzdJtoOHATIMROuRqNsBwg== X-Received: by 2002:aa7:860b:: with SMTP id p11-v6mr27942701pfn.247.1535959355925; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdb4NCL2BRrCy7LUfx9tSfIuWVjtvpzgwdJWOt/hIKn7saH0AkQEczYg6ZTdppXheOeJszQu X-Received: by 2002:aa7:860b:: with SMTP id p11-v6mr27942650pfn.247.1535959354766; Mon, 03 Sep 2018 00:22:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959354; cv=none; d=google.com; s=arc-20160816; b=yJamESMKldT2wFZIkLAiIHfZV/JLM0eAZIORpEVk5UWC49xTZRyjCB3n7JxlwqDyAl TYUt1fi+OfF7RDPvnEun7MU8Yvq7/GMP4rb++3udssy507Ua3dlzWpnUc9HpJufyr/fu fqfFkhCtvnmJgp0FzDgE8pfZ/f275h/NPZARITW1tZV9Ke5BLeO3mGNUyU+KsM4w7cW+ JnakJQDOoCdbV0DJpd71tXB3EeQXylT2MqC/bLi7LMrpbKtRBtRAMaO6uiIqWqQgYf8B 426OXZ0OKLoIOoql0380JeMxDqjSFTNwcguwZhy9fc8f5oCFiHV9zcBDx61sWeCorxjv ikYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=sphryy7c4LYsJYfH5LNDpxrgpZCVvi98eY54y6/4cw4=; b=XuCVCDmOGDwrBsziIYNiof/55CsxUiK7U60F2a5VDwROCyx9AD57YC7rEzqZ9Dpc1X p2RHnR7AWEbU+TSVgeccZT2ZoUBwb9q1kpXafEpf0OPHsKhaIiQH7mokRtGfAR0aw2PN qDMsPN5uxhkZtI5gTs9ZALHmrEr58p3fLF+u0SGjgrRy3gNKVXTPLsYqht08BuY93gV/ MfVa+zAC24JocwWB3NzTMh0LNmKht9l+aVZjzMSIkrGtp/mzfWEqiVfCBP3sn4qsqvbC 3KR1MBoAJozWTCLCYYY3+MGU3QRZq0FKG5WH1+0RlpMzQcJYVEESHl9t1ZVpsZtlLuFr L98g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id b5-v6si19104766plk.176.2018.09.03.00.22.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146885" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:28 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 06/21] swap: Support PMD swap mapping when splitting huge PMD Date: Mon, 3 Sep 2018 15:21:59 +0800 Message-Id: <20180903072214.24602-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 85 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 99c19b06d9a4..0f3e1739986f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index 63235b879a59..7ee8bfdd0861 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -614,11 +614,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f3e8b5cdf7d..956c49bfd208 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1604,6 +1604,40 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2070,7 +2104,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2094,7 +2128,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2138,6 +2172,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2229,14 +2266,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 8da03551ecbe..242f70c9e1f2 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4034,6 +4034,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Mon Sep 3 07:22:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AF0513AC for ; Mon, 3 Sep 2018 07:22:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 540162876E for ; Mon, 3 Sep 2018 07:22:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 470B92879D; Mon, 3 Sep 2018 07:22:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92DFA2876E for ; Mon, 3 Sep 2018 07:22:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 080976B66AE; Mon, 3 Sep 2018 03:22:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E68D36B66B1; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B54AC6B66AE; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 579916B66AB for ; Mon, 3 Sep 2018 03:22:36 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id w18-v6so1921091plp.3 for ; Mon, 03 Sep 2018 00:22:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=i3PallYERdv0CWM2pHlJx4yHnR7FXS7oPnYzxmnxn/0=; b=TAxouL6ZU4lWxCNkqdxUnTD5+PfOiJgfHM7xZ1H/t2hsKwOrF2ky2A1nKtO7PbfToB vKZkXIaSBOsPH31gGv44o2nDx8I3k6aGSl1Paj6vQYF/OzvRACKx4ircsoTJMlY5rulK jRkKskwDve4eXOUzGwr0VinO8BdfjCub5pquMLrF59rQvokT5KYw0zyKQLGV3Qz3UvjU O7UdULW1OPDzVCFNAqjMupHbG5iKHEMhRDRU4maWaKR/tmDz9ujaM1mv86W7TH8aYFjh ybCAIBzZUsivKZN11jOW2GYGBPhNXeXAcDK1NpN2w8XQpVvpSe4hmmQHrx6ebO4hmyEC pd0g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DQf1xvQUjajk92nn1nd5YxT+3cspA79++nYXexm64zz7kE1HoY DDNjmk1Y8ZegTPE8KUIEcskByFqnSMldnxhHZmlEqt38XqdmvwuPPMeHzYLgVcbmO6hsGtpQS6b qK2Q8jZ+IuvMmjZKy+gGanIhATlGdOSamxSXa4fX8BUDts987eycHjUz7v264hiHDow== X-Received: by 2002:a17:902:9045:: with SMTP id w5-v6mr27281237plz.10.1535959356030; Mon, 03 Sep 2018 00:22:36 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZc4VXgoUn9j2nmPDhamfqClxXHsi0jrIdaZW1Oi8q1WQ+AdaX2QNq4sfBnGVD/PdzOBQ+B X-Received: by 2002:a17:902:9045:: with SMTP id w5-v6mr27281200plz.10.1535959355224; Mon, 03 Sep 2018 00:22:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959355; cv=none; d=google.com; s=arc-20160816; b=0d9a5Zzj+9A/Dy9FLYLcKpAfD0wkzk3Cb9Lnl9guYmazdF9jc8InOx4k0yiq7rWbkd k0clD8VGq558jaRBrRE8Dh1dZyynprdrVrQjhEjn+jy/D7aQY0say1LxtB9B2zci79Si 1NFEAHZE15X5RIuinfDOKJCYMdWxzAe+P6qwlDPNkzJWHOHW6bncxLrSa34CG0BBVL/G 4VxFSYAiHDPbwOUbtC0WJEf1sHcKfGDzSt4pdjBXfHdHip2tMBjR2/M07Z65imy61DxY F4ePC0xnFKscoag+TXoVALE180Tbyk7pMaaJxfNyR91VOZF5NAbzmPndOJ4umxqXOI2j SZ4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=i3PallYERdv0CWM2pHlJx4yHnR7FXS7oPnYzxmnxn/0=; b=qdOpS5N/IUhgpzVtJb1TWK+KAfRi2Iab6hfh+GeNx+v4EYuyppQeO/uHZCMLNUl+DZ VJ7xiQY/Aghr25pUdcNDn5M6OEv21Ty4jRphrWDl0bHBuYLECSHSVMzWOLnnFaN0Vb8N bFe01aAf/2gG/VWeN4DATtiiFAwXjmLnU2ALTgvx/tXmOZfyqAiHWG/SlFhxtw7qrqYp VrUkBoyr2eda/lxWv5I9BeO4nvpu5H3/SaoePT7ETHZ0Tzm/uAoXktsfUX3tQapg7HY0 3wBeDDzfr3SCcV9D0qByWlWUL2i2usNRD+uXgXkOhPWf3i3Eh4tTgh17QExeD3lcOMSx 5GQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146890" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:31 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Mon, 3 Sep 2018 15:22:00 +0800 Message-Id: <20180903072214.24602-8-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 ++++-- mm/huge_memory.c | 18 ++++++++++------ mm/swapfile.c | 58 +++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 7ee8bfdd0861..1ab197a0e065 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -612,11 +612,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 956c49bfd208..9341c90aa286 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2496,6 +2496,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2719,12 +2730,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index 242f70c9e1f2..a16cd903d3ef 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4064,6 +4047,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Mon Sep 3 07:22:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585565 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 111F213AC for ; Mon, 3 Sep 2018 07:22:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF0682876E for ; Mon, 3 Sep 2018 07:22:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E26CA2879D; Mon, 3 Sep 2018 07:22:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB0AA2876E for ; Mon, 3 Sep 2018 07:22:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70F4C6B66AF; Mon, 3 Sep 2018 03:22:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6BDF06B66B0; Mon, 3 Sep 2018 03:22:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 565486B66B1; Mon, 3 Sep 2018 03:22:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 00A136B66AF for ; Mon, 3 Sep 2018 03:22:39 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id p22-v6so10780299pfj.7 for ; Mon, 03 Sep 2018 00:22:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=S9+nPtiaaIipXTLXQ21P/u7SrJD1Rgo++Rh+Kp9cib8=; b=gdBMOB/0w6ccfYD/sJbpLCGD9kGKyN9BJL4/yBUy2H+dKZpCavCIzSylW90de1xvDm 4WFO5tGr+l4Bz5V+8Rm6dP1yZjzzgnG3zEQW4RVZlx+80X3/cWqk6KJagYjR0xFTq6fr 76W2NezgPjIfEEQOjxsG4QCJwkPaFKZruYJ6UaQPKmANEiiwRHAOWWlUMTD8iz+UFnet PJ4f71y2FgkL1yy0SUJTF3FRJRf/JKYB/VXtZNOruslBVxK8ItmSaOB5RgZ8VGfDEbJp 3i8fLoMw9QIgjf5jvLA7eKwIrgmNaLPkSqER6/og01iSpvCZRIax/uxbOCfvjVb7mriy DtQg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51A1Fj8fqQt3ri5L+39fbaVg/PSwHpFpmL1fS94bXvYDi3p6NCpE k4rbTxahc5Z+r2UUPCVTdAQCRArXhEOFK1ErY8OVm1k2vk4549J6mbKdkOqEgO53dx2SurCCf1Y DOtOPwQDUQjQ9V8PNQvLckgciOQLnEs9HuFo6jpGdJVZgUGBpqngAxrKTnL7H0Ptciw== X-Received: by 2002:a63:2445:: with SMTP id k66-v6mr18516685pgk.405.1535959358628; Mon, 03 Sep 2018 00:22:38 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZd+tjaGgSQzS7zYLr1aoSUiaptRPnArevS5Wk6TYdvzunhQ8YGoTJ+BaR7hrid4MKGTLz+ X-Received: by 2002:a63:2445:: with SMTP id k66-v6mr18516649pgk.405.1535959357736; Mon, 03 Sep 2018 00:22:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959357; cv=none; d=google.com; s=arc-20160816; b=tUr9qu134axhzZYfya8fTc2SYQY64L+kmB0yFeD4HLnT4pjBa1mFzLRQF4++lM1VJx FHDPdEe8IPgZqHuyn3r5EReguRUgaA4pNKg8CkmH3NyvJv31GhkQLQ766u+En9UOiFPM wAwaqy6hgJKgGxRk3dfhkn2ctgZLqCBUeIo8nIWTUFHC6Eg1t/oUJK9VaDMG7KB6/BK2 gyQzlJ+YR9foe75tWyfEIC81JZoaD4nHY2WVgNXcABDOLOrUXbooSQOoQ6K9fNW546uH +8UfZt/xZoGAuQs9l3A8gZtN2iwWtCpsLl3HeRFOjhiMupWSyIip0ftjgxLDs0fBjqG4 JqAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=S9+nPtiaaIipXTLXQ21P/u7SrJD1Rgo++Rh+Kp9cib8=; b=FszcLNvJg8/8iumB/AeXlQO+OQ597Wq5nlJtmwZXPdYkLfQNYLm5L8nnL26L8tC7Vn BJ1lT4UY/OUShscxPEKxrQzp4cyWUjUmqblHa7tf/U1jNh9uy+aS1z6WuMLY6FUEF6I2 g3Bhdz2+sotU31mpoin4MVelfPR/gPi8GUDHbQy0O/8sjM2ovJsbwB637nCNo03DPTKD 7o/SXt284+Q0aFs6OFKrexqzwCJxhN7xadYMM2rkoWa37L0obBXZEYp9mygyZiWl9hWc vMbsnB28zvr+B4sBdeW9R+Q086Le4PVO3ljpwnnBj7qBe8/4XjjJS9vxVG0a/SPqqlQi UwkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:37 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146894" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:34 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 08/21] swap: Support to read a huge swap cluster for swapin a THP Date: Mon, 3 Sep 2018 15:22:01 +0800 Message-Id: <20180903072214.24602-9-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 38 +++++++++++++++++++++++++++++++ include/linux/swap.h | 4 ++-- mm/huge_memory.c | 26 --------------------- mm/swap_state.c | 60 +++++++++++++++++++++++++++++++++++++++---------- mm/swapfile.c | 9 +++++--- 5 files changed, 94 insertions(+), 43 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0f3e1739986f..3fdb29bc250c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,39 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +/* + * always: directly stall for all thp allocations + * defer: wake kswapd and fail if not immediately available + * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise + * fail if not immediately available + * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately + * available + * never: never stall for any thp allocation + */ +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + bool vma_madvised; + + if (!vma) + return GFP_TRANSHUGE_LIGHT; + vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : + __GFP_KSWAPD_RECLAIM); + if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, + &transparent_hugepage_flags)) + return GFP_TRANSHUGE_LIGHT | + (vma_madvised ? __GFP_DIRECT_RECLAIM : 0); + return GFP_TRANSHUGE_LIGHT; +} #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +396,11 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 1ab197a0e065..0f653b9027d7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -457,7 +457,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -585,7 +585,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9341c90aa286..b14d4bfb06f4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -620,32 +620,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, } -/* - * always: directly stall for all thp allocations - * defer: wake kswapd and fail if not immediately available - * defer+madvise: wake kswapd and directly stall for MADV_HUGEPAGE, otherwise - * fail if not immediately available - * madvise: directly stall for MADV_HUGEPAGE, otherwise fail if not immediately - * available - * never: never stall for any thp allocation - */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) -{ - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - __GFP_KSWAPD_RECLAIM); - if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - 0); - return GFP_TRANSHUGE_LIGHT; -} - /* Caller must hold page table lock. */ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, diff --git a/mm/swap_state.c b/mm/swap_state.c index 5a307d220e33..66311e29cccb 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -361,7 +361,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -387,14 +389,40 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp = alloc_hugepage_direct_gfpmask(vma); + + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_hugepage_vma(gfp, vma, + addr, HPAGE_PMD_ORDER); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -402,7 +430,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(hentry, entry_size); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble @@ -411,17 +439,24 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; } else if (err) /* swp entry is obsolete ? */ break; /* May fail (-ENOMEM) if XArray node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, hentry, + gfp_mask & GFP_KERNEL); if (likely(!err)) { /* Initiate read into locked page */ lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } __ClearPageLocked(new_page); @@ -429,7 +464,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -451,7 +486,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -570,8 +605,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -732,8 +768,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index a16cd903d3ef..d98c1f74f87e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Mon Sep 3 07:22:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585567 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8238714BD for ; Mon, 3 Sep 2018 07:23:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C6822876E for ; Mon, 3 Sep 2018 07:23:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6075D287C6; Mon, 3 Sep 2018 07:23:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8ED5A2876E for ; Mon, 3 Sep 2018 07:23:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C7166B66B1; Mon, 3 Sep 2018 03:22:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 27A4F6B66B2; Mon, 3 Sep 2018 03:22:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F84F6B66B3; Mon, 3 Sep 2018 03:22:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id B842B6B66B1 for ; Mon, 3 Sep 2018 03:22:41 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id j15-v6so10742153pff.12 for ; Mon, 03 Sep 2018 00:22:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=vzPZvWNo9Xng6MclLCRwI0JA3Tu51LB7TQJe8abn/6M=; b=GcuvEREa/biDdkgRGY7/82kNnW6qcQoWqpw3B7u1gEC6Oepp0oQiL2Ih6qXqAVZev0 3tKg2oirvj+VTemBqdbPlwSOVybgOqcGBhI0y+J0DE92gbJFH9wtmbgJpdzlY42WKO0f KqeSdXjU41iKnaDrBEaxch4vg9VLbWYO8lPLz4bh+7F55T/YZ+0IpZVORMZ6k0w1kJ7x tpUVffjf8NTrArFun9fyvz3iEET8MQpEt5SJNU3qI1lmlbObEKlHqTHvYYJtb1xes8/Z gD1XgPJVYnAFnaLadwU7COVUsnI65runJU21eWBI8Z9xWh6WNdtvUC7pEEnMqJFiG4l7 bbLw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BgGgp7D9LH0DHjRAbwFE3uRjQy2MZ2Ni5rj9dpAvW0q7h2sTRH 79aT+JEbBqnqUrhKdBRFJMGMJ6EdFc7f16DhDhVOqodDe7Z0gta8VJP2nfcoO69yJ49YVktMdP4 xP99/lRsXIaqx3hgDVCAjkBmSb6x7qjvvytccn6dBeZChP16iOKwjccXboIECyo/7sg== X-Received: by 2002:a63:2445:: with SMTP id k66-v6mr18516802pgk.405.1535959361402; Mon, 03 Sep 2018 00:22:41 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaF7EAAbPWh/ca/w2NgE0Q2zFX0JzxWKc/J2X79k+47trRf5jQd72VWQQqTS84wSADDIygX X-Received: by 2002:a63:2445:: with SMTP id k66-v6mr18516761pgk.405.1535959360520; Mon, 03 Sep 2018 00:22:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959360; cv=none; d=google.com; s=arc-20160816; b=KxS1c7Y6mEEkdln/0bjQd6zTNsKI1bULYadRBpI+zUfjnLcIm5aCe8sl/euS4DZ7+3 hw8KI7uRvshSNWCxRKFZbN6eR8TwQy4oqrsv0aChnxlonKMxIos/7gOxGJXdfqoZo3pJ sV9lwZ0dbIsTBJ/Xkzhv8ae0jTkQWwiHn1SexWQIr6bqmtfXnbvttmjQC/4pdkUQsbbV mkLkfOrDbQKOx+9BWXBNKtpJ+wAOQA8ySyNV3k2V02LmlXrFDsh3KlVzvhA558LTFcwA gr4yLZ4QVIrU9Z2fs9Zd1CRAl0vmPcVO2dg+Uz8SWCXYz/tnj/WSJLKvsXxBn1rhJXrP oDyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=vzPZvWNo9Xng6MclLCRwI0JA3Tu51LB7TQJe8abn/6M=; b=VdK5r/RXiRdSa6G3YICZfHrc1mvPi4E+x77VwAdmAHnACLcKxE3YfznXfvbuIgF4yt VjqBOdKUYUlS+UEBPYAukxYfCPcKB/u/Dv44BDcuHDi4spxWPHcOyp3T4IhH5oQ7aWm1 57bfHRKuIyn24aexylGPeiFTKJ2r0nYPUbjr9mxyW6uNymGcTHDq/2N0dhkK+CmkhWVp 0othpfrh49sCrnvGRThsGTf2v8dha5n2ueUM0Dkm5pu7HTFNVj30cho1/CpG3suR6Lcq 7TiDG7FSQPH2GrVDjCivKT4F2iGvHkrSOD4phz+h27UXhCfAx17ulHmtLvQFrhRPgLkI m9aw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:40 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146901" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:37 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 09/21] swap: Swapin a THP in one piece Date: Mon, 3 Sep 2018 15:22:02 +0800 Message-Id: <20180903072214.24602-10-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 +++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 3fdb29bc250c..c2b8ced6fc2b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -403,4 +403,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b14d4bfb06f4..cfbc9f15e020 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1612,6 +1614,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address & HPAGE_PMD_MASK, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index b604f16d031b..9b6fdc18b10e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4090,13 +4090,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Mon Sep 3 07:22:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585569 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19C3A14BD for ; Mon, 3 Sep 2018 07:23:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02F2128785 for ; Mon, 3 Sep 2018 07:23:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EABC4287C6; Mon, 3 Sep 2018 07:23:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5416428785 for ; Mon, 3 Sep 2018 07:23:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8A9E6B66B3; Mon, 3 Sep 2018 03:22:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C64E76B66B5; Mon, 3 Sep 2018 03:22:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B06306B66B6; Mon, 3 Sep 2018 03:22:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 6DE706B66B3 for ; Mon, 3 Sep 2018 03:22:44 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id bh1-v6so10363083plb.15 for ; Mon, 03 Sep 2018 00:22:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=yp3Hua6i3ewWDTLXKOFGHH1UnEyyr+u9bkGm90RF6cs=; b=YXiaqot7VN5xR+mlH07x3woEedlOpqdwPOFD/SLTEgcwrlkj4f4Y8DWoNcMIaEaUjx MAIppbVfE5DsiED3eD6Ze6PUk+BWoyTE4xZmqugiu2CWpbN6yswlBdRWyK+EO1rbJbpT xwKR9rMYIL/w1P3S8pfDQzhU+JXK4YnLIhjiBE393I7LsoQBgicQNQvVW1AvMI2Ramop DN9WJHYq+FIBDe1m8Kz8jxiz8EZCiL9sC6LUMIYGQtbi5Ew5jUEfBSxOJyhhDvuKHzNK +8bRp2JaiVCJYEjMN0gmqySkNf/XkC61Bfm3L52UHalFFvtdfLtSofjN572WbvFkYpus r45g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CnLIlP+di5TGvhDyhVc+gKrpyOeXM9Mi+GHxY4IqBB357VpH7/ poLuSt0qa/E7g4c1gagFXdevsPgF2ZIlLNW2dXxpsKo/gkSn0ES0HqBOyTEokHkW1eX8SG8atLF TaPrQsqXnl45OxsQoqcJpnfEfD3gEDtY4J0fTYzK/5BNcPA9h2vtgCeFXeofmVxMOMw== X-Received: by 2002:a63:3105:: with SMTP id x5-v6mr4747444pgx.323.1535959364130; Mon, 03 Sep 2018 00:22:44 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZjaont1uMsiQnLQzFCdl0FMKkN9230bp6qCDHwmYc1kVMdc1agBgbLzg3VG4uND+iRwKbg X-Received: by 2002:a63:3105:: with SMTP id x5-v6mr4747405pgx.323.1535959363404; Mon, 03 Sep 2018 00:22:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959363; cv=none; d=google.com; s=arc-20160816; b=x/8lf8z314gOWHqOMBrMbm9Ttrq4oaVTrOqTj4lS35cILl4HVOXhunHPI2ADxEcth7 ckxjewu28jKJy6PjYW0vqQSYTXuvzWiTSJy/0L+0PlcL11rt/jXpDrFEnnK6pJcboW/6 rZ42WYpXtXK5BhIexF3iCqS6h8voHXPhqjpBpfO7RhgUyeV1WSnxhbUoUywLTslkxgjR qQbypnUkfA3wwY3aHqAOC07YhfW/h5k6aXVUC0xxWv9Numsb8HcPw5RJ/DH/gv035P5F n8styiGNxTcDXpsSDogEHDUfA1MkiYjs8JMauNct7FGhkqNkdhSKiqPsAP6lZa+FAYhd 0+Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=yp3Hua6i3ewWDTLXKOFGHH1UnEyyr+u9bkGm90RF6cs=; b=KwkZaaQbBiSBmQMZnu3t1GhrveHBUOyF13bIGhPOR7dy6FrRdJ5rf73XcfEmvpvvEG 7f3Q7yn82Nejglmlro7cJgAOklZ3wpo56wvAwRYRK+ygteYEiN+FiNPReUMOrrIxFNst 6EWYanF+6ZLtmd5yQmXiMy/f+N1JMiRob5OByTyYiiEAcZcvw3IUdxMvZd3ys/cR0MxU cMgSHxg9QSV9Tdt3F55jUefC6fcgIIHWHriDTmZtLogEoCGv3DLsIBFmLPVnRcOBpJFn LBlgp3levizK+CDvFf7lB1ZhFM3I5TGmOkg9aMZHZglCIXj/uJPCcHXCrp/z+jVWRjv5 qsSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146905" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:40 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 10/21] swap: Support to count THP swapin and its fallback Date: Mon, 3 Sep 2018 15:22:03 +0800 Message-Id: <20180903072214.24602-11-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 5c7f010676a7..7b438548a78e 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cfbc9f15e020..0c02d54edc4e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1668,8 +1668,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index aafd19ec1db4..362254b99955 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -401,7 +410,7 @@ int swap_readpage(struct page *page, bool synchronous) get_task_struct(current); bio->bi_private = current; bio_set_op_attrs(bio, REQ_OP_READ, 0); - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index 8ba0870ecddd..ac04801bb0cb 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1263,6 +1263,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Mon Sep 3 07:22:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585571 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD01813AC for ; Mon, 3 Sep 2018 07:23:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 930AD28785 for ; Mon, 3 Sep 2018 07:23:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 84D12287C6; Mon, 3 Sep 2018 07:23:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5E9C28785 for ; Mon, 3 Sep 2018 07:23:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D757A6B66B5; Mon, 3 Sep 2018 03:22:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D260E6B66B6; Mon, 3 Sep 2018 03:22:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEDE46B66B7; Mon, 3 Sep 2018 03:22:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 788C56B66B5 for ; Mon, 3 Sep 2018 03:22:47 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id m3-v6so1907585plt.9 for ; Mon, 03 Sep 2018 00:22:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=48qjOJi9SSMGKcOFrBwaROng+Y790kF0CTIhqVQy+aQ=; b=D0NTl6Kml7WdN1YWTQiS2BkpwKfJaImkm2EeVCI4OhorLhM2j9FnilZLwbK9tPh5uO N5Y4jfMkD4HJBm8zpoNim9KVdb1KFQ0VQMkju6ginbxM/gkizl3eO5cgEhs2xqAWplNb KYLFl57fmXRN2Q2MloDA3E5ty9+RhP8iwfrGo4eQ+6yS+ZD/o4G4IL/jK6G00x1gZrKZ zo4n3pW8c09ZleVVJralE6F2zK6X6fyDT92Giof7joGafp2WJHdG/7OqTfAOPCt2KgzF MKeeHEoFkHvsysCQ49D7jmB4CGCA9NtYTO29PgUaueDjiA1h4Rp/ZD0WoNwXzpl653vM 5oDg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51C08oMQT3oN88E7L6iXKpdG1/sulIVJ4xweLAGOYu6z+h/MhVwD Vk5G9X3w9WuEYHOYK5a9kjOgrhuN0iMzai5zFh42UIfP5mVqGbezOGoTezhU+weGwqLm4E9JwPJ 2K3fv32V4Ivif/QJY2UXHgObNQk63pO5VzASbB3VYfnJXAKSo/WAf9z7NAKUUv0SaMw== X-Received: by 2002:a62:90d4:: with SMTP id q81-v6mr27996532pfk.37.1535959367147; Mon, 03 Sep 2018 00:22:47 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaMx0c9d4kO6+Vni9GJOssIkxQYrWStSeaQ971LznEWbM3OB5Hjty7aGKW0FgreaCt0/Q+Q X-Received: by 2002:a62:90d4:: with SMTP id q81-v6mr27996482pfk.37.1535959366259; Mon, 03 Sep 2018 00:22:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959366; cv=none; d=google.com; s=arc-20160816; b=gA4wWsb+LAEq+0QJpiPyIkzcM8ALR4GNIWznT9bOaCN2SWUkVFX+RiPgs5LtsRzGp6 9LdSYyUab4T474SJrJ6w8/TjeeHbZUekdiHWngmIljWLXkLhR8aTX8Squiu3TOkLXYXZ 4c3VvO/7dMf28v0M2DQ1TyasS2U5YI328guURyn2y77WwyJsvWz1K7iyOvRgYMTzGFhP oDuhlfnW8N5VeCaLs3DXHrfc+wfImmcpfDsmyG2QP1TgJQKA5NFYLLCGSWk2G3vjc0zM dUrA/lNDllTSS4GyWPk8L5PDVt3PQ+2anwNSpeYu4+M7T5ITKpi9PbCNOxT0l51L8z2G ihOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=48qjOJi9SSMGKcOFrBwaROng+Y790kF0CTIhqVQy+aQ=; b=URCUikrd507rarul6iQXcbS3fjgTCsHC6N1R4n6Eja79aAJ+0NuAgFFtPkOAQLjyiR LjfQVzOhHN6OOfkWMaHLDNM+3dO0Sa1kf/IiCsaqRzASngscJDars/XBnYwDmYdNJUo4 WAGSvDdro16iZY3e6yip1BoGAckfAIYxh3ncB9XPdH3f23WhuLoHubXIMRc4j8nrvtaO LmVvp8lRO3Do0zQoRvNzWZuMjrFfW1lFm61WdWMMUSz6VzEPIMSEcqCLuS6leVNp6AU4 hDf4P9m9mOVpHzx77rpTA+CQPkRBfqIriIBtG6F4prhfj9kMnzRhny/ycT9pT9+iiX6r /NEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:46 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146910" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:43 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 11/21] swap: Add sysfs interface to configure THP swapin Date: Mon, 3 Sep 2018 15:22:04 +0800 Message-Id: <20180903072214.24602-12-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++++ include/linux/huge_mm.h | 31 ++++++++++ mm/huge_memory.c | 94 ++++++++++++++++++++++++------ 3 files changed, 127 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c2b8ced6fc2b..9dedff974def 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -405,11 +407,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0c02d54edc4e..7d22e33fdd43 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1655,24 +1709,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1785,6 +1823,24 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) if (page) put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); + goto fallback; } #endif From patchwork Mon Sep 3 07:22:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585573 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3625D14BD for ; Mon, 3 Sep 2018 07:23:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D0E828785 for ; Mon, 3 Sep 2018 07:23:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 10B11287C6; Mon, 3 Sep 2018 07:23:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4059E28785 for ; Mon, 3 Sep 2018 07:23:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A68586B66B7; Mon, 3 Sep 2018 03:22:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A40F66B66B8; Mon, 3 Sep 2018 03:22:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E2DA6B66B9; Mon, 3 Sep 2018 03:22:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 491936B66B7 for ; Mon, 3 Sep 2018 03:22:50 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id 3-v6so2632662plq.6 for ; Mon, 03 Sep 2018 00:22:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=BMN6WiDq2cjft/TWo3EWdHkTOa3ARlGQeFw2zCVwga8=; b=hNsUjaHH5zN2PHtG17Mwk/Om6vc95xMsTaQ0ZFb7jk6o71km+69pe9GuC8vCUWdJ9c Y9aQmhT2V5FKaiDUub616lhFhl3BXQ9HIKGaEU4ykzBKCBmxjM5yV8vJWuZT9ydEQIh9 LAhZNdh9Eu4qJeRzNZVCn9NrPkiAh7xu3xpMHLAZUV3PWnjT+aXnsfsZM+be09wFL3gK 1EynXiaZlNhniwnNptkFEv/rC3vdW0WRMcUMBBkly2raP57R0ma+qHydM9VKXhnzNGag baeXnrKSiEcrhUjqGNxmEZxNI3Mf7nPPZ0M6806Dl1oIELlDGc97D/a4GfZmttanZCYQ 4HZw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CfCMEACYh2l+04ffY9uNI6/VaDpBGL1kF8p5eJDgHJTzg7AcEf N8TZJfHZ18pKJHXnKBYkFO/h5ccsFaRL4ShuFNkUcGUZhVWOl//f28NdyZLxle/jQMUHrRLr30b VNXPbegV6F2jUcAnVnlr4VmaZAb2UDYNS0HE8nw7CSXecu5XCBX0xsBnhioKqat7gVg== X-Received: by 2002:a17:902:2e01:: with SMTP id q1-v6mr27089922plb.40.1535959369973; Mon, 03 Sep 2018 00:22:49 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdbk62iQQwH8vKFLT2fyT2AApIrUxCWt/ZrvpmRsobHKOD9CSHD+J5LrNRR7j7PI9NJhox9m X-Received: by 2002:a17:902:2e01:: with SMTP id q1-v6mr27089882plb.40.1535959369168; Mon, 03 Sep 2018 00:22:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959369; cv=none; d=google.com; s=arc-20160816; b=VEUeDHGMmsiael9GvyotpWs+ydXmu3KIf1DU5XFO1uQNPkzG4s7RQySk0Im8UPznZ9 n3Cgk1amdkCtiV5FHV3BG38IwmDnG5vQ4AgZOOFl762xNLkhLJyOrfUKm4bV/SXir9br T/4NwwQkp5Hj/7wipu6mAbBxUlmah7MgSZV7nbZlMo11jR10ZnTT18WDNLvDiG2jwZVh RpGr0WRe8UAwY+Si4kYFfZDglT2gdYWUhnf1PR8/HyPiEzaRBiD5A4eYXHauGdMdW7+b 44XSSlgAEibp8bij4/8UigagxaETPp7sFiJrqdtDMlQ4tKQmhLOIY0b4ogL4ZbsPFxMK k4nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=BMN6WiDq2cjft/TWo3EWdHkTOa3ARlGQeFw2zCVwga8=; b=Z2L/XCHhIWxP5cslecKOVAsLzQMzWlJQelkYm6HAn1p1iD0yY8g50OtvPtIWvNugpM +Qgoa79mbV5T+n3QtsdnntajnwBtzBiq/uouka/xzR+6D5WzlYYCGEghng6kWu++12oo 1XksI0fhNQQwRQDGZg+2A46vEJgs+xDk3aRZ96VokezdIRLVd/LLPVk+UrOyf9TL0oFf JQipk5BXkklc7OvlipR9TKUDVKV6I1mQ9xDXZILWudUpZffblDgCtn/oqm9UP46LHsxH rahQ/g9dwfBiaaCUEumcQRNZxYK840CGrhcbsBqO/TIQ6i/dL4H5PeM46qXLSYi/+voC KWug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146916" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:46 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 12/21] swap: Support PMD swap mapping in swapoff Date: Mon, 3 Sep 2018 15:22:05 +0800 Message-Id: <20180903072214.24602-13-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for a huge swap cluster, we need to allocate a THP, read its contents into the THP and unuse the PMD and PTE swap mappings to it. If failed to allocate a THP, the huge swap cluster will be split. During unuse, if it is found that the swap cluster mapped by a PMD swap mapping is split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +------ include/linux/huge_mm.h | 8 ++++ mm/huge_memory.c | 4 +- mm/swapfile.c | 86 ++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 97 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index bf207f915967..d18a4415db46 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9dedff974def..25ba9b5f1e60 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -406,6 +406,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -431,6 +433,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7d22e33fdd43..4b7ee510a9fe 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1666,8 +1666,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index d98c1f74f87e..f801d9852b3e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) @@ -2032,7 +2084,7 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, swp_entry_t entry, struct page *page) { - pmd_t *pmd; + pmd_t swp_pmd = swp_entry_to_pmd(entry), *pmd, orig_pmd; unsigned long next; int ret; @@ -2040,6 +2092,27 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, do { cond_resched(); next = pmd_addr_end(addr, end); + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(orig_pmd)) { + if (likely(!pmd_same_as_swp(orig_pmd, swp_pmd))) + continue; + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) { + ret = split_huge_swap_pmd(vma, pmd, + addr, orig_pmd); + if (ret) + return ret; + ret = unuse_pte_range(vma, pmd, addr, + next, entry, page); + } else + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret) + return ret; + continue; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, entry, page); @@ -2233,6 +2306,7 @@ int try_to_unuse(unsigned int type, bool frontswap, * there are races when an instance of an entry might be missed. */ while ((i = find_next_to_unuse(si, i, frontswap)) != 0) { +retry: if (signal_pending(current)) { retval = -EINTR; break; @@ -2248,6 +2322,8 @@ int try_to_unuse(unsigned int type, bool frontswap, page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, NULL, 0, false); if (!page) { + struct swap_cluster_info *ci = NULL; + /* * Either swap_duplicate() failed because entry * has been freed independently, and will not be @@ -2264,6 +2340,14 @@ int try_to_unuse(unsigned int type, bool frontswap, */ if (!swcount || swcount == SWAP_MAP_BAD) continue; + if (si->cluster_info) + ci = si->cluster_info + i / SWAPFILE_CLUSTER; + /* Split huge cluster if failed to allocate huge page */ + if (cluster_is_huge(ci)) { + retval = split_swap_cluster(entry, 0); + if (!retval || retval == -EEXIST) + goto retry; + } retval = -ENOMEM; break; } From patchwork Mon Sep 3 07:22:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585575 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CBC414BD for ; Mon, 3 Sep 2018 07:23:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2747E2876E for ; Mon, 3 Sep 2018 07:23:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B3652879D; Mon, 3 Sep 2018 07:23:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C3342876E for ; Mon, 3 Sep 2018 07:23:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E1356B66B8; Mon, 3 Sep 2018 03:22:54 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 168486B66BA; Mon, 3 Sep 2018 03:22:54 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 057546B66BB; Mon, 3 Sep 2018 03:22:54 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id B892F6B66B8 for ; Mon, 3 Sep 2018 03:22:53 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id s11-v6so2248130pgv.9 for ; Mon, 03 Sep 2018 00:22:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=dB+ELHZrrtMkZ296hswR6Zmfp6auy0mx1dozSVvZnos=; b=ttZQWi3nmwxfqqeOf9aUp03XtiCExqH+SENL8XQQBpAIWtd0g8uerIdIAD6rxKhHHR k9bSprUILRwFjUtqQCaizABUy8+cJntd3UZsgOTHaYx3q9EFSbDMjpQAqNdxbPzAvEn+ FqaupDHiYO6zVrztrpGegZKCcS8VYB06my5RAtunZ3NmnJDehPN+ovA1kbUBx7J1YbT8 i/jZOoYat3IEQhk2o7oG0dZPP6Gd28iKBo+eZ8APOQUoXEiPopJ2s0MqYjhXeMo9Fxwq 6PqISgzlb6NFtZFjiTrZ46IqDyAzwLq9d7fLZatlu6QfxHAfP6aJ4/u2srNf8cp82bMP gQ+g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Dcyek0pAkqwAoT7V16bM+DFZxwZHyscb9nlXpVd7ZJ2wAVwxlj jh40SVAyTpK9/0d1yMeqxtvwCnyLNkxh37eO8XDleu9YZQjZ3Vix6tfdORa5MaM+YFLTKd/Ankc 8xJSkqEsyxoOrOqDIeCz8agwnXHWocTg2WQMz5tPrd7TBe8iuJcrpxYpRIdq+pmyvfQ== X-Received: by 2002:a17:902:ab94:: with SMTP id f20-v6mr27388308plr.231.1535959373433; Mon, 03 Sep 2018 00:22:53 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbWGFGUY2Khcq13nHGfg4mS1nv38oxXIvSEDSUtAaH9JvkRzZkU13cbG1l1JTmNUKM4b5ex X-Received: by 2002:a17:902:ab94:: with SMTP id f20-v6mr27388276plr.231.1535959372723; Mon, 03 Sep 2018 00:22:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959372; cv=none; d=google.com; s=arc-20160816; b=EBgt2U1wQ64HbVYD/0eYyKU9X8i7jVXELP4yq5JMHds+dqeMC3sOped4WtfVAifQlK SxbseTWQuWYtDR0aTia5ISL3y49cHH6+0+cgEN6jt8MFvB89/5rdFaPTpvzV4KH9te7o P24Chv45IiRzSZfkyzwvbjDBe7ABvZxpaGcUJ66KoJv5tD+2DGrxNdim8crZ/pw44qY2 gE8+M2yBai4JwIODgYpgeB7j6+PPKM5jfuKMP4p2iuVvMwmGycLrmYqJj9YXFlHHXP8M 2InkuQTOoA4fIesOLOVr7ZKUnHPuaNh88TNrsw7Sw4yjysVDNDQ2vek8bjcFBCsYG+09 gp5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=dB+ELHZrrtMkZ296hswR6Zmfp6auy0mx1dozSVvZnos=; b=0kG5hJHHI4OuLYa2HFvUDI+IWqpPdv6aABTY4830qdSkkkkWfya87OInhXdzM/IaeY uBDk5m/v3NwVJ0oDSE86grqdm2eT5TDBzJESUvmBBuyub2iF7SQfCHAznYqKbumuL5ow O6U0jIcEu1vslwWNUyK+NeOdCP8OF14iR6nye4myC9LXECnm5T3AcNMRSgxdssJXY8pb d9t4dCidB6E1SWWkb4MMbO/GRaOyp9688w5QooBY4yIFUUrqaX2MRgvYFUa2xzlJL5XK 1tQF1sRWJR9A3R+Z+vihuTZ2VfJSIeTNRXXaJyVBKE0FCfeR8EekR3id5DK0JD+BWcgE d0ew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:52 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146924" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:49 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 13/21] swap: Support PMD swap mapping in madvise_free() Date: Mon, 3 Sep 2018 15:22:06 +0800 Message-Id: <20180903072214.24602-14-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 54 +++++++++++++++++++++++++++++++++++++++--------------- mm/madvise.c | 2 +- 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4b7ee510a9fe..656e760d19e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1844,6 +1844,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1864,15 +1873,39 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + unsigned long haddr = addr & HPAGE_PMD_MASK; + + __split_huge_swap_pmd(vma, haddr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1918,15 +1951,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index 50282ba862e2..20101ff125d0 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Mon Sep 3 07:22:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585577 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02E7913AC for ; Mon, 3 Sep 2018 07:23:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFFC82876E for ; Mon, 3 Sep 2018 07:23:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D449B2879D; Mon, 3 Sep 2018 07:23:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA6A42876E for ; Mon, 3 Sep 2018 07:23:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D75F36B66BA; Mon, 3 Sep 2018 03:22:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D4D946B66BD; Mon, 3 Sep 2018 03:22:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC5706B66BE; Mon, 3 Sep 2018 03:22:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 6C18D6B66BA for ; Mon, 3 Sep 2018 03:22:57 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id d22-v6so10861780pfn.3 for ; Mon, 03 Sep 2018 00:22:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ETQapgMuBxWtMwprSUuMaOqJ37X0E18YOSPaDrNfsLc=; b=HDc5zclD6GdUEwzyXLj435Tr95CMBnHT/GY2P2sf0X/qw+25nwHqVr+38mmSPYu6ME ZMw8dsYpJukGS2gtN2G7vcer/zulYq/nlfsvW1v/ina156NXCmTGyeru3j/VSn1YTX30 Pn6s0+rts0X747LYm1A0wU7XoF9lRgKYFaLXSI/C+w3lB2h8LCJkChzgKpNXYJ2atTIx HXUWxZ6mq95Ps9U04SWTlOcgDPg7hUu6DcuFVOX1Zzt5MF0/9/AgtB6hYcWAVfaMeN4c GoaRwoXurGJKoL8M84eUoBiAyDarCInWodv9E1NHRbgmh5dieQ1DBdcWv6qYfSf5VGGy SKsA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DDps1qI3+3LH3rFpHO6KQUM5B44VXqMmiry2rIkNXL7Ugr3s02 SVgGxEL9Hf3kjluql7akkD0DQreA5AyLeMWF2/rpsuIlGSvUoPymZZEsIt6pqnzWG0F4Kn6wt5b rxaO/LZr5Qk8vebfCv9+K2bY8guNGI/9Bucw1BL+1tI3nrrix2Ka5uPb7vAqDE5+3yw== X-Received: by 2002:a65:5304:: with SMTP id m4-v6mr25586807pgq.250.1535959377094; Mon, 03 Sep 2018 00:22:57 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYXceO/i7QS4IQrj+751mXMe2kgrgLy/H7gGQ1iQJ23DqasnPQaX1K84CP/88k8iM6Xv60h X-Received: by 2002:a65:5304:: with SMTP id m4-v6mr25586760pgq.250.1535959375922; Mon, 03 Sep 2018 00:22:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959375; cv=none; d=google.com; s=arc-20160816; b=FM1kiUM/QPMvaPlH6S5vDLCwabZdGm8BcKr4aMRTyP/ZIWUpb+y73wHRDI9IWGMozT Tq6YV2FdQVXA7j9gZFgSljMxFbC4KgJE/jmOomUqmeM6hdwILBQaK1goSl56N80bqe/J 1lqldW7HbCBgrRN6qC4BRjPs0MOwLYJlRq4PS9fMelYOYCq9d8F9hK0Agaz5CW4ZbDR5 TTJ91lvUpc1ie0nu3+zclTVz9yq5jlXv06gCk6PYHSiVHE7UdbUDqQiWlrgz2K6O4VKO 7MFz6c5ZWMLb7SeShjfBUDYka+6o9ZyomJKM8XjKgVe8UXidyN/SzIqBPkt94KdwoM+1 NS3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ETQapgMuBxWtMwprSUuMaOqJ37X0E18YOSPaDrNfsLc=; b=vLeg/SHzFGXniRkKi6jeVxd1cgmFw+ebMAUpnhJl3G65mAST/QOvCXNgz0G/silYyS HFDJ1s2qf0uM8XEK9a2F9XZcEJCaRTyIsWkcI4oeqkFZTXVC73J9H14MyoUJZb1SVqYP C2Nm/aHwQCKqTyO6igBzVNJLvgLn/Z+z0y4lZfHigv/4JHVXPo8HBnDZnuCcA8M8Vv9U IZSAAkyuAAxDPGajBtzVQ/wI5sYRtVFPXnh7wnyi7ts/nJhDQhzQizZM3Dzz29ZHe5w8 ug6zRkS1jlWH6QSW70wmPDyrf+c7eO3DwIyUMqO7YLG5xMMTnwydT5vaEe4I50H5tdaA Z+ww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:55 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146939" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:52 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 14/21] swap: Support to move swap account for PMD swap mapping Date: Mon, 3 Sep 2018 15:22:07 +0800 Message-Id: <20180903072214.24602-15-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 ++++ include/linux/swap.h | 6 +++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 8 +-- mm/memcontrol.c | 129 ++++++++++++++++++++++++++++++++++---------- mm/swap_cgroup.c | 45 +++++++++++++--- mm/swapfile.c | 14 +++++ 7 files changed, 174 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 25ba9b5f1e60..6586c1bfac21 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -406,6 +406,9 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -433,6 +436,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 0f653b9027d7..9a12888c3c38 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,6 +617,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -627,6 +628,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 656e760d19e2..292c16b21442 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1631,10 +1631,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long haddr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1665,7 +1666,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 29d9d1a69b36..85f75c1c427c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2682,9 +2682,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2695,23 +2696,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4664,6 +4669,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4730,6 +4736,26 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4918,7 +4944,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4929,37 +4957,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4972,6 +5027,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4980,12 +5036,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5176,6 +5236,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5183,8 +5244,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5195,7 +5257,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5203,9 +5266,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5215,7 +5290,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5249,7 +5323,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index f801d9852b3e..7221b4c90108 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Mon Sep 3 07:22:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585579 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7D9814BD for ; Mon, 3 Sep 2018 07:23:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A127C28785 for ; Mon, 3 Sep 2018 07:23:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 95820287C6; Mon, 3 Sep 2018 07:23:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04F4E28785 for ; Mon, 3 Sep 2018 07:23:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C8136B66BE; Mon, 3 Sep 2018 03:23:00 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 29EF16B66C0; Mon, 3 Sep 2018 03:23:00 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18BC06B66BF; Mon, 3 Sep 2018 03:23:00 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id C43396B66BD for ; Mon, 3 Sep 2018 03:22:59 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id s22-v6so7731798plq.21 for ; Mon, 03 Sep 2018 00:22:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=pwSeESJGFFTVsNtUrsZfBu4js7ORgUq+NaSC8N5CVac=; b=tDoO2wWgL9GwDBC7C+92+ZJguv3aKJkwzdZcHG7hMtX7h3IgPMYz/K1h2e7VRMqar+ BNovZYkjEewHtMb1EZccYx9VTPeBIVTi1DusbGWpQwB+i8GoFCUX8FnH7UEiUbaxMPWp 00RJADGgq8zzRe10icISyJrCuEStxoc/GHv4+qeX3p+VcDluCf65L9H+4+HchKYXBjdd HTqTlIlUXz4P90e3OifTT4W5SuREPXrqEaTTSUbxQM74Jb6HQwWO307U7oc7RLJHL5ZU fgqcQgUXQK2Pm/yej4JMcgmds0iNNsbX0xkatOXqxM62SiecGxw9rfVZ+3NKnoUoLdZU rnqQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51C5+jUQfeRPWv8s/6egOqDZ4835dWWOKHu7XNIibywgh0R2PKOp eyARB1E7KG49/x7m/upzFGgWij4xaW0+DYjEAZ0GnOe71ZQPz8PQqLh+TGt48VKoE+4PpccnvPt wgkKdxfZUQx+2UvgXqoSSqQ6J6U7OtmlfP0GZ5Xs/hSP398KNI9++RtggDYb+3ktHGQ== X-Received: by 2002:a63:67c3:: with SMTP id b186-v6mr13306575pgc.5.1535959379483; Mon, 03 Sep 2018 00:22:59 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbGg6pBxiYY+fnfiGCF2WIVGk/08vwXZjyfuT2an3+n3Tuc4gXIqEBAA60i7DbQ8casE6rs X-Received: by 2002:a63:67c3:: with SMTP id b186-v6mr13306549pgc.5.1535959378775; Mon, 03 Sep 2018 00:22:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959378; cv=none; d=google.com; s=arc-20160816; b=BGItmRXKvW/HTBwDBaCGMZ5uUGpMCuYM6raG8W7O+O7KNtGdrujU9n3xzgZ5lyaTNe ahARLBHmHq3fKa2RIrQJ8TDFDfMu7bZd/iFB856ho5Y3HMyfftzGcUdQIy6xfw5crMZz X35MZ2ypp66l7+CBDZIGQ5BK5dbSlu/jemRWh66bvo0YVDD729mwyf4sSczpv/+OM2i5 U5J0BdwFKoPQM07pHBfbXZUfM39PGKZM+cX8e7J6AuOILYYb/0Ry3xJr3ze9Dad8SYH3 qDsb63z7aaqeB7x4qA4KGl4h5TsuNJRbIpt9fOgul+iLtHvBZgy/23M+O8TKjTduA0Th CU1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=pwSeESJGFFTVsNtUrsZfBu4js7ORgUq+NaSC8N5CVac=; b=vLj8x8lz3fuayDybb5h9vRc3qU+uRP6HTU6AAjq/vTutgBQIvJKG1bQBAvr4obK86H I03+q9iJCM2Nxn2Bjzn7u/ZooZB6uxs7kfmJl36QJBIFgHFK/ZC/kNmFkl/poJKYzzi1 VvcbEyQiVrzrw8158Abnt0S4PLV8ULVLULq+ZM6mGBHx0kViaYneOcAX7XkxdIiUTVT6 kg5Vm3t3NLzT3tBTg0z3M4+EvjgBc/1E9Ddfu8/GqJq5kIedLcxsVg0sdGdHSNglzTUC sI9ocn82dqsOhYmkSNQUch8yKHE3UL5kv4UaX+CYbGqOJ08jWhY1M/+qAK4xJVRD50I6 xNCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.22.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:22:58 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:22:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146947" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:55 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 15/21] swap: Support to copy PMD swap mapping when fork() Date: Mon, 3 Sep 2018 15:22:08 +0800 Message-Id: <20180903072214.24602-16-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 292c16b21442..56b12f533a64 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -941,6 +941,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -948,26 +949,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Mon Sep 3 07:22:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585581 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E7CF14BD for ; Mon, 3 Sep 2018 07:23:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68A86291E4 for ; Mon, 3 Sep 2018 07:23:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5AF2729203; Mon, 3 Sep 2018 07:23:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D8EA291E4 for ; Mon, 3 Sep 2018 07:23:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E07D6B66C0; Mon, 3 Sep 2018 03:23:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8940C6B66C2; Mon, 3 Sep 2018 03:23:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75A796B66C1; Mon, 3 Sep 2018 03:23:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 3710D6B66C4 for ; Mon, 3 Sep 2018 03:23:03 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id r130-v6so7566405pgr.13 for ; Mon, 03 Sep 2018 00:23:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=MtJ/UGiWQhFCbhyVyAhYZDQ22CY4uVb6kRyj/kHQIxw=; b=iWLx82VF+82wFNopGwsedtNK3x7iz8tTB5EMChbbJm/QUxQyVU6cgr9uVtRv3P8FeM eWgZ/MYTTmZcCiylBKfPzCck1bbSN2l2N+3uzAjwBuAfpdYKpKErg+5NzHGZfW32UeGQ JhZWDTxE55K0lwZZS+gKd0xHhMBtIUH95IuyLJOyKkjh5lSZ7Ga8Fc5cXyz9e4U+2JTf zOtkEfUSwCNK1RnagBl1UMA4IYCT+gFyNBUnSQvTEvtJ+AjhK2Ugshm77oH+rWMo1/1I wQjh7cIughCyFI9eW//vhNvZTllSNsAd87Tc9wb6i9y7zumVbbJUjqqj076eE1IL2aXW A6Og== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AfLQtyeBAakwq1adtTbPQ10QUVx+1dgQP9fvk6Qd5G4Jr2wdrF 2IVbS+YI8KOn/ShnoeA8XpNuwtfpjhsgEh/cBLdNKKriUXGT10KMVDcNTobe4mHBV2jK7sXeMXF J34/nzt4F48MzEZMn+JIL1bboaWPeDsiUevtfe+24c/pIfuVTR6ytr4V3eQuO1SY/IA== X-Received: by 2002:a17:902:8a90:: with SMTP id p16-v6mr27329816plo.106.1535959382766; Mon, 03 Sep 2018 00:23:02 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaZWppPSa05GUjhytwyfeYdSaIiF3c6wQS37mordz/F3E1V5UduNwIMlqZSqgm1rd80O6b0 X-Received: by 2002:a17:902:8a90:: with SMTP id p16-v6mr27329762plo.106.1535959381670; Mon, 03 Sep 2018 00:23:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959381; cv=none; d=google.com; s=arc-20160816; b=TuG5Ro/tznyfFehUSJS1cRG6sX8KXdUeNk2E7wSGIrRd8/sDNtNF24LnQoJB31oy2M oxn80lCUh3KVsx3YO0FS8dJTJ0e87EPbDyyY7fOt7OwdjRsEZ2ki6YRe/B66lk6MeOY9 mskPeyXLWUVaUEAj4Pu3C0e59ZKEO5W/5JH07ogY6JwUCZo2GDJxE/NrgwNiA8+KmxMd OvKWEqfH0nNtt6aalSXWZaSDsKh6cISpVrsoVhdhzSoGxl3qlneaGwNTIRH0Jm3oSUxT CHCuOe60nDj9wd6mZEiOF5a2dpa5vFWH3Hm1a1Rbl8u+9FJMefGBCa12oNc7sAlt46uh 9sMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=MtJ/UGiWQhFCbhyVyAhYZDQ22CY4uVb6kRyj/kHQIxw=; b=H6fPbi0iouMrjMKw+R8BJK9uuiEV2Rg02ZN1RoHr/FBGq13UQ3sLmNzJHbCbA7m3f9 v032lBpTcnAG5BkysRTpJzAWcavLCFykcJttK66kkLFBlo2b3lC3HdggRzEP3yz8F4wA DO1E24KUvzy05O6dtzEM0qB9Ql6O/XSKa5yXHwQcjqSvagJ9T2SuM5BpMYYSm/HdQkQQ RtXpyzHTi5vw0sMcIEkQc6gDe4P4znvjFcHYAvFImWcNoo2IM7hyDUwAfnkJ6lLXwm3C l/UP6satQd9YD7jeloyQit6JqtGeXcDq66CcFv5lUn629PXst97ysRY9VWBISCetAjjT B49g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.23.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:23:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:23:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146958" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:22:58 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 16/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Mon, 3 Sep 2018 15:22:09 +0800 Message-Id: <20180903072214.24602-17-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 56b12f533a64..c9ab96711b59 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2019,7 +2019,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2032,17 +2032,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2050,7 +2060,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Mon Sep 3 07:22:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585583 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AF2F413AC for ; Mon, 3 Sep 2018 07:23:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 97A5D291E4 for ; Mon, 3 Sep 2018 07:23:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B8B329203; Mon, 3 Sep 2018 07:23:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04D7E291E4 for ; Mon, 3 Sep 2018 07:23:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 057656B66C2; Mon, 3 Sep 2018 03:23:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F22586B66C3; Mon, 3 Sep 2018 03:23:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC1306B66C4; Mon, 3 Sep 2018 03:23:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 9BDBA6B66C2 for ; Mon, 3 Sep 2018 03:23:05 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id 3-v6so2633011plq.6 for ; Mon, 03 Sep 2018 00:23:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=sSuHAxdJw5wR/53zP7teoEH2FB0eMG7Or719zFrThv8=; b=uNQm3LGHXxVwWx3dGPoSFpOZDQnjVO0kvo/nSDOwDjDVIacPjAQ/9BAYDhKo5ncwTh nuxTKemFA2jgAkFymrixT44gk6B4q6Ir6Dg38h3FP2XuOccXcrrui96QH9qQ7fUrfd/9 8KZr/A+WCh2MnPW0Q+t2EttTzoXomaGpfJ1znEngk3mBcI73I2wuS4e/w8Yo14OiikKG hBXkdqQ+Zqeh59/Ymrbz/A9aCZdU7f2QiNoWwUncTHIFjcp/InmMYlJYFfEI1uvoD84q qakJE4Zwv2UE1PatF7VRwLnzFnYM/DYLr/59uXyj25gdloZ4ZC2tj3uw4Q/aC3CR9xGu kSuw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51Bz79FnK0IsPe/xKfwdncTJftiorsk3717nyGqAJsOx3Jhyryvc J37m1quhy+7V41pm4Klnp6bpvFo5WKWjWjHeTt5Kyy5ARTxaaMPN/smJUGP0mRhHD1TGfWnIRGt GIvDQcznX9OWtf2vFEh5qLNtiISZ8NDwgZskX3Qy9pC8jCpTruB1aE8Zm4bI8jdB/NQ== X-Received: by 2002:a62:7088:: with SMTP id l130-v6mr20147947pfc.84.1535959385341; Mon, 03 Sep 2018 00:23:05 -0700 (PDT) X-Google-Smtp-Source: ANB0VdblIQHYoP74VfI2NYHRpaoet91EctEDAu6izR/iq4a6UcZxjD/4seUpnkuUF5TwEFN+BU/p X-Received: by 2002:a62:7088:: with SMTP id l130-v6mr20147909pfc.84.1535959384695; Mon, 03 Sep 2018 00:23:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959384; cv=none; d=google.com; s=arc-20160816; b=zpFerLnvbVFYpz9fPhCODLaQSmZT9f49BeN2eFem+4D6CMn84A2504H7N1ufFPkD1a b5n2XOG5sjVHaaCoZvShc5uROZeGlRllaJcy0FA35xZUs4LKDZKbhjFLgtjKdjumiGOt qPOS1XJvoOOiHnFiYCiCIuSGzdpYzLSbG8AGkIsbSbPPk6d6GoWBuk8q7z3vgEVnjcAn XPpLbZZQP3bDyaRZ4/QQTTfBJWlcJVic0LCzruIlKPCX7NAGH8LkS13fwi+jv9CUkaum XSV/cG6EXF21cbnhrhGkATDeNPw0ZJg1nAvfEJZoiyIsQjtSzdrn9gXXA36aENi2yF/8 Q5bQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=sSuHAxdJw5wR/53zP7teoEH2FB0eMG7Or719zFrThv8=; b=ttCMsnhWqyBQ1lhToI9oyYz7j5m31DfSdxXbt2MyJM1lLAj/Z1lCrPWqAhoxfxhTdr OPBpr9Wv0HvdjOx0QoXKWttMrz2p8Je2GzI+yWyWKLjWO8B8MIKjaWkERVecNt1GhYEN tJyz8axtxIVPC20lRRGedDvXT/GX8MlXZy71jRiaOg2kUcp/TtouP4Jjtb71BLBDn1GT hI0alvHwRSuUx3w2dVZ9dJU0UX7p2LZz90d+5Cub+cEG7hjgeEXwkzsUi2VzuQGLKCF+ e/UuFPCe6a/myK5K2H1tAAjmEeWCMoytaOpvmMkdcmvXQNBoVOLoLLRHdEc86xLlkxF9 7E2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.23.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:23:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:23:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146978" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:23:01 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 17/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Mon, 3 Sep 2018 15:22:10 +0800 Message-Id: <20180903072214.24602-18-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 20101ff125d0..0413659ff6ba 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Mon Sep 3 07:22:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585585 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A49413AC for ; Mon, 3 Sep 2018 07:23:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32CCE291E4 for ; Mon, 3 Sep 2018 07:23:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2764E29203; Mon, 3 Sep 2018 07:23:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0107291E4 for ; Mon, 3 Sep 2018 07:23:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB1616B66C4; Mon, 3 Sep 2018 03:23:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C3B0D6B66C5; Mon, 3 Sep 2018 03:23:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B06366B66C6; Mon, 3 Sep 2018 03:23:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 6ACBE6B66C4 for ; Mon, 3 Sep 2018 03:23:08 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id a10-v6so10255112pls.23 for ; Mon, 03 Sep 2018 00:23:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=bT5SC8EGqEOhgRJjdQnX8eMGPmoDA3XI4e3ERUDYS1k=; b=bcVJRycEsvRugj0J4AibGTXs/b6nZ3i0GkCxjamA8XzV/+GoAw7WCVmDtemT0VndfL 2zki0v8TBl0SLEI6O9glrMiKsXmXBwU8MaxsVzATp9PI1BFyumkcwlw7J2mYZN9x9c0L V2NvtzsxLyQZgAMmPdobv/nbPdMCwUsBoGtMOwGz6kIsNqiLxMdW3DloMlyFcAJP1ayx ve0uipCUZ4Ki2PtEIM4gbJtUVKXack5K5krhnwVwABCVZ7Qp8RSEcf7no5DV2worsAab Lud622XGI34A3qOZ0jx8ujt44UnF7ABDuKSEPvFhG2mUFWBRVne6+O4Hkfv5Y7+UH9sv VA8Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CA7Kbi0xLsq/st2C19zzkr6I+ycQZLeucJIYBPZRyQhH/aI+Bt zLKAfYg6EAjWubUWlGFYDQrRn8yLiMNQGwrWYpVdEDrFV6I6OzgoK9/qm6WrDkvZ1ydSHz9m5hZ 5hP5SO5/TAZa7hM166GcerrWKM7n+pDCocshdyb6rNxDY7hotjcdm4I+BJgcZhj+hrw== X-Received: by 2002:a17:902:b68d:: with SMTP id c13-v6mr22386053pls.139.1535959388125; Mon, 03 Sep 2018 00:23:08 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbfGAE9pr6MBjYXrxNFWEZMtEtP58h7Jzl6zx50eFq4a292qd6qS4nHJf8NmI+8a8w1fpRm X-Received: by 2002:a17:902:b68d:: with SMTP id c13-v6mr22386024pls.139.1535959387496; Mon, 03 Sep 2018 00:23:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959387; cv=none; d=google.com; s=arc-20160816; b=ymqDMmvoUXmFCyiDaH2o6CxzReaClI0fuvUY+YOVpI4IN+cyOQ+qy94o8a7YPT/t0E 7hwiuBORPe1eGXSPPfaZJ8HnLJf2312mX9zwgcaWlf/sYDBp9SztkZ9WdKEV+0XJJF7h cIT28WLahNzlCyZf0kCLgaNR+A12COsSgPx6dVaaV1JRP+wzwD3M2jFRn8PL5PRlQLx6 toDLx4Mjx/BVbpXbSuM1NR1zqf9DQ22roY+oM0Fe4CzkgGbsGzmalyIrmAFZG5TYNuLz kWflh+NLDjeWEXwE3ydgjweyrhOWnCuDfOeS06a2gv0RwCYJcpZeGqdChE1ChtCrlKlv ICEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=bT5SC8EGqEOhgRJjdQnX8eMGPmoDA3XI4e3ERUDYS1k=; b=CzR31dzyychW+V9rVE5kAjtju1yujGkDoYKSnYr3xZv6z1fNoembR2MybkHYvUMEae vcmLeilfNABZBnm1As08bA+Cb7llK7aBSfdXYHULJlyvFUXLbO1YkJdTD5TiGj3wJFzh Jj/Lxw3Rq9WBPIMyePVU76zbLptaRn8wPUHlesDwVZ2mnwhaNPWV3yhPLagMVX4cd5iC Zqjgx7InfDj4rnx2jdO6o9LNONDhN7c3RGKYoXp4R/YOP6zEdExinK1nx0ZGtIXl7ZjR hY/P5J6qmGUX3202ns3Up8nyNKAZIUB3cpxqICUS0JF+lupzTgPOJstpfT9X3Q7ST8xl YYoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.23.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:23:07 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:23:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146989" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:23:04 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 18/21] swap: Support PMD swap mapping in mincore() Date: Mon, 3 Sep 2018 15:22:11 +0800 Message-Id: <20180903072214.24602-19-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index aa0e542569f9..1d861fac82ee 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Mon Sep 3 07:22:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585587 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9525314BD for ; Mon, 3 Sep 2018 07:23:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F397291E4 for ; Mon, 3 Sep 2018 07:23:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 732E729203; Mon, 3 Sep 2018 07:23:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44094291E4 for ; Mon, 3 Sep 2018 07:23:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A40216B66CC; Mon, 3 Sep 2018 03:23:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9F4C86B66CD; Mon, 3 Sep 2018 03:23:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81D066B66CE; Mon, 3 Sep 2018 03:23:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 43ABB6B66CC for ; Mon, 3 Sep 2018 03:23:12 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id d132-v6so10178569pgc.22 for ; Mon, 03 Sep 2018 00:23:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=8E4XJrijIUgl0lxU7e9FGvf7dJGa5yaaIJ2f1kd/lNA=; b=M4dw7vqd06d//O6R/IYR0g/r0OSR6NebcAI1yj6HxdlAE0Pfem2ZD0tdrVw5bvcGxm lHM8U+b64oK/A8Gda70+1yYIdDKHb7MVqEharboLn17yo2QQ6gounvNQJ2IhlSCRprqd dSy5RIGky4PnNBR48BuI0XeeXBqh/9jsP1Y4UvAnJqMXTf2HAVYUsNric9xNP8W/M0RR w8Wf7YeJ/6bw7W5nn5hWaY/GkS79hKljFmitLxbfmSzuz+EpFG0YStSn4p49EUlt5p0s iK/1IHnP3Opb3z+mKi9WrfrELnadt6emPTwaipOWO2efnqAuKAuy2LxVlJqntTZwUHAF UWYQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AKSBdipKQSOVAkx8GaJtVAnKsI2PfLWKosFNC759yzTiYVBG90 n0v0ZT4DRnVH0ijFGWk5laPyYNZ3vxUM6eJRrnp5RBVfOk36L3fdF1jLZBD6TnZ8tsjJezNZCCL fhehr6eah5foddA5LuPlRDJaULzRXU8RdEiqt72i87wZoLTc71Dr5V4/lrWqNr2qakA== X-Received: by 2002:a62:2646:: with SMTP id m67-v6mr28167947pfm.254.1535959391955; Mon, 03 Sep 2018 00:23:11 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYYL0RUjtuvYq/UjvksmO148n52onx/2BAoZjky0gcfWTpYWpscg0K3li9/JY+sgQPjhLEb X-Received: by 2002:a62:2646:: with SMTP id m67-v6mr28167890pfm.254.1535959390719; Mon, 03 Sep 2018 00:23:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959390; cv=none; d=google.com; s=arc-20160816; b=1HTuP8IKgQh6u51nInbHw4wupUCj/1bwZn1RzVGs/t+qi4k/vSqR19wvz5FLBviCg7 tP1t29BtOwAaiR4jmFa+GQYbCZqYZgiJd1uVBn0TJNwvqziGrTdTqZuKZBM6/qTry6xt 3s+iJrb3upxw3gJn5wwIl+mqZ2uKYFFqxVmFIQUwho2XudQl5dGJffHvLZKd9b2PIZHA QdCpMc182p+EqMyfJqGo3fhoUSWqLZvZNYCVe7nRSJF2JmsgIcx3xoYntqXpmBI7jCH0 oat8HAkKyam34XfV4hp00iCSXV7E+5R6nkgXwAPEkXKJvYZvlXOkiUuoo4OD9SNK0Ksz a/bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=8E4XJrijIUgl0lxU7e9FGvf7dJGa5yaaIJ2f1kd/lNA=; b=XEWij4kNcB+STufqI2F5hY8poTX7nsbdJW3mE8usYuFl6S4k/rKPZJY9Tuonn5nTZc 1Q4AmAVyAAr+gsBCkJl42ch6nuyDZ9ENSn8NTqxRoAbWNHIhxrFodNzHwfm/E2a+uQh+ 5tLf2FDkxiGsFxxQ4JnB3jKrjnxmkov6furgLj6VX+x6+FoLl8szFjuzzF9023msj7oF K2/By577F7og4YnhxeBVuYXKQZHEtWgNznbf/sGivSYJGY39Y45edtucO+Y33/x/2BZO 91vwVNhNjUkZvfIz+l/Ssktq7CIV23uiH2zYN/2C3j1L2s+CpH2p9TGXdRpntin1cuGd mSqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.23.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:23:10 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:23:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70146997" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:23:07 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 19/21] swap: Support PMD swap mapping in common path Date: Mon, 3 Sep 2018 15:22:12 +0800 Message-Id: <20180903072214.24602-20-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 669abb617321..8759cb746261 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -972,7 +972,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1302,9 +1302,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1317,10 +1316,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index 1abc8b4afff6..b35b7729b1b7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -216,6 +216,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -243,18 +244,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -276,11 +281,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c9ab96711b59..c825f470d58a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2086,7 +2086,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2172,11 +2172,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index da858f794eb6..dbd8bb77d78c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Mon Sep 3 07:22:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585589 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76D6914BD for ; Mon, 3 Sep 2018 07:23:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FBB3291E4 for ; Mon, 3 Sep 2018 07:23:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 53DD229203; Mon, 3 Sep 2018 07:23:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91CEC291E4 for ; Mon, 3 Sep 2018 07:23:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 206BD6B66CD; Mon, 3 Sep 2018 03:23:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 18C616B66CF; Mon, 3 Sep 2018 03:23:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A4546B66D0; Mon, 3 Sep 2018 03:23:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id B2AC16B66CD for ; Mon, 3 Sep 2018 03:23:14 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id g9-v6so10236851pgc.16 for ; Mon, 03 Sep 2018 00:23:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=9tOA1LKhGiEfJu4t/V3cL71b+Ah3yMGvdvl2xuoOmUc=; b=X7Yj39LFj0QceDMONV6P2sTmnC31AFy45lzHM74/O++HhloSsudQNZUztTooVTvfPJ pjNnBOEyrSTIaFdVfEuyY0HUgzia1IarugqV7GIC977VAgZZMTtBsYtpg2ibBr2VCMoU aDr1hg7llLWG2aIShBKUs5VAQrQdNRgiXp+XWX8kY6++89mB6OhCHAMsYR6+lu9QeoQY CDJGvaPPUfwP+Btw+AgfsZsCeFKdLpcvh7cNiXull2lE+1E4Qa6o3H0yicS/ac+dQxp1 aUnY93rJ+EoeHm84zC/xgNZDGeD+v6RNY3JMAVB+hSMn5xURPkBnBM/0UBqiFCW7tMiW cbUQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51DFObqUHLCvZ9OE7xYeHaPA2XHiRM5U430r1DWM36yUHQ0NiX/j mNpvqMVviNxQWpvRl1o4K0PfZ5TS4SilcwZK+dPggwyFmIzsPOi7zwTDF1ByJGMkJzNvvvrag3B WA82ULBT/HUXtnj/t0KPxuvrWcZ3ETWdouAu7ECSdJiN0TE9GTZ8V8wdcqhZPY7Zuww== X-Received: by 2002:a63:e14a:: with SMTP id h10-v6mr24968757pgk.358.1535959394400; Mon, 03 Sep 2018 00:23:14 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaWS18pKQN7H+s/VXjtzCs0T9i8pGqx7TAv1OkJGtk7imhBS7QCHoHFeDzyNEgEzSsKWyDJ X-Received: by 2002:a63:e14a:: with SMTP id h10-v6mr24968735pgk.358.1535959393686; Mon, 03 Sep 2018 00:23:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959393; cv=none; d=google.com; s=arc-20160816; b=ZJTyaF7KIppy8tGgbe9EE3bVNMirWfoznuDCJt8L0N5hUm3/1daw4dY3sRqbdcmUbk NNGNnHH0pUS8Qa3oFNQvWwJYi9m+YC8hwQBsgkaBMKKlM0uqJnP6/YzT8+SKoocTFZuf p88Up+PhWKUAjGkIAiv6pBEqsyOOr+3mH4mVUbv8oNNPAcxPoIazplLYubq13P5lkB0D fN7fH+9xehDH9dVaEcnFqnXClY9TlMNpw18i8RTPCt1za6rRWG/+ahoH4hhh9d4eMboa x+pszJz578QVtgDCkGJXencZWDjy1FRZIUzim3PoJ0D+Gr3c2a8Gb7qI4EWnaFjqWu6A y3cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=9tOA1LKhGiEfJu4t/V3cL71b+Ah3yMGvdvl2xuoOmUc=; b=QNy8CFh3w1q1plXDAjZAQgrw9FcfVNLQWu4LRQQx2QKiWSo2WGmZtb8AfLgz2iwmyp Lly1Ji+veH/15MRdu0HxjLxg5MlL5BS/U+W3jSQXxh4bwy6Y7jDRw3Mdqo8G4fZVn8Ya yN/IgAQOg67fDEbSIRtTH264q2KW4U+369qr/AfaYjNUONQ7WX6LfLGe6giRlFvuvZgj /p63BeCn8FpWP39Xc+JPr2RbbEbFGw5+6K/KOoYaz+4+MQg/+95mUurveQHzsfwoUuC8 k9vkvc8U8GeI12SvYuPPB7536gJir07t7QU3Gg8plllfMXuf3V5tLvIev8r9G4X+3KVr lUJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.23.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:23:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:23:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70147005" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:23:10 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V5 20/21] swap: create PMD swap mapping when unmap the THP Date: Mon, 3 Sep 2018 15:22:13 +0800 Message-Id: <20180903072214.24602-21-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/rmap.c | 43 ++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 84 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 6586c1bfac21..8cbce31bc090 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -405,6 +405,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long haddr, @@ -412,6 +414,8 @@ extern void __split_huge_swap_pmd(struct vm_area_struct *vma, extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -453,6 +457,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c825f470d58a..a86cdcab2627 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1884,6 +1884,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) count_vm_event(THP_SWPIN_FALLBACK); goto fallback; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index 3bb4be720bc0..a180cb1fe2db 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,11 +1413,52 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(mm, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range_end(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index d649b242b989..cb28d17bb184 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1307,11 +1307,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; } From patchwork Mon Sep 3 07:22:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10585591 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4618C13AC for ; Mon, 3 Sep 2018 07:23:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2FE0E291E4 for ; Mon, 3 Sep 2018 07:23:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 246C829203; Mon, 3 Sep 2018 07:23:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A3193291E4 for ; Mon, 3 Sep 2018 07:23:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E89806B66CF; Mon, 3 Sep 2018 03:23:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E103F6B66D1; Mon, 3 Sep 2018 03:23:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFE976B66D2; Mon, 3 Sep 2018 03:23:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 8D8666B66CF for ; Mon, 3 Sep 2018 03:23:17 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id o16-v6so10228543pgv.21 for ; Mon, 03 Sep 2018 00:23:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=dxCDEiqIpw/gKZ26fjqd9tzDmmIAZHCGcxz1+vVhgM0=; b=pNwo7v/W1y0k4SHvjgx6dVyqyGG2EseltTqAm/QCDr952d0NRMDUTR19zpBjjajgxK K0g3vENzsFnDKTDqPJ9YHLn69u+ZmDbAkdydVsfJKEtk1sO79vkaYVth22mVaBWo5cvb 81VsGG+La9QJlTVBFE3LBesPMFhWiLul3UkNKUbHLdvxQWvaKNvoU47fjyFto2Md+Npy D+Kw69cSkuLrNhu0rwoei6Omq21FkI2rnGA9MGB9nHy/lr39pwL3feMNGFStwCv7bUfn EIyx9hSiG9qSA2UXAofTAQipgr9q81AXad5FIBBXFKjQpWGXMrNNL+Y50bCIM9k99wHE wmrg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51BD5uDPXxJ1rifIb5tbuVf6ERLwR9kwJCmUCKWL8iWyoJC+4/8C G36XkyWsNEOgC32rJhhOKeNe62rHVRswxEOkx/KjjQlfTCvVyRYXEyYsCktiN8Pvu0LxmEOtp1e P1bUB/DuqaixS5xlAIrlQNNLY9Rezl6CKpT3AQhwvXmjnZkJfBgRN1qfGHl+eXYxjAw== X-Received: by 2002:a63:5a13:: with SMTP id o19-v6mr24723315pgb.75.1535959397267; Mon, 03 Sep 2018 00:23:17 -0700 (PDT) X-Google-Smtp-Source: ANB0VdahSGSUk1CPpHAfyG9BkeJHfY7YB0ll8GyziVnlHhpJrlPpJKSxqpLMgh7btwvbHoxBYlI5 X-Received: by 2002:a63:5a13:: with SMTP id o19-v6mr24723289pgb.75.1535959396726; Mon, 03 Sep 2018 00:23:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535959396; cv=none; d=google.com; s=arc-20160816; b=TBEHszsjPDTQar8gBsR3/D2vIstgMFA2Wxkn0srAsj8Lw6xcQniyN3kjv/bRNW4Is3 +lP4W6GblroRghc+qH5qga47T22+AASvqZJw/djFCMn3snqzzUOGeHfY77J9yTYjCRq7 q9gtFUtvtmcw9ZyNNxza+CnLILobisxic8PmxbvxdGynuai1ewsL+Oew7yfxVxKi/cIF ouGyGEGMSW9i1ihAqM83PJzxyfRO0MngLuA9Hhi3d6Xfk3xF8BDV9LoTpPZJ4UGDiTCO cOU6OoKfTjxyLmfoiPMC/uZNuz6sCYtfK5QyHNP/uEt8fdSV3xxSkKT8Hry04ahgyY7U s70Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=dxCDEiqIpw/gKZ26fjqd9tzDmmIAZHCGcxz1+vVhgM0=; b=qtVlZ1EaqClm2SK4cW+xEl0p3vwV/nRxcUJaxRwUgoolc2XtUbYO+T0Gwy6b0JchYF v2O8weE8WBCB8I8upai2xejQGSZVgo+uburoUJa0ePy7jkoiYKGg48n37MYnDk1HgXq4 ezg8wUqY8pL9lUWhvr9eZQnzIdduqBPtnRa13WEtTs7IKz8Yv8Gs8aG0deePdr/U/Jnl 0OYWIh71cNTpax6ix7E2BKjTOzEe7oIOn7n2JqN+uGI3k7GIw2Aob7kRuZFPhcu47pqD rWT/uOPtUItCB76JJRWMWGmrWfZ8/QK2fAxLv2VybpotQc2KICxgFZCyYgt6rTNts8Ro kAsw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id az1-v6si9327648plb.513.2018.09.03.00.23.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Sep 2018 00:23:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Sep 2018 00:23:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,324,1531810800"; d="scan'208";a="70147021" Received: from yhuang-mobile.sh.intel.com ([10.239.196.86]) by orsmga008.jf.intel.com with ESMTP; 03 Sep 2018 00:23:13 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan , Dan Williams Subject: [PATCH -V5 21/21] swap: Update help of CONFIG_THP_SWAP Date: Mon, 3 Sep 2018 15:22:14 +0800 Message-Id: <20180903072214.24602-22-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180903072214.24602-1-ying.huang@intel.com> References: <20180903072214.24602-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The help of CONFIG_THP_SWAP is updated to reflect the latest progress of THP (Tranparent Huge Page) swap optimization. Signed-off-by: "Huang, Ying" Reviewed-by: Dan Williams Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 0163ff069fd1..8dac2fb19203 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -425,8 +425,6 @@ config THP_SWAP depends on TRANSPARENT_HUGEPAGE && ARCH_WANTS_THP_SWAP && SWAP help Swap transparent huge pages in one piece, without splitting. - XXX: For now, swap cluster backing transparent huge page - will be split after swapout. For selection by architectures with reasonable THP sizes.