From patchwork Fri Dec 14 06:27:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730609 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E849591E for ; Fri, 14 Dec 2018 06:27:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D896D2CC8C for ; Fri, 14 Dec 2018 06:27:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC5DA2CC99; Fri, 14 Dec 2018 06:27:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58AD42CC8C for ; Fri, 14 Dec 2018 06:27:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD6B08E01AB; Fri, 14 Dec 2018 01:27:44 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C5AD08E0014; Fri, 14 Dec 2018 01:27:44 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B27AE8E01AB; Fri, 14 Dec 2018 01:27:44 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 693D68E0014 for ; Fri, 14 Dec 2018 01:27:44 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id b24so2935004pls.11 for ; Thu, 13 Dec 2018 22:27:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rcmn20os+/K4CyfpYFSW3FtZWFvbwNieX5oLi+pnIY4=; b=gRY3TEskp5TUrnsFNdlluH2rlD0Y0InGZJHruWqll9Ae2/l6AOTqz91Qv6UyG2OK2L scBWhIkEc2JlGIxvvENU5LF8U8JL32ITiVnr0cDhiq3gR8GlMY8sbBwmRVDEt3WrO6J3 JnPVAWfl4DHKrkFzACw2crrWmKHuYmtTNQKao5cNi8SvT89wUcj9jLz2uKGhkAJxoB8r wtRDCQKN4uTam/MHOyTYiqCk5AeMv8WuA4WWzgl5CndFPNy7AxinjMjkykXxA1Y9J4iJ 9npPv23wwYMtuEd4Cck/ojKHj1NQZA/Mw66bBlUDAN/YHnAFyrOwT1lXXtPSpp1qIlcI 0Dtg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZJTykANqJ79Qf3MelOKp7w7DmnVdaWhJP5Yryrjvi2RbtScpzY 7HmB/33ShNlP9EvtQp81XVc+wifRBDLVWpbiZvcVtt6Gz44CEdz4YGbwx7fVe7Cpd7QX93HCLQX LGq6ez56WRekRqTMQEQbNG/xmy79iDngDdDzzGw2l1as1doUm+R/9ua50y2++LMK6fg== X-Received: by 2002:a63:955a:: with SMTP id t26mr1626123pgn.449.1544768864109; Thu, 13 Dec 2018 22:27:44 -0800 (PST) X-Google-Smtp-Source: AFSGD/XoCGRcvBKD4AAwRK+hkOfUxdmhW2vjykygBuFaHHZxqSDGJ8iowqjyHHKvSBve+Fzg1ONl X-Received: by 2002:a63:955a:: with SMTP id t26mr1626085pgn.449.1544768862956; Thu, 13 Dec 2018 22:27:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768862; cv=none; d=google.com; s=arc-20160816; b=RXspu0CQvcxfzUbLImlQUEKzM7/rWWQ4LQNeToRrJvBbz6qaX2eCa6FT+ijI7mmuj3 Una60re48W1F06eRz4DAGamp2fwxRycfyBqEsTvhgOT9Ti6J8Hi3f/3s/BKfIVtmV6Qg 4DEw50Es1wNGwBjQHcS3m/hP+bZ8iOEphXaY6S8jqiySZBshm1ajQcNFwLhilLgn95oP a5vdmM/j+XnkBeV9hyzmWGZL7ij4MQlloU+/n8bP7rYRS+PzwINWkCrge838IsPM4Dl6 uP1H0XGn+iE1M+Mj0rSL1x0yIlECPUdzMkyCxiuU/hDQS1gnWgh8dlaxQs49LDroz2YG aRAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rcmn20os+/K4CyfpYFSW3FtZWFvbwNieX5oLi+pnIY4=; b=hoa46zIqLMutg3jLhDrII73qxD7SdIZbLa9wA+mT+nCiBYjSush4cHGaB6X/kRq8fR cyzRuw7QOfKv8jQPrTFsHvwzPR8BCni+dK5K/rSzJTuGfcj2jLMFkzA7k0zmarMXv+qV 3/qQzT02e3fqh7P7jFxYFEb0/YmtTXY2uPXqCVVXLrXlogiGk0OMdeIcyUTmGh/XhGdR J/x5nPzE7zpt3VRCSR55GRlCJOcHx2A5lFa38iAjFnL4BD+52yKPHSBfgHw+GYOX99bf PUZcWa9sCR8mKpCQ8tsGKiK9goxvZ4u0z3MynbdYMyisT/DcsWeQEEx6DmcfTDXkWSoo +BWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id m28si2919232pgn.273.2018.12.13.22.27.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:27:42 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:27:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840795" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:40 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Vineeth Remanan Pillai , Kelley Nielsen , Rik van Riel , Matthew Wilcox , Hugh Dickins Subject: [PATCH -V9 01/21] swap: Deal with PTE mapped THP when unuse PTE Date: Fri, 14 Dec 2018 14:27:34 +0800 Message-Id: <20181214062754.13723-2-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A PTE swap entry may map to a normal swap slot inside a huge swap cluster. To free the huge swap cluster and the corresponding THP (transparent huge page), all PTE swap entry mappings need to be unmapped. The original implementation only checks current PTE swap entry mapping, this is fixed via calling try_to_free_swap() instead, which will check all PTE swap mappings inside the huge swap cluster. This fix could be folded into the patch: mm, swap: rid swapoff of quadratic complexity in -mm patchset. Signed-off-by: "Huang, Ying" Cc: Vineeth Remanan Pillai Cc: Kelley Nielsen Cc: Rik van Riel Cc: Matthew Wilcox Cc: Hugh Dickins --- mm/swapfile.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 7464d0a92869..9e6da494781f 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1921,10 +1921,8 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, goto out; } - if (PageSwapCache(page) && (swap_count(*swap_map) == 0)) - delete_from_swap_cache(compound_head(page)); + try_to_free_swap(page); - SetPageDirty(page); unlock_page(page); put_page(page); From patchwork Fri Dec 14 06:27:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730611 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DFAEE91E for ; Fri, 14 Dec 2018 06:27:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF1322CC8C for ; Fri, 14 Dec 2018 06:27:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C336B2CC99; Fri, 14 Dec 2018 06:27:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B76FE2CC8C for ; Fri, 14 Dec 2018 06:27:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9761D8E01AC; Fri, 14 Dec 2018 01:27:47 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 926EB8E0014; Fri, 14 Dec 2018 01:27:47 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 819C38E01AC; Fri, 14 Dec 2018 01:27:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 3B2B18E0014 for ; Fri, 14 Dec 2018 01:27:47 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id q62so3149012pgq.9 for ; Thu, 13 Dec 2018 22:27:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=hnXDFrScy6S2HqHX+4l9J6aKrwjFW0SaveFEp+iDji8=; b=X3ZeO/RX5AJkUTkO590MWKbO8/ImL0gm4H/Glv7tWTLcWbxQHuGQVCJMkX8UWTSBa5 JQt1tpnQadgIBf4krf7aP0SBxlp+3QG86opaC3d+kAkx+5+GmgOJyJI5VXs8U0jFBp8W lobbjJayiC/+jqJ57fLO/V4j4g/g8Uz1GFugLo+vVjirCm4jwJk3pGwQ7Xfz0j8/c6Cv Hz3o59cg2kJb6I59kcPY7zL0YADkAT20dBlh7BgLSOCy0kKkAL4vWIxGs+uFJ59QS0Zs EkZ+WlGtcbBWczTN2gX6q93moNRFX9Y1tdNOLBhIQh8qVacJbH9iCEiV2dbT9c93t9g5 nGzg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZOlaKj41+7YeyEtLuAgsRE/if+z14irqM8eWauuFE7bjYa+8n3 yYO/VN9zEr/tusXb6C6TPxCmf07UCrWtUeJYQGbmpNqXy9N1snT+QwLpgpaEhK8QKwUSQ5vL+fL RgD8KGnkx92B6D9CqcI2sWtMAPnM7ACBMuNoPL49i37cxNYJppaBXiVpupNkPr+FrWA== X-Received: by 2002:a63:1444:: with SMTP id 4mr1667659pgu.430.1544768866893; Thu, 13 Dec 2018 22:27:46 -0800 (PST) X-Google-Smtp-Source: AFSGD/V08OSb9Jtgxv2plNULRBzIgSM3xUQwGe7gPWAiNMs5trNoecUo6it/6oaoJo3DnbfhvEPY X-Received: by 2002:a63:1444:: with SMTP id 4mr1667640pgu.430.1544768866142; Thu, 13 Dec 2018 22:27:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768866; cv=none; d=google.com; s=arc-20160816; b=D7gNGGmSTe8BbVz0NT2QRvqhttjNQoIPvQdDkoESJLPiIz3TQ4eTnEEFMZ88n86VUc aBMaHf2IKDLp5KPndXq7tG3CWzjDb2QgUD+TDHEXN4VpxI7NAQoezaZpB3iuF/wOGiYW a3Yp3/zOuVxCxKa/66UBTKgOKMz/iTEpH7n4v0GQMBcsxeTFVSeHe44Wd3JO144DL+RZ cyWf77qhkFvYQQWAIOu+jaLHrhj2a/L+YnDH5/jFNnbEJ90n5N0u5/ggrRGzBykALbcT AouQKYKcX/qcpdFlXq+vKGgJUzXDf/Lp6ZOGBtvGMthw+aNhS387n9o4vo+jzPRZ8w+M ivGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=hnXDFrScy6S2HqHX+4l9J6aKrwjFW0SaveFEp+iDji8=; b=BVqulVNU9LsO4zti/Xdi3m1pu4UOdzNulG/yrfusdizTbZq+dFIs47g1g8xWvs8ysH dq67Bs2HvXAVAsOM8+jAxnWeSat+KTlrOaW8bRIHE/pi+Btk/sIYdvp8I7prOkEIo7f1 VzcaEHYtwPqUC7vqt4C6VOhhgp/OWJm16kR7SDl8gs8qyrVQE9jxn/DVaCFGSQImFd1d ueQFWFlmKZNDpapE0uaGL7uE9snMJwIuVlC+tFMWL1ElwBCMUeLV5SWskAEl5cL4bQMo odxbDNR68iPE7Mjc6EXB5Yw97v6xPtcoHPXcNnTXtSPFm3UknIg7s7vW1VEWEkJOBZZq Wthg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id m28si2919232pgn.273.2018.12.13.22.27.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:27:46 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:27:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840808" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:42 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 02/21] swap: Enable PMD swap operations for CONFIG_THP_SWAP Date: Fri, 14 Dec 2018 14:27:35 +0800 Message-Id: <20181214062754.13723-3-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Currently, "the swap entry" in the page tables is used for a number of things outside of actual swap, like page migration, etc. We support the THP/PMD "swap entry" for page migration currently and the functions behind this are tied to page migration's config option (CONFIG_ARCH_ENABLE_THP_MIGRATION). But, we also need them for THP swap optimization. So a new config option (CONFIG_HAVE_PMD_SWAP_ENTRY) is added. It is enabled when either CONFIG_ARCH_ENABLE_THP_MIGRATION or CONFIG_THP_SWAP is enabled. And PMD swap entry functions are tied to this new config option instead. Some functions enabled by CONFIG_ARCH_ENABLE_THP_MIGRATION are for page migration only, they are still enabled only for that. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/x86/include/asm/pgtable.h | 2 +- include/asm-generic/pgtable.h | 2 +- include/linux/swapops.h | 44 ++++++++++++++++++---------------- mm/Kconfig | 8 +++++++ 4 files changed, 33 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 40616e805292..e830ab345551 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1333,7 +1333,7 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index e0381a4ce7d4..2a619f378297 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -675,7 +675,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, #endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY -#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION +#ifndef CONFIG_HAVE_PMD_SWAP_ENTRY static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) { return pmd; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 4d961668e5fc..905ddc65caa3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -254,17 +254,7 @@ static inline int is_write_migration_entry(swp_entry_t entry) #endif -struct page_vma_mapped_walk; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page); - -extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, - struct page *new); - -extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); - +#ifdef CONFIG_HAVE_PMD_SWAP_ENTRY static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; @@ -282,6 +272,28 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } +#else +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) +{ + return swp_entry(0, 0); +} + +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) +{ + return __pmd(0); +} +#endif + +struct page_vma_mapped_walk; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page); + +extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, + struct page *new); + +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline int is_pmd_migration_entry(pmd_t pmd) { @@ -302,16 +314,6 @@ static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } -static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) -{ - return swp_entry(0, 0); -} - -static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) -{ - return __pmd(0); -} - static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; diff --git a/mm/Kconfig b/mm/Kconfig index 25c71eb8a7db..d7c5299c5b7d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -422,6 +422,14 @@ config THP_SWAP For selection by architectures with reasonable THP sizes. +# +# "PMD swap entry" in the page table is used both for migration and +# actual swap. +# +config HAVE_PMD_SWAP_ENTRY + def_bool y + depends on THP_SWAP || ARCH_ENABLE_THP_MIGRATION + config TRANSPARENT_HUGE_PAGECACHE def_bool y depends on TRANSPARENT_HUGEPAGE From patchwork Fri Dec 14 06:27:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730615 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BCC714E2 for ; Fri, 14 Dec 2018 06:28:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2813F2CC8C for ; Fri, 14 Dec 2018 06:28:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1952C2CD6E; Fri, 14 Dec 2018 06:28:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F1D42CC8C for ; Fri, 14 Dec 2018 06:28:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF6F68E01AF; Fri, 14 Dec 2018 01:28:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EA5D18E01AD; Fri, 14 Dec 2018 01:28:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D93668E01AF; Fri, 14 Dec 2018 01:28:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 7DAFF8E01AD for ; Fri, 14 Dec 2018 01:28:00 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id t26so3119962pgu.18 for ; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=a5CT5zTegCK4qFUTCt67EEoHlKF80wrSAkDRxIM4p/M=; b=eOOzQzRHiVTMn6dd2fUjdHabqXNfbqN7Iw0is9r2/zC7ebD9qG867VvIztaNOaUrxk m67t3sPGwOWrPHJb9EsifD2XnhW3Z/pq1oRW6TOgtkAAJoHOCUIeqFGsl+HTJXJ9IBXV kX3ztwAz/u0qIRiTZ1MrewTF7xSJFENXEYHjF/c8k6kjW8uoFR68zKoy6ughXVkiVWke uXzo/wh1LKpYJtQb7+/PGxDkXuIfBwtocUbBapWIWglbqQYAXyUjBl5wfetfKIQVh5ch zcXbEMrCU9ENEABxAcYNilMrCF+12XHnUMD/DFSadJBl3wwKbx57KxWoYBzCFHfYUNBR XVRA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWav4ktAix3G+eJiwFvwXxRyxRcbgBU//U2yRFRgl4HYMs1PLzeG rABg7aXSZCauP32uwGcKvxkx0GQVOF3458fVxiISIihGEgEW5YDEAmRzta1RjQZIWjCXE1lDbjL xl4i+sONhmRgzSR5v42DIXEz4UaZlE3kfTM21QE6Utf3sMq9aOn+0GZSO/80SAIeShg== X-Received: by 2002:a63:801:: with SMTP id 1mr1605474pgi.275.1544768880158; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/VOBCTlPA3NeM6+QhxX9hxhe1sFvGblOgoOqcdvbdq+br+rE/CLNx+R2JgzS8Aheh2AW8af X-Received: by 2002:a63:801:: with SMTP id 1mr1605429pgi.275.1544768878892; Thu, 13 Dec 2018 22:27:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768878; cv=none; d=google.com; s=arc-20160816; b=zW9YXyQZTa0FXC49D3ft6Svv2QBU0dH8MVBIKWLI8LWOFvx1CTMfMbKNqDCyWWMlV6 AFulUaoRetpJIhaB4zvGggZvkWphIDdUrmQPkm8oftclnb0YY/+9M9mxRH7LeiYImFsE lpjhOApcC2/ycw23Qnu28Y3KuHeCbDbYXhi5KEbOdnn6UJh3BQFjve9/Cd96FbBfAya3 MNrmA2rqFKDc8GDNcJ7kNVJu1+lj1zqnTlh4ndPIrzg7oRCWboy1JAqyarzOBuonPDmn KnmO2lA5Dz8Z0Wl7y+DmpAEk1T8qQb9q1e3igtocOa4gZ7qx7yUB0NkblPS7I/YxyJhd ZMkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=a5CT5zTegCK4qFUTCt67EEoHlKF80wrSAkDRxIM4p/M=; b=B+oM4qQ1wwyqqZOBFqJB00t8RlfGwsbxIl+K3zLC5zGNF0tVTeClzVWOEKc/enskqO hdSdP1bUs5uWOrDdus1JMHiM0SiJvg3I4VhQDBzWwPZ2puQkKr/Anc8ScM4FfoWuy7Eb n615OtshfwlArC6X/dvL7PEGgKGwMHJ+tqtRo7uGI9VDK4FWGB470l7sD19CFIuxqq8q nbta4mSHtBWAUDPxUE3JLoPOB1Wc5hxeABP+bq8tNB28IsrkPNPdKMCremfbHk5k7JkR 2C2Gm6m/iocz34WARp1mQd1v77OjWC1x6TcJ0O6FeMbf/p6qd66qGmRGafF/5+izFUBM 5A/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.27.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:27:58 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:27:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840823" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:45 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 03/21] swap: Add __swap_duplicate_locked() Date: Fri, 14 Dec 2018 14:27:36 +0800 Message-Id: <20181214062754.13723-4-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The part of __swap_duplicate() with lock held is separated into a new function __swap_duplicate_locked(). Because we will add more logic about the PMD swap mapping into __swap_duplicate() and keep the most PTE swap mapping related logic in __swap_duplicate_locked(). Just mechanical code refactoring, there is no any functional change in this patch. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 63 ++++++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 9e6da494781f..5adc0787343f 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3343,32 +3343,12 @@ void si_swapinfo(struct sysinfo *val) spin_unlock(&swap_lock); } -/* - * Verify that a swap entry is valid and increment its swap map count. - * - * Returns error code in following case. - * - success -> 0 - * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL - * - swap-cache reference is requested but there is already one. -> EEXIST - * - swap-cache reference is requested but the entry is not used. -> ENOENT - * - swap-mapped reference requested but needs continued swap count. -> ENOMEM - */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_locked(struct swap_info_struct *p, + unsigned long offset, unsigned char usage) { - struct swap_info_struct *p; - struct swap_cluster_info *ci; - unsigned long offset; unsigned char count; unsigned char has_cache; - int err = -EINVAL; - - p = get_swap_device(entry); - if (!p) - goto out; - - offset = swp_offset(entry); - ci = lock_cluster_or_swap_info(p, offset); + int err = 0; count = p->swap_map[offset]; @@ -3378,12 +3358,11 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { err = -ENOENT; - goto unlock_out; + goto out; } has_cache = count & SWAP_HAS_CACHE; count &= ~SWAP_HAS_CACHE; - err = 0; if (usage == SWAP_HAS_CACHE) { @@ -3410,11 +3389,39 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) p->swap_map[offset] = count | has_cache; -unlock_out: +out: + return err; +} + +/* + * Verify that a swap entry is valid and increment its swap map count. + * + * Returns error code in following case. + * - success -> 0 + * - swp_entry is invalid -> EINVAL + * - swp_entry is migration entry -> EINVAL + * - swap-cache reference is requested but there is already one. -> EEXIST + * - swap-cache reference is requested but the entry is not used. -> ENOENT + * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + */ +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + struct swap_info_struct *p; + struct swap_cluster_info *ci; + unsigned long offset; + int err = -EINVAL; + + p = get_swap_device(entry); + if (!p) + goto out; + + offset = swp_offset(entry); + ci = lock_cluster_or_swap_info(p, offset); + err = __swap_duplicate_locked(p, offset, usage); unlock_cluster_or_swap_info(p, ci); + + put_swap_device(p); out: - if (p) - put_swap_device(p); return err; } From patchwork Fri Dec 14 06:27:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730619 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5DB691E for ; Fri, 14 Dec 2018 06:28:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A43C62CC97 for ; Fri, 14 Dec 2018 06:28:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 986462CD6E; Fri, 14 Dec 2018 06:28:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 867B22CC97 for ; Fri, 14 Dec 2018 06:28:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EC218E01AD; Fri, 14 Dec 2018 01:28:01 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9C9418E0014; Fri, 14 Dec 2018 01:28:01 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CDD38E01B1; Fri, 14 Dec 2018 01:28:01 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 1D5FF8E01AD for ; Fri, 14 Dec 2018 01:28:01 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id s14so3544394pfk.16 for ; Thu, 13 Dec 2018 22:28:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=xRpBkm7nQXKK0x5DAu99HsabftHkAtZ1sNWZHe26lrU=; b=LURlnyNRefbOaKFR6QkOP5dz++QaeJlsgZVQL2xnLocP64bCFvoMLDoYOCCQNRXBiH mKXnFhiu8FfTGuK/bE7dYewI48qF9xtnsKmSK3tb1M3ES+p5LIHMhhy85A2WnVH+6E9Q YZj+Rgs11AUIl9TGHTw+zbCSjrAspPbFn4zXmWvUduVK5hx+TqDnIXYIP65ZKBb5zV8N /lWuaJPpNQN7FaU/DpUlMD5yBJU41IeYBwco/AZjhg2A8AUC9rJYaBgIwf0N92gkiprp kqTKKlQFdt+T6wZxNeCSp8roF1GATOEGieEwz1CBfjeOSKkBeadn/9JZvfyFfkK49vjy 2yKw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWb6a/Q792FbTmjLCmg+Spw4cubfcMLR67tGLx6pXfqukCMv8CQe hHEqGJ5+82InItXciaJYp3nJS7ZVm73RLDT6wUqDDOA15HT0ulIAs4XeFxXQoNWQe4kf+yaUE0U R0eujIYx85UtxGl72W8AR5dKj8qSd99PnvlkXqqXDVnc26shy54lzSc43IsYqbNLCFw== X-Received: by 2002:a17:902:aa8c:: with SMTP id d12mr1778007plr.25.1544768880740; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/VOj7hEskoapSq3cAyRjcVpZqmzCA33Gf8cUEuElMMpyOUxWXkH+OXtZBQbHtq6Gv9VUKAK X-Received: by 2002:a17:902:aa8c:: with SMTP id d12mr1777945plr.25.1544768879097; Thu, 13 Dec 2018 22:27:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768879; cv=none; d=google.com; s=arc-20160816; b=aFrEDRmW+7u1JPu6HKxr0zF3xV/xpuhWVerMF8cQO5zGwCi83glU1EuZTrhDcTqQvh J8XGspWfRhz+16Mvlo2VV/qcTdy7PjlXRV9myyVClE8dTuW6Y7MJyqpuYRN0XXtYIOqX AKgVx/lN/CNe+7HYHyL6oFB9WSc7soAWpSL0GTwsnclk3dt7CSbCQdpZtvpQ3JxrnR71 RBBaz1yWvRWMryFqoVLVAqvCZpS6xVwEt6i2EY6/w421YXxFK8/xDvnErK5yGZ9+xy3j MK9j+k7UUgIcnWAo4W5xzN6VlTXPfPCTEvjcc4529+3kk+b6eMHMgSneuFvknYuXAL2L TF1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=xRpBkm7nQXKK0x5DAu99HsabftHkAtZ1sNWZHe26lrU=; b=okQN/xi4F1qKqIaQq56h5I9CQj9A1t4VSck0ch/Vp9ORbNzHq01EwyrDBTES87GGCN gzg8o60Vc47Nnnytp96dPoKuDl80GvXDncRq0n/yrnzrOgXdxL3FPkGbuU3b29wmTabS VSW0rtv7PdylFTNnRiXniwx5LRGbV3SeCyFTfAjKGIi4Ll4oJ9sDvhMZZy3XyVbhSnj1 eaEEOG6G5oIu4ycplKJsZHHGb4rM6GEG/cdOOSei3fJ9eYqbbA2MrAAgi+xwczSIrGNi XE20EuCgz8jpYl0bvXWRJML4llnXfCPQX/YUQ9PU24L1qh+RL42LKR6g5YDTpZPIT71k FUeg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.27.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:27:59 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:27:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840836" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:49 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 04/21] swap: Support PMD swap mapping in swap_duplicate() Date: Fri, 14 Dec 2018 14:27:37 +0800 Message-Id: <20181214062754.13723-5-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To support to swapin the THP in one piece, we need to create PMD swap mapping during swapout, and maintain PMD swap mapping count. This patch implements the support to increase the PMD swap mapping count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for swapin, etc.) for a huge swap cluster in swap_duplicate() function family. Although it only implements a part of the design of the swap reference count with PMD swap mapping, the whole design is described as follow to make it easy to understand the patch and the whole picture. A huge swap cluster is used to hold the contents of a swapouted THP. After swapout, a PMD page mapping to the THP will become a PMD swap mapping to the huge swap cluster via a swap entry in PMD. While a PTE page mapping to a subpage of the THP will become the PTE swap mapping to a swap slot in the huge swap cluster via a swap entry in PTE. If there is no PMD swap mapping and the corresponding THP is removed from the page cache (reclaimed), the huge swap cluster will be split and become a normal swap cluster. The count (cluster_count()) of the huge swap cluster is SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because all swap slots in the huge swap cluster are mapped by PTE or PMD, or has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make it easy to determine whether there are remaining PMD swap mappings. The count in swap_map[offset] is the sum of PTE and PMD swap mapping count. This means when we increase the PMD swap mapping count, we need to increase swap_map[offset] for all swap slots inside the swap cluster. An alternative choice is to make swap_map[offset] to record PTE swap map count only, given we have recorded PMD swap mapping count in the count of the huge swap cluster. But this need to increase swap_map[offset] when splitting the PMD swap mapping, that may fail because of memory allocation for swap count continuation. That is hard to dealt with. So we choose current solution. The PMD swap mapping to a huge swap cluster may be split when unmap a part of PMD mapping etc. That is easy because only the count of the huge swap cluster need to be changed. When the last PMD swap mapping is gone and SWAP_HAS_CACHE is unset, we will split the huge swap cluster (clear the huge flag). This makes it easy to reason the cluster state. A huge swap cluster will be split when splitting the THP in swap cache, or failing to allocate THP during swapin, etc. But when splitting the huge swap cluster, we will not try to split all PMD swap mappings, because we haven't enough information available for that sometimes. Later, when the PMD swap mapping is duplicated or swapin, etc, the PMD swap mapping will be split and fallback to the PTE operation. When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be set in the swap_map[offset] of all swap slots inside the huge swap cluster backing the THP. This huge swap cluster will not be split unless the THP is split even if its PMD swap mapping count dropped to 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE flag will be cleared in the swap_map[offset] of all swap slots inside the huge swap cluster. And this huge swap cluster will be split if its PMD swap mapping count is 0. The first parameter of swap_duplicate() is changed to return the swap entry to call add_swap_count_continuation() for. Because we may need to call it for a swap entry in the middle of a huge swap cluster. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 9 ++-- mm/memory.c | 2 +- mm/rmap.c | 2 +- mm/swap_state.c | 2 +- mm/swapfile.c | 109 ++++++++++++++++++++++++++++++++++++------- 5 files changed, 99 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 928550bd28f3..70a6ede1e7e0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); -extern int swap_duplicate(swp_entry_t); -extern int swapcache_prepare(swp_entry_t); +extern int swap_duplicate(swp_entry_t *entry, int entry_size); +extern int swapcache_prepare(swp_entry_t entry, int entry_size); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); @@ -510,7 +510,8 @@ static inline void show_swap_cache_info(void) } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) -#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define swapcache_prepare(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { @@ -521,7 +522,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp) { } -static inline int swap_duplicate(swp_entry_t swp) +static inline int swap_duplicate(swp_entry_t *swp, int entry_size) { return 0; } diff --git a/mm/memory.c b/mm/memory.c index 532061217e03..5efb9259d47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -709,7 +709,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, swp_entry_t entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { - if (swap_duplicate(entry) < 0) + if (swap_duplicate(&entry, 1) < 0) return entry.val; /* make sure dst_mm is on swapoff's mmlist. */ diff --git a/mm/rmap.c b/mm/rmap.c index 896c61dbf16c..e9b07016f587 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1609,7 +1609,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, break; } - if (swap_duplicate(entry) < 0) { + if (swap_duplicate(&entry, 1) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/swap_state.c b/mm/swap_state.c index 5a1cc9387151..97831166994a 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -402,7 +402,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry); + err = swapcache_prepare(entry, 1); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble diff --git a/mm/swapfile.c b/mm/swapfile.c index 5adc0787343f..bd8756ac3bcc 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -534,6 +534,40 @@ static void dec_cluster_info_page(struct swap_info_struct *p, free_cluster(p, idx); } +/* + * When swapout a THP in one piece, PMD page mappings to THP are + * replaced by PMD swap mappings to the corresponding swap cluster. + * cluster_swapcount() returns the PMD swap mapping count. + * + * cluster_count() = PMD swap mapping count + count of allocated swap + * entries in cluster. If a cluster is mapped by PMD, all swap + * entries inside is used, so here cluster_count() = PMD swap mapping + * count + SWAPFILE_CLUSTER. + */ +static inline int cluster_swapcount(struct swap_cluster_info *ci) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + return cluster_count(ci) - SWAPFILE_CLUSTER; +} + +/* + * Set PMD swap mapping count for the huge cluster + */ +static inline void cluster_set_swapcount(struct swap_cluster_info *ci, + unsigned int count) +{ + VM_BUG_ON(!cluster_is_huge(ci) || cluster_count(ci) < SWAPFILE_CLUSTER); + cluster_set_count(ci, SWAPFILE_CLUSTER + count); +} + +static inline void cluster_add_swapcount(struct swap_cluster_info *ci, int add) +{ + int count = cluster_swapcount(ci) + add; + + VM_BUG_ON(count < 0); + cluster_set_swapcount(ci, count); +} + /* * It's possible scan_swap_map() uses a free cluster in the middle of free * cluster list. Avoiding such abuse to avoid list corruption. @@ -3394,35 +3428,66 @@ static int __swap_duplicate_locked(struct swap_info_struct *p, } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that the swap entries from *entry is valid and increment their + * PMD/PTE swap mapping count. * * Returns error code in following case. * - success -> 0 * - swp_entry is invalid -> EINVAL - * - swp_entry is migration entry -> EINVAL * - swap-cache reference is requested but there is already one. -> EEXIST * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM + * - the huge swap cluster has been split. -> ENOTDIR */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate(swp_entry_t *entry, int entry_size, + unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; int err = -EINVAL; + int i, size = swap_entry_size(entry_size); - p = get_swap_device(entry); + p = get_swap_device(*entry); if (!p) goto out; - offset = swp_offset(entry); + offset = swp_offset(*entry); ci = lock_cluster_or_swap_info(p, offset); - err = __swap_duplicate_locked(p, offset, usage); + if (size == SWAPFILE_CLUSTER) { + /* + * The huge swap cluster has been split, for example, failed to + * allocate huge page during swapin, the caller should split + * the PMD swap mapping and operate on normal swap entries. + */ + if (!cluster_is_huge(ci)) { + err = -ENOTDIR; + goto unlock; + } + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* If cluster is huge, all swap entries inside is in-use */ + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + } + /* p->swap_map[] = PMD swap map count + PTE swap map count */ + for (i = 0; i < size; i++) { + err = __swap_duplicate_locked(p, offset + i, usage); + if (err && size != 1) { + *entry = swp_entry(p->type, offset + i); + goto undup; + } + } + if (size == SWAPFILE_CLUSTER && usage == 1) + cluster_add_swapcount(ci, usage); +unlock: unlock_cluster_or_swap_info(p, ci); put_swap_device(p); out: return err; +undup: + for (i--; i >= 0; i--) + __swap_entry_free_locked(p, offset + i, usage); + goto unlock; } /* @@ -3431,36 +3496,44 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) */ void swap_shmem_alloc(swp_entry_t entry) { - __swap_duplicate(entry, SWAP_MAP_SHMEM); + __swap_duplicate(&entry, 1, SWAP_MAP_SHMEM); } /* * Increase reference count of swap entry by 1. - * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required - * but could not be atomically allocated. Returns 0, just as if it succeeded, - * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which - * might occur if a page table entry has got corrupted. + * + * Return error code in following case. + * - success -> 0 + * - swap_count_continuation is required but could not be atomically allocated. + * *entry is used to return swap entry to call add_swap_count_continuation(). + * -> ENOMEM + * - otherwise same as __swap_duplicate() */ -int swap_duplicate(swp_entry_t entry) +int swap_duplicate(swp_entry_t *entry, int entry_size) { int err = 0; - while (!err && __swap_duplicate(entry, 1) == -ENOMEM) - err = add_swap_count_continuation(entry, GFP_ATOMIC); + while (!err && + (err = __swap_duplicate(entry, entry_size, 1)) == -ENOMEM) + err = add_swap_count_continuation(*entry, GFP_ATOMIC); + /* If kernel works correctly, other errno is impossible */ + VM_BUG_ON(err && err != -ENOMEM && err != -ENOTDIR); return err; } /* * @entry: swap entry for which we allocate swap cache. + * @entry_size: size of the swap entry, 1 or SWAPFILE_CLUSTER * * Called when allocating swap cache for existing swap entry, * This can return error codes. Returns 0 at success. - * -EBUSY means there is a swap cache. - * Note: return code is different from swap_duplicate(). + * -EINVAL means the swap device has been swapoff. + * -EEXIST means there is a swap cache. + * Otherwise same as __swap_duplicate() */ -int swapcache_prepare(swp_entry_t entry) +int swapcache_prepare(swp_entry_t entry, int entry_size) { - return __swap_duplicate(entry, SWAP_HAS_CACHE); + return __swap_duplicate(&entry, entry_size, SWAP_HAS_CACHE); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Fri Dec 14 06:27:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730613 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A920791E for ; Fri, 14 Dec 2018 06:28:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98DAD2CC8C for ; Fri, 14 Dec 2018 06:28:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8A0692CC99; Fri, 14 Dec 2018 06:28:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F11D2CC8C for ; Fri, 14 Dec 2018 06:28:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFEC28E01AE; Fri, 14 Dec 2018 01:28:00 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AD3468E0014; Fri, 14 Dec 2018 01:28:00 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C39E8E01AE; Fri, 14 Dec 2018 01:28:00 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 58D7A8E0014 for ; Fri, 14 Dec 2018 01:28:00 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id s27so3162424pgm.4 for ; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+YFKEG//rlf0/GQhw6vBmeocw65fCmDEJxyZLgC78fk=; b=UUouF3CSWidJuk2B6mSSENWkCGh9FE50WDfks4xbQow8GN0Lhjj+GN13EM6oxJgVoh KF3SZS0HdmG/p+VyEN7Sok3mNuvnJ94xCCtI3TkTO1u3dNI7okts/6JvjkAEo1nJYDm6 f0xaT5TYwW0PdpwVFxWrcqURrynKA7qYSkwD4ZehcUjsW82W0e55y182mUgWzCIsv5Lv 6/KcLD4sOLygTF4mQNHpXRtSBE3UBOvRZSiw470LTOq2btnsjVZGkiSYimBGrcPdEp4j bqcN+Mt8yoav2sxInPjIzEx0F0hxSNzqNfkGCJC3RCW4UzLoPwC6iTjm4LZZFPwRaqyh SQBA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYfUUcmz566JtrV0uMjRZ88kSX07L+p+VIiFTK2hM/OdxKEdqHN oZ35Eqz67DGcxW3qf4lXh1n1e1ddmJOs5q0GGMRaMN27xYywsDzJoKX/TDQgBiqL3/j8QIcsh5m WeNvqRd1QILGvSi96eMbN/Af0jaIaHTnmR+dFu1UUyzaGehiQml9wxGJPx+b6x/e+Tw== X-Received: by 2002:a17:902:3124:: with SMTP id w33mr1752467plb.241.1544768880031; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/WmuL3c1VyXk7SXlJE6VTQg0giZIlTSziXWAlsSIhUCZjE5QQmctSoIoYOKXT8cffGxMQ4X X-Received: by 2002:a17:902:3124:: with SMTP id w33mr1752432plb.241.1544768879342; Thu, 13 Dec 2018 22:27:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768879; cv=none; d=google.com; s=arc-20160816; b=IdvHEz0XEJRkKUnUBoDEw3QNi4ontCbF9Xey+jgGfLngk597VDab07DfhqjjFP9O+u mGLQgqexs+r8JvdAgGE3cRdHklQ5oWGac/LprM53TZ0YR+flrgMjv86peOlDj0znagWG sVJmdLYR57amq1j1QjtLD9oA0tRHNbOsXvpIrbRRv/rDlBPaZUaMXtuXbvDQ/BXlIqTG 6YMozX/Yeu79ZljM3tLM7r+/rCbkGnAYTRPQmYnQmhY080JvCowzbeE8s+ukdvGZQ8DI 1vTXc9bD3wiK0lxlvQYf1LDFo7tVr8TDwd48u3mMzUOdhyVxGJMZT6yf2wiHugE5WbDx ihsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+YFKEG//rlf0/GQhw6vBmeocw65fCmDEJxyZLgC78fk=; b=v7CWOXtqZWjWvILaG7ns2E2tlZqiJVjq4910AxkeEXTzVmCmu+sRChffOUv7s2uz0L ajF9JGRGSRUb2fItfFWu9GbytorRzl95QBtZxvQwA811nRhmt6MoMBLJkLJOpgcI3ImP UoDbqpn05t1d5VvhVrLl3PueEJMipbQV3hLLYtwG3X6UjDGgheSHo2/aSxRXDKud+XN5 1rjLIjJYWks0HUmGzdAyTyzltDw1z9dTUYIhITQy/sS8vGarN+mek8mRW4bmCdOuzcr5 oPxVTP1kBuTtLuvG8UI0XXHbK0jMHkkvi1+5Wz9iAzvEGEPF+J900byqGqCn9+m+w8yq +KGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.27.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:27:59 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:27:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840848" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:52 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 05/21] swap: Support PMD swap mapping in put_swap_page() Date: Fri, 14 Dec 2018 14:27:38 +0800 Message-Id: <20181214062754.13723-6-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously, during swapout, all PMD page mapping will be split and replaced with PTE swap mapping. And when clearing the SWAP_HAS_CACHE flag for the huge swap cluster in put_swap_page(), the huge swap cluster will be split. Now, during swapout, the PMD page mappings to the THP will be changed to PMD swap mappings to the corresponding swap cluster. So when clearing the SWAP_HAS_CACHE flag, the huge swap cluster will only be split if the PMD swap mapping count is 0. Otherwise, we will keep it as the huge swap cluster. So that we can swapin a THP in one piece later. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/swapfile.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index bd8756ac3bcc..04cf6b95cae0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1314,6 +1314,15 @@ void swap_free(swp_entry_t entry) /* * Called after dropping swapcache to decrease refcnt to swap entries. + * + * When a THP is added into swap cache, the SWAP_HAS_CACHE flag will + * be set in the swap_map[] of all swap entries in the huge swap + * cluster backing the THP. This huge swap cluster will not be split + * unless the THP is split even if its PMD swap mapping count dropped + * to 0. Later, when the THP is removed from swap cache, the + * SWAP_HAS_CACHE flag will be cleared in the swap_map[] of all swap + * entries in the huge swap cluster. And this huge swap cluster will + * be split if its PMD swap mapping count is 0. */ void put_swap_page(struct page *page, swp_entry_t entry) { @@ -1332,15 +1341,23 @@ void put_swap_page(struct page *page, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); if (size == SWAPFILE_CLUSTER) { - VM_BUG_ON(!cluster_is_huge(ci)); + VM_BUG_ON(!IS_ALIGNED(offset, size)); map = si->swap_map + offset; - for (i = 0; i < SWAPFILE_CLUSTER; i++) { - val = map[i]; - VM_BUG_ON(!(val & SWAP_HAS_CACHE)); - if (val == SWAP_HAS_CACHE) - free_entries++; + /* + * No PMD swap mapping, the swap cluster will be freed + * if all swap entries becoming free, otherwise the + * huge swap cluster will be split. + */ + if (!cluster_swapcount(ci)) { + for (i = 0; i < SWAPFILE_CLUSTER; i++) { + val = map[i]; + VM_BUG_ON(!(val & SWAP_HAS_CACHE)); + if (val == SWAP_HAS_CACHE) + free_entries++; + } + if (free_entries != SWAPFILE_CLUSTER) + cluster_clear_huge(ci); } - cluster_clear_huge(ci); if (free_entries == SWAPFILE_CLUSTER) { unlock_cluster_or_swap_info(si, ci); spin_lock(&si->lock); From patchwork Fri Dec 14 06:27:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E74D14E2 for ; Fri, 14 Dec 2018 06:28:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BFA32CC8C for ; Fri, 14 Dec 2018 06:28:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0026D2CC99; Fri, 14 Dec 2018 06:28:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D89462CC8C for ; Fri, 14 Dec 2018 06:28:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FE598E01B0; Fri, 14 Dec 2018 01:28:01 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 509C88E0014; Fri, 14 Dec 2018 01:28:01 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 385998E01B0; Fri, 14 Dec 2018 01:28:01 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id D39E58E0014 for ; Fri, 14 Dec 2018 01:28:00 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id x26so3019500pgc.5 for ; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=GfwUscvPdO3s8yU0g2fToFQm2g/IwPl1rxMRancWhPk=; b=YCPE9EDtMsllZDGHD2j7DHel2/aQ8N7xd2vNIJ0atpDcy0VlmFfgsbR/Uv4DI+MO8U HBteHhaRDKwKMGP1A/mrULCEwulpDSFhJ96xMgkc7/PIEJ9uGw4E4KbJMGMys8WJrWg+ sLGr0yx//GK3904kF9Flvwu4qY1KV6H5BX3deDaYFwJUOJy5oVp51JEGQRl8SVMGMzIR wVw+jAsxYM06eWr6Z/asq1igy1MYf7ruTbN72o/pYWCvIFz18uCzq0ShKvug7eiPAhf5 aHMTIKsg+JGcXr/9uFalUG9a/7pAuZXx4Lbg9aVzOP9qRsE8BTD7OOTM/Eee27t004oP TyAA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbA6xhuEMD4th0Xr0NHVN2rWqqHj2S3apPz+ohxYDX5m6L4JM2f +hgwTY8I3+TAi1LzLYx7YbogMngEXhoNtuP/GhhDmaAJH02Qh/4lVx4k9i0T3H9BzmScl1LVRwj hXHka31dmtsGucCJ47N+747tcjTu1FeGYawaYHhpOr+YDUxJjnWow14VbBi4SmAEJbw== X-Received: by 2002:a63:8f45:: with SMTP id r5mr1646528pgn.222.1544768880517; Thu, 13 Dec 2018 22:28:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/VgkM8eIcqAXeUHXakJY7v6wAn0tbavzQKmobLbjaXo65KJKM3gUDD0Q906D5tUp5+pC4/R X-Received: by 2002:a63:8f45:: with SMTP id r5mr1646490pgn.222.1544768879575; Thu, 13 Dec 2018 22:27:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768879; cv=none; d=google.com; s=arc-20160816; b=rINkUXacCFRBO1E5q1zR4Ezish8mdo2UCMG85jEDLLl6NOBCM9KVfU/78VJONBbn3Y z7bsswkWCG6bHc2DXxGZOMmOSBhZFitVooTcq7StqzyFJ184UyDMn7SZME5V7ldyZbFK sNmbRF3Ohn2pSml23mS4olUb/nhJkkOoK8d/LPhxvU6bq2SdFQlxYzN2VRFlNIr8/XOA dpwxllt3FhXy/aqBM/7sFLBl8U/j/e3w+QPxbGdhb0Yc/FGaC3BSTaLQn/n8At0i97m1 SrF6G2yRKBaszm9Jtc5SNw95v1nL4hZJodZvahv7n1nuajJAHa9k5JpMEeqWux8Mn7CL n0qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=GfwUscvPdO3s8yU0g2fToFQm2g/IwPl1rxMRancWhPk=; b=hXCuP/u2JMGaD7lS9FFuU7QtRj4iXftbZLo3xZEdRIgClQ9lZX35nzF3vxQ7TOM5jI tsA4Ts5bUQ0LPO1jXKK3kNAiAxvblW9NLt3Q+bKz69fhrMZ8DknphDBhoj60i6K9jz6C /qoW0+h3rPjZMHCwwxYBBl+3tKiOnaFPz7KkpWgYkN2vEvlRdbwarVhS7FIGE6qypxSK JCHhZa++/1BVDOUx9PHzgE2oxD7FsxDx0l+WqijXvUtSiZ6n2BdOla5OYItfxKgRBrHl S4KUxBu1aS7hrRas0tY3nnSw9k9j528/NQKIfUSxDwRykROjRQu9V7R3N0y2AD/pJr5Q SKjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.27.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:27:59 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:27:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840866" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:55 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 06/21] swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Date: Fri, 14 Dec 2018 14:27:39 +0800 Message-Id: <20181214062754.13723-7-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a PMD swap mapping is removed from a huge swap cluster, for example, unmap a memory range mapped with PMD swap mapping, etc, free_swap_and_cache() will be called to decrease the reference count to the huge swap cluster. free_swap_and_cache() may also free or split the huge swap cluster, and free the corresponding THP in swap cache if necessary. swap_free() is similar, and shares most implementation with free_swap_and_cache(). This patch revises free_swap_and_cache() and swap_free() to implement this. If the swap cluster has been split already, for example, because of failing to allocate a THP during swapin, we just decrease one from the reference count of all swap slots. Otherwise, we will decrease one from the reference count of all swap slots and the PMD swap mapping count in cluster_count(). When the corresponding THP isn't in swap cache, if PMD swap mapping count becomes 0, the huge swap cluster will be split, and if all swap count becomes 0, the huge swap cluster will be freed. When the corresponding THP is in swap cache, if every swap_map[offset] == SWAP_HAS_CACHE, we will try to delete the THP from swap cache. Which will cause the THP and the huge swap cluster be freed. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- arch/s390/mm/pgtable.c | 2 +- include/linux/swap.h | 9 ++- kernel/power/swap.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 4 +- mm/shmem.c | 4 +- mm/swapfile.c | 170 ++++++++++++++++++++++++++++++++--------- 7 files changed, 147 insertions(+), 48 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index f2cc7da473e4..ffd4b68adbb3 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -675,7 +675,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) dec_mm_counter(mm, mm_counter(page)); } - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); } void ptep_zap_unused(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/swap.h b/include/linux/swap.h index 70a6ede1e7e0..24c3014894dd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -453,9 +453,9 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t *entry, int entry_size); extern int swapcache_prepare(swp_entry_t entry, int entry_size); -extern void swap_free(swp_entry_t); +extern void swap_free(swp_entry_t entry, int entry_size); extern void swapcache_free_entries(swp_entry_t *entries, int n); -extern int free_swap_and_cache(swp_entry_t); +extern int free_swap_and_cache(swp_entry_t entry, int entry_size); extern int swap_type_of(dev_t, sector_t, struct block_device **); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); @@ -509,7 +509,8 @@ static inline void show_swap_cache_info(void) { } -#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) +#define free_swap_and_cache(e, s) \ + ({(is_migration_entry(e) || is_device_private_entry(e)); }) #define swapcache_prepare(e, s) \ ({(is_migration_entry(e) || is_device_private_entry(e)); }) @@ -527,7 +528,7 @@ static inline int swap_duplicate(swp_entry_t *swp, int entry_size) return 0; } -static inline void swap_free(swp_entry_t swp) +static inline void swap_free(swp_entry_t swp, int entry_size) { } diff --git a/kernel/power/swap.c b/kernel/power/swap.c index d7f6c1a288d3..0275df84ed3d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -182,7 +182,7 @@ sector_t alloc_swapdev_block(int swap) offset = swp_offset(get_swap_page_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -206,7 +206,7 @@ void free_all_swap_pages(int swap) ext = rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); for (offset = ext->start; offset <= ext->end; offset++) - swap_free(swp_entry(swap, offset)); + swap_free(swp_entry(swap, offset), 1); kfree(ext); } diff --git a/mm/madvise.c b/mm/madvise.c index d220ad7087ed..fac48161b015 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,7 +349,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, if (non_swap_entry(entry)) continue; nr_swap--; - free_swap_and_cache(entry); + free_swap_and_cache(entry, 1); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); continue; } diff --git a/mm/memory.c b/mm/memory.c index 5efb9259d47b..78f341b5672d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1134,7 +1134,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = migration_entry_to_page(entry); rss[mm_counter(page)]--; } - if (unlikely(!free_swap_and_cache(entry))) + if (unlikely(!free_swap_and_cache(entry, 1))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); @@ -2829,7 +2829,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - swap_free(entry); + swap_free(entry, 1); if (mem_cgroup_swap_full(page) || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) try_to_free_swap(page); diff --git a/mm/shmem.c b/mm/shmem.c index a9b7e65f4b2c..d00fe9a23670 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -670,7 +670,7 @@ static int shmem_free_swap(struct address_space *mapping, old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0); if (old != radswap) return -ENOENT; - free_swap_and_cache(radix_to_swp_entry(radswap)); + free_swap_and_cache(radix_to_swp_entry(radswap), 1); return 0; } @@ -1657,7 +1657,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, delete_from_swap_cache(page); set_page_dirty(page); - swap_free(swap); + swap_free(swap, 1); *pagep = page; return 0; diff --git a/mm/swapfile.c b/mm/swapfile.c index 04cf6b95cae0..243131253238 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -49,6 +49,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); static sector_t map_swap_entry(swp_entry_t, struct block_device**); +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset); DEFINE_SPINLOCK(swap_lock); static unsigned int nr_swapfiles; @@ -1267,19 +1270,106 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) +#define SF_FREE_CACHE 0x1 + +static void __swap_free(struct swap_info_struct *p, swp_entry_t entry, + int entry_size, unsigned long flags) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + int i, free_entries = 0, cache_only = 0; + int size = swap_entry_size(entry_size); + unsigned char *map, count; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + VM_BUG_ON(!IS_ALIGNED(offset, size)); + /* + * Normal swap entry or huge swap cluster has been split, free + * each swap entry + */ + if (size == 1 || !cluster_is_huge(ci)) { + for (i = 0; i < size; i++, entry.val++) { + count = __swap_entry_free_locked(p, offset + i, 1); + if (!count || + (flags & SF_FREE_CACHE && + count == SWAP_HAS_CACHE && + !__swap_page_trans_huge_swapped(p, ci, + offset + i))) { + unlock_cluster_or_swap_info(p, ci); + if (!count) + free_swap_slot(entry); + else + __try_to_reclaim_swap(p, offset + i, + TTRS_UNMAPPED | TTRS_FULL); + if (i == size - 1) + return; + lock_cluster_or_swap_info(p, offset); + } + } + unlock_cluster_or_swap_info(p, ci); + return; + } + /* + * Return for normal swap entry above, the following code is + * for huge swap cluster only. + */ + cluster_add_swapcount(ci, -1); + /* + * Decrease mapping count for each swap entry in cluster. + * Because PMD swap mapping is counted in p->swap_map[] too. + */ + map = p->swap_map + offset; + for (i = 0; i < size; i++) { + /* + * Mark swap entries to become free as SWAP_MAP_BAD + * temporarily. + */ + if (map[i] == 1) { + map[i] = SWAP_MAP_BAD; + free_entries++; + } else if (__swap_entry_free_locked(p, offset + i, 1) == + SWAP_HAS_CACHE) + cache_only++; + } + /* + * If there are PMD swap mapping or the THP is in swap cache, + * it's impossible for some swap entries to become free. + */ + VM_BUG_ON(free_entries && + (cluster_swapcount(ci) || (map[0] & SWAP_HAS_CACHE))); + if (free_entries == SWAPFILE_CLUSTER) + memset(map, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); + /* + * If there are no PMD swap mappings remain and the THP isn't + * in swap cache, split the huge swap cluster. + */ + else if (!cluster_swapcount(ci) && !(map[0] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); unlock_cluster_or_swap_info(p, ci); - if (!usage) - free_swap_slot(entry); - - return usage; + if (free_entries == SWAPFILE_CLUSTER) { + spin_lock(&p->lock); + mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTER); + swap_free_cluster(p, offset / SWAPFILE_CLUSTER); + spin_unlock(&p->lock); + } else if (free_entries) { + ci = lock_cluster(p, offset); + for (i = 0; i < size; i++, entry.val++) { + /* + * To be freed swap entries are marked as SWAP_MAP_BAD + * temporarily as above + */ + if (map[i] == SWAP_MAP_BAD) { + map[i] = SWAP_HAS_CACHE; + unlock_cluster(ci); + free_swap_slot(entry); + if (i == size - 1) + return; + ci = lock_cluster(p, offset); + } + } + unlock_cluster(ci); + } else if (cache_only == SWAPFILE_CLUSTER && flags & SF_FREE_CACHE) + __try_to_reclaim_swap(p, offset, TTRS_UNMAPPED | TTRS_FULL); } static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) @@ -1303,13 +1393,13 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry) * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free(swp_entry_t entry) +void swap_free(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_free(p, entry, entry_size, 0); } /* @@ -1545,29 +1635,33 @@ int swp_swapcount(swp_entry_t entry) return count; } -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, - swp_entry_t entry) +/* si->lock or ci->lock must be held before calling this function */ +static bool __swap_page_trans_huge_swapped(struct swap_info_struct *si, + struct swap_cluster_info *ci, + unsigned long offset) { - struct swap_cluster_info *ci; unsigned char *map = si->swap_map; - unsigned long roffset = swp_offset(entry); - unsigned long offset = round_down(roffset, SWAPFILE_CLUSTER); + unsigned long hoffset = round_down(offset, SWAPFILE_CLUSTER); int i; - bool ret = false; - ci = lock_cluster_or_swap_info(si, offset); - if (!ci || !cluster_is_huge(ci)) { - if (swap_count(map[roffset])) - ret = true; - goto unlock_out; - } + if (!ci || !cluster_is_huge(ci)) + return !!swap_count(map[offset]); for (i = 0; i < SWAPFILE_CLUSTER; i++) { - if (swap_count(map[offset + i])) { - ret = true; - break; - } + if (swap_count(map[hoffset + i])) + return true; } -unlock_out: + return false; +} + +static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, + swp_entry_t entry) +{ + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + bool ret; + + ci = lock_cluster_or_swap_info(si, offset); + ret = __swap_page_trans_huge_swapped(si, ci, offset); unlock_cluster_or_swap_info(si, ci); return ret; } @@ -1739,22 +1833,17 @@ int try_to_free_swap(struct page *page) * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. */ -int free_swap_and_cache(swp_entry_t entry) +int free_swap_and_cache(swp_entry_t entry, int entry_size) { struct swap_info_struct *p; - unsigned char count; if (non_swap_entry(entry)) return 1; p = _swap_info_get(entry); - if (p) { - count = __swap_entry_free(p, entry, 1); - if (count == SWAP_HAS_CACHE && - !swap_page_trans_huge_swapped(p, entry)) - __try_to_reclaim_swap(p, swp_offset(entry), - TTRS_UNMAPPED | TTRS_FULL); - } + if (p) + __swap_free(p, entry, entry_size, SF_FREE_CACHE); + return p != NULL; } @@ -1901,7 +1990,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, } set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); - swap_free(entry); + swap_free(entry, 1); /* * Move the page to the active list so it is not * immediately swapped out again after swapon. @@ -2630,6 +2719,15 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) reinsert_swap_info(p); reenable_swap_slots_cache_unlock(); goto out_dput; + } else { + /* + * Swap entries may be marked as SWAP_MAP_BAD temporarily in + * __swap_free() before being freed really. try_to_unuse() + * will skip these swap entries, that is OK. But we need to + * wait until they are freed really. + */ + while (READ_ONCE(p->inuse_pages)) + schedule_timeout_uninterruptible(1); } reenable_swap_slots_cache_unlock(); From patchwork Fri Dec 14 06:27:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730621 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82AB314E2 for ; Fri, 14 Dec 2018 06:28:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 724052CC8C for ; Fri, 14 Dec 2018 06:28:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6638F2CC99; Fri, 14 Dec 2018 06:28:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C6B82CC97 for ; Fri, 14 Dec 2018 06:28:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FC128E01B1; Fri, 14 Dec 2018 01:28:03 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1D9C18E0014; Fri, 14 Dec 2018 01:28:03 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 077228E01B1; Fri, 14 Dec 2018 01:28:03 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id B96DD8E0014 for ; Fri, 14 Dec 2018 01:28:02 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id h11so3564609pfj.13 for ; Thu, 13 Dec 2018 22:28:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=cJin0AWmjRCUTctFyPnG/iBR8TkQz7FzBLCXvHnStBI=; b=SDlwnBX8gSyeYSwPKFs7MM3e/krXFkbz4m2M4RwPsR4K5ewxnkGTinJLHUJb7nb584 JGXu3gRvmynmx6G1AF/p7SikAYeE7f1zAnLK2oyD+cTdpyeY85opFbyNPXMqpyt++w5G Oje0neKDLhzsoyZTtEWrjziDvkYL49ONzKnCTxN+Bxf3W+HTPjqwaehfIBvhjBbvz5X2 KudTZNv+Tdyhb/l052pQ8AAu+HmSGFMkO556bUePcyYcd7meH+xhLk6s4hOhZoOY9k8L 87AqVQZr0XfCZ20ELbKtsaPtAtjJ+9Ehbjsgpt2ZOWlr1S7McLbT1TqYrPLLLVF1I4YW CBFQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWY/JQddhhk2FjgMx4W7QOypT8Kk4XbFxqVoxgCMVyTBc+WWN7b0 1IQysEgw0PU67S9qlbpAlfU9BNl2421UiD7GQmzIrJVqKKF/yi9n1Zsx6K00n/o1vQG/Bemntt3 nfnWpp4ZNL73e0mNGwxk63RlIu1HMOg3/Lva9XN2xgk45BZGnzlnK4oVZy00V+z3mkQ== X-Received: by 2002:a17:902:8306:: with SMTP id bd6mr1753041plb.217.1544768882404; Thu, 13 Dec 2018 22:28:02 -0800 (PST) X-Google-Smtp-Source: AFSGD/WfPT3b5JIRq8vYc5Roaf8Z+BubXy68cypWqSOvep2tI3f3L5BOv82FRSzGKdvdmjrO7B6q X-Received: by 2002:a17:902:8306:: with SMTP id bd6mr1752998plb.217.1544768881155; Thu, 13 Dec 2018 22:28:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768881; cv=none; d=google.com; s=arc-20160816; b=qtqfQRFxv4neX8PA4QozlXTyK9s9kCLezaCmWv9BYtnqnd4E2tnh0eZzeYrUGYwAvC DlwgJM92ZCS+BWmMZufEGtEB6xfMNlnoN6BBTPyEVHBr+QH1CFmouwRkgqXrGBy9ELdv WytRxOHWtSXs71+1B8W/6zhJjnppF3Lu9RxhCJIIq+843KjyW78mmfeJ4icjLnPpEWJ0 dia+3LYlgw+fajOQjgy9G7QF896xrRmlk3ANvw7tLoA7szZ56FcbgZ9jLGka3P0gAD2v VRpybUdqzXAH7m0dhUPyjZB9CDLx46cK0pMuK7K182s5bmIYYDWDhwg1SeofO0PawNCS VkDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=cJin0AWmjRCUTctFyPnG/iBR8TkQz7FzBLCXvHnStBI=; b=dRArULt7gOwepB5EmPN41ib2ogn2QTZDEyB5KATcMsGLb+r06aa3WEZTqqXJO5slNI J99y3uHCaJrT7suBnlDJyNBbZiaxj5Uvl30aiL1zLDRlvHXUQ2dXlznVnCm45gpDSWXR ozkfFZ2rpWtKiLqzLHnd6gbGuxWSvI3B0w1C6yXeUSjtxiiFm7FUutITCkGDLqk862TS 7FwtssGUlw6TwmhVYS56Z7hOfV2rfIOo2ooRIGOcqj6ru/Kh1bzgOd3hWfl+duNnwxVJ 7tlan0s+3hbcK8bqNU8S86Qac8Ox0EYT25tDiW4XdfLNV+CWTOmGkEQmS/DGGLjlYhJ0 pJ/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:01 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840881" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:27:58 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 07/21] swap: Support PMD swap mapping when splitting huge PMD Date: Fri, 14 Dec 2018 14:27:40 +0800 Message-Id: <20181214062754.13723-8-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Thanks Daniel Jordan for testing and reporting a data corruption bug caused by misaligned address processing issue in __split_huge_swap_pmd(). Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 4 ++++ include/linux/swap.h | 6 +++++ mm/huge_memory.c | 49 ++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 32 +++++++++++++++++++++++++++ 4 files changed, 86 insertions(+), 5 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4663ee96cf59..1c0fda003d6a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,10 @@ static inline bool is_huge_zero_page(struct page *page) return READ_ONCE(huge_zero_page) == page; } +/* + * is_huge_zero_pmd() must be called after checking pmd_present(), + * otherwise, it may report false positive for PMD swap entry. + */ static inline bool is_huge_zero_pmd(pmd_t pmd) { return is_huge_zero_page(pmd_page(pmd)); diff --git a/include/linux/swap.h b/include/linux/swap.h index 24c3014894dd..a24d101b131d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -619,11 +619,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bd2543e10938..49df3e7c96c7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1617,6 +1617,41 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +/* Convert a PMD swap mapping to a set of PTE swap mappings */ +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + addr &= HPAGE_PMD_MASK; + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, addr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, addr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2082,7 +2117,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2106,7 +2141,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2150,6 +2185,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2243,14 +2281,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 243131253238..d38760b6d495 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3942,6 +3942,38 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, } #endif +#ifdef CONFIG_THP_SWAP +/* + * The corresponding page table shouldn't be changed under us, that + * is, the page table lock should be held. + */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + cluster_add_swapcount(ci, -1); + /* + * If the last PMD swap mapping has gone and the THP isn't in + * swap cache, the huge swap cluster will be split. + */ + if (!cluster_swapcount(ci) && !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid; From patchwork Fri Dec 14 06:27:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730623 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66CE214E2 for ; Fri, 14 Dec 2018 06:28:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 55B402CC8C for ; Fri, 14 Dec 2018 06:28:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 495DF2CC99; Fri, 14 Dec 2018 06:28:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 969952CC8C for ; Fri, 14 Dec 2018 06:28:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13A088E01B2; Fri, 14 Dec 2018 01:28:06 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0F6BB8E0014; Fri, 14 Dec 2018 01:28:05 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBAB98E01B2; Fri, 14 Dec 2018 01:28:05 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 9D55C8E0014 for ; Fri, 14 Dec 2018 01:28:05 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id p15so3569548pfk.7 for ; Thu, 13 Dec 2018 22:28:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=zy8/cYBmUCDWIC2PKdCOC5C+rBMjybFE5MBESnGiePo=; b=uLEOXFsjnLvY02mTT2nSpCK/nDU0zaHoc27zuTn3JH4Rqb3LyqKWbWQ8QK5Tcml33h YDcZBHuItwkDks9n5/Qb1EtKLRjJBIMjEsVyA3seACkzyJvqJoLvlNGyMjT99mxVvfI1 5eAyDCamDBvpQjOULeotoYv+c4Wy+Y8sRtIrEIRp2SOCbAP8hX1GBAEq7S8k03mt/Yol RJuifXtQtLk81JRqlLN39jJm1fB7TDqX9RfzybSFOgDLE266vPi54LCzKsQ+gkbPPeY0 GP9US76X2HmOwzXVhc785+FS0CCJHvfOUSsM04kRtJuF9JGUsrD4QfjRBbQpBA+gVtsW RkDQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbu9QckoWDRB580kyzvBABuMG1RGbsWJTfk1BR/EoKND84LL1uG R7FQVnGe+y0WJ/PdzN/CueOWoaYeBZIvd4fvPqhrRK5e2LqS4x4TZxlcEsNBr4y2VsKmerWmoj3 noTYwwnVI89c43d7t1W1bLFMWgg7bfMcrYtaJE4lRMmFR2n+Gp2ASEklWfArFmalHsw== X-Received: by 2002:a63:d655:: with SMTP id d21mr1663922pgj.280.1544768885241; Thu, 13 Dec 2018 22:28:05 -0800 (PST) X-Google-Smtp-Source: AFSGD/W9bUS2BGCaU7z6lht2754REuKQ79T9iU70OGZ+AZuv1+gI6/SrHJfLJteI7Gsi3K0kUTVe X-Received: by 2002:a63:d655:: with SMTP id d21mr1663884pgj.280.1544768884012; Thu, 13 Dec 2018 22:28:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768883; cv=none; d=google.com; s=arc-20160816; b=gUztaDelEM3qTShhexrtXvXSgHd6+5st2jnZD23Ooq8YolyEgl/SHgUtfeoRqq79r/ /S8Hw+ph9nRpoDbSUMPsvWJJBqTNLE5vfxvAw9bq3jcAd6VZQno8SRotlAlwd2fncw6k +hLGeiu8okZY4aDJ0c6GA+/t2J/TE+noXdrfcBNjKIoRvuvfDtRtktVjy+xA7vcPh4Kk nzmGh2X9nWgz70J9Hu3+VXWaRPBZHKqz1CZ7J6/X2o+Cehm61o0ehnlgDkJjqU+clSWU dNIZT96PGDyPGM0wrosQDSjaNabSHh9B6G+jC7T7lo2R1vBeZYpSfr1+QWzPfbl3DUrE iwjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=zy8/cYBmUCDWIC2PKdCOC5C+rBMjybFE5MBESnGiePo=; b=ajmhxVJGueRN1EY96c7C2z7EPoRpJDXGcqngQUdrEomXu75eLMmso+IKeczkxFWsl9 WFYnTcFqnBm5/7iljE7DqQ8yTjxLj+dR5LvkEOmAEzt+whopcpKQtc9vlfSJc8uulOuC +kp1tJiVg0cEc4qDrVcTlMRbC9kNrp2PFBspt92QnbdKLxepgzyvuieygCXVdB7Tm0DN 7TxMjZISQbfO7au2KeWIxvJPBcMLXI9lzGryKJ1zMY4XPZ0Lzl6dMmnE5+BmGAwsXVCE FNjcJO9zfZkp1Vjxi8RHusy5s0ejlptEchslj1QUWZ9YHRsetRzqe3oex1eK51Ka3skd TKYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:03 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840903" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:00 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 08/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Fri, 14 Dec 2018 14:27:41 +0800 Message-Id: <20181214062754.13723-9-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 +++-- mm/huge_memory.c | 18 +++++++++----- mm/swapfile.c | 58 +++++++++++++++++++++++++++++++------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a24d101b131d..441da4a832a6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,11 +617,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 49df3e7c96c7..fc31fc1ae0b3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2507,6 +2507,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, remap_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2741,12 +2752,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, end, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index d38760b6d495..c59cc2ca7c2c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -3972,6 +3955,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) From patchwork Fri Dec 14 06:27:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730625 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4843F91E for ; Fri, 14 Dec 2018 06:28:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36B432CC8C for ; Fri, 14 Dec 2018 06:28:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2A5B02CC99; Fri, 14 Dec 2018 06:28:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40B0B2CC8C for ; Fri, 14 Dec 2018 06:28:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 755298E01B3; Fri, 14 Dec 2018 01:28:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 72A948E0014; Fri, 14 Dec 2018 01:28:08 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A7628E01B3; Fri, 14 Dec 2018 01:28:08 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 0DE3B8E0014 for ; Fri, 14 Dec 2018 01:28:08 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id n17so3538416pfk.23 for ; Thu, 13 Dec 2018 22:28:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=YgsrK6AjB6AYp3gipowet+Imh98x808J/6Ka++B42NM=; b=udpU4HJzEZQNn4yGHnXRmXKn1TJmylYdPhRL7m/smq2Z3cevYEKUhKTHMR593a1dZ5 /kzsfT0ctwRKjxOMJcRB9nneawaF+BPVJjbViVCvTxuUu3z83BhHsVa3HdlMl0dPJcvh 1wR1LcWISAUowb9vCRlzr0QW/Ks1IwSIjY0puRh+TvyAR0hxM8B3uX96HR7jiV6UZEbe uU4NDQTmIoeWec4pHDZhhwHPVR+Dws8NwjIZnGhyEvbD/KsaTH5euvq0Gzy47jflC0DX tO0D3vRIjipz5a0nh1GN3isVFPjP2iPQbBU45Zhbq+nqBSrBGJupCuV2pxP+BikmqdnH ZRIA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWblba+xdv1mh1w7I3NYxk2SuV0t2DgddyC1bdd+d5chdE692PuO E7Ma+zLdEstbv6sOEOs7j5jioThQ9JzNUpV41057C9t8DvGV73EjVgzWWFeN5jhFfrGdB4/2dV8 EBBtYkwdGSPYkVC76iLe/qxHVx0BbEvhw0VyuOR29dOdMsTszBUQibdpzh7alEcPetQ== X-Received: by 2002:a17:902:8306:: with SMTP id bd6mr1753232plb.217.1544768887706; Thu, 13 Dec 2018 22:28:07 -0800 (PST) X-Google-Smtp-Source: AFSGD/Wxh7JVFzshKKdG4J9/7kP0cuB5blSKj+6Wxqqez2ZeEU18RrmVbp86dBjXiTTPLldE5LPC X-Received: by 2002:a17:902:8306:: with SMTP id bd6mr1753205plb.217.1544768886952; Thu, 13 Dec 2018 22:28:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768886; cv=none; d=google.com; s=arc-20160816; b=vqIi7ilHBfF+bhCLegNMaeOcbAM0fygo1EyWnU2VAPWYcIwcNykg1XhyQPt37IG79E LEIZcWtkQXVVUy0u3UJHbTiEnGu81UeFh0xjrDg59/0wdhjxTd2vZqKERZz/U4xJVJpC s9lTSLO46yUj5k8PFSNEkngyPKadP0rVwWEfogsvrxtNc2KK2jEgTmFIbq4nsu2d+eaC Rud7vikhejMsTEu5OSzLGc1Nx6dG27eDmyVRo3Out7IOt3PIXBY6I5DOK5aqdA/wwnHs t1P6kATvs1gA4UELiEPAtPnxOdD1ZORWY3psGE0uqONYg65yRvifjnOhOw53d1NHp+d4 OPXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=YgsrK6AjB6AYp3gipowet+Imh98x808J/6Ka++B42NM=; b=t6G+UGfxoeh5gscXgq2yZE9GgTd8aVmEk6NxR5k5IqzqWPwgRQZjY716BT9x+Ljh6e zqRNXNpkJHiHLBCG9pVGOlnOK0YpWUMxLZgt9RrZThWwxGtCz14bs9x5xCkQPdh/Yqof KN0fs0J9F+1KG15HBUf3wqSre4nLQsxozlqNfZswUUPPscN9+SoxzO/tf5aERBh3aWS6 SbWUUl3kWeHv9VuOtCIQ50+egxq0JzMlBuRx1mE8Y6O0BtUTt9QO2QLWNF2RN4EnHhqx 5BOFyfvIytLpTVIQIAcg7/2QAoUvzylqQyYO8YVxIgzUmQI7mVatUhAA/xDkroNkl/12 DS4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:06 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840917" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:03 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 09/21] swap: Support to read a huge swap cluster for swapin a THP Date: Fri, 14 Dec 2018 14:27:42 +0800 Message-Id: <20181214062754.13723-10-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP To swapin a THP in one piece, we need to read a huge swap cluster from the swap device. This patch revised the __read_swap_cache_async() and its callers and callees to support this. If __read_swap_cache_async() find the swap cluster of the specified swap entry is huge, it will try to allocate a THP, add it into the swap cache. So later the contents of the huge swap cluster can be read into the THP. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 6 +++++ include/linux/swap.h | 4 +-- mm/huge_memory.c | 4 +-- mm/swap_state.c | 60 ++++++++++++++++++++++++++++++++--------- mm/swapfile.c | 9 ++++--- 5 files changed, 64 insertions(+), 19 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1c0fda003d6a..72f2617d336b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,6 +250,7 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; }) @@ -363,6 +364,11 @@ static inline bool thp_migration_supported(void) { return false; } + +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 441da4a832a6..4bd532c9315e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -462,7 +462,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); -extern int __swp_swapcount(swp_entry_t entry); +extern int __swp_swapcount(swp_entry_t entry, int *entry_size); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); @@ -590,7 +590,7 @@ static inline int __swap_count(swp_entry_t entry) return 0; } -static inline int __swp_swapcount(swp_entry_t entry) +static inline int __swp_swapcount(swp_entry_t entry, int *entry_size) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fc31fc1ae0b3..1cec1eec340e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,9 +629,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) { - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + const bool vma_madvised = vma ? !!(vma->vm_flags & VM_HUGEPAGE) : false; /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) diff --git a/mm/swap_state.c b/mm/swap_state.c index 97831166994a..5e761bb6e354 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -361,7 +361,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, { struct page *found_page = NULL, *new_page = NULL; struct swap_info_struct *si; - int err; + int err, entry_size = 1; + swp_entry_t hentry; + *new_page_allocated = false; do { @@ -387,14 +389,41 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * as SWAP_HAS_CACHE. That's done in later part of code or * else swap_off will be aborted if we return NULL. */ - if (!__swp_swapcount(entry) && swap_slot_cache_enabled) + if (!__swp_swapcount(entry, &entry_size) && + swap_slot_cache_enabled) break; /* * Get a new page to read into from swap. */ - if (!new_page) { - new_page = alloc_page_vma(gfp_mask, vma, addr); + if (!new_page || + (IS_ENABLED(CONFIG_THP_SWAP) && + hpage_nr_pages(new_page) != entry_size)) { + if (new_page) + put_page(new_page); + if (IS_ENABLED(CONFIG_THP_SWAP) && + entry_size == HPAGE_PMD_NR) { + gfp_t gfp; + + gfp = alloc_hugepage_direct_gfpmask(vma); + /* + * Make sure huge page allocation flags are + * compatible with that of normal page + */ + VM_WARN_ONCE(gfp_mask & ~(gfp | __GFP_RECLAIM), + "ignoring gfp_mask bits: %x", + gfp_mask & ~(gfp | __GFP_RECLAIM)); + new_page = alloc_hugepage_vma(gfp, vma, addr, + HPAGE_PMD_ORDER); + if (new_page) + prep_transhuge_page(new_page); + hentry = swp_entry(swp_type(entry), + round_down(swp_offset(entry), + HPAGE_PMD_NR)); + } else { + new_page = alloc_page_vma(gfp_mask, vma, addr); + hentry = entry; + } if (!new_page) break; /* Out of memory */ } @@ -402,7 +431,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Swap entry may have been freed since our caller observed it. */ - err = swapcache_prepare(entry, 1); + err = swapcache_prepare(hentry, entry_size); if (err == -EEXIST) { /* * We might race against get_swap_page() and stumble @@ -411,18 +440,24 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ cond_resched(); continue; + } else if (err == -ENOTDIR) { + /* huge swap cluster has been split under us */ + continue; } else if (err) /* swp entry is obsolete ? */ break; /* May fail (-ENOMEM) if XArray node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, hentry, gfp_mask & GFP_KERNEL); if (likely(!err)) { /* Initiate read into locked page */ SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; + if (IS_ENABLED(CONFIG_THP_SWAP)) + new_page += swp_offset(entry) & + (entry_size - 1); return new_page; } __ClearPageLocked(new_page); @@ -430,7 +465,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * add_to_swap_cache() doesn't return -EEXIST, so we can safely * clear SWAP_HAS_CACHE flag. */ - put_swap_page(new_page, entry); + put_swap_page(new_page, hentry); } while (err != -ENOMEM); if (new_page) @@ -452,7 +487,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(compound_head(retpage), do_poll); return retpage; } @@ -571,8 +606,9 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (offset != entry_offset) { + swap_readpage(compound_head(page), false); + if (offset != entry_offset && + !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } @@ -733,8 +769,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); - if (i != ra_info.offset) { + swap_readpage(compound_head(page), false); + if (i != ra_info.offset && !PageTransCompound(page)) { SetPageReadahead(page); count_vm_event(SWAP_RA); } diff --git a/mm/swapfile.c b/mm/swapfile.c index c59cc2ca7c2c..e27fe24a1f41 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1542,7 +1542,8 @@ int __swap_count(swp_entry_t entry) return count; } -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) +static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry, + int *entry_size) { int count = 0; pgoff_t offset = swp_offset(entry); @@ -1550,6 +1551,8 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) ci = lock_cluster_or_swap_info(si, offset); count = swap_count(si->swap_map[offset]); + if (entry_size) + *entry_size = ci && cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; unlock_cluster_or_swap_info(si, ci); return count; } @@ -1559,14 +1562,14 @@ static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry) * This does not give an exact answer when swap count is continued, * but does include the high COUNT_CONTINUED flag to allow for that. */ -int __swp_swapcount(swp_entry_t entry) +int __swp_swapcount(swp_entry_t entry, int *entry_size) { int count = 0; struct swap_info_struct *si; si = get_swap_device(entry); if (si) { - count = swap_swapcount(si, entry); + count = swap_swapcount(si, entry, entry_size); put_swap_device(si); } return count; From patchwork Fri Dec 14 06:27:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730627 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C88D114E2 for ; Fri, 14 Dec 2018 06:28:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B663B2CC8C for ; Fri, 14 Dec 2018 06:28:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA6ED2CC99; Fri, 14 Dec 2018 06:28:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8A7B2CC8C for ; Fri, 14 Dec 2018 06:28:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F84C8E01B4; Fri, 14 Dec 2018 01:28:11 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3A82B8E0014; Fri, 14 Dec 2018 01:28:11 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BFB78E01B4; Fri, 14 Dec 2018 01:28:11 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id DBB6F8E0014 for ; Fri, 14 Dec 2018 01:28:10 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id l9so2943600plt.7 for ; Thu, 13 Dec 2018 22:28:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=knYDE2Y4kYMyTftPpiIn/YNLPn8n1fSOGQupP+BrBRs=; b=K2oGJ+qiveEqIKApw85EwNJYeYSI8Bpv54hJ7U0yODqRoYzV4huq55o+Zf7hyzxjQF /7S4j62Uhl9hTVWDS7b2SkeOZDfMiYxsqHKmS6Z1J0i/6tmL6g/8NnYM5BwsUy7vRzHK xpTNBA3Seqp4ZdElGCNplxufcdzItHYRRr66Zo5sPIcOJrli7KDUy7ssMDyv6D9WElry hgAoCx8x+dwAdEt5rv2pwz6WffzKLb+uIzTh9jar5not8Aj0JEvOKoeUa1WgMEFX+UZC p5PsbHw+OuVacWPhG/DixI6QfqazAz1aNBPtut+gAneBqOV+ybNXkIfdVhnIBzkyFSDC xD+A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbeQD2rtZO7GqAujAZDEGs3ol5e9tZ9IT/+DksRv+dRoA4awAwD Phduov/7giRmba1he4HGvpbWk3b25EDGXyQOeQQdoToMYwAB3Ft3XmOC9Ej5892EQPl5ObrD6W5 lMaK+xRC4z/xjNUs4OXih48A8uubYt6meV7Gf25po3oG+8HueSLmoV1vHuQ2/NPEiXw== X-Received: by 2002:a17:902:29ab:: with SMTP id h40mr1757200plb.238.1544768890554; Thu, 13 Dec 2018 22:28:10 -0800 (PST) X-Google-Smtp-Source: AFSGD/UaEMP9Up2HnEEGI0vDLxVp5G1hcH8MXIpcoVfNPn7MI+qCJuqRH+OGp+R2jfMqL/VRqBWE X-Received: by 2002:a17:902:29ab:: with SMTP id h40mr1757174plb.238.1544768889757; Thu, 13 Dec 2018 22:28:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768889; cv=none; d=google.com; s=arc-20160816; b=FnyFpV3zZdzMzlZXkDjRWVwsqAx0rWdXovRWKISD6g30gUO57sVeop+ceM7Q6a7xhA 8gOWj5oAYnWzl80dZR4cp/73HAdSzuWWIeAelq+vJatwqQcRpN/OphGXDCm3YnuAGtDm EscubuKiw2nENZd7Ezg3FIp9mqct/G0zynREvNusFFGISHKeeD01CoHuoX14yDLAqLS8 unDNOMYXucoNUxZuvwHt4WDuizt7ifNHJMPCyBS/hV0pWY6TggoOlag5cNW/AwDQszWZ uWqIQGutK/1UdzmaW6OxsrRyiOR4zD9Vo/ihNP9xdLsleu08n+AAZNlWc6/e//5gRbeX qTwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=knYDE2Y4kYMyTftPpiIn/YNLPn8n1fSOGQupP+BrBRs=; b=GX1f1+rHQVf5k62Bu7A7iRHTzbhhN4KbiJD0sDhvDEO1K7/M3KtF7thd/KuZZerr9I ygfOpsoh5cyPqwo1dMtavWBgMlGk1bArz+OrRBk5ph9GQakx2tDNIaPbWy9KfwblzXIo 4AOAjlF+dVWjEnRph1xWRHuLbi6MLn7p7FRmETR4dwwrclI5Pow6Hnitznj9BMHtI2M4 k8UM3CJYNqh2hfUw6Tfh1W2UlAhr8WGpDvMlDL6t1yJSaWJn6wrFdssM4xnU/QD3PY8e rl1reqJ9jeIzEFutRUsTuKa7WStlU+koyQJRcTCz4ZX1Q68quTFXQLeemd3imHwX6m4d qDMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:09 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840940" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:06 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 10/21] swap: Swapin a THP in one piece Date: Fri, 14 Dec 2018 14:27:43 +0800 Message-Id: <20181214062754.13723-11-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With this patch, when page fault handler find a PMD swap mapping, it will swap in a THP in one piece. This avoids the overhead of splitting/collapsing before/after the THP swapping. And improves the swap performance greatly for reduced page fault count etc. do_huge_pmd_swap_page() is added in the patch to implement this. It is similar to do_swap_page() for normal page swapin. If failing to allocate a THP, the huge swap cluster and the PMD swap mapping will be split to fallback to normal page swapin. If the huge swap cluster has been split already, the PMD swap mapping will be split to fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 9 +++ mm/huge_memory.c | 174 ++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 16 ++-- 3 files changed, 193 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 72f2617d336b..debe3760e894 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -371,4 +371,13 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifdef CONFIG_THP_SWAP +extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +#else /* CONFIG_THP_SWAP */ +static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + return 0; +} +#endif /* CONFIG_THP_SWAP */ + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1cec1eec340e..644cb5d6b056 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -1652,6 +1654,178 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } +#ifdef CONFIG_THP_SWAP +static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + int ret = 0; + + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, orig_pmd)) + __split_huge_swap_pmd(vma, address, pmd); + else + ret = -ENOENT; + spin_unlock(ptl); + + return ret; +} + +int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) +{ + struct page *page; + struct mem_cgroup *memcg; + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + swp_entry_t entry; + pmd_t pmd; + int i, locked, exclusive = 0, ret = 0; + + entry = pmd_to_swp_entry(orig_pmd); + VM_BUG_ON(non_swap_entry(entry)); + delayacct_set_flag(DELAYACCT_PF_SWAPIN); +retry: + page = lookup_swap_cache(entry, NULL, vmf->address); + if (!page) { + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, + haddr, false); + if (!page) { + /* + * Back out if somebody else faulted in this pmd + * while we released the pmd lock. + */ + if (likely(pmd_same(*vmf->pmd, orig_pmd))) { + /* + * Failed to allocate huge page, split huge swap + * cluster, and fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + /* swapoff occurs under us */ + } else if (ret == -EINVAL) + ret = 0; + else + goto fallback; + } + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + goto out; + } + + /* Had to read the page from swap area: Major fault */ + ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + } else if (!PageTransCompound(page)) + goto fallback; + + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!locked) { + ret |= VM_FAULT_RETRY; + goto out_release; + } + + /* + * Make sure try_to_free_swap or reuse_swap_page or swapoff did not + * release the swapcache from under us. The page pin, and pmd_same + * test below, are not enough to exclude that. Even if it is still + * swapcache, we need to check that the page's swap has not changed. + */ + if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val)) + goto out_page; + + if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = VM_FAULT_OOM; + goto out_page; + } + + /* + * Back out if somebody else already faulted in this pmd. + */ + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) + goto out_nomap; + + if (unlikely(!PageUptodate(page))) { + ret = VM_FAULT_SIGBUS; + goto out_nomap; + } + + /* + * The page isn't present yet, go ahead with the fault. + * + * Be careful about the sequence of operations here. + * To get its accounting right, reuse_swap_page() must be called + * while the page is counted on swap but not yet in mapcount i.e. + * before page_add_anon_rmap() and swap_free(); try_to_free_swap() + * must be called after the swap_free(), or it will never succeed. + */ + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + pmd = mk_huge_pmd(page, vma->vm_page_prot); + if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { + pmd = maybe_pmd_mkwrite(pmd_mkdirty(pmd), vma); + vmf->flags &= ~FAULT_FLAG_WRITE; + ret |= VM_FAULT_WRITE; + exclusive = RMAP_EXCLUSIVE; + } + for (i = 0; i < HPAGE_PMD_NR; i++) + flush_icache_page(vma, page + i); + if (pmd_swp_soft_dirty(orig_pmd)) + pmd = pmd_mksoft_dirty(pmd); + do_page_add_anon_rmap(page, vma, haddr, + exclusive | RMAP_COMPOUND); + mem_cgroup_commit_charge(page, memcg, true, true); + activate_page(page); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); + + swap_free(entry, HPAGE_PMD_NR); + if (mem_cgroup_swap_full(page) || + (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) + try_to_free_swap(page); + unlock_page(page); + + if (vmf->flags & FAULT_FLAG_WRITE) { + spin_unlock(vmf->ptl); + ret |= do_huge_pmd_wp_page(vmf, pmd); + if (ret & VM_FAULT_ERROR) + ret &= VM_FAULT_ERROR; + goto out; + } + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); +out: + return ret; +out_nomap: + mem_cgroup_cancel_charge(page, memcg, true); + spin_unlock(vmf->ptl); +out_page: + unlock_page(page); +out_release: + put_page(page); + return ret; +fallback: + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) + ret = VM_FAULT_FALLBACK; + else + ret = 0; + if (page) + put_page(page); + return ret; +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. diff --git a/mm/memory.c b/mm/memory.c index 78f341b5672d..a480c562d7d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3870,13 +3870,17 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, barrier(); if (unlikely(is_swap_pmd(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - if (is_pmd_migration_entry(orig_pmd)) + if (thp_migration_supported() && + is_pmd_migration_entry(orig_pmd)) { pmd_migration_entry_wait(mm, vmf.pmd); - return 0; - } - if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { + return 0; + } else if (IS_ENABLED(CONFIG_THP_SWAP)) { + ret = do_huge_pmd_swap_page(&vmf, orig_pmd); + if (!(ret & VM_FAULT_FALLBACK)) + return ret; + } else + VM_BUG_ON(1); + } else if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); From patchwork Fri Dec 14 06:27:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 586C614E2 for ; Fri, 14 Dec 2018 06:28:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4806E2CC8C for ; Fri, 14 Dec 2018 06:28:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3BAF72CC99; Fri, 14 Dec 2018 06:28:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E66F2CC8C for ; Fri, 14 Dec 2018 06:28:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F82C8E01B5; Fri, 14 Dec 2018 01:28:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3A8DD8E0014; Fri, 14 Dec 2018 01:28:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B1908E01B5; Fri, 14 Dec 2018 01:28:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id C82F18E0014 for ; Fri, 14 Dec 2018 01:28:13 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id g188so3112593pgc.22 for ; Thu, 13 Dec 2018 22:28:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=v75KLdHdmuQzRdYVnO6ZFM3JQ0qpbbAj2bCb1y/sA7U=; b=By7dCoQv5OJWTYtTG/k3fvevU3gMQfM7zhCo6+COFo1YtsLFi/Vw8efp/CMg08xJYt v+ft688t0nkANTN8y6K/xP32tXYptU78iKTS8hIeNYCPhc2DRptaaQZrmflbmbv9ErW2 gxBRgI2kSGG69CqUZ+Hqw2DjusLHRa4T+dNbevFohwspDavMqhTNTKTht40vqB3+G6xD +1HlgF823VeAtP7/xAagLRT9n10hn7QIvHRbP2Y2sJs5+0YKnpVVSYR2jOECWwiiXfmS Rf8AgFNASGBV1ubiYnTFS9058/sfIUR/ynUug1VrZs93oJPSHlQOH8TE8W24kWsnSjYS /R+g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZZ7cWhvlyxDYqSwQr63U+TA2uKkk53x3OPkZ8+Yz7+nzrR5zAx hSJQBvs1P/k01OwW6XmuDSghOxHqoEv6pLfV9UD9toj80zR+syXRwThiaCwzNHU1x0IAnLwni4y ulETK7HuH73an/KTdCyE4gp3uuhP4E1MvKG487e+uwys+mi1yb+vb/gZyxjw4dyT3DQ== X-Received: by 2002:a17:902:ba89:: with SMTP id k9mr1769296pls.189.1544768893487; Thu, 13 Dec 2018 22:28:13 -0800 (PST) X-Google-Smtp-Source: AFSGD/VVSbuiZArd5rkIZThTLO3Hq+MA9G37CgbcbxmV4d/t6ioAnenfXBECnvfNI1oyLMm2KP0D X-Received: by 2002:a17:902:ba89:: with SMTP id k9mr1769271pls.189.1544768892792; Thu, 13 Dec 2018 22:28:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768892; cv=none; d=google.com; s=arc-20160816; b=NDVmIVxFuYpEt+y8sBkvka9HYppn/UZ+0KvzhnbJFkXpPGaxPGMd9I+119P30uIpmJ 2ynJ0C9Xo/7W73/jvCLMt3E/Ec6x+pdjn8W/UqQ7y/6DrwmhY3tvJF/evmtAhat19qe1 UsEToVLaZlDoT0ICW77cuLU6ReortvNj0GzDvTQJxxa0mguyVo3ZEbvmSyQ6PPbGbXBB 3qmPJWzcYOh5VPQ/j5DOH4CRI64YKUk4Pz0oYoopZFMjAe5i0ngOYior7ulh4oXBOj3J vZt4O7AtlxoQObXrAU8Xp4R8ftTcxLXwFEKBXgRUzTvVpe8JmIXAuGgsQIdn+c5jDCO2 gg+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=v75KLdHdmuQzRdYVnO6ZFM3JQ0qpbbAj2bCb1y/sA7U=; b=R3en/WlTo+sxjEd0DtsqUGa+kkw6XhmAWm8n6T4GiB8jn81y8wo4E01pk6p9abSpvO wuKEAR9zplpAcA3oSV/ZkGpbcfYLu0UVVgvLi61sx2JFFO6Vu+tYAHM2yWADOFa7VcOg oY3Qh7RzI1RnRsvyJc8rMY9HvyOq4MTTIL9Z3tgPgHk7G3cdWz9iYlxFLyDULCaV1c+y X218UAF4gmLU3Xl7BRKErsQiXQi3kY9zDvsPL6vNP5yOO+dJiIPaV24w3ZouFBufzClK /cwQioJmh+qYRBnpL+PV3yUFKhDC18/GfcncGZv4rYYYWRgnQg+AMJyDUzaHgayhMhMN 5hsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:12 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840960" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:09 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 11/21] swap: Support to count THP swapin and its fallback Date: Fri, 14 Dec 2018 14:27:44 +0800 Message-Id: <20181214062754.13723-12-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP 2 new /proc/vmstat fields are added, "thp_swapin" and "thp_swapin_fallback" to count swapin a THP from swap device in one piece and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 8 ++++++++ include/linux/vm_event_item.h | 2 ++ mm/huge_memory.c | 4 +++- mm/page_io.c | 15 ++++++++++++--- mm/vmstat.c | 2 ++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 7ab93a8404b9..85e33f785fd7 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -364,6 +364,14 @@ thp_swpout_fallback Usually because failed to allocate some continuous swap space for the huge page. +thp_swpin + is incremented every time a huge page is swapin in one piece + without splitting. + +thp_swpin_fallback + is incremented if a huge page has to be split during swapin. + Usually because failed to allocate a huge page. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441cf4c4..c20b655cfdcc 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -88,6 +88,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, THP_SWPOUT_FALLBACK, + THP_SWPIN, + THP_SWPIN_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON BALLOON_INFLATE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 644cb5d6b056..e1e95e6c86e3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1708,8 +1708,10 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) /* swapoff occurs under us */ } else if (ret == -EINVAL) ret = 0; - else + else { + count_vm_event(THP_SWPIN_FALLBACK); goto fallback; + } } delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; diff --git a/mm/page_io.c b/mm/page_io.c index 67a7f64d6c1a..00774b453dca 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -348,6 +348,15 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return ret; } +static inline void count_swpin_vm_event(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (unlikely(PageTransHuge(page))) + count_vm_event(THP_SWPIN); +#endif + count_vm_events(PSWPIN, hpage_nr_pages(page)); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -371,7 +380,7 @@ int swap_readpage(struct page *page, bool synchronous) ret = mapping->a_ops->readpage(swap_file, page); if (!ret) - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return ret; } @@ -382,7 +391,7 @@ int swap_readpage(struct page *page, bool synchronous) unlock_page(page); } - count_vm_event(PSWPIN); + count_swpin_vm_event(page); return 0; } @@ -403,7 +412,7 @@ int swap_readpage(struct page *page, bool synchronous) bio_set_op_attrs(bio, REQ_OP_READ, 0); if (synchronous) bio->bi_opf |= REQ_HIPRI; - count_vm_event(PSWPIN); + count_swpin_vm_event(page); bio_get(bio); qc = submit_bio(bio); while (synchronous) { diff --git a/mm/vmstat.c b/mm/vmstat.c index 83b30edc2f7f..80a731e9a5e5 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1265,6 +1265,8 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc_failed", "thp_swpout", "thp_swpout_fallback", + "thp_swpin", + "thp_swpin_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON "balloon_inflate", From patchwork Fri Dec 14 06:27:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730631 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8AE9591E for ; Fri, 14 Dec 2018 06:28:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7824B2CC8C for ; Fri, 14 Dec 2018 06:28:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6BF812CC99; Fri, 14 Dec 2018 06:28:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A14872CC8C for ; Fri, 14 Dec 2018 06:28:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F77C8E01B6; Fri, 14 Dec 2018 01:28:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3A87B8E0014; Fri, 14 Dec 2018 01:28:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 272848E01B6; Fri, 14 Dec 2018 01:28:17 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id D80EC8E0014 for ; Fri, 14 Dec 2018 01:28:16 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id 89so2925184ple.19 for ; Thu, 13 Dec 2018 22:28:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=KYbssjUY7NjYgEMrfRpbelR2d6lLzD2RxnFkCLZGTFk=; b=RE1lOe/01Q95vw6DY0RueHT1hNmrmuU5t0Vg5o9EVoHVVzSp04Wb7Ozf0J+dYfcyKo gmLd9RNZZgRFHuerGV9zrqGV28AiBSlQbKlMn2j7MoU7iZNtDXl2BPqDpnQ8wMrYM1U7 GbgnoIjmkv/ZEI5KRqoUfcIzzGGFREEHTOujUGHubRwoCFd322vap9vL13BtBiEd9kgQ aPWiRgG658AodRBiBbBloMQk8JQC5Z9JLemiGz6cTSCdwq4EirAvAkpDM/XG95bydNyS jvJYNu5xUT5E7sDP+jTWSLN4N+lsZGzmKhdNYRqCjdR/M/QfT1XuRz8lTWGsaqDYQUDU pU4Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYJKs03oiudXvR7S2C0LIoEer2mpB/r0c8pXi4LAV3i8O4PdqLQ Gmrdr4H43QJokLGXD0fha1ijHujvwot4MRZF0ok4ZNSCeAy+pKige72Ksl9r+bXYM0v3isUgWoL 6uVXRkHzU6aFDLjtY9xS1ThGmpA4yYJUUSWVgibC9iNVlHidjaIeALeBZ/1nqQBgoZw== X-Received: by 2002:a17:902:8306:: with SMTP id bd6mr1753529plb.217.1544768896535; Thu, 13 Dec 2018 22:28:16 -0800 (PST) X-Google-Smtp-Source: AFSGD/W0nu1FcrE2FP3ONf/2D36BngpBN33lVXy0y8NazEhTF2g7yG9dJHwp+hvZyPx2zuKMi5Za X-Received: by 2002:a17:902:8306:: with SMTP id bd6mr1753508plb.217.1544768895773; Thu, 13 Dec 2018 22:28:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768895; cv=none; d=google.com; s=arc-20160816; b=MhveriaYa4f0zlIMAVgN8zAl2fuQobn2rLR6p0VRAQctzCVgsGq6YM5HWppX7ZFbxo Cj/JZ73d3E1lHBYYMxPrInPCN/7d1Df/27lfoIrWlr4xmt3YiYElnkiK14xa4blXug1B 3woxz8LKuy7L009U60vjXT1cFqmezZ10Yyvkr/yI/HGgc8DFEVFm9TJTXOquAUfAXnWa aDfVaNZQAXBWzRlh+GgSIwdQs8qPTfw/yckiGJImA7FFYaDaoUm6F5x83FRyIIzBqach qXdF75tKBQVD554vBGtUkXcm2bxftlX51nWtvFsaH8+VFbIgt4EaPW2FgD5nEHKyrMIU OjUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=KYbssjUY7NjYgEMrfRpbelR2d6lLzD2RxnFkCLZGTFk=; b=t1MJQ0Np7IXMPQwXEhr36oe0bKWS3gDoaLsY5Kk40laCoJ3Bu312HVLSqtYEk2xqX4 ugmPTblnst28VxRWky9uyAgmK7fpNOoDuO0lm9RAmi7J23T4TPtKNcFPURQuKQXKGGY1 5piBFu4pLhgQ41FxRBEUZfjxlv3lR48c6VDozNygBpxdZmfoTYAB53Wxl4VT/0peuwh1 muLDKH5iuzgXdUP29IZtp0j/eBbEH9shd+yENvjgdWtU/WqH53g5nrH+q6IexuEE81ZL d3ke3tJMSz65973TyQ39Y8CuYUK4mqgTXwIo0Zo1pwSqmCPag8i3MasetONWbkrXcfR/ eYqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:15 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125840988" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:12 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 12/21] swap: Add sysfs interface to configure THP swapin Date: Fri, 14 Dec 2018 14:27:45 +0800 Message-Id: <20181214062754.13723-13-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Swapin a THP as a whole isn't desirable in some situations. For example, for completely random access pattern, swapin a THP in one piece will inflate the reading greatly. So a sysfs interface: /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure it. Three options as follow are provided, - always: THP swapin will be enabled always - madvise: THP swapin will be enabled only for VMA with VM_HUGEPAGE flag set. - never: THP swapin will be disabled always The default configuration is: madvise. During page fault, if a PMD swap mapping is found and THP swapin is disabled, the huge swap cluster and the PMD swap mapping will be split and fallback to normal page swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- Documentation/admin-guide/mm/transhuge.rst | 21 +++++ include/linux/huge_mm.h | 31 ++++++++ mm/huge_memory.c | 93 +++++++++++++++++----- 3 files changed, 126 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 85e33f785fd7..23aefb17101c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -160,6 +160,27 @@ Some userspace (such as a test program, or an optimized memory allocation cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size +Transparent hugepage may be swapout and swapin in one piece without +splitting. This will improve the utility of transparent hugepage but +may inflate the read/write too. So whether to enable swapin +transparent hugepage in one piece can be configured as follow. + + echo always >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/swapin_enabled + +always + Attempt to allocate a transparent huge page and read it from + swap space in one piece every time. + +never + Always split the swap space and PMD swap mapping and swapin + the fault normal page during swapin. + +madvise + Only swapin the transparent huge page in one piece for + MADV_HUGEPAGE madvise regions. + khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index debe3760e894..06dbbcf6a6dd 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -63,6 +63,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_HUGEPAGE_SWAPIN_FLAG, + TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG, }; struct kobject; @@ -373,11 +375,40 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #ifdef CONFIG_THP_SWAP extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_NOHUGEPAGE) + return false; + + if (is_vma_temporary_stack(vma)) + return false; + + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_FLAG)) + return true; + + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_SWAPIN_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + + return false; +} #else /* CONFIG_THP_SWAP */ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; } + +static inline bool transparent_hugepage_swapin_enabled( + struct vm_area_struct *vma) +{ + return false; +} #endif /* CONFIG_THP_SWAP */ #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e1e95e6c86e3..8e8952938c25 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -57,7 +57,8 @@ unsigned long transparent_hugepage_flags __read_mostly = #endif (1<address); if (!page) { + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, vma, haddr, false); if (!page) { @@ -1695,24 +1749,8 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) * Back out if somebody else faulted in this pmd * while we released the pmd lock. */ - if (likely(pmd_same(*vmf->pmd, orig_pmd))) { - /* - * Failed to allocate huge page, split huge swap - * cluster, and fallback to swapin normal page - */ - ret = split_swap_cluster(entry, 0); - /* Somebody else swapin the swap entry, retry */ - if (ret == -EEXIST) { - ret = 0; - goto retry; - /* swapoff occurs under us */ - } else if (ret == -EINVAL) - ret = 0; - else { - count_vm_event(THP_SWPIN_FALLBACK); - goto fallback; - } - } + if (likely(pmd_same(*vmf->pmd, orig_pmd))) + goto split; delayacct_clear_flag(DELAYACCT_PF_SWAPIN); goto out; } @@ -1816,6 +1854,23 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) out_release: put_page(page); return ret; +split: + /* + * Failed to allocate huge page, split huge swap cluster, and + * fallback to swapin normal page + */ + ret = split_swap_cluster(entry, 0); + /* Somebody else swapin the swap entry, retry */ + if (ret == -EEXIST) { + ret = 0; + goto retry; + } + /* swapoff occurs under us */ + if (ret == -EINVAL) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return 0; + } + count_vm_event(THP_SWPIN_FALLBACK); fallback: delayacct_clear_flag(DELAYACCT_PF_SWAPIN); if (!split_huge_swap_pmd(vmf->vma, vmf->pmd, vmf->address, orig_pmd)) From patchwork Fri Dec 14 06:27:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9602091E for ; Fri, 14 Dec 2018 06:28:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 844182CC8C for ; Fri, 14 Dec 2018 06:28:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7719C2CC99; Fri, 14 Dec 2018 06:28:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B09782CC8C for ; Fri, 14 Dec 2018 06:28:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 499F58E01B7; Fri, 14 Dec 2018 01:28:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 41D688E0014; Fri, 14 Dec 2018 01:28:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30F028E01B7; Fri, 14 Dec 2018 01:28:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id DDD538E0014 for ; Fri, 14 Dec 2018 01:28:19 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id h10so2933692plk.12 for ; Thu, 13 Dec 2018 22:28:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=TNStQ7cvlFR7lsNAzUrKex6MPQ3IJBH0rx4+ssRGTBo=; b=pRAjxU6jefrAOVFU6Xxiv9hzzr+bRqzJG0S12KGRchuFedOopYymHiYb1k/8U1kyLZ cyrolYVPB/LvVCUZIhdgsyrjzNfKH7JeWzr5qa68rT/19pUn7ZXQM4QBCMusVMUgOl75 WbdkFLTgaOzpNclBYXHo4d/kWyMEJaO0POmLxIYyBY8iZLhv856xlzOSY9Bw6kLsSBRJ Op+DcrFbhipDnLYO1mn4rJxcpCAPJEHRiXuhHHAA6tzEP4/zfLRzfECrB3DQJf+JbPF0 8cmm77sC3ItofRKXHYynXWiNqoO0jFXPaiVRw/VtV0WZbc1k7j9jkhn+AxZyt7Hb8uYT qNEA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaxHQiSz8b2sN3UayglaVRdRcr/wUOGy6uPseIf1Qb2BFvrMpKQ XedDB5J+8ATqYKslOCQ84wfMxXz12jfFtCy+PpGs4K/F2tWVPDWLgYWg62Lmn+EjJxJvOODrgiV to1K3mNcs62JtvjIUxc3mTxZ+/RTuXTO91x/a5BF1zcEOg1j80Ty0qII6TSCJfpBRXw== X-Received: by 2002:a62:3a04:: with SMTP id h4mr1677876pfa.119.1544768899511; Thu, 13 Dec 2018 22:28:19 -0800 (PST) X-Google-Smtp-Source: AFSGD/WOPKrnOKC/C/JH12F8Ae0G+zLnNG+fTgFOHgpHj/Rq3qlP7Hzb5RjomwC38wds9aETn2Vi X-Received: by 2002:a62:3a04:: with SMTP id h4mr1677858pfa.119.1544768898745; Thu, 13 Dec 2018 22:28:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768898; cv=none; d=google.com; s=arc-20160816; b=lvhpVXGCbae4U/iwlPKYz8/c4oHZQZDG4OwJHzOxFW5kwGqBxI3g3xBxovDCp3TPVy EZQl6KqABTxJvwbqi7G6vwpkJ6rPiJ6MDS0XeCWYwJrJNswuxSxCmNcyuA4+giivJhbm gYqoAkKlzX/k8VOptn3J5dU5aKq1ISaOAj8NsLYnKYUHPSuK+uVzbT1bdXB8EnLEM+84 CvqE2VMSCIY1xTXk1vVvyKGu6VBvgvfqVyVe92eJSZ1JKA5kpfw/uqkxp84E1zo/rID2 BOIrfRSU4Ql/oOqHwsHv/hjals0XL1DzqQ8U1Kv2c9YrcMt5alWFBP5+BNRRhPkUY/lX dRbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=TNStQ7cvlFR7lsNAzUrKex6MPQ3IJBH0rx4+ssRGTBo=; b=pI4D/rgfGP9JZYyowsb0hZUMe9r+GxwQYJMaDQ0LYAkoCIm9qtIsj5fxXzW7c+CW6j bQqlFVtnj89EuQZ/53SsHEsPLeWqJnvjmYgBDRxjubj8oAbtEqx4SiQ73E2+duxJrqhg cXW2dwmPBbdzv4xS0Hou9gRcErh5R8ZlnqFyX6uAclO44um+/ektzfDWA5LIZ7VAxH5T HCjQ4a7dvBWozcO9GzFwfVudxgTdW2RRh2TAYjEc5rcuq4wS4tyQxaXYxUoP9bcLGxx+ Fm/bwd/knMpN5E7hO7hJwcDBq+dRkF5+ye8c4etAFIRuVl+Ok2a9NysyvFm3w3kxypJa Q5wA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:18 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841016" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:15 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 13/21] swap: Support PMD swap mapping in swapoff Date: Fri, 14 Dec 2018 14:27:46 +0800 Message-Id: <20181214062754.13723-14-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During swapoff, for each PMD swap mapping, we will allocate a THP, read the contents of the huge swap cluster into the THP and change the PMD swap mapping to the PMD page mapping to the THP, then try to free the huge swap cluster. If failed to allocate a THP, the huge swap cluster will be split. If the swap cluster mapped by a PMD swap mapping has been split already, we will split the PMD swap mapping and unuse the PTEs. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/asm-generic/pgtable.h | 14 +---- include/linux/huge_mm.h | 8 +++ mm/huge_memory.c | 4 +- mm/swapfile.c | 108 +++++++++++++++++++++++++++++++++- 4 files changed, 119 insertions(+), 15 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 2a619f378297..d2d4d520e2e7 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -931,22 +931,12 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) barrier(); #endif /* - * !pmd_present() checks for pmd migration entries - * - * The complete check uses is_pmd_migration_entry() in linux/swapops.h - * But using that requires moving current function and pmd_trans_unstable() - * to linux/swapops.h to resovle dependency, which is too much code move. - * - * !pmd_present() is equivalent to is_pmd_migration_entry() currently, - * because !pmd_present() pages can only be under migration not swapped - * out. - * - * pmd_none() is preseved for future condition checks on pmd migration + * pmd_none() is preseved for future condition checks on pmd swap * entries and not confusing with this function name, although it is * redundant with !pmd_present(). */ if (pmd_none(pmdval) || pmd_trans_huge(pmdval) || - (IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !pmd_present(pmdval))) + (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && !pmd_present(pmdval))) return 1; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 06dbbcf6a6dd..7c72e63757af 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -374,6 +374,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); static inline bool transparent_hugepage_swapin_enabled( @@ -399,6 +401,12 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) +{ + return 0; +} + static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8e8952938c25..fdffa07bff98 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1706,8 +1706,8 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, } #ifdef CONFIG_THP_SWAP -static int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, pmd_t orig_pmd) +int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, pmd_t orig_pmd) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; diff --git a/mm/swapfile.c b/mm/swapfile.c index e27fe24a1f41..454e993bc32f 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1931,6 +1931,11 @@ static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); } +static inline int pmd_same_as_swp(pmd_t pmd, pmd_t swp_pmd) +{ + return pmd_same(pmd_swp_clear_soft_dirty(pmd), swp_pmd); +} + /* * No need to decide whether this PTE shares the swap entry with others, * just let do_wp_page work it out if a write is requested later - to @@ -1992,6 +1997,53 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, return ret; } +#ifdef CONFIG_THP_SWAP +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + struct mem_cgroup *memcg; + spinlock_t *ptl; + int ret = 1; + + if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, + &memcg, true)) { + ret = -ENOMEM; + goto out_nolock; + } + + ptl = pmd_lock(vma->vm_mm, pmd); + if (unlikely(!pmd_same_as_swp(*pmd, swp_entry_to_pmd(entry)))) { + mem_cgroup_cancel_charge(page, memcg, true); + ret = 0; + goto out; + } + + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -HPAGE_PMD_NR); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + get_page(page); + set_pmd_at(vma->vm_mm, addr, pmd, + pmd_mkold(mk_huge_pmd(page, vma->vm_page_prot))); + page_add_anon_rmap(page, vma, addr, true); + mem_cgroup_commit_charge(page, memcg, true, true); + swap_free(entry, HPAGE_PMD_NR); + /* + * Move the page to the active list so it is not + * immediately swapped out again after swapon. + */ + activate_page(page); +out: + spin_unlock(ptl); +out_nolock: + return ret; +} +#else +static int unuse_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr, swp_entry_t entry, struct page *page) +{ + return 0; +} +#endif + /* * unuse_pte can return 1. Use a unique return value in this * context to denote requested frontswap pages are unused. @@ -2072,14 +2124,68 @@ static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned int type, unsigned long *fs_pages_to_unuse) { - pmd_t *pmd; + pmd_t *pmd, orig_pmd; + struct page *page; + swp_entry_t entry; + struct swap_info_struct *si; unsigned long next; int ret; + si = swap_info[type]; pmd = pmd_offset(pud, addr); do { cond_resched(); next = pmd_addr_end(addr, end); +restart: + orig_pmd = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && !*fs_pages_to_unuse && + is_swap_pmd(orig_pmd)) { + entry = pmd_to_swp_entry(orig_pmd); + if (swp_type(entry) != type) + continue; + + if (!transparent_hugepage_swapin_enabled(vma)) + goto split; + +swapin: + page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, + vma, addr, false); + if (!page) { + if (!pmd_same(*pmd, orig_pmd)) + goto restart; + goto split; + } + + /* + * Huge cluster has been split already, split + * PMD swap mapping and fallback to unuse PTE + */ + if (!PageTransCompound(page)) + goto fallback; + + lock_page(page); + wait_on_page_writeback(page); + ret = unuse_pmd(vma, pmd, addr, entry, page); + if (ret < 0) { + unlock_page(page); + put_page(page); + return ret; + } + + try_to_free_swap(page); + unlock_page(page); + put_page(page); + + continue; +split: + ret = split_swap_cluster(entry, 0); + if (ret == -EEXIST) + goto swapin; +fallback: + if (split_huge_swap_pmd(vma, pmd, + addr, orig_pmd)) + goto restart; + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) continue; ret = unuse_pte_range(vma, pmd, addr, next, type, From patchwork Fri Dec 14 06:27:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0519214E2 for ; Fri, 14 Dec 2018 06:28:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7EE52CC8C for ; Fri, 14 Dec 2018 06:28:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB0752CC99; Fri, 14 Dec 2018 06:28:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B24702CC8C for ; Fri, 14 Dec 2018 06:28:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77A678E01B8; Fri, 14 Dec 2018 01:28:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 72BF18E0014; Fri, 14 Dec 2018 01:28:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F4128E01B8; Fri, 14 Dec 2018 01:28:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 18A158E0014 for ; Fri, 14 Dec 2018 01:28:23 -0500 (EST) Received: by mail-pl1-f199.google.com with SMTP id t10so2937634plo.13 for ; Thu, 13 Dec 2018 22:28:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=6no0BFHShEvLnnAIIMmmltYfEjyOMtyEJzUskr4z9DQ=; b=SaTx8Vt0vRCUxSmWI3UfnDz3lDf7ZkIXawIzCy+r5P0BWUxdBZWaByVkeey58QpRB9 Akvs3xjFAzSBxpEZJf1a0h9e12VSg2z28Iu81Kl3wn/uAVECnhxMTVv/M8aLV4zIHPkU Hbufu0T7WE/HGXxbkP6logUsAXILwWldN352L9GFRUHsIHAPElqWLuX3DLp22UnQgXhe 34gzwNTyEXUOYphoQa4k3mwsvUdFoejRWfZu6Z/JgrwCioSYPZas5ri5Q1UXV5Igmq9X IzhAGf0jt1UPhiMfJxmYI8pLXibsGJfSrIS68JGs/YkG+qSk+uP40uyYEdh19vZQSORu 608Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYzgGvNaZ5eSs7jIHVKTgciKjCHxtGQx2Ieab5VLbrEWFp9E7fA 0PoLWqyqsB6SZq8ewZ/dUwzCaa/WRBl8f8Ebp5V/cDyo1kP7ExFEEXmS6luvU6uMxlDX+AdCkgg Ym/LekOC4mAu9BqoJR/dMLnwvrTzOHlx4tAIk2YuONbJKpjB3z/jRsat0eGaXMJqvNg== X-Received: by 2002:a17:902:15a8:: with SMTP id m37mr1770490pla.129.1544768902687; Thu, 13 Dec 2018 22:28:22 -0800 (PST) X-Google-Smtp-Source: AFSGD/W4kKs7Mfa/mG6hTPyjQkQ1v+f/GCb8BB7UIjOmRkJe9rwwEqjGf1iNhs4eRVEc/QgrXApR X-Received: by 2002:a17:902:15a8:: with SMTP id m37mr1770453pla.129.1544768901627; Thu, 13 Dec 2018 22:28:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768901; cv=none; d=google.com; s=arc-20160816; b=0hRGPztXVY6ysk0ritgmzQMkTQWjLs74qCkzjGqdxhlVPbw3OEQ4GPexD/uwHN4gzm F9zSIi+xYDFZZNc39l0mKI/g2ElFM7okqaJgr0ik1Lx9a1ngDeaU9sYm5rGDooRL5gxS QZsBxoIYCscfVMv6klkz4cBjgrTZv6XKhfT8QtamYYdUXPvg/dCu8GIP6na6LJQCrisy J/6d4B/Ie9+9/LqsqHClS6HuBIksiTLuOI939+K92XcQZtHBwdnwxT8stJ1jmoiTZVAM FlW9xoxmDwRSGfqS8u8zN0Ckcrb3CB5f1kzTj1/WrztpdHe16dQWJKf73808njCv/5NZ h4yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=6no0BFHShEvLnnAIIMmmltYfEjyOMtyEJzUskr4z9DQ=; b=AvS1daomMD/pMg8KdNRnJ2wTEf1v/g8G5zSg53N2/Q9nKk/wOhgX+UAiZI4u2iboVv yTNJGSkX13pvTemDMgrXZpRABEVoJAGgiya5HrnRwZmNYpMJWmgsKK/lE9YvvqOAlD85 hExK5cl/YinFZ+XOy3OckbVwapbbhlhtkPwfPJuGN7X6rjPxPbTeQ3I/1GL4PMLbXd2R 1nFCooaYBSIVO77oLGI0nNtDXciy66PeMW0yNo+IjYxoUy6MOb17xE13osFHlph2WHfS K0f687dkX8Ww2XdPffXnl9xkfOwBdptOU99rzo40sBkbCYATLhKwbWw+cY70ONjFnxFg vkZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id v19si3555849pfa.80.2018.12.13.22.28.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:21 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841038" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:18 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 14/21] swap: Support PMD swap mapping in madvise_free() Date: Fri, 14 Dec 2018 14:27:47 +0800 Message-Id: <20181214062754.13723-15-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When madvise_free() found a PMD swap mapping, if only part of the huge swap cluster is operated on, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Otherwise, if all huge swap cluster is operated on, free_swap_and_cache() will be called to decrease the PMD swap mapping count and probably free the swap space and the THP in swap cache too. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 52 ++++++++++++++++++++++++++++++++++-------------- mm/madvise.c | 2 +- 2 files changed, 38 insertions(+), 16 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fdffa07bff98..c895c2a2db6e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1883,6 +1883,15 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) } #endif +static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -1903,15 +1912,37 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto out_unlocked; orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (is_migration_entry(entry)) { + VM_BUG_ON(!thp_migration_supported()); + goto out; + } else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) { + /* + * If part of THP is discarded, split the PMD + * swap mapping and operate on the PTEs + */ + if (next - addr != HPAGE_PMD_SIZE) { + __split_huge_swap_pmd(vma, addr, pmd); + goto out; + } + free_swap_and_cache(entry, HPAGE_PMD_NR); + pmd_clear(pmd); + zap_deposited_table(mm, pmd); + if (current->mm == mm) + sync_mm_rss(mm); + add_mm_counter(mm, MM_SWAPENTS, -HPAGE_PMD_NR); + ret = true; + goto out; + } else + VM_BUG_ON(1); } + if (is_huge_zero_pmd(orig_pmd)) + goto out; + page = pmd_page(orig_pmd); /* * If other processes are mapping this page, we couldn't discard @@ -1957,15 +1988,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { diff --git a/mm/madvise.c b/mm/madvise.c index fac48161b015..c1845dab2dd4 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -321,7 +321,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long next; next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || is_swap_pmd(*pmd)) if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) goto next; From patchwork Fri Dec 14 06:27:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730651 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6251314DE for ; Fri, 14 Dec 2018 06:32:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FDC22C8FE for ; Fri, 14 Dec 2018 06:32:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 410DA2C90D; Fri, 14 Dec 2018 06:32:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BF522C8FE for ; Fri, 14 Dec 2018 06:32:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 110268E01C0; Fri, 14 Dec 2018 01:32:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 098388E0014; Fri, 14 Dec 2018 01:32:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7B508E01C0; Fri, 14 Dec 2018 01:32:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 91F118E0014 for ; Fri, 14 Dec 2018 01:32:27 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id p4so3126355pgj.21 for ; Thu, 13 Dec 2018 22:32:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:x-original-authentication-results:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+S54y5t6kpvX9QEHJRA3XOV19TYsZ8Xsnrd9uM2Q8Qw=; b=G2pkdFuKwscFCJCyhQVE7gKXlOJ43ZUyazroVv3u/fYPJPPQXumuIQzw7Vby2SHr0O zHfXohV7N4jP+uiND8aCM3L+e+2AkGb9oJE/Xiv57zA8POCNR2GGE/hgbKEpoytIpvnv T1w3WBwrQrHZlruHZS8BzRR2IIzJAmFoRXcjQWKHprc3NfLcfNqyzTDHVRQsbykDywbY DCr/5kTE5Sg9e40dIX5qEhVYuufWXKTYbjy8AmmRN7vT67SX46hPmdNE/yuoF7t8k6eK 9uUKDBNbDpYPezz0lpG2DLsinIBpfqYDZI0Wu7JwEMsHI9t6Y1nehyAuIco+WnhHQ/OF GRaQ== X-Gm-Message-State: AA+aEWY45+rxnTMuBiVX17SfT8ZlEj2Nb7C8yaaavpE7c+s/V5MrYpyv qcxu9hDktK0IFrHzuMz3Kr36Byu7xhl7xP4GT0FrNsCufHkITEJsdoJsSW2CoonJqmJPU8bzWXP XF9/iyqu+oI0A23WVFlJZgbC6issHt0twJfJZBO+2PRl67A7i9zFapuAsRpFW/zttbgWep5HiHY l/Txlz6CD/Zon2fQdZTN+JrL9xZhHK4khQN+JzjWBaRPjjzRSOfuzZecTnFN2doxtPc4c+NUh2p Z6r9aWNGkr5VPcLGuJBI4s1wAFquXI5HZzafy6Opg9Vj9Mq3tboPfBtRMOqJZLjGsEna7wgMYms /lOIFccMwCpEfquFKnIbUyyzUmOZnJn2gWs0sD08u8uUbvB0mcoQkh0i7o8Ms/eIWxTKRgE3Phj y X-Received: by 2002:a62:9f01:: with SMTP id g1mr1671138pfe.223.1544768907044; Thu, 13 Dec 2018 22:28:27 -0800 (PST) X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Received: by 2002:a62:9f01:: with SMTP id g1mr1671132pfe.223.1544768906888; Thu, 13 Dec 2018 22:28:26 -0800 (PST) X-Google-Smtp-Source: AFSGD/XHey0C88iMljWwciIEUME8S+6B10zCSQrb/Pc5F+Zh2FA/fSE9z4lBGAxWPejwBzrM1NLu X-Received: by 2002:a62:9f01:: with SMTP id g1mr1671081pfe.223.1544768905332; Thu, 13 Dec 2018 22:28:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768905; cv=none; d=google.com; s=arc-20160816; b=C6GOhu54ZxuFDTwYdilk7Y1oGvd8OP7C0jU+dlXmQFCuCQvgERxuk8q7WY+SFLJ305 axxLIog0o5gW/J3kcGCUSPmqy8isq4iHLldjBk+k53DE+rMArM7sDSyPFBWMy2ve/ilj JgH4coOBSJciF6vOUGtn1qF6cfsIJ9+th5I39DONfQwbpQeLYAIOZoiERMvpfL60is3K SDW8Jjs9EgjJdMbpvG23yG6xEMkLjTxiMSVWT0+kSQBv9AALm2MOmtcRXPpELDTaP+g9 RPLKZFWvcjnKKxF9/TNoqN8k5Ix3cO4HjFVxEjcpdC62nLMoEP23OVjprfZ5ZlC1xtiN 34OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+S54y5t6kpvX9QEHJRA3XOV19TYsZ8Xsnrd9uM2Q8Qw=; b=X2gxvQ7MsE0k4gbPEntIQroumSWbz2gIgArQ95C3igOSeBgtHzXLy4ew+VcjDnhYop LJaNs0Fa73Wuyht8NIj+Kury2iRell0/WSl836eYQN2GJJ+ENvwSrqaZ+owApj71pTA+ pcusiYqYqSHkcTtnyosmq6A3CCsSFwsb2AbgZTLz67bQbCvrePOv/O4t20SRcu5gV5BS RAq3m5rfKilxWlalz39zb0DQlON/+gbicy766lTqVaCsy1xcgc1XOrEYCC5sgKijV8nS mOmrKAMAhfGme2+ZkhojxNE+z/L8IRs4rSl6aBidMyLjPbh8dWBm49FlG+Q0dEQuyWO5 kN2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id m7si3331602pgi.547.2018.12.13.22.28.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:25 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841053" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:21 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 15/21] swap: Support to move swap account for PMD swap mapping Date: Fri, 14 Dec 2018 14:27:48 +0800 Message-Id: <20181214062754.13723-16-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Previously the huge swap cluster will be split after the THP is swapout. Now, to support to swapin the THP in one piece, the huge swap cluster will not be split after the THP is reclaimed. So in memcg, we need to move the swap account for PMD swap mappings in the process's page table. When the page table is scanned during moving memcg charge, the PMD swap mapping will be identified. And mem_cgroup_move_swap_account() and its callee is revised to move account for the whole huge swap cluster. If the swap cluster mapped by PMD has been split, the PMD swap mapping will be split and fallback to PTE processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 7 ++ include/linux/swap.h | 6 ++ include/linux/swap_cgroup.h | 3 +- mm/huge_memory.c | 7 +- mm/memcontrol.c | 131 ++++++++++++++++++++++++++++-------- mm/swap_cgroup.c | 45 ++++++++++--- mm/swapfile.c | 14 ++++ 7 files changed, 173 insertions(+), 40 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7c72e63757af..3c05294689c1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -374,6 +374,8 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_THP_SWAP +extern void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -401,6 +403,11 @@ static inline bool transparent_hugepage_swapin_enabled( return false; } #else /* CONFIG_THP_SWAP */ +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) +{ +} + static inline int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 4bd532c9315e..6463784fd5e8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -622,6 +622,7 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); +extern int get_swap_entry_size(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { @@ -632,6 +633,11 @@ static inline int split_swap_cluster_map(swp_entry_t entry) { return 0; } + +static inline int get_swap_entry_size(swp_entry_t entry) +{ + return 1; +} #endif #ifdef CONFIG_MEMCG diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..c40fb52b0563 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -7,7 +7,8 @@ #ifdef CONFIG_MEMCG_SWAP extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new); + unsigned short old, unsigned short new, + unsigned int nr_ents); extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents); extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c895c2a2db6e..e460241ea761 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1670,10 +1670,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP /* Convert a PMD swap mapping to a set of PTE swap mappings */ -static void __split_huge_swap_pmd(struct vm_area_struct *vma, - unsigned long addr, - pmd_t *pmd) +void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -1705,7 +1705,6 @@ static void __split_huge_swap_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -#ifdef CONFIG_THP_SWAP int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b860dd4f75f2..ac1abfcfab88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2667,9 +2667,10 @@ void mem_cgroup_split_huge_fixup(struct page *head) #ifdef CONFIG_MEMCG_SWAP /** * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved + * @entry: the first swap entry to be moved * @from: mem_cgroup which the entry is moved from * @to: mem_cgroup which the entry is moved to + * @nr_ents: number of swap entries * * It succeeds only when the swap_cgroup's record for this entry is the same * as the mem_cgroup's id of @from. @@ -2680,23 +2681,27 @@ void mem_cgroup_split_huge_fixup(struct page *head) * both res and memsw, and called css_get(). */ static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { unsigned short old_id, new_id; old_id = mem_cgroup_id(from); new_id = mem_cgroup_id(to); - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); + if (swap_cgroup_cmpxchg(entry, old_id, new_id, nr_ents) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -nr_ents); + mod_memcg_state(to, MEMCG_SWAP, nr_ents); return 0; } return -EINVAL; } #else static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) + struct mem_cgroup *from, + struct mem_cgroup *to, + unsigned int nr_ents) { return -EINVAL; } @@ -4649,6 +4654,7 @@ enum mc_target_type { MC_TARGET_PAGE, MC_TARGET_SWAP, MC_TARGET_DEVICE, + MC_TARGET_FALLBACK, }; static struct page *mc_handle_present_pte(struct vm_area_struct *vma, @@ -4715,6 +4721,28 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, + pmd_t pmd, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pmd_to_swp_entry(pmd); + + if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + return NULL; + + /* + * Because lookup_swap_cache() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swp_offset(ent)); + if (do_memsw_account()) + entry->val = ent.val; + + return page; +} +#endif + static struct page *mc_handle_file_pte(struct vm_area_struct *vma, unsigned long addr, pte_t ptent, swp_entry_t *entry) { @@ -4903,7 +4931,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, * There is a swap entry and a page doesn't exist or isn't charged. * But we cannot move a tail-page in a THP. */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && + if (ent.val && !ret && + ((page && !PageTransCompound(page)) || + (!page && get_swap_entry_size(ent) == 1)) && mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { ret = MC_TARGET_SWAP; if (target) @@ -4914,37 +4944,64 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. + * We don't consider file mapped pages because THP does not support + * them for now. */ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { + pmd_t pmd = *pmdp; struct page *page = NULL; enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; + if (is_pmd_migration_entry(pmd)) { + VM_BUG_ON(!thp_migration_supported()); + return ret; + } + if (!IS_ENABLED(CONFIG_THP_SWAP)) { + VM_BUG_ON(1); + return ret; + } + page = mc_handle_swap_pmd(vma, pmd, &ent); + /* The swap cluster has been split under us */ + if ((page && !PageTransHuge(page)) || + (!page && ent.val && get_swap_entry_size(ent) == 1)) { + __split_huge_swap_pmd(vma, addr, pmdp); + ret = MC_TARGET_FALLBACK; + goto out; + } + } else { + page = pmd_page(pmd); + get_page(page); } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); + VM_BUG_ON_PAGE(page && !PageHead(page), page); if (!(mc.flags & MOVE_ANON)) - return ret; - if (page->mem_cgroup == mc.from) { + goto out; + if (!page && !ent.val) + goto out; + if (page && page->mem_cgroup == mc.from) { ret = MC_TARGET_PAGE; if (target) { get_page(page); target->page = page; } } + if (ent.val && !ret && !page && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } +out: + if (page) + put_page(page); return ret; } #else static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) + unsigned long addr, pmd_t *pmdp, union mc_target *target) { return MC_TARGET_NONE; } @@ -4957,6 +5014,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; + int ret; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -4965,12 +5023,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, * support transparent huge page with MEMORY_DEVICE_PUBLIC or * MEMORY_DEVICE_PRIVATE but this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; + ret = get_mctgt_type_thp(vma, addr, pmd, NULL); spin_unlock(ptl); + if (ret == MC_TARGET_FALLBACK) + goto fallback; + if (ret) + mc.precharge += HPAGE_PMD_NR; return 0; } +fallback: if (pmd_trans_unstable(pmd)) return 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -5161,6 +5223,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, enum mc_target_type target_type; union mc_target target; struct page *page; + swp_entry_t ent; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { @@ -5168,8 +5231,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, spin_unlock(ptl); return 0; } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { + target_type = get_mctgt_type_thp(vma, addr, pmd, &target); + switch (target_type) { + case MC_TARGET_PAGE: page = target.page; if (!isolate_lru_page(page)) { if (!mem_cgroup_move_account(page, true, @@ -5180,7 +5244,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, putback_lru_page(page); } put_page(page); - } else if (target_type == MC_TARGET_DEVICE) { + break; + case MC_TARGET_DEVICE: page = target.page; if (!mem_cgroup_move_account(page, true, mc.from, mc.to)) { @@ -5188,9 +5253,21 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, mc.moved_charge += HPAGE_PMD_NR; } put_page(page); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to, + HPAGE_PMD_NR)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_swap += HPAGE_PMD_NR; + } + break; + default: + break; } spin_unlock(ptl); - return 0; + if (target_type != MC_TARGET_FALLBACK) + return 0; } if (pmd_trans_unstable(pmd)) @@ -5200,7 +5277,6 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, for (; addr != end; addr += PAGE_SIZE) { pte_t ptent = *(pte++); bool device = false; - swp_entry_t ent; if (!mc.precharge) break; @@ -5234,7 +5310,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, break; case MC_TARGET_SWAP: ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + if (!mem_cgroup_move_swap_account(ent, mc.from, + mc.to, 1)) { mc.precharge--; /* we fixup refcnts and charges later. */ mc.moved_swap++; diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 45affaef3bc6..ccc08e88962a 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -87,29 +87,58 @@ static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, /** * swap_cgroup_cmpxchg - cmpxchg mem_cgroup's id for this swp_entry. - * @ent: swap entry to be cmpxchged + * @ent: the first swap entry to be cmpxchged * @old: old id * @new: new id + * @nr_ents: number of swap entries * * Returns old id at success, 0 at failure. * (There is no mem_cgroup using 0 as its id) */ unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, - unsigned short old, unsigned short new) + unsigned short old, unsigned short new, + unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; + struct swap_cgroup *sc_start, *sc; unsigned long flags; unsigned short retval; + pgoff_t offset_start = swp_offset(ent), offset; + pgoff_t end = offset_start + nr_ents; - sc = lookup_swap_cgroup(ent, &ctrl); + sc_start = lookup_swap_cgroup(ent, &ctrl); spin_lock_irqsave(&ctrl->lock, flags); - retval = sc->id; - if (retval == old) + sc = sc_start; + offset = offset_start; + for (;;) { + if (sc->id != old) { + retval = 0; + goto out; + } + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + + sc = sc_start; + offset = offset_start; + for (;;) { sc->id = new; - else - retval = 0; + offset++; + if (offset == end) + break; + if (offset % SC_PER_PAGE) + sc++; + else + sc = __lookup_swap_cgroup(ctrl, offset); + } + retval = old; +out: spin_unlock_irqrestore(&ctrl->lock, flags); return retval; } diff --git a/mm/swapfile.c b/mm/swapfile.c index 454e993bc32f..f4c458768bd4 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1730,6 +1730,20 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, return map_swapcount; } +#ifdef CONFIG_THP_SWAP +int get_swap_entry_size(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + + si = _swap_info_get(entry); + if (!si || !si->cluster_info) + return 1; + ci = si->cluster_info + swp_offset(entry) / SWAPFILE_CLUSTER; + return cluster_is_huge(ci) ? SWAPFILE_CLUSTER : 1; +} +#endif + /* * We can write to an anon page without COW if there are no other references * to it. And as a side-effect, free up its swap: because the old content From patchwork Fri Dec 14 06:27:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730637 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A04714E2 for ; Fri, 14 Dec 2018 06:28:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 186DA2CC8C for ; Fri, 14 Dec 2018 06:28:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0CD282CC99; Fri, 14 Dec 2018 06:28:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 709922CC8C for ; Fri, 14 Dec 2018 06:28:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C64FD8E01B9; Fri, 14 Dec 2018 01:28:29 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B98568E0014; Fri, 14 Dec 2018 01:28:29 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A893E8E01B9; Fri, 14 Dec 2018 01:28:29 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 655548E0014 for ; Fri, 14 Dec 2018 01:28:29 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id n17so3538897pfk.23 for ; Thu, 13 Dec 2018 22:28:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=VheNffx6Ok7tJ2puIz2iBr9EOUU/bjMKyAANBaJPPhk=; b=tF3Bnkcxda9a6Oga7n/V1z5zmQEoC8LYP4DH9+vG0bVZ8vVOLU/Cb43YV2qfax7je5 qnmuxCO+woFhCg9IFLp6ztMlwiz3aov+xLQprouNXC8JxBxxuXVoTNduCX/d5Aqi2PdC 4ctgBFX8oVnHXvLZrvhftTXjxzMdN2/pAkyVtyNp7IgtVW2antz1/IRq2MNhv3047Jk4 YJLSCDKovdC0Zk25P52AthO0Swm64yGy/Icu/xuDUAwc8lE2ZupreOGFhyQ9De6ooWZ9 OtadCiLIBaIwgcOsSHulVGifMHxoTHh2S/wR7AaAAXX/nk9BwhEnMolzpqse0e3dk9W1 2cfg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZJUnocLbDeYMitvK9M/Q1hMCkqmIbH3bvV8TRTM95pAFFpW1Qn 7KLJ6IwH4owW/D0ik5lVmVl2orMR1Y6rF4rXNRD3uoMkeSTitGh3LzMYP6quQPJfLCFuzopeQUB S6/8g1Mu24tF1Ct4y9vdrJ0YLX5uKjzISd7EMmLMjwf6ZqnKHpe2KG4lADNgPTNdcuA== X-Received: by 2002:a62:1bd7:: with SMTP id b206mr1678090pfb.213.1544768909083; Thu, 13 Dec 2018 22:28:29 -0800 (PST) X-Google-Smtp-Source: AFSGD/XjSiJ7Gk+GgLJLS5efA0I3fklxNSyQ+Q080knf72XwT5soyJb1PKqXvgga737Lsykdgar3 X-Received: by 2002:a62:1bd7:: with SMTP id b206mr1678058pfb.213.1544768907928; Thu, 13 Dec 2018 22:28:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768907; cv=none; d=google.com; s=arc-20160816; b=ZklGIodXOYgZGu1hbiwylKFG9A6p2PaqvrQND7PteE6eihGRRPLX5XsC8CotyazjAs rI9XNrJFi4BARmEHYnj8yQ65SI4jb2H0O6pwBBdQjaI4HTMdscZF5ENfCD+BfN1pTlIj hIE2UBmb5rD82V/CEQWCsCWzLOg7X/fxTAvi1rg8q6q1kgh4rIKKKq6a2imipf6ddEp4 9uAwUyQk9XptRkni6wZu5hzukF6ZP68Hubtne76mACX28O1C8MKgx531QVj2bGZIzQyS LFe2NBWFexjAxSYZ9MrjbSKLA1O4joe2GbmykvIvZ3kty1A69B/6SdFlvH4pmZISLfGV erOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=VheNffx6Ok7tJ2puIz2iBr9EOUU/bjMKyAANBaJPPhk=; b=UY2U+UU1hQrDsSLo3edB2wi65m/AXQTVEVwsuspjDZf4shhr5qfAB5t0IJidcKQZRz 5g9xV+jl/Bsiykc+UrCP5SojzZVozFb30EnCxPa91UCPxPOd1HhSgHzv6j50OOSkH1w2 3SCWDGcgXjAN9sM2tKEUWC0cdN0V7u46zoCjsAKFiOhyZUJdeTBm3r3OWsl4KIGuhSS4 a7ZlBeewvxkD4KOuaYmOdRwKlwLtbJ7ltv0bF8tcNzWC6Ut6OHCrgiUyis4zU/7t5VME 0CGbarDPIApA5XcxG4Bv/B6Q6hW00ApjbRRalgTuy/mve/iixjPUNVbJ/OXsrXPZMSOY jGVw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id d8si1744446plo.196.2018.12.13.22.28.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:27 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841066" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:24 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 16/21] swap: Support to copy PMD swap mapping when fork() Date: Fri, 14 Dec 2018 14:27:49 +0800 Message-Id: <20181214062754.13723-17-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During fork, the page table need to be copied from parent to child. A PMD swap mapping need to be copied too and the swap reference count need to be increased. When the huge swap cluster has been split already, we need to split the PMD swap mapping and fallback to PTE copying. When swap count continuation failed to allocate a page with GFP_ATOMIC, we need to unlock the spinlock and try again with GFP_KERNEL. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 72 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e460241ea761..b083c66a9d09 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -974,6 +974,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (unlikely(!pgtable)) goto out; +retry: dst_ptl = pmd_lock(dst_mm, dst_pmd); src_ptl = pmd_lockptr(src_mm, src_pmd); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); @@ -981,26 +982,67 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pmd = *src_pmd; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry = pmd_to_swp_entry(pmd); - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (is_write_migration_entry(entry)) { - make_migration_entry_read(&entry); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_migration_entry(entry)) { + if (is_write_migration_entry(entry)) { + make_migration_entry_read(&entry); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } #endif + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) { + ret = swap_duplicate(&entry, HPAGE_PMD_NR); + if (!ret) { + add_mm_counter(dst_mm, MM_SWAPENTS, + HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, + pgtable); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + /* make sure dst_mm is on swapoff's mmlist. */ + if (unlikely(list_empty(&dst_mm->mmlist))) { + spin_lock(&mmlist_lock); + if (list_empty(&dst_mm->mmlist)) + list_add(&dst_mm->mmlist, + &src_mm->mmlist); + spin_unlock(&mmlist_lock); + } + } else if (ret == -ENOTDIR) { + /* + * The huge swap cluster has been split, split + * the PMD swap mapping and fallback to PTE + */ + __split_huge_swap_pmd(vma, addr, src_pmd); + pte_free(dst_mm, pgtable); + } else if (ret == -ENOMEM) { + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + ret = add_swap_count_continuation(entry, + GFP_KERNEL); + if (ret < 0) { + ret = -ENOMEM; + pte_free(dst_mm, pgtable); + goto out; + } + goto retry; + } else + VM_BUG_ON(1); + goto out_unlock; + } + VM_BUG_ON(1); + } if (unlikely(!pmd_trans_huge(pmd))) { pte_free(dst_mm, pgtable); From patchwork Fri Dec 14 06:27:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730641 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2AF8C14E2 for ; Fri, 14 Dec 2018 06:28:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 176FA27F3E for ; Fri, 14 Dec 2018 06:28:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B1CB2C11D; Fri, 14 Dec 2018 06:28:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 380FD27F3E for ; Fri, 14 Dec 2018 06:28:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2735D8E01BA; Fri, 14 Dec 2018 01:28:32 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1D5188E0014; Fri, 14 Dec 2018 01:28:32 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 001E18E01BA; Fri, 14 Dec 2018 01:28:31 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id AC6978E0014 for ; Fri, 14 Dec 2018 01:28:31 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id s71so3538137pfi.22 for ; Thu, 13 Dec 2018 22:28:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+rlS1q2H7SC2NClWo6jHuSbLY+tZwEhTesLeIKdDMqs=; b=GsjVYWeu7XW7yXBaSvem9aIxibKYpyY31rbjdC55ee0FJ2mCg0T2/p1zkA5bAN7EuA ukEKPEVUvNK+DZf56qUtdi0f1ytoFovxLlzPx7+98anrnr4pNkTmPU+/rfpVK9z0fIwj jS8VnySxRiemHcmJdxcTluW21qrF8QOglC2Eod3Zvita0OsKvHb3NfjEnXvAmyoc2q+f iBtQUaY58StEea5gr+jolKZGA49O5oXLdUbhYIt+VQrqTf86cSlvg2ZV8Axi7a9xm2oS hzJdF7I4xK8LdfxH9GlGYhSajdAueo/MP8mgVfEqGB6aaVTPN1fmi74pwHb0KKs06Yn2 pGtA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZAsJ63g0m4Judlr8QgXSiZM8Y4KbLId+znvqu0gV1Bba0B8s68 Y+907HvaFt6ERWJ5BgYPf0Oo94S4/pOAbqLr7rnt+t5pCoBzPhIr/LV7C6sXF4YWcwsFEm+SrF4 Hj++T5jfUahv+9ZWeamxGTGwKgBRA1+FJG0AlEiaIXq81IFofvRYvQ7on3D+dKbJKKA== X-Received: by 2002:a63:8441:: with SMTP id k62mr1652726pgd.392.1544768911361; Thu, 13 Dec 2018 22:28:31 -0800 (PST) X-Google-Smtp-Source: AFSGD/XYKnE3ax+LjIHU+cWriOCfdkK58yQe4pXV5LZ2sdB22tqmHtTGaXkzBCJcPQ6D2J8akDUr X-Received: by 2002:a63:8441:: with SMTP id k62mr1652694pgd.392.1544768910263; Thu, 13 Dec 2018 22:28:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768910; cv=none; d=google.com; s=arc-20160816; b=EVQPNMuLU03oFI0Y/0J3aLXLhVm3neIvDh24d6ToWEJ14wBfHHA+AcXMDJfNCu0eeI +7mEufmpLWfkL64VdnUQPmVsNQpVWbsAEq1gIh7pKPxnwmh4CtfJUlEQbuBHbwwMvlFi bor6FT6J7wetafdBLwK+BERHzG7qg/3cw0/BmyPPeO8HSI9tS5SF3odvpRk8iQ330b5m fFbcdlNm6d9i4foSgiEYgNdwoiCAD3IvVU12AXo96/LWC0BXb9LFO5/EwLYlgcRdKXEY NInlYF/zvBW6ftysi0viy1NkGtC98t0BxHzABQLdjFpTCl+5e7nnGfNjPRPneOeol1WW 341w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+rlS1q2H7SC2NClWo6jHuSbLY+tZwEhTesLeIKdDMqs=; b=hONoj4fH4BeiqVU6ESoecAoNfCr9zm4NAlaYEs7NvOkpr6Bo8xPGh9UXenUDYesB1Z Pj266TxoZFrbQCrB69PTzo9nvopTggv86pIWG5RAId0PmBUi6sVQgBEZx87nXlnqP/Y1 g3EDXsOgf443G5XzbIOynSd/CbocnADXnBAjwb+kHAbtne20YOJtg6QCA4Rhc/VHHLl5 86LlRNoIf9TNdnGV0/k8qkd4OtouIC37+GeqtmZdPCiK0dpDUCwzEcZTE/RAa59gOzkE AfwILUBrozEwg4gsBnPSGDcGyKXomioXsGGL1GrOW76jKndg2mTM1KdQugJTMHxaTWvZ zDMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id d8si1744446plo.196.2018.12.13.22.28.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:30 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841080" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:27 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 17/21] swap: Free PMD swap mapping when zap_huge_pmd() Date: Fri, 14 Dec 2018 14:27:50 +0800 Message-Id: <20181214062754.13723-18-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For a PMD swap mapping, zap_huge_pmd() will clear the PMD and call free_swap_and_cache() to decrease the swap reference count and maybe free or split the huge swap cluster and the THP in swap cache. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/huge_memory.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b083c66a9d09..6d144d687e69 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2055,7 +2055,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); - } else if (is_huge_zero_pmd(orig_pmd)) { + } else if (pmd_present(orig_pmd) && is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); @@ -2068,17 +2068,27 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - page = pfn_to_page(swp_offset(entry)); + } else { + swp_entry_t entry = pmd_to_swp_entry(orig_pmd); + + if (thp_migration_supported() && + is_migration_entry(entry)) + page = pfn_to_page(swp_offset(entry)); + else if (IS_ENABLED(CONFIG_THP_SWAP) && + !non_swap_entry(entry)) + free_swap_and_cache(entry, HPAGE_PMD_NR); + else { + WARN_ONCE(1, +"Non present huge pmd without pmd migration or swap enabled!"); + goto unlock; + } flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + } - if (PageAnon(page)) { + if (!page) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_SWAPENTS, -HPAGE_PMD_NR); + } else if (PageAnon(page)) { zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { @@ -2086,7 +2096,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, zap_deposited_table(tlb->mm, pmd); add_mm_counter(tlb->mm, mm_counter_file(page), -HPAGE_PMD_NR); } - +unlock: spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE); From patchwork Fri Dec 14 06:27:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730639 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9D9A91E for ; Fri, 14 Dec 2018 06:28:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B62542A21D for ; Fri, 14 Dec 2018 06:28:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A8FEE2BE76; Fri, 14 Dec 2018 06:28:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B1DEF2AAE1 for ; Fri, 14 Dec 2018 06:28:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68FB68E01BB; Fri, 14 Dec 2018 01:28:34 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 63F948E0014; Fri, 14 Dec 2018 01:28:34 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52FAA8E01BB; Fri, 14 Dec 2018 01:28:34 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 109138E0014 for ; Fri, 14 Dec 2018 01:28:34 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id a10so2932122plp.14 for ; Thu, 13 Dec 2018 22:28:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=7Xky8fq8ebLSHZnapWHSGLx0rwhW1evYiCK055LhMTQ=; b=mljgiMz2BXQRAda7LoUFdo9XuSy3X5lPIcwHLTOrhlsQLAYcYSnp54NjholMivFj1o qOzcjCPhOogC9zn636lbQpOXHoqwdTy2/eAd8CpBSnNcmuopQhbNUEkagl8FGGJZNz+I OB0wD7p2LsnYCTHKeY7Q0ZHZqAhTRE7Xu+wVx5G1+29TqJ3T9irdUv409YHR/Qhq/3Pp 0QcdIomaNw0dIPqYvV8WB4K+KqiwBpSaKh/WFUzGXwyPASrDe669eIo1xDdAKiBvicai Sb4Juw5gFsiYjBr+sEAaooVMejksU7S622MXqz/s0WKFgTRpT2tSipDa5qaSB6tLT8vl FmDQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYmfh36SLzsgfD+bV8LKKHNPBBxgHFbHrJmvlbodJ+Ycufuy0/0 XR0UY5GfS8S/nTvWBSN8nXYCoLL2VkERtBnYTjx0p2YFCQz2ia/vda/g/ruY7i9GnJDCNj8osZ8 Bf/J9fcloU1ZaSTLdb6RoiggHF5fSQawUxkW1cIzNKv/oNH7sgc/4MNH03jGyem/L4g== X-Received: by 2002:a63:9501:: with SMTP id p1mr1649316pgd.149.1544768913687; Thu, 13 Dec 2018 22:28:33 -0800 (PST) X-Google-Smtp-Source: AFSGD/WEDPWJEAiBcTa7owlQE9xioVoXvT9jWl44qZ9DnSyRkdXh54xnD1d3gL/mKKzJ55VY1Jfl X-Received: by 2002:a63:9501:: with SMTP id p1mr1649296pgd.149.1544768913054; Thu, 13 Dec 2018 22:28:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768913; cv=none; d=google.com; s=arc-20160816; b=wsCZjAgcpR6kJK3+zQyVo3a0F0CRZ+XMCrJ6J87vzhSPrf4GWh97WI4t6EY/YKHhw0 hCQZRDxKyCe4zmuC5b23GAFBy8/F/vTlkaF0xGX9hN/pyzikle1MPbxOUExKFTQJ6s+1 ntfFYht/k5eHOqIC166BNAVRfQmWHYJhdtwG42A8dw9+NRpAsZinws7bl2wmXg3sEvZh Svyw5Tmf5VjJ4pmBkfJLtSRzXOwiaFf/iHiOAfJaUbcz+SRenxRuNaQ8eZ2VZ7b14f3B KwSgrwbjyOzX/wrrd+Brt+JfW8pav8B43NCaiu6Cu4bQkgQ4EJ6H4z78+FDbfQdtl2XI v0rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=7Xky8fq8ebLSHZnapWHSGLx0rwhW1evYiCK055LhMTQ=; b=ZMbLCOJIlX0x+4iqtGDHcwma0bBtA08BcRSvDXS0TyZwEgDyRfM7PCjNWOqppI3UzO l8U040bEW6NOUfbS6fJYakx4KO4+yE0HQn9yYgs5echLw6sw+StjkqRMrVXoWVkNp2mj Mnq1OA730UelGBmJf5UCNeveN+bnzHRQOrx98WpqPXCfsTCn/GJwDWhwySlPwpm4hJz6 aj18V1rZNwnXSxmdLflQoD8io5MSnBULDc5zq72vBGsaw+msh+xcr26swSF767EpMVeT fqf0kJ4ayVLuCunpmb+O/smlJZ3P2TX1Fisb8s66pLGHgO26W5oFdDFF8iNGD/8/yb4J ODxg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id d8si1744446plo.196.2018.12.13.22.28.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:33 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841094" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:30 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 18/21] swap: Support PMD swap mapping for MADV_WILLNEED Date: Fri, 14 Dec 2018 14:27:51 +0800 Message-Id: <20181214062754.13723-19-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During MADV_WILLNEED, for a PMD swap mapping, if THP swapin is enabled for the VMA, the whole swap cluster will be swapin. Otherwise, the huge swap cluster and the PMD swap mapping will be split and fallback to PTE swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/madvise.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index c1845dab2dd4..84d055c19dd4 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -196,14 +196,36 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + swp_entry_t entry; + struct page *page; + pmd_t pmdval; + + pmdval = *pmd; + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(pmdval) && + !is_pmd_migration_entry(pmdval)) { + entry = pmd_to_swp_entry(pmdval); + if (!transparent_hugepage_swapin_enabled(vma)) { + if (!split_swap_cluster(entry, 0)) + split_huge_swap_pmd(vma, pmd, start, pmdval); + } else { + page = read_swap_cache_async(entry, + GFP_HIGHUSER_MOVABLE, + vma, start, false); + if (page) { + /* The swap cluster has been split under us */ + if (!PageTransHuge(page)) + split_huge_swap_pmd(vma, pmd, start, + pmdval); + put_page(page); + } + } + } if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; for (index = start; index != end; index += PAGE_SIZE) { pte_t pte; - swp_entry_t entry; - struct page *page; spinlock_t *ptl; orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); From patchwork Fri Dec 14 06:27:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730643 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC28F91E for ; Fri, 14 Dec 2018 06:28:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9ABC2BE76 for ; Fri, 14 Dec 2018 06:28:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD1DF2C1CC; Fri, 14 Dec 2018 06:28:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1DB3D2BE76 for ; Fri, 14 Dec 2018 06:28:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B71828E01BC; Fri, 14 Dec 2018 01:28:37 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B4DE78E0014; Fri, 14 Dec 2018 01:28:37 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A12158E01BC; Fri, 14 Dec 2018 01:28:37 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 5D8F08E0014 for ; Fri, 14 Dec 2018 01:28:37 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id 82so3551028pfs.20 for ; Thu, 13 Dec 2018 22:28:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=PneHrFIvBxeR39UtwCqcyYL55SkL1f0iPo6rjUY3mOU=; b=No/wVvG9RNkEZ2/odhM8FxnftUnU+vVGPRHcEpEje/P/GMvgFp5JS4qK+xuLLVwQNk R4OVuH81srNrBvXr2a8GQK2Df14TyHhEhPnND5sJ8M7HWOb0ZzTjdWmJvk4jPLBHW/Jr xARBDqONzm3TQ21RQbJ6rzrpZzU+lvNo1qXsmhwvThs4QXCz2Xg+NKCqBlFmLpja0JaT qtw2qY/dB/A3xsLvBQEfOhDkuTiM6CFPKC5InvaXwqGfZgmfE2UcPu5pcV24VvqNzlOe VOKHIQelEbam69i9UqNC9yhF/JKlDXBbKRzCodireQhaP5n7q+oSN95sBVbOHYgdXsCP gcwQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbJDLEVq7uBWj7AOvcm3cy5ZCsxaes5cWXDDiajuBpP/Gi+gXuF yDghCrSzAN+N5HiKFN00LGhb8RmKzp3/TBOlY5jCllB/SrhGQGjEwRNQKl2LP4tvnTVT7bhmq8u PoxU80sJSiwuXdoBr45iLEY7HYv0YxKhANrvU+54UbEl2exw7Fd+JUpTTr+YnVQgk6A== X-Received: by 2002:a63:e615:: with SMTP id g21mr1684233pgh.290.1544768917049; Thu, 13 Dec 2018 22:28:37 -0800 (PST) X-Google-Smtp-Source: AFSGD/V/hi+Wdfl+9qNYiLsuim/oUBBqMu0vupDMB7FY8Tcq7B3fCUBd8E09bNqRUzaxFjDRSRqJ X-Received: by 2002:a63:e615:: with SMTP id g21mr1684206pgh.290.1544768915914; Thu, 13 Dec 2018 22:28:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768915; cv=none; d=google.com; s=arc-20160816; b=FyAA8hswxTkcHetmipMrTxVlvy5N3eRULIZVA+5GT8ewaDHYcljANPCjvuL3e9mibr QysMqJQw+i73ovbOJx+eobXlFNUgGaTKQR5OVX3qdE1OaMu0zCx75YQZk9/xJMIkIrOp kTugOQwBVUerTOR1xvERzvZYSEYojSTi+SaHn460HjCUhk0TjgCYJ64f/YZvVR4K2+0e 6DUzHQ67rb3biqA2XgM9vjIM/FaFXKo1alCafykfZwH+WAVFuAvzupzrdoi/A2CQ5cGN 7MP4/0/DZ8qCSJEvhUEUNQHMQxiUoJsbOdZXNlCyQuVRDEHfFqB5vnUGdd3wsUQCsYAM DwxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=PneHrFIvBxeR39UtwCqcyYL55SkL1f0iPo6rjUY3mOU=; b=TKb/5P8egUK5r6nRJd5wRg9DRGqX+SgWYSO7tmjmQvkksmpJK7Jn1x70ZY7ehNo8qD rxn6j0eerTYbZYC6XTVbEtrz0QvrcrZ6XoJmFDrwxzGyv1NsQNV5xsyQaiP2oLwNg0cl qJxg8+pYprng1+EiIRjO1fTlH1X+joX2UgrnAxGlU6UWiTUDWxMjzca9faME+ibcM4lX K5c/tP2YtdSWoHv+iw/wHcDJd1ycJcFgfP82F689eXyBJ5wxVJ8hsLpM0W33b8O8IpWb P1v3XEf57Y6YebhdnFHYNPCPYiJhHHU4MbPbgpYtwydc/7Ok4ks0m4Nrcrg4FWrvhm9q i91A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id d8si1744446plo.196.2018.12.13.22.28.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:35 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841102" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:32 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 19/21] swap: Support PMD swap mapping in mincore() Date: Fri, 14 Dec 2018 14:27:52 +0800 Message-Id: <20181214062754.13723-20-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During mincore(), for PMD swap mapping, swap cache will be looked up. If the resulting page isn't compound page, the PMD swap mapping will be split and fallback to PTE swap mapping processing. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- mm/mincore.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index aa0e542569f9..1d861fac82ee 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -48,7 +48,8 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, * and is up to date; i.e. that no page-in operation would be required * at this time if an application were to map and access this page. */ -static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) +static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff, + bool *compound) { unsigned char present = 0; struct page *page; @@ -86,6 +87,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) #endif if (page) { present = PageUptodate(page); + if (compound) + *compound = PageCompound(page); put_page(page); } @@ -103,7 +106,8 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = mincore_page(vma->vm_file->f_mapping, + pgoff, NULL); } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -127,14 +131,36 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte_t *ptep; unsigned char *vec = walk->private; int nr = (end - addr) >> PAGE_SHIFT; + swp_entry_t entry; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - memset(vec, 1, nr); + unsigned char val = 1; + bool compound; + + if (IS_ENABLED(CONFIG_THP_SWAP) && is_swap_pmd(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + if (!non_swap_entry(entry)) { + val = mincore_page(swap_address_space(entry), + swp_offset(entry), + &compound); + /* + * The huge swap cluster has been + * split under us + */ + if (!compound) { + __split_huge_swap_pmd(vma, addr, pmd); + spin_unlock(ptl); + goto fallback; + } + } + } + memset(vec, val, nr); spin_unlock(ptl); goto out; } +fallback: if (pmd_trans_unstable(pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; @@ -150,8 +176,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ - swp_entry_t entry = pte_to_swp_entry(pte); - + entry = pte_to_swp_entry(pte); if (non_swap_entry(entry)) { /* * migration or hwpoison entries are always @@ -161,7 +186,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } else { #ifdef CONFIG_SWAP *vec = mincore_page(swap_address_space(entry), - swp_offset(entry)); + swp_offset(entry), NULL); #else WARN_ON(1); *vec = 1; From patchwork Fri Dec 14 06:27:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730645 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 42E4491E for ; Fri, 14 Dec 2018 06:28:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30DAA2BE76 for ; Fri, 14 Dec 2018 06:28:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 24EEE2C4A9; Fri, 14 Dec 2018 06:28:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7379E2BE76 for ; Fri, 14 Dec 2018 06:28:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9022C8E01BD; Fri, 14 Dec 2018 01:28:40 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B3DE8E0014; Fri, 14 Dec 2018 01:28:40 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72DD38E01BD; Fri, 14 Dec 2018 01:28:40 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 2B28D8E0014 for ; Fri, 14 Dec 2018 01:28:40 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id s71so3538376pfi.22 for ; Thu, 13 Dec 2018 22:28:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=pcPYgsoT/A5nvUukV9o1i6GjCenIrzqciUJ8d1JdxN0=; b=LZgb7yxxgg9aZGg8hdOK6O3aQyRTHulgiycIqL7V0VDbAyogmV6HQWQNtc9cWOtHhJ Zd9e0HJPVGqC5EPvrfavWmX9nYN5yrlicVOWBhIvok0DJJwQ3mPBnxXaIIPVdfNp1/H3 LP4IOX/0JJ487gIgn4MZUeK7C/JUzMgMzwby1TImnr+RyTmBydClPSK5VD5qUwM0UEOL ml2UDIco3aRrZgeTGux+qKPugcXe319q4wtqrfqaNMzaCSHUSKlFwEXCJo7wO0pKQExs Yeay99TZdQfV6UYQk5gZIwOuBl720wsItWalYW0F1LbsZiyQWR/lgw7ojNpZkd0Ue3Xt QCoQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZ978HV/EAVWw0s2AxsvVk+P6M6InVwtGivANnXFylkDopWMOT8 oU/tCrMTaH0qZ/VWmgG6jJkkTs3KnA6H2b0RLdOqLKc/NQcykgzfKywIF21rIywLF9SD7QA2BIg O8UGvNMdk55fL9hVGfMEWhY1hiXwVRFFszkIcpgOdqMJ7gz9th0c6BebyEJ76d+doIA== X-Received: by 2002:a17:902:a601:: with SMTP id u1mr1750457plq.77.1544768919833; Thu, 13 Dec 2018 22:28:39 -0800 (PST) X-Google-Smtp-Source: AFSGD/WnuUMmZblFk8VWZYRjm7EIFa6jWcYGfcptDMAghaKI5XKu12ttmTTH84LViD5ewycolZuN X-Received: by 2002:a17:902:a601:: with SMTP id u1mr1750420plq.77.1544768918707; Thu, 13 Dec 2018 22:28:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768918; cv=none; d=google.com; s=arc-20160816; b=YfFrDtQ0wHMtxsm+0ZZJfNKRPLoAlK858HDaZwNXLGOWmAEaiUzOs+cNRPajjLwtdv bf1G5XjwPDqCsgJbWAAQbkRj2SnXKLB+nrk2/k0YM4fofrHHJhNA0TZ7SIUAMOpZ79XY Xg8hQhBs88n4KwSMc4V2OdhqaIusIfaOR5W0B2/8sHZkTz7bjgWEVHprfnfcTpsWvQMg zDt8qquBtctGj+ddr23m8YxscsQxQs/FwX8BMV+W9aMztMzGuuUNNadbXiUrW/2DPdA8 Z5kIHWBxCsY6HSPxbE1gqm+VrXxLUr61F8wjUYllKz12YLulbw5IEOSkLkdqToEpqNac ek3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=pcPYgsoT/A5nvUukV9o1i6GjCenIrzqciUJ8d1JdxN0=; b=u6pGAUcg7rUC1JHhHxjdjMOyKhsvannrd8iJLmOZ9zwb2gYTiOzlyeT3oa+GjGznz6 TTRJLbqKI+UnZmHvZP4frbRXlZj5M+YM90xOI4hcm1EpLQdnHH251ax3nxgMb4YyKj3Y EH2T5h4Xedt8xZor7vpPhjpzLDbU0zf1nmEuDXPHwxS0suK8ijk/cV8KBFJavegYOY6O jF3DBYACTsmYiEfhtS0Ii+YXBpL0Yc4BGw0b5iugmFEUSXJl1wd2l7PBD2QSQO4xpYIN XCLgKNAUKX3t0YCREPGEzoQnv3iyI0gyWes3drvmojw0ompnz+1CuUD9tIvIow+B18KR pxDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id d8si1744446plo.196.2018.12.13.22.28.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:38 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841109" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:35 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 20/21] swap: Support PMD swap mapping in common path Date: Fri, 14 Dec 2018 14:27:53 +0800 Message-Id: <20181214062754.13723-21-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Original code is only for PMD migration entry, it is revised to support PMD swap mapping. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- fs/proc/task_mmu.c | 12 +++++------- mm/gup.c | 36 ++++++++++++++++++++++++------------ mm/huge_memory.c | 7 ++++--- mm/mempolicy.c | 2 +- 4 files changed, 34 insertions(+), 23 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dc36909a73c6..fa41822574e1 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -986,7 +986,7 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, pmd = pmd_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); - } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { + } else if (is_swap_pmd(pmd)) { pmd = pmd_swp_clear_soft_dirty(pmd); set_pmd_at(vma->vm_mm, addr, pmdp, pmd); } @@ -1320,9 +1320,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pm->show_pfn) frame = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - } -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - else if (is_swap_pmd(pmd)) { + } else if (IS_ENABLED(CONFIG_HAVE_PMD_SWAP_ENTRY) && + is_swap_pmd(pmd)) { swp_entry_t entry = pmd_to_swp_entry(pmd); unsigned long offset; @@ -1335,10 +1334,9 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SWAP; if (pmd_swp_soft_dirty(pmd)) flags |= PM_SOFT_DIRTY; - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - page = migration_entry_to_page(entry); + if (is_pmd_migration_entry(pmd)) + page = migration_entry_to_page(entry); } -#endif if (page && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; diff --git a/mm/gup.c b/mm/gup.c index 6dd33e16a806..460565825ef0 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -215,6 +215,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; + swp_entry_t entry; pmd = pmd_offset(pudp, address); /* @@ -242,18 +243,22 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (!pmd_present(pmdval)) { if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmdval)); - if (is_pmd_migration_entry(pmdval)) + entry = pmd_to_swp_entry(pmdval); + if (thp_migration_supported() && is_migration_entry(entry)) { pmd_migration_entry_wait(mm, pmd); - pmdval = READ_ONCE(*pmd); - /* - * MADV_DONTNEED may convert the pmd to null because - * mmap_sem is held in read mode - */ - if (pmd_none(pmdval)) + pmdval = READ_ONCE(*pmd); + /* + * MADV_DONTNEED may convert the pmd to null because + * mmap_sem is held in read mode + */ + if (pmd_none(pmdval)) + return no_page_table(vma, flags); + goto retry; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) return no_page_table(vma, flags); - goto retry; + WARN_ON(1); + return no_page_table(vma, flags); } if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); @@ -275,11 +280,18 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags); } if (unlikely(!pmd_present(*pmd))) { + entry = pmd_to_swp_entry(*pmd); spin_unlock(ptl); if (likely(!(flags & FOLL_MIGRATION))) return no_page_table(vma, flags); - pmd_migration_entry_wait(mm, pmd); - goto retry_locked; + if (thp_migration_supported() && is_migration_entry(entry)) { + pmd_migration_entry_wait(mm, pmd); + goto retry_locked; + } + if (IS_ENABLED(CONFIG_THP_SWAP) && !non_swap_entry(entry)) + return no_page_table(vma, flags); + WARN_ON(1); + return no_page_table(vma, flags); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6d144d687e69..38904d673339 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2122,7 +2122,7 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, static pmd_t move_soft_dirty_pmd(pmd_t pmd) { #ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) + if (unlikely(is_swap_pmd(pmd))) pmd = pmd_swp_mksoft_dirty(pmd); else if (pmd_present(pmd)) pmd = pmd_mksoft_dirty(pmd); @@ -2206,11 +2206,12 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, preserve_write = prot_numa && pmd_write(*pmd); ret = 1; -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP) if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!IS_ENABLED(CONFIG_THP_SWAP) && + !is_migration_entry(entry)); if (is_write_migration_entry(entry)) { pmd_t newpmd; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d4496d9d34f5..253d9aa25667 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -436,7 +436,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, struct queue_pages *qp = walk->private; unsigned long flags; - if (unlikely(is_pmd_migration_entry(*pmd))) { + if (unlikely(is_swap_pmd(*pmd))) { ret = 1; goto unlock; } From patchwork Fri Dec 14 06:27:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10730647 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B744E14E2 for ; Fri, 14 Dec 2018 06:28:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5C9C2BE76 for ; Fri, 14 Dec 2018 06:28:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 988252C4A9; Fri, 14 Dec 2018 06:28:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8B5B2BE76 for ; Fri, 14 Dec 2018 06:28:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1141D8E01BE; Fri, 14 Dec 2018 01:28:43 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0E8EE8E0014; Fri, 14 Dec 2018 01:28:43 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA5528E01BE; Fri, 14 Dec 2018 01:28:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id AA0438E0014 for ; Fri, 14 Dec 2018 01:28:42 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id s14so3545479pfk.16 for ; Thu, 13 Dec 2018 22:28:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=XHd3UriMNi0Kw8l1unb+I7LIIQfV5FMwS7jh1XjEK+U=; b=l/nzUzN8CKbz0SkLcRHbEtb6G1pIVe6y7+kCgzcMJhPDNLvrEJTrAG02u5nAii7fmy ubAfgV6r0NGLMHOQWi99pjhPlFSsvfZvPPHbs8mVf7KBOU3GE01PaLQR7PGKg+rCvuQN jgVhDjDolcQcdMMgBo8OIM7ObwJOkj713LuJp2AL/fWMxw5T+DIC58TS996xKhAt79em eQM47kHeIlTo8Nh6hoSOfH1OqO+Es5xHN3CPWawwe+8VP9yKR/kkq3pYHvapfzErRQJI dFde4DuUbpPGvJfhU8D/Gg9ggUKJEDolQN7sI27znvTupATLNW6eMDZxPPxl5xAGFMMX 20mA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaaFxipoplloj67mx6jJQHvltdV/mKvSi/ykhBqnJTAEozLwNOT gPM/4I/OClJLA66AC5Eb5CDfpwb9Xw0s/7E2vUBLO4tOAbnPTRlkj9NvdOcUUCgXrsUSQi0hWip qf1brjvA0/OtriXsL4Si6dHv31udeR7Onnrs7U/+9z5Pm8ANqPrHovo7J/5OFLdzWfA== X-Received: by 2002:a17:902:2:: with SMTP id 2mr1816502pla.228.1544768922338; Thu, 13 Dec 2018 22:28:42 -0800 (PST) X-Google-Smtp-Source: AFSGD/WeDykfoCxPR7hxzRh2POlQ9kTmdC6xcz/MeEwqPdQT2Ueg1e8sA85TYTwLWiVqCfoh5Q7y X-Received: by 2002:a17:902:2:: with SMTP id 2mr1816469pla.228.1544768921538; Thu, 13 Dec 2018 22:28:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544768921; cv=none; d=google.com; s=arc-20160816; b=03k3xpdTvsTkLWqVUnVYKCGd+ZYiOjkUbY4orPAZfBTG5+WeA1oDOiVrPXVR9a39Wd eUswBgvjzhiJo2X3vn/qnqr+IGNMT7aQOFHd6HNBLMfD71by9DvZqHyVFjyFIzcI/z24 HAoLWqb3fNzWb5hbPA+QLDKWVaLyfsFdFkiMkL6Gcw7r7oQh1ykEPASPUoesdMnEZWX2 AzwdXtc6fMR5T4fqamZ4R6RC1469c0D3IkU7WhRonG5ATczm17NR0vAWCA4Na6yJMdDA Aga9DtSw4C1uBB/HSihBAksrxPrESq+lMHBHLlIZGBYFW3XQsisMXTWXibWc+c16HUmy NPnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=XHd3UriMNi0Kw8l1unb+I7LIIQfV5FMwS7jh1XjEK+U=; b=swLy0ig9j0xo4sJIIa03I7T1Z/R5Zz31Q6FxI1vUKLTVVdE12P53CgEHG5GVb+mbRL Dm5+RFxYByBGciVWykUS6dr/iK0OBHQav3bz/XRvgmErORT8LR74CmULjRMXpQUa7RZ3 TgNal67ywRiXIYx+x/4gK/VM6t1b0I5GUKYDIZy7/v/YPrkd6wFj/DM6ZulnXYH0pKE7 4DktQdyjE0vZY9MXcto0GfuNZphwj+2PZCutTrFGxffo0vbpOeFyBbDztdmBT4Bbsq3R GNahDvhIF00Ut6kyd6W1pLNgfom8PnpBwSaAedPNqBL+2oLTNwD47ORfdn/mBujsIogQ LK7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id d8si1744446plo.196.2018.12.13.22.28.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 22:28:41 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Dec 2018 22:28:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,351,1539673200"; d="scan'208";a="125841119" Received: from yhuang-mobile.sh.intel.com ([10.239.197.226]) by fmsmga002.fm.intel.com with ESMTP; 13 Dec 2018 22:28:38 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V9 21/21] swap: create PMD swap mapping when unmap the THP Date: Fri, 14 Dec 2018 14:27:54 +0800 Message-Id: <20181214062754.13723-22-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181214062754.13723-1-ying.huang@intel.com> References: <20181214062754.13723-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is the final step of the THP swapin support. When reclaiming a anonymous THP, after allocating the huge swap cluster and add the THP into swap cache, the PMD page mapping will be changed to the mapping to the swap space. Previously, the PMD page mapping will be split before being changed. In this patch, the unmap code is enhanced not to split the PMD mapping, but create a PMD swap mapping to replace it instead. So later when clear the SWAP_HAS_CACHE flag in the last step of swapout, the huge swap cluster will be kept instead of being split, and when swapin, the huge swap cluster will be read in one piece into a THP. That is, the THP will not be split during swapout/swapin. This can eliminate the overhead of splitting/collapsing, and reduce the page fault count, etc. But more important, the utilization of THP is improved greatly, that is, much more THP will be kept when swapping is used, so that we can take full advantage of THP including its high performance for swapout/swapin. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/huge_mm.h | 11 +++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/rmap.c | 41 ++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 6 +----- 4 files changed, 82 insertions(+), 6 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 3c05294689c1..fef5d27c2083 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -373,12 +373,16 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +struct page_vma_mapped_walk; + #ifdef CONFIG_THP_SWAP extern void __split_huge_swap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd); extern int split_huge_swap_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, pmd_t orig_pmd); extern int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd); +extern bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, pmd_t pmdval); static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) @@ -419,6 +423,13 @@ static inline int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) return 0; } +static inline bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, + struct page *page, unsigned long address, + pmd_t pmdval) +{ + return false; +} + static inline bool transparent_hugepage_swapin_enabled( struct vm_area_struct *vma) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 38904d673339..e0205fceb84c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1922,6 +1922,36 @@ int do_huge_pmd_swap_page(struct vm_fault *vmf, pmd_t orig_pmd) put_page(page); return ret; } + +bool set_pmd_swap_entry(struct page_vma_mapped_walk *pvmw, struct page *page, + unsigned long address, pmd_t pmdval) +{ + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + pmd_t swp_pmd; + swp_entry_t entry = { .val = page_private(page) }; + + if (swap_duplicate(&entry, HPAGE_PMD_NR) < 0) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return false; + } + if (list_empty(&mm->mmlist)) { + spin_lock(&mmlist_lock); + if (list_empty(&mm->mmlist)) + list_add(&mm->mmlist, &init_mm.mmlist); + spin_unlock(&mmlist_lock); + } + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter(mm, MM_SWAPENTS, HPAGE_PMD_NR); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw->pmd, swp_pmd); + + page_remove_rmap(page, true); + put_page(page); + return true; +} #endif static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) diff --git a/mm/rmap.c b/mm/rmap.c index e9b07016f587..a957af84ec12 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1423,11 +1423,50 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, continue; } + address = pvmw.address; + +#ifdef CONFIG_THP_SWAP + /* PMD-mapped THP swap entry */ + if (IS_ENABLED(CONFIG_THP_SWAP) && + !pvmw.pte && PageAnon(page)) { + pmd_t pmdval; + + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + flush_cache_range(vma, address, + address + HPAGE_PMD_SIZE); + if (should_defer_flush(mm, flags)) { + /* check comments for PTE below */ + pmdval = pmdp_huge_get_and_clear(mm, address, + pvmw.pmd); + set_tlb_ubc_flush_pending(mm, + pmd_dirty(pmdval)); + } else + pmdval = pmdp_huge_clear_flush(vma, address, + pvmw.pmd); + + /* + * Move the dirty bit to the page. Now the pmd + * is gone. + */ + if (pmd_dirty(pmdval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + ret = set_pmd_swap_entry(&pvmw, page, address, pmdval); + mmu_notifier_invalidate_range(mm, address, + address + HPAGE_PMD_SIZE); + continue; + } +#endif + /* Unexpected PMD-mapped THP? */ VM_BUG_ON_PAGE(!pvmw.pte, page); subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - address = pvmw.address; if (PageHuge(page)) { if (huge_pmd_unshare(mm, &address, pvmw.pte)) { diff --git a/mm/vmscan.c b/mm/vmscan.c index 75b72ec9cc68..d3148f44a6a6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1340,11 +1340,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * processes. Try to unmap it here. */ if (page_mapped(page)) { - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - if (!try_to_unmap(page, flags)) { + if (!try_to_unmap(page, ttu_flags | TTU_BATCH_FLUSH)) { nr_unmap_fail++; goto activate_locked; }