From patchwork Wed Oct 10 07:19:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10634109 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C19E469B1 for ; Wed, 10 Oct 2018 07:27:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB6E1294F6 for ; Wed, 10 Oct 2018 07:27:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F7C3296AF; Wed, 10 Oct 2018 07:27:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB693294F6 for ; Wed, 10 Oct 2018 07:27:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B78F86B000D; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AD9916B000E; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97B636B0266; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 381116B000D for ; Wed, 10 Oct 2018 03:27:17 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id q143-v6so3006715pgq.12 for ; Wed, 10 Oct 2018 00:27:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=aNdtzsYao/w0pZhL4hmukJr6ZDEaaaPLI5V4qAK6Gw0=; b=Yeu0SnFxumijg4QgoxFLRdIQEU9FDiyb7ON3EkW31J9c5zhONzCur45k/V2xmge13I C62zcq7Xs8o/G467p7lE0o5MI632+Lpa422Ip8uWr3jEW+I3CsBZh7HMr3R+y0XUKjuX N7JqaWXtTJsa3Wn0Ml+hIFIKUHlWdf052AGOx26GsV8V+GymiYeuZWTosM9zVFNkTQGP QInTjMIt2x1TAlu7X6zwLGgTiN7+pHB1f8frIDPgegS5J2qtcYiNDjh3c5+8xEE9lzr8 fM69l8duyYPHLlk059LwcvOCLAcyemSkUkrDSISztTKPj3SMMEHIV6bKIT29u7ssZzvw 6L1A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ABuFfohX0Ci5yJil6OcUx00A33Q6ccvksT8LfOk/GUJYZSxwUuF21bSB nKJ2LN8Sp/EYwP7NmfP4XNHHYnDaWEdKPho1AJOaPECqOtYRAu7TZTKfVdPD5FlWwjo9RAi27qe Qid5sMCVf8EPFo0jOB8CfLEzLDpj8i01+soqMMDRkKfM+nS8sP0Icg3WGb3VFOWWjIA== X-Received: by 2002:a17:902:5a45:: with SMTP id f5-v6mr32437462plm.26.1539156436901; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV63rRVkeGggd/SIfgHU8soasIImkDUkoVcQBj2g1NdMl2eESXCJuewnwp4cwtmm2GqB5k4QT X-Received: by 2002:a17:902:5a45:: with SMTP id f5-v6mr32437408plm.26.1539156436104; Wed, 10 Oct 2018 00:27:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539156436; cv=none; d=google.com; s=arc-20160816; b=CcX2xCaxfEROPWEi/4n3MXYKQwGicgJK94BeuRvkY+t7NdauwmXTRiru9jAyUvK9tx 04Daku4h0H+p52IhKT/zTzTorcVdzBpzahX2YUtrny608c7q/4mQTQYRhgG4GhpVUwo0 PqJzTwWBq4BRiPVV72H0qXRlDm2P/AacIDTFLUDZkg1UJ33VbVDPVfZFYeWMmQLPyCt/ gXj2BR6Io8Cq0XNdxqq+zCLrGt2y1jAYWWzRQSrQBef1CNLTB4NEg9BUWcdDCL4H50Fo IgMZV6Ac6UMIc72RwNyVc25naBvt4Gdb98QD4BOa0P1o6DwfKyy5dgqRgh++ezHgXtYV kPuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=aNdtzsYao/w0pZhL4hmukJr6ZDEaaaPLI5V4qAK6Gw0=; b=MGt4QNsYiwo8dyMcSsPJF1tF8LMFumpQKuA9LlleyT6VtP970xjNr94KlpDS7Vnt4r UgNWc7t9i8IU86JqJ4jGBVZHVMqGo6moWHpJJBEZ9QY7QKKZqpVyzeAGx1Xuf9p3tb7H cVMpG+CfOIFGxxMy0BMSHEhwGQlQ8uJDns1x2fhG4r4jjKdEJnPJXy2MZZTX++HPWmHs A6hvq59D6c19ltky5siQGnbTwZigW51CNa422RPShUhZIUsTPfPH9XH2szFDJWgRxxzW DzLk1WZK4Fw9NYLVKsFBa9cuFxVFvJb3mp/YeOcyGKUnhyIqzZhEFLI0ebJ22Es7Tq8r 2IIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126]) by mx.google.com with ESMTPS id e2-v6si30331496pfh.64.2018.10.10.00.27.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 00:27:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) client-ip=134.134.136.126; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2018 00:27:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,363,1534834800"; d="scan'208";a="93870001" Received: from yhuang-mobile.sh.intel.com ([10.239.198.87]) by fmsmga002.fm.intel.com with ESMTP; 10 Oct 2018 00:19:12 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V6 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Wed, 10 Oct 2018 15:19:10 +0800 Message-Id: <20181010071924.18767-8-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181010071924.18767-1-ying.huang@intel.com> References: <20181010071924.18767-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 ++++-- mm/huge_memory.c | 18 ++++++++++------ mm/swapfile.c | 58 +++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 9bb3f73b5d68..60fd5189fde9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -612,11 +612,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f1c74487576..92e0cdb99c5a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2517,6 +2517,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2740,12 +2751,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index fa6b81b4e185..2020bd494419 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4066,6 +4049,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void)