From patchwork Fri Dec 7 05:41:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10717453 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9924109C for ; Fri, 7 Dec 2018 05:41:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B41D02EC74 for ; Fri, 7 Dec 2018 05:41:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A51702EC6E; Fri, 7 Dec 2018 05:41:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E97802EC6E for ; Fri, 7 Dec 2018 05:41:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 264016B7E9C; Fri, 7 Dec 2018 00:41:50 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 177886B7E9E; Fri, 7 Dec 2018 00:41:50 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8C926B7E9F; Fri, 7 Dec 2018 00:41:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 96B896B7E9C for ; Fri, 7 Dec 2018 00:41:49 -0500 (EST) Received: by mail-pg1-f198.google.com with SMTP id 202so1838447pgb.6 for ; Thu, 06 Dec 2018 21:41:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Aios5D2BTvmB1cr4Xbtp9aJgJd3YhAd50A1foJOsV9I=; b=jaNizgdMOz5cJC5pdAE+0WZ5AvMrwKJBt46WBgig5Qua5l8OshB+RPWl1BX+bCd5rf IViJZolQ79n0wHq9QBK4ZWL1TxxD3mW0B30lej6SHs9TkZqp+m7wRleXoNAJOCH3ibjO staJu/ibGPNi+ecv3Jpm6tdrHxpZo4+xeffeIWvYXeNS8u8tHT5JyQeg142ytDy52Fdz rTTKamhbZlAbrHaqLzSGuu2fi6Loxzc0znlV9ReR+WFwVp5XAm/AU7ucjqj736BTLbvO 6G1W2YxdvZmlkzJ+dW+xHAaklkqj8Ejn15NuIPO9R5K3/D3zDjfHbAm7H8fDdSwVw+4J fwLw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWaTVmozrORXLEcbseAktp1Vx+TEYT8euCvvmSxv3NfRJAhrW2Ld AORESMPHNbqjT0DkiG9vAVHIhssvgFVLjRVK1J3zdMIXGOrDXIn4AhLFWz5KLqeFZCOUqtC7l0U QY2hRbJXaoAR8lHnpek6bx+OZAAfCFe/1e9Vwo3MuXwWRdSYO/8m9DhNjuNdBQ+WNPw== X-Received: by 2002:a63:d34a:: with SMTP id u10mr788324pgi.301.1544161309251; Thu, 06 Dec 2018 21:41:49 -0800 (PST) X-Google-Smtp-Source: AFSGD/VmApKe5EdBGDrVIP6H8UqmnHbp+w+owDTXVC6MEA7PIvHZANQCnup7iF4hUhKERZ7vWS/g X-Received: by 2002:a63:d34a:: with SMTP id u10mr788288pgi.301.1544161308014; Thu, 06 Dec 2018 21:41:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544161307; cv=none; d=google.com; s=arc-20160816; b=YFtG25gAIAPIqFnUx13OMbZXpHE8sCq3LzUhBe2KLqENkzdeOwhba9yYn2yu6/bMdU 28NJEWp152PqeCt4NXQMe5ux1nWrEtYtfUVLF/zDq2l7jSrYr/wje/pw94sl5g3Q00hf gmdEC7Ovq+SAlCpEJsNanP0G56muspO9ijm4Vqi3RIjEIx/LR3fG8zEnBi74g5OcKqOB hGCENmH08VlEkY+X5YbtGGrL3xLZqWAC0lN8Er8W1lTagp9+6+vlkMiNpF2kKNEklppP 65ikwyioVO9pB564+eJvdFFQyUz3bAznthnPnM0edyb/nLkgSWxwdS7XNzASq66UhD6A pxhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Aios5D2BTvmB1cr4Xbtp9aJgJd3YhAd50A1foJOsV9I=; b=GdyPNEy9fP/egz20PXE2wl5n7nLbn6l1fyfb7hF/UoYg/HgsnFGMqY0RQCnAA3YVKJ fEbJOIp3O7BIO9AMnqDSHF1zyKLOkLORNKVL3IrmWDW6bmXmF2i+NwhsXHOGI8Oqb+xn pE2F+oY4aj0DBPjqpZE8ApmwH3b15++QjmAjno65aw20UJHNjZsM7Zm+DK2JMPuMO4pO hudmVPzbXUH0IHzQCB5vPKQKTVNclV2p742UToOCENXNSKfBj+BOlT4h8d5XxHDGzSef 4MxrBRbRKGE61P9h/3KC5U9MJ1bMBFdF3StYXCeIq/yKgbXA10UhHEHBt/NHbKOVkDJI yGWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id cf16si2126256plb.227.2018.12.06.21.41.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 21:41:47 -0800 (PST) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2018 21:41:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,324,1539673200"; d="scan'208";a="105567239" Received: from yhuang-mobile.sh.intel.com ([10.239.196.133]) by fmsmga007.fm.intel.com with ESMTP; 06 Dec 2018 21:41:45 -0800 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -V8 07/21] swap: Support PMD swap mapping in split_swap_cluster() Date: Fri, 7 Dec 2018 13:41:07 +0800 Message-Id: <20181207054122.27822-8-ying.huang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20181207054122.27822-1-ying.huang@intel.com> References: <20181207054122.27822-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 +++-- mm/huge_memory.c | 18 +++++++++----- mm/swapfile.c | 58 +++++++++++++++++++++++++++++++------------- 3 files changed, 57 insertions(+), 25 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a24d101b131d..441da4a832a6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,11 +617,13 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ +#define SSC_SPLIT_CACHED 0x1 + #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, unsigned long flags); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, unsigned long flags) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9ec87c2ed1e8..d23e18c0c07e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2519,6 +2519,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, remap_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, SSC_SPLIT_CACHED); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2753,12 +2764,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, end, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index e83e3c93f3b3..a57967292a8d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1469,23 +1469,6 @@ void put_swap_page(struct page *page, swp_entry_t entry) unlock_cluster_or_swap_info(si, ci); } -#ifdef CONFIG_THP_SWAP -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} -#endif - static int swp_entry_cmp(const void *ent1, const void *ent2) { const swp_entry_t *e1 = ent1, *e2 = ent2; @@ -4071,6 +4054,47 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +/* + * We will not try to split all PMD swap mappings to the swap cluster, + * because we haven't enough information available for that. Later, + * when the PMD swap mapping is duplicated or swapin, etc, the PMD + * swap mapping will be split and fallback to the PTE operations. + */ +int split_swap_cluster(swp_entry_t entry, unsigned long flags) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else, we are done */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* + * If not requested, don't split swap cluster that has SWAP_HAS_CACHE + * flag. When the flag is cleared later, the huge swap cluster will + * be split if there is no PMD swap mapping. + */ + if (!(flags & SSC_SPLIT_CACHED) && + si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_swapcount(ci, 0); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void)