From patchwork Fri Jun 22 03:51:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 10481137 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B1C7460380 for ; Fri, 22 Jun 2018 03:55:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A222128F91 for ; Fri, 22 Jun 2018 03:55:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9652428F98; Fri, 22 Jun 2018 03:55:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4DA928F91 for ; Fri, 22 Jun 2018 03:55:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5A336B026C; Thu, 21 Jun 2018 23:55:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AE2676B026E; Thu, 21 Jun 2018 23:55:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 984FD6B026D; Thu, 21 Jun 2018 23:55:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f70.google.com (mail-pl0-f70.google.com [209.85.160.70]) by kanga.kvack.org (Postfix) with ESMTP id 485FE6B026B for ; Thu, 21 Jun 2018 23:55:37 -0400 (EDT) Received: by mail-pl0-f70.google.com with SMTP id y7-v6so2931978plt.17 for ; Thu, 21 Jun 2018 20:55:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=UDjKHb1G4ITIfGfTTlmeu0YNhZ8zKSVZ3d6YEj0pAkc=; b=QfjHndfZ3PrjNY5yRKusW3ctXxdguUSwboucVsY/lSIk7KnX+02wLI3PMyvjPIpnZG ZlHHrbV3smukqjJZ5IId7xdzKyyYrb4IYKOISYRpUQibYFNfttjqF0VHf9vzvsN3J8iZ PlAHqW3NQRzTzzv01eUlXVDpm49YsdWYzZItDL+1Ffhp90N+W5OR4PSPAzDS5LEzT2he 4w/w33aQCAwCYhnKwFOEZrys5XBU+Z18P2V8vb5X+pPTYDp2SV8exzlG2t80WjU3E8EC 5ATRSlbYpmgJ4/4WdQJFLQVKRY4KanjO1cgJtJH/HJ3Zwd93AaGm2g5QVpQ4NIizvaM8 8mwQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APt69E06F5XDH1v6lLct/No25lLw4DcVFjYCZNN5d77xojZcQUwHIURu sc13DSKEd5NCySLNzL/MVbRA0agD2ZF4X+ZSwu1ZnXOiEGDT7NGFc13D2H0fUls8Rb5GhUvFlq+ jogw8lL0mEzKbcc94kFTP5hcWetJbfMCyYBJtHWEal+FYI2SKZRNAZwrYpYBtDF8tIg== X-Received: by 2002:a62:d913:: with SMTP id s19-v6mr30136120pfg.39.1529639736977; Thu, 21 Jun 2018 20:55:36 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLE74LzlJqez+S5SqXK9swaxgfoLlCfwMkOjm/2LFXaJBmlwVIYYCx1FyIQJ7OKYbcmASDK X-Received: by 2002:a62:d913:: with SMTP id s19-v6mr30136088pfg.39.1529639736017; Thu, 21 Jun 2018 20:55:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529639735; cv=none; d=google.com; s=arc-20160816; b=RkGVixG2lpiNQUZsYOJHeZ30ZDIiin+WO95d/y5igfhCMaaMdVGl1LiW2EPy2YIB2B vNxPnP/00yCYq/BDCZw2qMu60xu6hF9I1lArsjpes5KDL0Dd9NWy+G3wU/D/B63ip2nn QufZPL0+1RjFugr4FQDVXV1r1+er9OUabFHeSfpzCP8gVqF9NPA8L1HrmBD7MmeJTBwB 5Zxnbkee0RlczQAT+/6MjN34kr6+VpU+4QKneNf4CbseNDBYkf7Fe6xzukxYBWcxdt4B LNFjOjv0FEKi7k1DumKpF8zwpkhn/THxnM4VDeCYDNdUCh3/JV/7C+wZ3oDAXriFLZum XZ5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=UDjKHb1G4ITIfGfTTlmeu0YNhZ8zKSVZ3d6YEj0pAkc=; b=Q173kyjmtzA994LPzjAHp0eNHkgGJg7JVzMNangxFwrBLfVsW/2fym30Zat7zDd54S 2Jdo+gWrX5sTCRjPckzdJmLjY0dPjuFu3Se5lpizLt8VD7lGYLDkNflfi+Zb5E95wk+O sUVC3yWuAUPefEtlh/hJRpftaLInF64inZSUQ6Y3ZOhN0SnFo3lsdLLYxO08Wf+mIfzX ykL6ulT5qH8ROZcdSv82XIQNVocQTDR18ykdmF6gQ1BFw7tcsmbNHp+cp2KRf6aXNkbI +WmDpCWdSXYjFdeuCQCxCkQfAhkpbnffPG1t075IviTNGljIzon7rNmmViqCFZ+oBmBY EmVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id n61-v6si6160066plb.256.2018.06.21.20.55.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Jun 2018 20:55:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) client-ip=192.55.52.88; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Jun 2018 20:55:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,255,1526367600"; d="scan'208";a="65335101" Received: from wanpingl-mobl.ccr.corp.intel.com (HELO yhuang6-ux31a.ccr.corp.intel.com) ([10.254.212.200]) by fmsmga004.fm.intel.com with ESMTP; 21 Jun 2018 20:55:20 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -mm -v4 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD Date: Fri, 22 Jun 2018 11:51:36 +0800 Message-Id: <20180622035151.6676-7-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180622035151.6676-1-ying.huang@intel.com> References: <20180622035151.6676-1-ying.huang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Huang Ying A huge PMD need to be split when zap a part of the PMD mapping etc. If the PMD mapping is a swap mapping, we need to split it too. This patch implemented the support for this. This is similar as splitting the PMD page mapping, except we need to decrease the PMD swap mapping count for the huge swap cluster too. If the PMD swap mapping count becomes 0, the huge swap cluster will be split. Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap PMD, so pmd_present() check is called before them. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 6 ++++++ mm/huge_memory.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++----- mm/swapfile.c | 28 +++++++++++++++++++++++++ 3 files changed, 87 insertions(+), 5 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 7ed2c727c9b6..bb9de2cb952a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -618,11 +618,17 @@ static inline swp_entry_t get_swap_page(struct page *page) #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster_map(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } + +static inline int split_swap_cluster_map(swp_entry_t entry) +{ + return 0; +} #endif #ifdef CONFIG_MEMCG diff --git a/mm/huge_memory.c b/mm/huge_memory.c index feba371169ca..2d615328d77f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1602,6 +1602,47 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } +#ifdef CONFIG_THP_SWAP +static void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pmd_t _pmd; + swp_entry_t entry; + int i, soft_dirty; + + entry = pmd_to_swp_entry(*pmd); + soft_dirty = pmd_soft_dirty(*pmd); + + split_swap_cluster_map(entry); + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE, entry.val++) { + pte_t *pte, ptent; + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + ptent = swp_entry_to_pte(entry); + if (soft_dirty) + ptent = pte_swp_mksoft_dirty(ptent); + set_pte_at(mm, haddr, pte, ptent); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} +#else +static inline void __split_huge_swap_pmd(struct vm_area_struct *vma, + unsigned long haddr, + pmd_t *pmd) +{ +} +#endif + /* * Return true if we do MADV_FREE successfully on entire pmd page. * Otherwise, return false. @@ -2068,7 +2109,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) + VM_BUG_ON(!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -2090,8 +2131,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, MM_FILEPAGES, -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_present(*pmd) && is_huge_zero_pmd(*pmd)) { /* + * is_huge_zero_pmd() may return true for PMD swap + * entry, so checking pmd_present() firstly. + * * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside * __split_huge_pmd() ? @@ -2134,6 +2178,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pfn_to_page(swp_offset(entry)); } else #endif + if (thp_swap_supported() && is_swap_pmd(old_pmd)) + return __split_huge_swap_pmd(vma, haddr, pmd); + else page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); @@ -2225,14 +2272,15 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * pmd against. Otherwise we can end up replacing wrong page. */ VM_BUG_ON(freeze && !page); - if (page && page != pmd_page(*pmd)) - goto out; + /* pmd_page() should be called only if pmd_present() */ + if (page && (!pmd_present(*pmd) || page != pmd_page(*pmd))) + goto out; if (pmd_trans_huge(*pmd)) { page = pmd_page(*pmd); if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!(pmd_devmap(*pmd) || is_swap_pmd(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: diff --git a/mm/swapfile.c b/mm/swapfile.c index 7d11d8104ba7..a0141307f3ac 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4043,6 +4043,34 @@ static void free_swap_count_continuations(struct swap_info_struct *si) } } +#ifdef CONFIG_THP_SWAP +/* The corresponding page table shouldn't be changed under us */ +int split_swap_cluster_map(swp_entry_t entry) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + + VM_BUG_ON(!is_cluster_offset(offset)); + si = _swap_info_get(entry); + if (!si) + return -EBUSY; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else */ + if (!cluster_is_huge(ci)) + goto out; + cluster_set_count(ci, cluster_count(ci) - 1); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + if (cluster_count(ci) == SWAPFILE_CLUSTER && + !(si->swap_map[offset] & SWAP_HAS_CACHE)) + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + return 0; +} +#endif + static int __init swapfile_init(void) { int nid;