From patchwork Wed Mar 3 10:20:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12113245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FDF8C433E0 for ; Wed, 3 Mar 2021 10:21:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD8D9601FC for ; Wed, 3 Mar 2021 10:21:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD8D9601FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 629C18D014F; Wed, 3 Mar 2021 05:21:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D9758D0135; Wed, 3 Mar 2021 05:21:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42CCE8D014F; Wed, 3 Mar 2021 05:21:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id 245968D0135 for ; Wed, 3 Mar 2021 05:21:45 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E6DF052D1 for ; Wed, 3 Mar 2021 10:21:44 +0000 (UTC) X-FDA: 77878171728.23.966F023 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf10.hostedemail.com (Postfix) with ESMTP id DA1FD407F8F7 for ; Wed, 3 Mar 2021 10:21:42 +0000 (UTC) IronPort-SDR: V8q2/oMyodjAAquCw8Qax4xIG/NfzFbpKM8ziWCmeDJf/p7EEhwP9GGegtijJxtncmatzOK9Fw JfFC75aRsnfA== X-IronPort-AV: E=McAfee;i="6000,8403,9911"; a="272162856" X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="272162856" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Mar 2021 02:21:43 -0800 IronPort-SDR: HSu42u1kR4AT0B/qmPXrhnbj0wc+SCAlWmoRFD5p1Cpw98kRJxD/KS6MWKHHgsiv8vTUx6hXd6 EiJ4XxVn+A5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="445200345" Received: from shbuild999.sh.intel.com ([10.239.146.165]) by orsmga001.jf.intel.com with ESMTP; 03 Mar 2021 02:21:40 -0800 From: Feng Tang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Michal Hocko , Andrea Arcangeli , David Rientjes , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Dave Hansen , Ben Widawsky , Andi leen , Dan Williams , Feng Tang Subject: [PATCH v3 11/14] mm/mempolicy: huge-page allocation for many preferred Date: Wed, 3 Mar 2021 18:20:55 +0800 Message-Id: <1614766858-90344-12-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1614766858-90344-1-git-send-email-feng.tang@intel.com> References: <1614766858-90344-1-git-send-email-feng.tang@intel.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DA1FD407F8F7 X-Stat-Signature: fdryaho14r9cx98sgfyt9sdsshkoafi8 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=mga05.intel.com; client-ip=192.55.52.43 X-HE-DKIM-Result: none/none X-HE-Tag: 1614766902-551152 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ben Widawsky Implement the missing huge page allocation functionality while obeying the preferred node semantics. This uses a fallback mechanism to try multiple preferred nodes first, and then all other nodes. It cannot use the helper function that was introduced because huge page allocation already has its own helpers and it was more LOC, and effort to try to consolidate that. The weirdness is MPOL_PREFERRED_MANY can't be called yet because it is part of the UAPI we haven't yet exposed. Instead of make that define global, it's simply changed with the UAPI patch. v3: add __GFP_NOWARN for first try of prefer_many allocation (Feng) Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com Signed-off-by: Ben Widawsky Signed-off-by: Feng Tang --- mm/hugetlb.c | 22 +++++++++++++++++++--- mm/mempolicy.c | 3 ++- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58a..c7c9ef3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1110,7 +1110,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, unsigned long address, int avoid_reserve, long chg) { - struct page *page; + struct page *page = NULL; struct mempolicy *mpol; gfp_t gfp_mask; nodemask_t *nodemask; @@ -1131,7 +1131,15 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, gfp_mask = htlb_alloc_mask(h); nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); - page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); + if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */ + page = dequeue_huge_page_nodemask(h, + gfp_mask | __GFP_RETRY_MAYFAIL | __GFP_NOWARN, + nid, nodemask); + if (!page) + page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL); + } else { + page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); + } if (page && !avoid_reserve && vma_has_reserves(vma, chg)) { SetPagePrivate(page); h->resv_huge_pages--; @@ -1935,7 +1943,15 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, nodemask_t *nodemask; nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask); - page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); + if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */ + page = alloc_surplus_huge_page(h, + gfp_mask | __GFP_RETRY_MAYFAIL | __GFP_NOWARN, + nid, nodemask); + if (!page) + alloc_surplus_huge_page(h, gfp_mask, nid, NULL); + } else { + page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); + } mpol_cond_put(mpol); return page; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 0cb92ab..f9b2167 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2075,7 +2075,8 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, huge_page_shift(hstate_vma(vma))); } else { nid = policy_node(gfp_flags, *mpol, numa_node_id()); - if ((*mpol)->mode == MPOL_BIND) + if ((*mpol)->mode == MPOL_BIND || + (*mpol)->mode == MPOL_PREFERRED_MANY) *nodemask = &(*mpol)->nodes; } return nid;