From patchwork Thu May 3 23:29:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 10379457 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E2F2760159 for ; Thu, 3 May 2018 23:30:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D9B35292A7 for ; Thu, 3 May 2018 23:30:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CDE86292A9; Thu, 3 May 2018 23:30:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9522292AA for ; Thu, 3 May 2018 23:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75C366B0022; Thu, 3 May 2018 19:30:00 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 736586B0023; Thu, 3 May 2018 19:30:00 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 628B46B0024; Thu, 3 May 2018 19:30:00 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id 2B4DF6B0022 for ; Thu, 3 May 2018 19:30:00 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id c4-v6so14681242qtp.9 for ; Thu, 03 May 2018 16:30:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=NOe0E8sai+Ne5BkiO7sk9v+LpcSgvoky1iFpPq56w9s=; b=r3MY8+HXoKEQOHZcpcXaVMQaVxl7EBHKLxE7v43Ri2HxL1Yu8R7kZuG59AXiNy7W+k ULJsXqFJGHNwwl439boppgrLp/oVal0Vi1vfinA+0ApnjTesZ7k5htSs1QAlLqkOJq5y thBHOoBBvbl8w+D+q4FutRopGk5HiqGAVYavIU8M0GAVzpUURMgrltxI+HUdnjWbKFr5 6obaHgyWxv4O4MNUXujQPn0V4Yquys4dyEGssx/SeAMR5/FTsYKA1X1L/vWkodVMY7JY 0Us4yc6LxLC8HOvC1E1FNseZnwGuKwcPDUTzBnBb+x7GRdhGbAwJgZC8Z1Ug71x+bM/b BU9g== X-Gm-Message-State: ALKqPwdFDG8D/T+JKmRS/cVW5PakYH4thAb0+a3VFNWq8Dx3F3qO6Opa q6YQCW2cXBE9HvGvB4wQA6e/jZMq5FkC1sZQiILS2/U/6kW7jiE4hlFCP2+jjlq0oxTr+XYj9c6 QbM/nFwelwnxSrS3zEgaAN3lNvSZzODbozxCdusIwnJYBQ7g0wv6KH2JGEVvxE1em7A== X-Received: by 2002:a0c:f381:: with SMTP id i1-v6mr5611347qvk.214.1525390199924; Thu, 03 May 2018 16:29:59 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpeGaRK/C1zIWslZAMv4p8yf5I4JsJr3HhmwTsAE39enpnsDHp1lZZy/9qv/s0YJ2cNGtyF X-Received: by 2002:a0c:f381:: with SMTP id i1-v6mr5611320qvk.214.1525390199039; Thu, 03 May 2018 16:29:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525390198; cv=none; d=google.com; s=arc-20160816; b=z8TdZUI9SxHBqsAYbbBZb+TE/haViSgLEaO8XhbTjabHQfspwDzCemCFq2mTRnxwX+ Jf+nSw1GBd6nlyYKS08M+BSuI4N6c6QSImc9Y6SPmL5gPe4yOM8wMaqTm+bSUvKK1rX6 1OAc/myWq13KElfU3qVg9lFMM5bq763/WkN4SipK4ZBcyV0vrhfFUN+c3mkIIf49UB4J dtI+0onJWo+UQFEqH1Yz1dKplMMj5KUNx9cxOAUEYs8Rcn5S3M3A1fxXiRh0mCNSCJdO LgC4GQJJV8hT6DfvX5J/dtTx61FA68+dU75D03gUnsMFpoaC+5Z4+xU70qRBySO3OpVZ u3DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=NOe0E8sai+Ne5BkiO7sk9v+LpcSgvoky1iFpPq56w9s=; b=Is3W5E8teiJqX25ouV/nFVBFaxoI5OjpUEX+Ao+mDlKoBQoUyFVrwQPr7gBuvcJnOq xTCpmWbhFZ3XZnFhh0j7j+9xOXc8BDDOi0/7kAeJYgZRBsVINM7SvpefS9RMchPhStfJ e583Sxr3zxfT6Jc12XKYWcVGZZNgkVDDq+IOjSOC+PJpwqw0vYpaH1KhN+VaVJvfPtUF HvZ1PhLnaBwu4VxAwN8VWZrxQ2qlSTolxbO3Lnkyelkky86h/PKXQA3dpvFvLccMabOE Ked59LgrgKmoUuKHjrJsj/FBpZ6mJG09R+OuEHDRmD8Ip+9DQEKFJWSm2JzmRFGWlsIZ 4asg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=m47g3vjG; spf=pass (google.com: domain of mike.kravetz@oracle.com designates 156.151.31.85 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2120.oracle.com (userp2120.oracle.com. [156.151.31.85]) by mx.google.com with ESMTPS id e12si6574530qkm.167.2018.05.03.16.29.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 May 2018 16:29:58 -0700 (PDT) Received-SPF: pass (google.com: domain of mike.kravetz@oracle.com designates 156.151.31.85 as permitted sender) client-ip=156.151.31.85; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=m47g3vjG; spf=pass (google.com: domain of mike.kravetz@oracle.com designates 156.151.31.85 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w43NMu0q051721; Thu, 3 May 2018 23:29:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=NOe0E8sai+Ne5BkiO7sk9v+LpcSgvoky1iFpPq56w9s=; b=m47g3vjGrLoZadlquXQuporXEmtEc7V11P7pmW3ewr3dXNT3xIYEO6fbW61qx/deDmvS f+CaAyhdxqZYPShtRcG6LEqBaQlDAWNvZZVBPZbIZg6v6GSnwUXglIl9fckTp5hQI5SV 2cW3Hk4urq+HgCYsO2/xXcl76QptXjcSJrDRz3najDTf6rDxgE8q4xylTmBCxObQajSt TsxegtflH59MMA9ab94r0e/0+jrjob4e48qHvYrel2rve/MrjbMPBOkFL8rB4urdyMqd Rd3ztwk20e7gGh4xcA1xR86Jrusx8lNbIj7q55MTi7CQOdoNW+vKxsMq5aXcZGsMVZwy Cg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2120.oracle.com with ESMTP id 2hmhmfuvtn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 03 May 2018 23:29:53 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w43NTqm8032516 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 May 2018 23:29:52 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w43NTnRt002141; Thu, 3 May 2018 23:29:49 GMT Received: from monkey.oracle.com (/50.38.38.67) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 03 May 2018 16:29:49 -0700 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: Reinette Chatre , Michal Hocko , Christopher Lameter , Guy Shattah , Anshuman Khandual , Michal Nazarewicz , Vlastimil Babka , David Nellans , Laura Abbott , Pavel Machek , Dave Hansen , Andrew Morton , Mike Kravetz Subject: [PATCH v2 3/4] mm: add find_alloc_contig_pages() interface Date: Thu, 3 May 2018 16:29:34 -0700 Message-Id: <20180503232935.22539-4-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180503232935.22539-1-mike.kravetz@oracle.com> References: <20180503232935.22539-1-mike.kravetz@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8882 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805030204 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP find_alloc_contig_pages() is a new interface that attempts to locate and allocate a contiguous range of pages. It is provided as a more convenient interface than alloc_contig_range() which is currently used by CMA and gigantic huge pages. When attempting to allocate a range of pages, migration is employed if possible. There is no guarantee that the routine will succeed. So, the user must be prepared for failure and have a fall back plan. Signed-off-by: Mike Kravetz --- include/linux/gfp.h | 12 +++++ mm/page_alloc.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 146 insertions(+), 2 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 86a0d06463ab..b0d11777d487 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void) extern int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask); extern void free_contig_range(unsigned long pfn, unsigned long nr_pages); +extern struct page *find_alloc_contig_pages(unsigned long nr_pages, gfp_t gfp, + int nid, nodemask_t *nodemask); +extern void free_contig_pages(struct page *page, unsigned long nr_pages); +#else +static inline struct page *find_alloc_contig_pages(unsigned long nr_pages, + gfp_t gfp, int nid, nodemask_t *nodemask) +{ + return NULL; +} +static inline void free_contig_pages(struct page *page, unsigned long nr_pages) +{ +} #endif #ifdef CONFIG_CMA diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb1a5e0be6ee..d0a2d0da9eae 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -67,6 +67,7 @@ #include #include #include +#include #include #include @@ -7913,8 +7914,12 @@ int alloc_contig_range(unsigned long start, unsigned long end, /* Make sure the range is really isolated. */ if (test_pages_isolated(outer_start, end, false)) { - pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", - __func__, outer_start, end); +#ifdef MIGRATE_CMA + /* Only print messages for CMA allocations */ + if (migratetype == MIGRATE_CMA) + pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", + __func__, outer_start, end); +#endif ret = -EBUSY; goto done; } @@ -7950,6 +7955,133 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages) } WARN(count != 0, "%ld pages are still in use!\n", count); } + +/* + * Only check for obvious pfn/pages which can not be used/migrated. The + * migration code will do the final check. Under stress, this minimal set + * has been observed to provide the best results. The checks can be expanded + * if needed. + */ +static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn, + unsigned long nr_pages) +{ + unsigned long i, end_pfn = start_pfn + nr_pages; + struct page *page; + + for (i = start_pfn; i < end_pfn; i++) { + if (!pfn_valid(i)) + return false; + + page = pfn_to_online_page(i); + + if (page_zone(page) != z) + return false; + + } + + return true; +} + +/* + * Search for and attempt to allocate contiguous allocations greater than + * MAX_ORDER. + */ +static struct page *__alloc_contig_pages_nodemask(gfp_t gfp, + unsigned long order, + int nid, nodemask_t *nodemask) +{ + unsigned long nr_pages, pfn, flags; + struct page *ret_page = NULL; + struct zonelist *zonelist; + struct zoneref *z; + struct zone *zone; + int rc; + + nr_pages = 1 << order; + zonelist = node_zonelist(nid, gfp); + for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp), + nodemask) { + pgdat_resize_lock(zone->zone_pgdat, &flags); + pfn = ALIGN(zone->zone_start_pfn, nr_pages); + while (zone_spans_pfn(zone, pfn + nr_pages - 1)) { + if (contig_pfn_range_valid(zone, pfn, nr_pages)) { + struct page *page = pfn_to_online_page(pfn); + unsigned int migratetype; + + /* + * All pageblocks in range must be of same + * migrate type. + */ + migratetype = get_pageblock_migratetype(page); + pgdat_resize_unlock(zone->zone_pgdat, &flags); + + rc = alloc_contig_range(pfn, pfn + nr_pages, + migratetype, gfp); + if (!rc) { + ret_page = pfn_to_page(pfn); + return ret_page; + } + pgdat_resize_lock(zone->zone_pgdat, &flags); + } + pfn += nr_pages; + } + pgdat_resize_unlock(zone->zone_pgdat, &flags); + } + + return ret_page; +} + +/** + * find_alloc_contig_pages() -- attempt to find and allocate a contiguous + * range of pages + * @nr_pages: number of pages to find/allocate + * @gfp: gfp mask used to limit search as well as during compaction + * @nid: target node + * @nodemask: mask of other possible nodes + * + * Pages can be freed with a call to free_contig_pages(), or by manually + * calling __free_page() for each page allocated. + * + * Return: pointer to 'order' pages on success, or NULL if not successful. + */ +struct page *find_alloc_contig_pages(unsigned long nr_pages, gfp_t gfp, + int nid, nodemask_t *nodemask) +{ + unsigned long i, alloc_order, order_pages; + struct page *pages; + + /* + * Underlying allocators perform page order sized allocations. + */ + alloc_order = get_count_order(nr_pages); + if (alloc_order < MAX_ORDER) { + pages = __alloc_pages_nodemask(gfp, (unsigned int)alloc_order, + nid, nodemask); + split_page(pages, alloc_order); + } else { + pages = __alloc_contig_pages_nodemask(gfp, alloc_order, nid, + nodemask); + } + + if (pages) { + /* + * More pages than desired could have been allocated due to + * rounding up to next page order. Free any excess pages. + */ + order_pages = 1UL << alloc_order; + for (i = nr_pages; i < order_pages; i++) + __free_page(pages + i); + } + + return pages; +} +EXPORT_SYMBOL_GPL(find_alloc_contig_pages); + +void free_contig_pages(struct page *page, unsigned long nr_pages) +{ + free_contig_range(page_to_pfn(page), nr_pages); +} +EXPORT_SYMBOL_GPL(free_contig_pages); #endif #if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA