From patchwork Fri Aug 14 17:31:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11715063 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 865A1739 for ; Fri, 14 Aug 2020 17:31:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4626820768 for ; Fri, 14 Aug 2020 17:31:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PmAGVEUj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4626820768 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 85F7B8D0007; Fri, 14 Aug 2020 13:31:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7E7EF8D0003; Fri, 14 Aug 2020 13:31:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ECE18D0007; Fri, 14 Aug 2020 13:31:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 444348D0003 for ; Fri, 14 Aug 2020 13:31:48 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0BA59641D for ; Fri, 14 Aug 2020 17:31:48 +0000 (UTC) X-FDA: 77149866696.13.cat90_3011a8c26ffe Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id BF72718140B70 for ; Fri, 14 Aug 2020 17:31:47 +0000 (UTC) X-Spam-Summary: 1,0,0,a1c921da3a6b3470,d41d8cd98f00b204,minchan.kim@gmail.com,,RULES_HIT:1:41:355:379:421:541:800:960:966:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1801:2194:2196:2198:2199:2200:2201:2393:2559:2562:2636:2693:2731:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4250:4321:4385:4605:5007:6119:6120:6261:6630:6653:7875:7903:8603:9108:10004:11026:11232:11473:11658:11914:12043:12291:12295:12296:12297:12438:12517:12519:12555:12664:12679:12683:12895:13161:13215:13229:13894:21080:21433:21444:21451:21611:21627:21740:21795:21987:21990:30012:30034:30051:30054,0,RBL:209.85.214.195:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100;04yf1icpfycjbydrfue1ft4ryaetiycukkzdzrgx667s188jnhgysjqe6mfkyc1.6epxnjmeg4dn5haprdakngo3dhfrjde7qpfn1bpkfgbq61eyru6zyo8bbn4b7u1.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA _SUMMARY X-HE-Tag: cat90_3011a8c26ffe X-Filterd-Recvd-Size: 13360 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Aug 2020 17:31:47 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id q19so4490697pll.0 for ; Fri, 14 Aug 2020 10:31:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2WoD8w9V8sAzcCU1i6LLdudW6Ib7s2S6BdueANcXcuU=; b=PmAGVEUjphHrXdrQY7ULkXj+G8rwvnWdjVo1AwCWlD/voyWsWTJt/OWqZKXvheqsN3 LnP00H2NT1gpXczptMmsJAQ9KI2jfC50ou+fffbQU37BWIV4/4mg2Rz/6lV9RoGvj12f f36dVT4CJu0FWWAk+t9+hSw2dIY+MP0x7Qxb1UE0D0Vcd4jB/s5ZMmAgHdX7LLG+YSaV XyMYn98i0WvaYlpx75Ep8ZavXDIbfobwK1FKtMqZL2vdI1j/WhEuzCHxVlSvLOC1iJsc xhAG9aDLjpiNDxF0fLBSP18ut1hvO1pL6ObsP/BCC7tn4CqyuXZM+R90VkkoKEuys3lR ul3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=2WoD8w9V8sAzcCU1i6LLdudW6Ib7s2S6BdueANcXcuU=; b=oWoHnuXVytH1KeIR0MGsiwbqTpVluDKbuAm6USjkRfSB89RaVT5MMfw32nXqBvQMPB iDrJqgHeBtmvJELlHJAAU9nu5uapxKACOzeu8TD9KaqCF8GjsYj2+d78Vf10oapGO1R4 hkNSEetFiIWxPPL2vwoz45WGe8L4NRQff8qIqSlcg+aTmnvEyN4+vLw08GaiPwBCnsy8 dF76WNqf3MQrOwiZesaRJ6KUKnSMMEdX/mUQyEiKZ7hCnqMfqs9FGOIow8PVeub82i2U RRwgKHGMoDfdk9fCiQMPVheslEWQAPMUPdVsJ9kzW1x+YFgAd+uKeQcr0cFAibCe6+7C zr1g== X-Gm-Message-State: AOAM530yKFgTk2mbwA+pzn1UU/KInmVnh6kAcWxKBEBcJUiF4CAYo2HH HxyhCqwPPK1yUkVQlzLOZKk= X-Google-Smtp-Source: ABdhPJyiNDgvQEgA2HrGLf8TUqc8gq4iw58bXwUtDmiBjl6FJZX0DLRx3Piu2gkpBCGXsu02p022Zw== X-Received: by 2002:a17:90a:df11:: with SMTP id gp17mr3178939pjb.140.1597426306127; Fri, 14 Aug 2020 10:31:46 -0700 (PDT) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:1:7220:84ff:fe09:5e58]) by smtp.gmail.com with ESMTPSA id n22sm8522973pjq.25.2020.08.14.10.31.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Aug 2020 10:31:45 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , Joonsoo Kim , Vlastimil Babka , John Dias , Suren Baghdasaryan , pullip.cho@samsung.com, Minchan Kim Subject: [RFC 5/7] mm: introduce alloc_pages_bulk API Date: Fri, 14 Aug 2020 10:31:29 -0700 Message-Id: <20200814173131.2803002-6-minchan@kernel.org> X-Mailer: git-send-email 2.28.0.220.ged08abb693-goog In-Reply-To: <20200814173131.2803002-1-minchan@kernel.org> References: <20200814173131.2803002-1-minchan@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: BF72718140B70 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is a need for special HW to require bulk allocation of high-order pages. For example, 4800 * order-4 pages. To meet the requirement, a option is using CMA area because page allocator with compaction under memory pressure is easily failed to meet the requirement and too slow for 4800 times. However, CMA has also the following drawbacks: * 4800 of order-4 * cma_alloc is too slow To avoid the slowness, we could try to allocate 300M contiguous memory once and then split them into order-4 chunks. The problem of this approach is CMA allocation fails one of the pages in those range couldn't migrate out, which happens easily with fs write under memory pressure. To solve issues, this patch introduces alloc_pages_bulk. int alloc_pages_bulk(unsigned long start, unsigned long end, unsigned int migratetype, gfp_t gfp_mask, unsigned int order, unsigned int nr_elem, struct page **pages); It will investigate the [start, end) and migrate movable pages out there by best effort(by upcoming patches) to make requested order's free pages. The allocated pages will be returned using pages parameter. Return value represents how many of requested order pages we got. It could be less than user requested by nr_elem. /** * alloc_pages_bulk() -- tries to allocate high order pages * by batch from given range [start, end) * @start: start PFN to allocate * @end: one-past-the-last PFN to allocate * @migratetype: migratetype of the underlaying pageblocks (either * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks * in range must have the same migratetype and it must * be either of the two. * @gfp_mask: GFP mask to use during compaction * @order: page order requested * @nr_elem: the number of high-order pages to allocate * @pages: page array pointer to store allocated pages (must * have space for at least nr_elem elements) * * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES * aligned. The PFN range must belong to a single zone. * * Return: the number of pages allocated on success or negative error code. * The allocated pages should be freed using __free_pages */ The test goes order-4 * 4800 allocation(i.e., total 300MB) under kernel build workload. System RAM size is 1.5GB and CMA is 500M. With using CMA to allocate to 300M, ran 10 times trial, 10 time failed with big latency(up to several seconds). With this alloc_pages_bulk API, ran 10 time trial, 7 times are successful to allocate 4800 times. Rest 3 times are allocated 4799, 4789 and 4799. They are all done with 300ms. Signed-off-by: Minchan Kim --- include/linux/gfp.h | 5 +++ mm/compaction.c | 11 +++-- mm/internal.h | 3 +- mm/page_alloc.c | 97 +++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 102 insertions(+), 14 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 67a0774e080b..79ff38f25def 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -625,6 +625,11 @@ static inline bool pm_suspended_storage(void) /* The below functions must be run on a range from a single zone. */ extern int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask); +extern int alloc_pages_bulk(unsigned long start, unsigned long end, + unsigned int migratetype, gfp_t gfp_mask, + unsigned int order, unsigned int nr_elem, + struct page **pages); + extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, int nid, nodemask_t *nodemask); #endif diff --git a/mm/compaction.c b/mm/compaction.c index 76f380cb801d..1e4392f6fec3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,10 +713,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, */ unsigned long isolate_freepages_range(struct compact_control *cc, - unsigned long start_pfn, unsigned long end_pfn) + unsigned long start_pfn, unsigned long end_pfn, + struct list_head *freepage_list) { unsigned long isolated, pfn, block_start_pfn, block_end_pfn; - LIST_HEAD(freelist); pfn = start_pfn; block_start_pfn = pageblock_start_pfn(pfn); @@ -748,7 +748,7 @@ isolate_freepages_range(struct compact_control *cc, break; isolated = isolate_freepages_block(cc, &isolate_start_pfn, - block_end_pfn, &freelist, 0, true); + block_end_pfn, freepage_list, 0, true); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -766,15 +766,14 @@ isolate_freepages_range(struct compact_control *cc, } /* __isolate_free_page() does not map the pages */ - split_map_pages(&freelist, cc->isolate_order); + split_map_pages(freepage_list, cc->isolate_order); if (pfn < end_pfn) { /* Loop terminated early, cleanup. */ - release_freepages(&freelist, cc->isolate_order); + release_freepages(freepage_list, cc->isolate_order); return 0; } - /* We don't use freelists for anything. */ return pfn; } diff --git a/mm/internal.h b/mm/internal.h index 5f1e9d76a623..f9b86257fae2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -258,7 +258,8 @@ struct capture_control { unsigned long isolate_freepages_range(struct compact_control *cc, - unsigned long start_pfn, unsigned long end_pfn); + unsigned long start_pfn, unsigned long end_pfn, + struct list_head *freepage_list); unsigned long isolate_migratepages_range(struct compact_control *cc, unsigned long low_pfn, unsigned long end_pfn); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index caf393d8b413..cdf956feae80 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8402,10 +8402,14 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, } static int __alloc_contig_range(unsigned long start, unsigned long end, - unsigned migratetype, gfp_t gfp_mask) + unsigned int migratetype, gfp_t gfp_mask, + unsigned int alloc_order, + struct list_head *freepage_list) { unsigned long outer_start, outer_end; unsigned int order; + struct page *page, *page2; + unsigned long pfn; int ret = 0; struct compact_control cc = { @@ -8417,6 +8421,7 @@ static int __alloc_contig_range(unsigned long start, unsigned long end, .no_set_skip_hint = true, .gfp_mask = current_gfp_context(gfp_mask), .alloc_contig = true, + .isolate_order = alloc_order, }; INIT_LIST_HEAD(&cc.migratepages); @@ -8515,17 +8520,42 @@ static int __alloc_contig_range(unsigned long start, unsigned long end, } /* Grab isolated pages from freelists. */ - outer_end = isolate_freepages_range(&cc, outer_start, end); + outer_end = isolate_freepages_range(&cc, outer_start, end, + freepage_list); if (!outer_end) { ret = -EBUSY; goto done; } /* Free head and tail (if any) */ - if (start != outer_start) - free_contig_range(outer_start, start - outer_start); - if (end != outer_end) - free_contig_range(end, outer_end - end); + if (start != outer_start) { + if (alloc_order == 0) + free_contig_range(outer_start, start - outer_start); + else { + list_for_each_entry_safe(page, page2, + freepage_list, lru) { + pfn = page_to_pfn(page); + if (pfn >= start) + break; + list_del(&page->lru); + __free_pages(page, alloc_order); + } + } + } + if (end != outer_end) { + if (alloc_order == 0) + free_contig_range(end, outer_end - end); + else { + list_for_each_entry_safe_reverse(page, page2, + freepage_list, lru) { + pfn = page_to_pfn(page); + if ((pfn + (1 << alloc_order)) <= end) + break; + list_del(&page->lru); + __free_pages(page, alloc_order); + } + } + } done: undo_isolate_page_range(pfn_max_align_down(start), @@ -8558,8 +8588,61 @@ EXPORT_SYMBOL(alloc_contig_range); int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask) { - return __alloc_contig_range(start, end, migratetype, gfp_mask); + LIST_HEAD(freepage_list); + + return __alloc_contig_range(start, end, migratetype, + gfp_mask, 0, &freepage_list); +} + +/** + * alloc_pages_bulk() -- tries to allocate high order pages + * by batch from given range [start, end) + * @start: start PFN to allocate + * @end: one-past-the-last PFN to allocate + * @migratetype: migratetype of the underlaying pageblocks (either + * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks + * in range must have the same migratetype and it must + * be either of the two. + * @gfp_mask: GFP mask to use during compaction + * @order: page order requested + * @nr_elem: the number of high-order pages to allocate + * @pages: page array pointer to store allocated pages (must + * have space for at least nr_elem elements) + * + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES + * aligned. The PFN range must belong to a single zone. + * + * Return: the number of pages allocated on success or negative error code. + * The allocated pages need to be free with __free_pages + */ +int alloc_pages_bulk(unsigned long start, unsigned long end, + unsigned int migratetype, gfp_t gfp_mask, + unsigned int order, unsigned int nr_elem, + struct page **pages) +{ + int ret; + struct page *page, *page2; + LIST_HEAD(freepage_list); + + if (order >= MAX_ORDER) + return -EINVAL; + + ret = __alloc_contig_range(start, end, migratetype, + gfp_mask, order, &freepage_list); + if (ret) + return ret; + + /* keep pfn ordering */ + list_for_each_entry_safe(page, page2, &freepage_list, lru) { + if (ret < nr_elem) + pages[ret++] = page; + else + __free_pages(page, order); + } + + return ret; } +EXPORT_SYMBOL(alloc_pages_bulk); static int __alloc_contig_pages(unsigned long start_pfn, unsigned long nr_pages, gfp_t gfp_mask)