From patchwork Thu Aug 20 11:43:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 7043931 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 04A7B9F358 for ; Thu, 20 Aug 2015 11:44:58 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 23B0120576 for ; Thu, 20 Aug 2015 11:44:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A1BC620570 for ; Thu, 20 Aug 2015 11:44:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751792AbbHTLoG (ORCPT ); Thu, 20 Aug 2015 07:44:06 -0400 Received: from mx2.suse.de ([195.135.220.15]:59961 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751400AbbHTLoD (ORCPT ); Thu, 20 Aug 2015 07:44:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E1490AC4E; Thu, 20 Aug 2015 11:43:58 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mel Gorman , David Rientjes , Greg Thelen , "Aneesh Kumar K.V" , Christoph Lameter , Pekka Enberg , Joonsoo Kim , Naoya Horiguchi , Johannes Weiner , linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, cbe-oss-dev@lists.ozlabs.org, kvm@vger.kernel.org, Vlastimil Babka , Tony Luck , Fenghua Yu , Arnd Bergmann , Benjamin Herrenschmidt , Paul Mackerras , Gleb Natapov , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Cliff Whickman , Michael Ellerman , Robin Holt Subject: [PATCH 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node Date: Thu, 20 Aug 2015 13:43:20 +0200 Message-Id: <1440071002-19085-1-git-send-email-vbabka@suse.cz> X-Mailer: git-send-email 2.5.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page allocator: do not check NUMA node ID when the caller knows the node is valid") as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node") and b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node"). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with the exception of sba_alloc_coherent() which open-codes the check for NUMA_NO_NODE, so it is converted to use alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: Vlastimil Babka Acked-by: Johannes Weiner Cc: Mel Gorman Cc: David Rientjes Cc: Greg Thelen Cc: Aneesh Kumar K.V Acked-by: Christoph Lameter Cc: Pekka Enberg Cc: Joonsoo Kim Cc: Naoya Horiguchi Cc: Tony Luck Cc: Fenghua Yu Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Acked-by: Michael Ellerman Cc: Gleb Natapov Cc: Paolo Bonzini Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Cliff Whickman Acked-by: Robin Holt Acked-by: Michal Hocko --- v3: fixed the slob part (Christoph Lameter), removed forgotten hunk from huge_memory.c (DavidR) arch/ia64/hp/common/sba_iommu.c | 6 +----- arch/ia64/kernel/uncached.c | 2 +- arch/ia64/sn/pci/pci_dma.c | 2 +- arch/powerpc/platforms/cell/ras.c | 2 +- arch/x86/kvm/vmx.c | 2 +- drivers/misc/sgi-xp/xpc_uv.c | 2 +- include/linux/gfp.h | 23 +++++++++++++++-------- kernel/profile.c | 8 ++++---- mm/filemap.c | 2 +- mm/huge_memory.c | 2 +- mm/hugetlb.c | 4 ++-- mm/memory-failure.c | 2 +- mm/mempolicy.c | 4 ++-- mm/migrate.c | 4 ++-- mm/page_alloc.c | 2 -- mm/slab.c | 2 +- mm/slob.c | 4 ++-- mm/slub.c | 2 +- 18 files changed, 38 insertions(+), 37 deletions(-) diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index 344387a..a6d6190 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -1140,13 +1140,9 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, #ifdef CONFIG_NUMA { - int node = ioc->node; struct page *page; - if (node == NUMA_NO_NODE) - node = numa_node_id(); - - page = alloc_pages_exact_node(node, flags, get_order(size)); + page = alloc_pages_node(ioc->node, flags, get_order(size)); if (unlikely(!page)) return NULL; diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c index 20e8a9b..f3976da 100644 --- a/arch/ia64/kernel/uncached.c +++ b/arch/ia64/kernel/uncached.c @@ -97,7 +97,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid) /* attempt to allocate a granule's worth of cached memory pages */ - page = alloc_pages_exact_node(nid, + page = __alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, IA64_GRANULE_SHIFT-PAGE_SHIFT); if (!page) { diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c index d0853e8..8f59907 100644 --- a/arch/ia64/sn/pci/pci_dma.c +++ b/arch/ia64/sn/pci/pci_dma.c @@ -92,7 +92,7 @@ static void *sn_dma_alloc_coherent(struct device *dev, size_t size, */ node = pcibus_to_node(pdev->bus); if (likely(node >=0)) { - struct page *p = alloc_pages_exact_node(node, + struct page *p = __alloc_pages_node(node, flags, get_order(size)); if (likely(p)) diff --git a/arch/powerpc/platforms/cell/ras.c b/arch/powerpc/platforms/cell/ras.c index e865d74..2d4f60c 100644 --- a/arch/powerpc/platforms/cell/ras.c +++ b/arch/powerpc/platforms/cell/ras.c @@ -123,7 +123,7 @@ static int __init cbe_ptcal_enable_on_node(int nid, int order) area->nid = nid; area->order = order; - area->pages = alloc_pages_exact_node(area->nid, + area->pages = __alloc_pages_node(area->nid, GFP_KERNEL|__GFP_THISNODE, area->order); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5477ab8..d019868 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3150,7 +3150,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu) struct page *pages; struct vmcs *vmcs; - pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order); + pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order); if (!pages) return NULL; vmcs = page_address(pages); diff --git a/drivers/misc/sgi-xp/xpc_uv.c b/drivers/misc/sgi-xp/xpc_uv.c index 95c8944..340b44d 100644 --- a/drivers/misc/sgi-xp/xpc_uv.c +++ b/drivers/misc/sgi-xp/xpc_uv.c @@ -239,7 +239,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name, mq->mmr_blade = uv_cpu_to_blade_id(cpu); nid = cpu_to_node(cpu); - page = alloc_pages_exact_node(nid, + page = __alloc_pages_node(nid, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, pg_order); if (page == NULL) { diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 3bd64b1..d2c142b 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -303,20 +303,28 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL); } -static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, - unsigned int order) +/* + * Allocate pages, preferring the node given as nid. The node must be valid and + * online. For more general interface, see alloc_pages_node(). + */ +static inline struct page * +__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { - /* Unknown node is current node */ - if (nid < 0) - nid = numa_node_id(); + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); } -static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask, +/* + * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE, + * prefer the current CPU's node. + */ +static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); + /* Unknown node is current node */ + if (nid < 0) + nid = numa_node_id(); return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); } @@ -357,7 +365,6 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask); void *alloc_pages_exact(size_t size, gfp_t gfp_mask); void free_pages_exact(void *virt, size_t size); -/* This is different from alloc_pages_exact_node !!! */ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask); #define __get_free_page(gfp_mask) \ diff --git a/kernel/profile.c b/kernel/profile.c index a7bcd28..99513e1 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -339,7 +339,7 @@ static int profile_cpu_callback(struct notifier_block *info, node = cpu_to_mem(cpu); per_cpu(cpu_profile_flip, cpu) = 0; if (!per_cpu(cpu_profile_hits, cpu)[1]) { - page = alloc_pages_exact_node(node, + page = __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0); if (!page) @@ -347,7 +347,7 @@ static int profile_cpu_callback(struct notifier_block *info, per_cpu(cpu_profile_hits, cpu)[1] = page_address(page); } if (!per_cpu(cpu_profile_hits, cpu)[0]) { - page = alloc_pages_exact_node(node, + page = __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0); if (!page) @@ -543,14 +543,14 @@ static int create_hash_tables(void) int node = cpu_to_mem(cpu); struct page *page; - page = alloc_pages_exact_node(node, + page = __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, 0); if (!page) goto out_cleanup; per_cpu(cpu_profile_hits, cpu)[1] = (struct profile_hit *)page_address(page); - page = alloc_pages_exact_node(node, + page = __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, 0); if (!page) diff --git a/mm/filemap.c b/mm/filemap.c index 204fd1c..b510a0d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -674,7 +674,7 @@ struct page *__page_cache_alloc(gfp_t gfp) do { cpuset_mems_cookie = read_mems_allowed_begin(); n = cpuset_mem_spread_node(); - page = alloc_pages_exact_node(n, gfp, 0); + page = __alloc_pages_node(n, gfp, 0); } while (!page && read_mems_allowed_retry(cpuset_mems_cookie)); return page; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7109330..15abd31 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2456,7 +2456,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm, */ up_read(&mm->mmap_sem); - *hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER); + *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 586aa69..b4c23bb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1331,7 +1331,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) { struct page *page; - page = alloc_pages_exact_node(nid, + page = __alloc_pages_node(nid, htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); @@ -1483,7 +1483,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid) __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); else - page = alloc_pages_exact_node(nid, + page = __alloc_pages_node(nid, htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE| __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 613389e..023f92b 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1509,7 +1509,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x) return alloc_huge_page_node(page_hstate(compound_head(p)), nid); else - return alloc_pages_exact_node(nid, GFP_HIGHUSER_MOVABLE, 0); + return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0); } /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d6f2cae..87a1779 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -942,7 +942,7 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x return alloc_huge_page_node(page_hstate(compound_head(page)), node); else - return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE | + return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0); } @@ -1998,7 +1998,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { mpol_cond_put(pol); - page = alloc_pages_exact_node(hpage_node, + page = __alloc_pages_node(hpage_node, gfp | __GFP_THISNODE, order); goto out; } diff --git a/mm/migrate.c b/mm/migrate.c index fbf1798..b7da48f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1201,7 +1201,7 @@ static struct page *new_page_node(struct page *p, unsigned long private, return alloc_huge_page_node(page_hstate(compound_head(p)), pm->node); else - return alloc_pages_exact_node(pm->node, + return __alloc_pages_node(pm->node, GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0); } @@ -1561,7 +1561,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page, int nid = (int) data; struct page *newpage; - newpage = alloc_pages_exact_node(nid, + newpage = __alloc_pages_node(nid, (GFP_HIGHUSER_MOVABLE | __GFP_THISNODE | __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c1024db..35f1d20 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3518,8 +3518,6 @@ EXPORT_SYMBOL(alloc_pages_exact); * * Like alloc_pages_exact(), but try to allocate on node nid first before falling * back. - * Note this is not alloc_pages_exact_node() which allocates on a specific node, - * but is not exact. */ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) { diff --git a/mm/slab.c b/mm/slab.c index bf7169c..d890750 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1595,7 +1595,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, if (memcg_charge_slab(cachep, flags, cachep->gfporder)) return NULL; - page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); + page = __alloc_pages_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); if (!page) { memcg_uncharge_slab(cachep, cachep->gfporder); slab_out_of_memory(cachep, flags, nodeid); diff --git a/mm/slob.c b/mm/slob.c index 165bbd3..0d7e5df 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -45,7 +45,7 @@ * NUMA support in SLOB is fairly simplistic, pushing most of the real * logic down to the page allocator, and simply doing the node accounting * on the upper levels. In the event that a node id is explicitly - * provided, alloc_pages_exact_node() with the specified node id is used + * provided, __alloc_pages_node() with the specified node id is used * instead. The common case (or when the node id isn't explicitly provided) * will default to the current node, as per numa_node_id(). * @@ -193,7 +193,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node) #ifdef CONFIG_NUMA if (node != NUMA_NO_NODE) - page = alloc_pages_exact_node(node, gfp, order); + page = __alloc_pages_node(node, gfp, order); else #endif page = alloc_pages(gfp, order); diff --git a/mm/slub.c b/mm/slub.c index 8987bd5..e180f8d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1336,7 +1336,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s, if (node == NUMA_NO_NODE) page = alloc_pages(flags, order); else - page = alloc_pages_exact_node(node, flags, order); + page = __alloc_pages_node(node, flags, order); if (!page) memcg_uncharge_slab(s, order);