From patchwork Thu Dec 20 09:50:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10738731 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99A1413A4 for ; Thu, 20 Dec 2018 09:51:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87FE128415 for ; Thu, 20 Dec 2018 09:51:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B4A728711; Thu, 20 Dec 2018 09:51:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A94428717 for ; Thu, 20 Dec 2018 09:51:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CC4B8E0005; Thu, 20 Dec 2018 04:51:13 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 056088E0003; Thu, 20 Dec 2018 04:51:12 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEAD68E0005; Thu, 20 Dec 2018 04:51:12 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 9523E8E0003 for ; Thu, 20 Dec 2018 04:51:12 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id x26so1071471pgc.5 for ; Thu, 20 Dec 2018 01:51:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=bFlrZBjUWwkKVtLETvH1CazL4niyqwbCd/ko3YIed4Y=; b=kbgLG3aIBk/02OeSSqpQhey54UIHx1bfVJ4N0CkoE7T7HTHPq7vOzvXrX4fpJ28PjK eXMxAvarj17xbCdjn+ZaNn1fQDYUNFCXECLTMdQvQyfdOoDStS2x2D8/oCA+wljMUIlD IgKyZrrzJMUApJxTiX3carhwb5xAi14MLlD0twfHBAh8ZMPmE9xd7GogGzIq8LVn0fzc cYKv6BPSuDNxfbKvs9GmdIXUNAtA/WgC08DtLblApsTXg9q/OAVi9PdEYF6CC60YaUaq yyaB+neSqKgT6ZNvwBOZjR4ObcXazzefXmbpJgkYGWYL4mwxaynJ9FcMzKmXfUsnQhu3 IXIA== X-Gm-Message-State: AA+aEWa9jwr5n1IdBnVRRwfsivfafpXC+J0JUc+hQfBcAJXRbYbRCwwG Ux6CjUJy6uzrIgaQrJi5oJrDBkNKm+6AbD2FuFjNzOj7iw//gq0LieDDsnz0p0enLmjOruuMa1l JlvVeg2fmKXeBrFOpl+nx/2atCZl4b6VsExmsjaC3LYXft73CFgBjk1Aqfh+35HWSlqHTs+kVPs l9krEU0+XDrg66uBUSLKngtAe/Z2KtRD67NoPpQyxUKTN04m8QXedPXbj8nidafiD2sT4srzCNv BYubqJlKbyCD8kfyS+HEX/SHzwbpVRjqmf87Q/tPzHnw+h8hNHLTT2DQwGR0i4jUal8eeI5bgmO 3YSh3VHJVThR1TjH3rGZACyXZ4Or4MDs8d0QlRTrCS2AFPJWe4mG2TJjdjG+l0yWA3hAF+bYNH/ I X-Received: by 2002:a65:65c9:: with SMTP id y9mr23044392pgv.438.1545299472258; Thu, 20 Dec 2018 01:51:12 -0800 (PST) X-Received: by 2002:a65:65c9:: with SMTP id y9mr23044339pgv.438.1545299470977; Thu, 20 Dec 2018 01:51:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545299470; cv=none; d=google.com; s=arc-20160816; b=Xmn0OIixzOK2UlQUoQI3NJ+XwWBWOydhD5p7jU82/MalMr1WZSbzzT41BtnBoDp6Y0 foqQStKx2aelsAqYomecs8bgk5ax6pZo6Ir0kLOAQBJrnEBvhSZ1cTzyK3bYT2J4ClHz xd/tklJvfYqkioExPcKpg7x1N4C2F2Rw94WpL8VF+T9zzrMu5A9u268PPISTN9YT4feF UdOuSZax6vFmlZzyKI3pHvB8dn2uEtz2CFUNJBlQH7MDmRRN7+b+0C7LhtQO52qJaTyy X/F3EQ/WjEFJ6up4/FNYibSK3Xoxqa/xi1oa+RrzTinTB5T0iWan8ltzj+egVXh1H2ak pFQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bFlrZBjUWwkKVtLETvH1CazL4niyqwbCd/ko3YIed4Y=; b=FrjJpYNtBgh1tPTqL+YewY0w19n+SnOh1ccpo9xQZs56yqqyrgNU1UngfDtdha/XND II3jq4psiHA8e4SRi1WK656qAIH/UjzCOY6nXOojbnA2YfJDZMYU/QWnN6HBLjBQpZox Xj41kUp1PrOMUGaBmRI4wXuzWWk4RQyTM9WMZzm/ptq89BekKoZus+5dZOAjzS9SCKfD CEIhSH9v7X+rGmAO4mz2NDXmRiaiVKPcC1JbHRxfgZJwSQBupcgaxxw0/EHiBlWxeFOH hz0zmnAWuf4v7jB6WGKUvrZxqatHgCSrsvAHO3SyN1RPCOF9gM9mbTbxw1mY7fHNsOix jZ5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="l/cOsm6S"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b5sor34345650pfj.35.2018.12.20.01.51.10 for (Google Transport Security); Thu, 20 Dec 2018 01:51:10 -0800 (PST) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="l/cOsm6S"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bFlrZBjUWwkKVtLETvH1CazL4niyqwbCd/ko3YIed4Y=; b=l/cOsm6SNToCov7U0QeU1WV0TKp3YyxFudL5VZe2kcjghnxG63U1wwxavHGZDGiR3i Rjixd+sslh+zH/CdsfIGiqp8V6qEixW3sBCVwukb51BTChhgzmHre0G7T5ZSysgY2lWp xfiP8HYvNxhv+PpbUhmKm0kuyGBX4mOjlaVM2ePCvepg2pCy7TJGw4JkL5noadc8hwJ0 k8yVcFt2xRJABe+JXADPcfx0eEjsTYOqkMzxSTZLVVFV6ecJJs2HPpLZbDt5YC38DVl2 50Ya/bk5/ZQu0qwBUs5G8EcU+n+MajYLy7Wp5QIvxjfgO/CD7qfHpqssKfUJiVzpflEQ OHaQ== X-Google-Smtp-Source: AFSGD/UgqYyt0OJR1caTO1JzmXFnV5rhJVHinuALAcpjMCINyFM7ITwObGiH2+ykYlNGtjKpzqqSuQ== X-Received: by 2002:a62:61c3:: with SMTP id v186mr23934902pfb.55.1545299470299; Thu, 20 Dec 2018 01:51:10 -0800 (PST) Received: from mylaptop.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id 125sm33355206pfx.159.2018.12.20.01.51.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Dec 2018 01:51:09 -0800 (PST) From: Pingfan Liu To: linux-mm@kvack.org Cc: Pingfan Liu , linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Michal Hocko , Vlastimil Babka , Mike Rapoport , Bjorn Helgaas , Jonathan Cameron , David Rientjes , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Subject: [PATCHv2 1/3] mm/numa: change the topo of build_zonelist_xx() Date: Thu, 20 Dec 2018 17:50:37 +0800 Message-Id: <1545299439-31370-2-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1545299439-31370-1-git-send-email-kernelfans@gmail.com> References: <1545299439-31370-1-git-send-email-kernelfans@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The current build_zonelist_xx func relies on pgdat instance to build zonelist, if a numa node is offline, there will no pgdat instance for it. But in some case, there is still requirement for zonelist of offline node, especially with nr_cpus option. This patch change these funcs topo to ease the building of zonelist for offline nodes. Signed-off-by: Pingfan Liu Cc: linuxppc-dev@lists.ozlabs.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton Cc: Michal Hocko Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Bjorn Helgaas Cc: Jonathan Cameron Cc: David Rientjes Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman --- mm/page_alloc.c | 44 +++++++++++++++++++++----------------------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec9cc4..17dbf6e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5049,7 +5049,7 @@ static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) * * Add all populated zones of a node to the zonelist. */ -static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) +static int build_zonerefs_node(int nid, struct zoneref *zonerefs) { struct zone *zone; enum zone_type zone_type = MAX_NR_ZONES; @@ -5057,7 +5057,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) do { zone_type--; - zone = pgdat->node_zones + zone_type; + zone = NODE_DATA(nid)->node_zones + zone_type; if (managed_zone(zone)) { zoneref_set_zone(zone, &zonerefs[nr_zones++]); check_highest_zone(zone_type); @@ -5186,20 +5186,20 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) * This results in maximum locality--normal zone overflows into local * DMA zone, if any--but risks exhausting DMA zone. */ -static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order, - unsigned nr_nodes) +static void build_zonelists_in_node_order(struct zonelist *node_zonelists, + int *node_order, unsigned int nr_nodes) { struct zoneref *zonerefs; int i; - zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs; + zonerefs = node_zonelists[ZONELIST_FALLBACK]._zonerefs; for (i = 0; i < nr_nodes; i++) { int nr_zones; pg_data_t *node = NODE_DATA(node_order[i]); - nr_zones = build_zonerefs_node(node, zonerefs); + nr_zones = build_zonerefs_node(node->node_id, zonerefs); zonerefs += nr_zones; } zonerefs->zone = NULL; @@ -5209,13 +5209,14 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order, /* * Build gfp_thisnode zonelists */ -static void build_thisnode_zonelists(pg_data_t *pgdat) +static void build_thisnode_zonelists(struct zonelist *node_zonelists, + int nid) { struct zoneref *zonerefs; int nr_zones; - zonerefs = pgdat->node_zonelists[ZONELIST_NOFALLBACK]._zonerefs; - nr_zones = build_zonerefs_node(pgdat, zonerefs); + zonerefs = node_zonelists[ZONELIST_NOFALLBACK]._zonerefs; + nr_zones = build_zonerefs_node(nid, zonerefs); zonerefs += nr_zones; zonerefs->zone = NULL; zonerefs->zone_idx = 0; @@ -5228,15 +5229,14 @@ static void build_thisnode_zonelists(pg_data_t *pgdat) * may still exist in local DMA zone. */ -static void build_zonelists(pg_data_t *pgdat) +static void build_zonelists(struct zonelist *node_zonelists, int local_node) { static int node_order[MAX_NUMNODES]; int node, load, nr_nodes = 0; nodemask_t used_mask; - int local_node, prev_node; + int prev_node; /* NUMA-aware ordering of nodes */ - local_node = pgdat->node_id; load = nr_online_nodes; prev_node = local_node; nodes_clear(used_mask); @@ -5257,8 +5257,8 @@ static void build_zonelists(pg_data_t *pgdat) load--; } - build_zonelists_in_node_order(pgdat, node_order, nr_nodes); - build_thisnode_zonelists(pgdat); + build_zonelists_in_node_order(node_zonelists, node_order, nr_nodes); + build_thisnode_zonelists(node_zonelists, local_node); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES @@ -5283,16 +5283,14 @@ static void setup_min_unmapped_ratio(void); static void setup_min_slab_ratio(void); #else /* CONFIG_NUMA */ -static void build_zonelists(pg_data_t *pgdat) +static void build_zonelists(struct zonelist *node_zonelists, int local_node) { int node, local_node; struct zoneref *zonerefs; int nr_zones; - local_node = pgdat->node_id; - - zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs; - nr_zones = build_zonerefs_node(pgdat, zonerefs); + zonerefs = node_zonelists[ZONELIST_FALLBACK]._zonerefs; + nr_zones = build_zonerefs_node(local_node, zonerefs); zonerefs += nr_zones; /* @@ -5306,13 +5304,13 @@ static void build_zonelists(pg_data_t *pgdat) for (node = local_node + 1; node < MAX_NUMNODES; node++) { if (!node_online(node)) continue; - nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs); + nr_zones = build_zonerefs_node(node, zonerefs); zonerefs += nr_zones; } for (node = 0; node < local_node; node++) { if (!node_online(node)) continue; - nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs); + nr_zones = build_zonerefs_node(node, zonerefs); zonerefs += nr_zones; } @@ -5359,12 +5357,12 @@ static void __build_all_zonelists(void *data) * building zonelists is fine - no need to touch other nodes. */ if (self && !node_online(self->node_id)) { - build_zonelists(self); + build_zonelists(self->node_zonelists, self->node_id); } else { for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); - build_zonelists(pgdat); + build_zonelists(pgdat->node_zonelists, pgdat->node_id); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES From patchwork Thu Dec 20 09:50:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10738733 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 835EB13BF for ; Thu, 20 Dec 2018 09:51:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BDD428415 for ; Thu, 20 Dec 2018 09:51:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5F70428718; Thu, 20 Dec 2018 09:51:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA73728711 for ; Thu, 20 Dec 2018 09:51:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A431D8E0006; Thu, 20 Dec 2018 04:51:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9CABC8E0003; Thu, 20 Dec 2018 04:51:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 846358E0006; Thu, 20 Dec 2018 04:51:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 3BDCF8E0003 for ; Thu, 20 Dec 2018 04:51:20 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id v72so1056147pgb.10 for ; Thu, 20 Dec 2018 01:51:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=sOIHUE42+0sViIuMaJHTMI5OzKaY6mxVEUe/nrLXhqU=; b=INT9nn2hV+jJq0o0LfUz1huhURnTyYUYi7L3it4/WRku0ahDt90Ac5l7G8rAC5bBid TZbmQaW44ibPsL2jbPYJwiZH9OemgvcVTEW7C99m9p6HflhzJq8DHe7iZNHkqz1qsDJQ UwVGUoaztLEdsgBN8zaA7Dt2IeW/vgsWdw6OB7+aUJGNDn5AqtO1sCzQHyIivp9M9beZ IxCXOgHMMY9EEKuOfRaisBOZLKc9YPm7to28X72Umeh72EXrH4R1YapIntGnmns6nujh Tn5jem3SIvEz1HUj4Joz9xSe7irpmcIF1AoMLqsL+Z8ugJCb/igWkVebpmOp6VoAJCAK ikwg== X-Gm-Message-State: AA+aEWYLoaJe0lXwJVHNXIOyN/0PTxO1W+pSmfm8kOQn/R7mUT6xvMlq 3fI+8poRVd2GiGOyb/MCOlXtBGSVVhEX001/R5XUiEDI1kCWxVi6T4btx1NDTFyHSuAVfnb6JkB QGY04Dq3727v47LZNd0qVvVZ4JMkALa00YB+puKqwwF5Fp2H8Qw3Gq+zJdT/+LCf2QAnEXp5jnx nkFeMHuVVPeEmRRmXmcljsFOzI0o2DVgnzJ8RD7hWDFPM9xRTGJTPXtAhmOghyKz4pUTKMsy0kr xZtgl54oimbzMQs7p0fKrg3zAM/gDSqTGMHOVZReO8SD156CP/ASQVWb+D3nsdjsOsF071yj61M 3nM4edFmzOKjan6+17Qx9NlxDglaGGM2F8b0w9T0JcZbarIJtkUhtQoNNHsnmkHOWMQwO/SDdP8 I X-Received: by 2002:a63:f109:: with SMTP id f9mr22200948pgi.286.1545299479852; Thu, 20 Dec 2018 01:51:19 -0800 (PST) X-Received: by 2002:a63:f109:: with SMTP id f9mr22200918pgi.286.1545299478711; Thu, 20 Dec 2018 01:51:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545299478; cv=none; d=google.com; s=arc-20160816; b=eLRq5Big8By6o56kBovnqN9fOvwcGhbVMUhBsfQmXcKACpht1Ive9mj0R7gC6ys7PW RDm4kJOgxQr0H1tDNvfYQq6x+BZg3ZoYtyU0Vy0WV2u+FWreFhNPjPT6zQUM/Yi39I57 7vrl7yKzmTJ0BGFSYjYbCKb9sklJtnmIhOIJwVCmKd3qhc1VL84QuOhX6NEdkPSDR4/D J2fbdArNdi+mKL9S/kCLkjTdICyDAYkXPGRTJcAc8+Oha2fTWeRZYlzcdLdsVp7yX2FI n+JYG06Wz5i+tcf7/jW5SkJXqznGqjoEnRP8Kq40Xhk5+sria7uI5qnNLyLNj2dy+unk wNrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sOIHUE42+0sViIuMaJHTMI5OzKaY6mxVEUe/nrLXhqU=; b=nSF9H/y9pP5F1MYoWQMUi0KFvn35F1jtWdxQtoMo1HxzxYhBSC3zGZtRtTPAEK804G Y8VwCz4/dSAORpQjDO3XANCz1H4XU8Gj8zOf/jpA7/v6ACL83BkVBIhD5Iixjdnh05Rr A7Z2BDiOZWXWN96hDVcYkIpxAbcpwdG3sLUGKVsQUK3OKrEqmwbGZaLMoXMWF9LeABT/ lIyMiPSYc3FwkXW9uu5TXypzLJ1YSuLAw4NYHCpRPruXzMtmb/ehCnssXiVa77q+E6NB B7Rz0uTAi3NXYiTw7iFce/Gec/gagky4Wgsc7aj6c9lx8X2UARS9q/HMv3acgXdRwQNI SJVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hGC0CLRd; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id x7sor32073853pll.71.2018.12.20.01.51.18 for (Google Transport Security); Thu, 20 Dec 2018 01:51:18 -0800 (PST) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hGC0CLRd; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=sOIHUE42+0sViIuMaJHTMI5OzKaY6mxVEUe/nrLXhqU=; b=hGC0CLRdy4EzFTWMV+LuCLwimeL9k1TKGDdAFza+twjRTdCx9rrynOQTHDPefhM3Z7 iLaIZ9Fufgcaa9y2LmUewXgQL1TsNv/st6XbYFLnt20gjFxDX9m84BKfoSXVCXil69wD TllkxbM2JwzMlSx4THvVfF+HUhPHUmhNrBLMLKPqmgJ42s5j2I/5i1qaWqJdA+ayRRIf p/93OW9w9+qXYny7kb5rzHKMiG/JWuazuIIhBBAf8mchWAE5Gd32mm4fZ5W0jRJo2gol pLtnNkjrK63ZO91Ja6YFr1l9NLL8a+6ej1ct9hb64AbsBko/iK7Miv2B9bWwDVnhHmpm WswA== X-Google-Smtp-Source: AFSGD/UHe6RcX8ECuJ34w2PG6qPwczv0gjjsg1Ix99b3GCEmf0EuINpCnZFUtlvM+fSTq9bqbIPAXw== X-Received: by 2002:a17:902:bd0a:: with SMTP id p10mr22744246pls.322.1545299477786; Thu, 20 Dec 2018 01:51:17 -0800 (PST) Received: from mylaptop.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id 125sm33355206pfx.159.2018.12.20.01.51.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Dec 2018 01:51:17 -0800 (PST) From: Pingfan Liu To: linux-mm@kvack.org Cc: Pingfan Liu , linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Michal Hocko , Vlastimil Babka , Mike Rapoport , Bjorn Helgaas , Jonathan Cameron , David Rientjes , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Subject: [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node Date: Thu, 20 Dec 2018 17:50:38 +0800 Message-Id: <1545299439-31370-3-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1545299439-31370-1-git-send-email-kernelfans@gmail.com> References: <1545299439-31370-1-git-send-email-kernelfans@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. It is due to some pgdat is not instanced when specifying nr_cpus, e.g, on x86, not initialized by init_cpu_to_node()->init_memory_less_node(). But device->numa_node info is used as preferred_nid param for __alloc_pages_nodemask(), which causes NULL reference ac->zonelist = node_zonelist(preferred_nid, gfp_mask); Although this bug is detected on x86, it should affect all archs, where a machine with a numa-node having no memory, if nr_cpus prevents the instance of the node, and the device on the node tries to allocate memory with device->numa_node info. There are two alternative methods to fix the bug. -1. Make all possible numa nodes be instanced. This should be done for all archs -2. Using zonelist instead of pgdat when encountering un-instanced node, and only do this when needed. This patch adopts the 2nd method, uses possible_zonelist[] to mirror node_zonelists[], and tries to build zonelist for the offline node when needed. Notes about the crashing info: -1. kexec -l with nr_cpus=4 -2. system info NUMA node0 CPU(s): 0,8,16,24 NUMA node1 CPU(s): 2,10,18,26 NUMA node2 CPU(s): 4,12,20,28 NUMA node3 CPU(s): 6,14,22,30 NUMA node4 CPU(s): 1,9,17,25 NUMA node5 CPU(s): 3,11,19,27 NUMA node6 CPU(s): 5,13,21,29 NUMA node7 CPU(s): 7,15,23,31 -3. panic stack [...] [ 5.721547] atomic64_test: passed for x86-64 platform with CX8 and with SSE [ 5.729187] pcieport 0000:00:01.1: Signaling PME with IRQ 34 [ 5.735187] pcieport 0000:00:01.2: Signaling PME with IRQ 35 [ 5.741168] pcieport 0000:00:01.3: Signaling PME with IRQ 36 [ 5.747189] pcieport 0000:00:07.1: Signaling PME with IRQ 37 [ 5.754061] pcieport 0000:00:08.1: Signaling PME with IRQ 39 [ 5.760727] pcieport 0000:20:07.1: Signaling PME with IRQ 40 [ 5.766955] pcieport 0000:20:08.1: Signaling PME with IRQ 42 [ 5.772742] BUG: unable to handle kernel paging request at 0000000000002088 [ 5.773618] PGD 0 P4D 0 [ 5.773618] Oops: 0000 [#1] SMP NOPTI [ 5.773618] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1+ #3 [ 5.773618] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.4.3 06/29/2018 [ 5.773618] RIP: 0010:__alloc_pages_nodemask+0xe2/0x2a0 [ 5.773618] Code: 00 00 44 89 ea 80 ca 80 41 83 f8 01 44 0f 44 ea 89 da c1 ea 08 83 e2 01 88 54 24 20 48 8b 54 24 08 48 85 d2 0f 85 46 01 00 00 <3b> 77 08 0f 82 3d 01 00 00 48 89 f8 44 89 ea 48 89 e1 44 89 e6 89 [ 5.773618] RSP: 0018:ffffaa600005fb20 EFLAGS: 00010246 [ 5.773618] RAX: 0000000000000000 RBX: 00000000006012c0 RCX: 0000000000000000 [ 5.773618] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080 [ 5.773618] RBP: 00000000006012c0 R08: 0000000000000000 R09: 0000000000000002 [ 5.773618] R10: 00000000006080c0 R11: 0000000000000002 R12: 0000000000000000 [ 5.773618] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000002 [ 5.773618] FS: 0000000000000000(0000) GS:ffff8c69afe00000(0000) knlGS:0000000000000000 [ 5.773618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.773618] CR2: 0000000000002088 CR3: 000000087e00a000 CR4: 00000000003406e0 [ 5.773618] Call Trace: [ 5.773618] new_slab+0xa9/0x570 [ 5.773618] ___slab_alloc+0x375/0x540 [ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] __slab_alloc+0x1c/0x38 [ 5.773618] __kmalloc_node_track_caller+0xc8/0x270 [ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] devm_kmalloc+0x28/0x60 [ 5.773618] pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] really_probe+0x73/0x420 [ 5.773618] driver_probe_device+0x115/0x130 [ 5.773618] __driver_attach+0x103/0x110 [ 5.773618] ? driver_probe_device+0x130/0x130 [ 5.773618] bus_for_each_dev+0x67/0xc0 [ 5.773618] ? klist_add_tail+0x3b/0x70 [ 5.773618] bus_add_driver+0x41/0x260 [ 5.773618] ? pcie_port_setup+0x4d/0x4d [ 5.773618] driver_register+0x5b/0xe0 [ 5.773618] ? pcie_port_setup+0x4d/0x4d [ 5.773618] do_one_initcall+0x4e/0x1d4 [ 5.773618] ? init_setup+0x25/0x28 [ 5.773618] kernel_init_freeable+0x1c1/0x26e [ 5.773618] ? loglevel+0x5b/0x5b [ 5.773618] ? rest_init+0xb0/0xb0 [ 5.773618] kernel_init+0xa/0x110 [ 5.773618] ret_from_fork+0x22/0x40 [ 5.773618] Modules linked in: [ 5.773618] CR2: 0000000000002088 [ 5.773618] ---[ end trace 1030c9120a03d081 ]--- [...] Other notes about the reproduction of this bug: After appling the following patch: 'commit 0d76bcc960e6 ("Revert "ACPI/PCI: Pay attention to device-specific _PXM node values"")' This bug is covered and not triggered on my test AMD machine. But it should still exist since dev->numa_node info can be set by other method on other archs when using nr_cpus param Signed-off-by: Pingfan Liu Cc: linuxppc-dev@lists.ozlabs.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton Cc: Michal Hocko Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Bjorn Helgaas Cc: Jonathan Cameron Cc: David Rientjes Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman --- include/linux/gfp.h | 10 +++++++++- mm/page_alloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 55 insertions(+), 7 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 0705164..0ddf809 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -442,6 +442,9 @@ static inline int gfp_zonelist(gfp_t flags) return ZONELIST_FALLBACK; } +extern struct zonelist *possible_zonelists[]; +extern int build_fallback_zonelists(int node); + /* * We get the zone list from the current node and the gfp_mask. * This zone list contains a maximum of MAXNODES*MAX_NR_ZONES zones. @@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags) */ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) { - return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); + if (unlikely(!possible_zonelists[nid])) { + WARN_ONCE(1, "alloc from offline node: %d\n", nid); + if (unlikely(build_fallback_zonelists(nid))) + nid = first_online_node; + } + return possible_zonelists[nid] + gfp_zonelist(flags); } #ifndef HAVE_ARCH_FREE_PAGE diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 17dbf6e..608b51d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -121,6 +121,8 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { }; EXPORT_SYMBOL(node_states); +struct zonelist *possible_zonelists[MAX_NUMNODES] __read_mostly; + /* Protect totalram_pages and zone->managed_pages */ static DEFINE_SPINLOCK(managed_page_count_lock); @@ -5180,7 +5182,6 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) return best_node; } - /* * Build zonelists ordered by node and zones within node. * This results in maximum locality--normal zone overflows into local @@ -5222,6 +5223,7 @@ static void build_thisnode_zonelists(struct zonelist *node_zonelists, zonerefs->zone_idx = 0; } + /* * Build zonelists ordered by zone and nodes within zones. * This results in conserving DMA zone[s] until all Normal memory is @@ -5229,7 +5231,8 @@ static void build_thisnode_zonelists(struct zonelist *node_zonelists, * may still exist in local DMA zone. */ -static void build_zonelists(struct zonelist *node_zonelists, int local_node) +static void build_zonelists(struct zonelist *node_zonelists, + int local_node, bool exclude_self) { static int node_order[MAX_NUMNODES]; int node, load, nr_nodes = 0; @@ -5240,6 +5243,8 @@ static void build_zonelists(struct zonelist *node_zonelists, int local_node) load = nr_online_nodes; prev_node = local_node; nodes_clear(used_mask); + if (exclude_self) + node_set(local_node, used_mask); memset(node_order, 0, sizeof(node_order)); while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { @@ -5258,7 +5263,40 @@ static void build_zonelists(struct zonelist *node_zonelists, int local_node) } build_zonelists_in_node_order(node_zonelists, node_order, nr_nodes); - build_thisnode_zonelists(node_zonelists, local_node); + if (!exclude_self) + build_thisnode_zonelists(node_zonelists, local_node); + possible_zonelists[local_node] = node_zonelists; +} + +/* this is rare case in which building zonelists for offline node, but + * there is dev used on it + */ +int build_fallback_zonelists(int node) +{ + static DEFINE_SPINLOCK(lock); + nodemask_t *used_mask; + struct zonelist *zl; + int ret = 0; + + spin_lock(&lock); + if (unlikely(possible_zonelists[node] != NULL)) + goto unlock; + + used_mask = kmalloc(sizeof(nodemask_t), GFP_ATOMIC); + zl = kmalloc(sizeof(struct zonelist)*MAX_ZONELISTS, GFP_ATOMIC); + if (unlikely(!used_mask || !zl)) { + ret = -ENOMEM; + kfree(used_mask); + kfree(zl); + goto unlock; + } + + __nodes_complement(used_mask, &node_online_map, MAX_NUMNODES); + build_zonelists(zl, node, true); + kfree(used_mask); +unlock: + spin_unlock(&lock); + return ret; } #ifdef CONFIG_HAVE_MEMORYLESS_NODES @@ -5283,7 +5321,8 @@ static void setup_min_unmapped_ratio(void); static void setup_min_slab_ratio(void); #else /* CONFIG_NUMA */ -static void build_zonelists(struct zonelist *node_zonelists, int local_node) +static void build_zonelists(struct zonelist *node_zonelists, + int local_node, bool _unused) { int node, local_node; struct zoneref *zonerefs; @@ -5357,12 +5396,13 @@ static void __build_all_zonelists(void *data) * building zonelists is fine - no need to touch other nodes. */ if (self && !node_online(self->node_id)) { - build_zonelists(self->node_zonelists, self->node_id); + build_zonelists(self->node_zonelists, self->node_id, false); } else { for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); - build_zonelists(pgdat->node_zonelists, pgdat->node_id); + build_zonelists(pgdat->node_zonelists, pgdat->node_id, + false); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES From patchwork Thu Dec 20 09:50:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10738735 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1320413BF for ; Thu, 20 Dec 2018 09:51:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F2C0A28415 for ; Thu, 20 Dec 2018 09:51:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E387D28712; Thu, 20 Dec 2018 09:51:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 72B7A28415 for ; Thu, 20 Dec 2018 09:51:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 401998E0007; Thu, 20 Dec 2018 04:51:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 389B98E0003; Thu, 20 Dec 2018 04:51:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22C7C8E0007; Thu, 20 Dec 2018 04:51:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id CDC878E0003 for ; Thu, 20 Dec 2018 04:51:27 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id n17so1143448pfk.23 for ; Thu, 20 Dec 2018 01:51:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=TzK/offuMiftcp8zdhetLLgTPbF5oMiPcKmIl7aXSqs=; b=AigiHoWjsp7893xY4iUp2NDuOiS0JNh5rjo+AKgTfUXv4xLPyPP6LYj4iMWlrzHTBd c0RA2JzHQa6f15oW3RTdxX9V8oG5JY/RqXgv0HzHD2+ty29rIQ53xrIYOku8YFEKwu7w qiqdnJt3ivoR/IVLjHOWr+uAh1zIeVb5+iFQ2p/xUPjJAN0f+jK7SQzeXBtcKlzqPlnj 5hrmTvkkhgaLO4IwSd1da+SfXLWYHbf3p2b5Q3DyBUCl/qdZXLZ60wY5ekMMXBxkkeLY 3V483OAm/IWnQ2VQAeEytLz3ADQ1wwNy951K+iyVwGYQUfUSUn11PN3/UrTb0beDQIDs KALQ== X-Gm-Message-State: AA+aEWYyQ9zAatTdVXCdjergl9elKdaDQ7cKfHwLpT1h2uR3DD5tYVY6 i6ywjpUfoep7rPxCDgyaTCfFSICnZcM9kZFI5TyilybP6ht96cHJNEMFduId4Oc6SxLyMUSEyZe oOUIWh+GGLlbIIxw7aeB4nXPuRv2TcL+lDSsATxxuIAt0NiGgJdnWeEiWsrUJsQibdsvTMcO1KP D4vskyunmauM63thhiN1pZXSUcl5tmoUtdzUF6+ScXDPEAHwUuXxC/cjQZaPG84dmu+0cleZtIT NJbywS5IHjJzc0rWx3HYQdoufClmn+tJLjK6CZ+gplCw1mYayQ87L0M/NBGK3RhxM6wLAUwpMLI gXsoHIKIliM2W0qNUAojv592+bxC1KNE59R1RYiEoaEyLss78IXdhUrSAnCA6FqTSsGjc+DY3Vg q X-Received: by 2002:a63:a35c:: with SMTP id v28mr22253178pgn.205.1545299487440; Thu, 20 Dec 2018 01:51:27 -0800 (PST) X-Received: by 2002:a63:a35c:: with SMTP id v28mr22253121pgn.205.1545299485704; Thu, 20 Dec 2018 01:51:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545299485; cv=none; d=google.com; s=arc-20160816; b=DrU4y0WeOVU4rTDgPMaZsuUYSHvNQBmL7n/Wet/JTy+r7T94Dd6my5lXFJRZ4RZc6S 9631o50l49EwkPe4BMveAUvqwGevR7ztPwPwNghJ9MPYmCO9buJjGzWNgJdIhtD7iV19 L0Ku3nUT/7tOqDusfMGTdPySW5iuNiUB5/1Bk/cA/RbO/FWbW5SUdavvyCY9wEhSuALC Ox+bkdoWQAdFFGMptLiNP/tZdkbx+XX3yPn7HxIwMuUCSf6xrH3KgehRBrYNPnzeK46Y Oerrbs8cg9oPvTGbrYoXWL0SbFRefWiZnAqelLjZjgFskiAM44La9M0U4F0E2FSKOtmj I+0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=TzK/offuMiftcp8zdhetLLgTPbF5oMiPcKmIl7aXSqs=; b=lZCm3v16q3ZcSoqsiSrTvgwoy9W6N8lSIoMLZJxRbprVWyuJ1lGvZBS/hBGiKAGQIG +zJv8OiF1ikCSox3FffIwMtnrN/8/LOZDbCssWCDDvLlqmzbAIzc4yugaUCdlyz/AMRU 6i553XmJR9qjobmGJT9NKc4bCTZBxpLhnXi8hPVNIAHmIGTQxb2eyWcN+yzUQNJ1fJ22 k02bmRs6D57IjOiNIrhkWY/mq8tGoye+kAbRrbXlfAKeylMY78Qi2urxCdXMsWyHW/FX WufNzl4Z3puBV97Wyo86M4YOiT64O6MipYriKrYltQzxOPbOiMOGPqeLu5vohu7Tf0wc 1Uow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="ZZ5Sby/E"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id o19sor31213473pll.44.2018.12.20.01.51.25 for (Google Transport Security); Thu, 20 Dec 2018 01:51:25 -0800 (PST) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="ZZ5Sby/E"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TzK/offuMiftcp8zdhetLLgTPbF5oMiPcKmIl7aXSqs=; b=ZZ5Sby/EbDZxJq5f8tEKiQXeJ3J+mgYGawoOSTjOqgDokHbe87vYBeNjo26rvurFaH u3zyMw8brzZW1JMc9FjAOrm4Aj/eCAyMVOZ+o/HrnUx6CdhsglE5aND0oCD2+FCKLRHg AMRuF+da/oCmYx8zey98h3O8QieoawwVMlJx8f9LNVqnOOy6Iu4fvIFg/4gmx75BhE61 fgAlr6TBOTxYrqpZxnaIER6Z+ERYvDmAQ7v6Ar/6dPfQSwRPJepnwqBw/bOO4t8teuvH qKy7ibtg29/9AnICPYBinuXDq223N45klFHNLiK7+CQYyaGDGMmJlCB6UlUmOoqT7BX9 SLuQ== X-Google-Smtp-Source: AFSGD/W+peKcgU2gsTb6UnoUotPkHL3GAnOzUnutuIhsRU+AXYQggUTAl5j7u1dOBpTeYLPbe+s5fw== X-Received: by 2002:a17:902:622:: with SMTP id 31mr23030237plg.171.1545299485120; Thu, 20 Dec 2018 01:51:25 -0800 (PST) Received: from mylaptop.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id 125sm33355206pfx.159.2018.12.20.01.51.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Dec 2018 01:51:24 -0800 (PST) From: Pingfan Liu To: linux-mm@kvack.org Cc: Pingfan Liu , linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Michal Hocko , Vlastimil Babka , Mike Rapoport , Bjorn Helgaas , Jonathan Cameron , David Rientjes , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Subject: [PATCHv2 3/3] powerpc/numa: make all possible node be instanced against NULL reference in node_zonelist() Date: Thu, 20 Dec 2018 17:50:39 +0800 Message-Id: <1545299439-31370-4-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1545299439-31370-1-git-send-email-kernelfans@gmail.com> References: <1545299439-31370-1-git-send-email-kernelfans@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch tries to resolve a bug rooted at mm when using nr_cpus. It was reported at [1]. The root cause is: device->numa_node info is used as preferred_nid param for __alloc_pages_nodemask(), which causes NULL reference when ac->zonelist = node_zonelist(preferred_nid, gfp_mask), due to the preferred_nid is not online and not instanced. Hence the bug affects all archs if a machine having a memory less numa-node, but a device on the node is used and provide numa_node info to __alloc_pages_nodemask(). This patch makes all possible node online for ppc. [1]: https://lore.kernel.org/patchwork/patch/1020838/ Signed-off-by: Pingfan Liu Cc: linuxppc-dev@lists.ozlabs.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton Cc: Michal Hocko Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Bjorn Helgaas Cc: Jonathan Cameron Cc: David Rientjes Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman --- Note: [1-2/3] implements one way to fix the bug, while this patch tries another way. Hence using this patch when [1-2/3] is not acceptable. arch/powerpc/mm/numa.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index ce28ae5..31d81a4 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -864,10 +864,19 @@ void __init initmem_init(void) memblock_dump_all(); - for_each_online_node(nid) { + /* Instance all possible nodes to overcome potential NULL reference + * issue on node_zonelist() when using nr_cpus + */ + for_each_node(nid) { unsigned long start_pfn, end_pfn; - get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + if (node_online(nid)) + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + else { + start_pfn = end_pfn = 0; + /* online it, so later zonelists[] will be built */ + node_set_online(nid); + } setup_node_data(nid, start_pfn, end_pfn); sparse_memory_present_with_active_regions(nid); }