From patchwork Tue Dec 4 03:05:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10711067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A11217D5 for ; Tue, 4 Dec 2018 03:06:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C10F2AD2E for ; Tue, 4 Dec 2018 03:06:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2FF282AD3A; Tue, 4 Dec 2018 03:06:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B27E2AD2E for ; Tue, 4 Dec 2018 03:06:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EAC26B6C7B; Mon, 3 Dec 2018 22:06:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 89A5F6B6C7C; Mon, 3 Dec 2018 22:06:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73C396B6C7D; Mon, 3 Dec 2018 22:06:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 2F7506B6C7B for ; Mon, 3 Dec 2018 22:06:14 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id f9so8127123pgs.13 for ; Mon, 03 Dec 2018 19:06:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=V8AZUaHMn0cHqeZMcXXrYDSL01wORoZ3fn4vXe9m5EI=; b=sw/Vi39K6skpNBTW+1Tjguj3MvEXlftU9JSiw2qx4fdLIunKgBLqCX4ceXXJKCk3I7 i+B8LevvTCndXoDfFQJZIE+E5JBygWBSAOF42pmeglMXz5ELsQnaaR4l6801RdQ/3xRI Lq+P7zE9GSVIj7ZCdYH3Sx3smXdvF1fRxpOLJcttgVMN8vq5coyB6PYbst95grQRtMEQ I8vpAZWIPsHSkYrYKrTijl5AjjF+q8omSV0JAdsmbvR5iaunCEF7UtHJixhffVEpyPCG ki2qtFkvB32MEtN/wCUMMD1yWsOtY9e2tuPEsdY6Q4E7hKm6OxZ1hBC3xZ9SsGeuw3CC h3sg== X-Gm-Message-State: AA+aEWZNg30Aqi33Ix8QTnkwxIWMyut27oiKHKivxmQp6jLyLezlu5ON RdMpJtLnneGdRuP/7sIQpFI9nowJWUBk/E69GIu16jO6RBfZbRlpd2FTnx9GL4CmDY1wzltlh0W b0ioowdgv9PBPTkoXyPkt3p1LDqV+K1S72Uxux5YuhTf1wBvYq6NeodPlSkEklHAZwzjR4zl1nm kXx5a7pydKAIRVMPwO+Iwm5acrhPRs0hN0lP1I40S/wXYxcKNs/fwT+y92n8EB2HEXMgr7CgtJL Qs9NGvGOrRF/QiksFDdxRCOz9DZiNOSx1BUfSpfZSh4fPGXzklu3+U01jvzirHHbdh0xJBv6i5S TTt0mmcA74Rckit08NGWKaANDC08ETGQwuMSfBfkfx/b5GHSBJeobCmVWFKc9PRCXzktZGCJ2QY 6 X-Received: by 2002:a17:902:e290:: with SMTP id cf16mr18895043plb.81.1543892773732; Mon, 03 Dec 2018 19:06:13 -0800 (PST) X-Received: by 2002:a17:902:e290:: with SMTP id cf16mr18894995plb.81.1543892772711; Mon, 03 Dec 2018 19:06:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543892772; cv=none; d=google.com; s=arc-20160816; b=DddiGyAzKzByFDIkXIhnmJHbCxC33MAVn0Q+DESQ50NU3cA0ckNXZTUFD/XqsRO5ta kWcN2Z+7/dgoH3aewyYc5VI7Nkh1xNJZeHuARt3lGGj8q7zsBrCZkLzJSiNI0beEk9A8 Jlq2dkx9EV2De5QTHwXOgSzjzO2j5yqBBqXcVd7T4f1jYdy5FG9XScqUnGRHTSmeartV DqiR32kbqt1+P37k5Fs1tWoVQ9AAhnTpwChPkBDmQoA5bO95huAQbTnoyHj1cAKcqs5e QFQ42GwSCr/h4e5M8dCaEGeNIOdl+Fuq8VFqeQtOs8ISx+JLm0/2zeoSoIAnYWNFLQx4 w0Rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature; bh=V8AZUaHMn0cHqeZMcXXrYDSL01wORoZ3fn4vXe9m5EI=; b=FnPfjHbGaTLl5gH5eDCQsBNVVs27SjYFjHBlg2983duB36R+qSb0C/gVwVrnzZcd15 YWr7/qV+mGSmO9bnuif9IDEcQhPaKpaxFFaBxaJuIpIBJV1MjojWEZXgRi7eJW122W8T v2CdYyCf/UfKUJx2dNFwJTLAh7m7bCDrRE3ZES8efsqUvSlO4QtmOD7fefyM2324ncFa Pxh7cYH1jVC1LwyrnRD7ogn78y4jWEYaVD8GsgnwoQt4j5qP/MuCqBcitYJTggLeBip2 ZFqWWxzS6sQS07+OrCP3jz1RsiutyhOVPJvpjFbcA9fS5PgW/TxHneCRQmgx9LO7i0jw Fm5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="Y+9aT/he"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id v12sor21549609pfj.17.2018.12.03.19.06.12 for (Google Transport Security); Mon, 03 Dec 2018 19:06:12 -0800 (PST) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="Y+9aT/he"; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=V8AZUaHMn0cHqeZMcXXrYDSL01wORoZ3fn4vXe9m5EI=; b=Y+9aT/he+yljvEtgvxESVWllzb51va2kKtitlhdbJJxZNX8t1nDaed7+cQW/BZ4A/+ P0cJgRuqkKw5XfaaYMoBqPxKn0XbRDkQLBBrhX0ubejSRLNLR3brxw5ANThBzPQLwvJv F06OS2zwrlb8mt1JvdWRzcfG7Vf1c4ldfdsltfF2Z2Cpjdjeoob5IXatXnoyMaiXdtNy EZru3PyAxNUm9TIerSDZxfr1I94QqCJTnu3p1jiAYQ6z//dulOWE8jdjJzgR+5rXhYnl lxbIMj0dusn6f4U4oMiy0UjssPh2eqoDiMgtJWV64eY2T1Ym+qNw6c073FNOAxbizG0o PIOg== X-Google-Smtp-Source: AFSGD/X96T4OEvxK72np5BaEqMMqKFzmIL7Mlgy0eTM16ppMUQ/Dm5JQCZxX1NwnkL0JnqYup1D43Q== X-Received: by 2002:a62:3c1:: with SMTP id 184mr6397667pfd.56.1543892771848; Mon, 03 Dec 2018 19:06:11 -0800 (PST) Received: from mylaptop.pek2.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id l185sm11969637pfl.54.2018.12.03.19.06.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Dec 2018 19:06:10 -0800 (PST) From: Pingfan Liu To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Pingfan Liu , Andrew Morton , Michal Hocko , Vlastimil Babka , Mike Rapoport , Bjorn Helgaas , Jonathan Cameron Subject: [PATCH] mm/alloc: fallback to first node if the wanted node offline Date: Tue, 4 Dec 2018 11:05:57 +0800 Message-Id: <1543892757-4323-1-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP During my test on some AMD machine, with kexec -l nr_cpus=x option, the kernel failed to bootup, because some node's data struct can not be allocated, e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But device->numa_node info is used as preferred_nid param for __alloc_pages_nodemask(), which causes NULL reference ac->zonelist = node_zonelist(preferred_nid, gfp_mask); This patch tries to fix the issue by falling back to the first online node, when encountering such corner case. Notes about the crashing info: -1. kexec -l with nr_cpus=4 -2. system info NUMA node0 CPU(s): 0,8,16,24 NUMA node1 CPU(s): 2,10,18,26 NUMA node2 CPU(s): 4,12,20,28 NUMA node3 CPU(s): 6,14,22,30 NUMA node4 CPU(s): 1,9,17,25 NUMA node5 CPU(s): 3,11,19,27 NUMA node6 CPU(s): 5,13,21,29 NUMA node7 CPU(s): 7,15,23,31 -3. panic stack [...] [ 5.721547] atomic64_test: passed for x86-64 platform with CX8 and with SSE [ 5.729187] pcieport 0000:00:01.1: Signaling PME with IRQ 34 [ 5.735187] pcieport 0000:00:01.2: Signaling PME with IRQ 35 [ 5.741168] pcieport 0000:00:01.3: Signaling PME with IRQ 36 [ 5.747189] pcieport 0000:00:07.1: Signaling PME with IRQ 37 [ 5.754061] pcieport 0000:00:08.1: Signaling PME with IRQ 39 [ 5.760727] pcieport 0000:20:07.1: Signaling PME with IRQ 40 [ 5.766955] pcieport 0000:20:08.1: Signaling PME with IRQ 42 [ 5.772742] BUG: unable to handle kernel paging request at 0000000000002088 [ 5.773618] PGD 0 P4D 0 [ 5.773618] Oops: 0000 [#1] SMP NOPTI [ 5.773618] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1+ #3 [ 5.773618] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.4.3 06/29/2018 [ 5.773618] RIP: 0010:__alloc_pages_nodemask+0xe2/0x2a0 [ 5.773618] Code: 00 00 44 89 ea 80 ca 80 41 83 f8 01 44 0f 44 ea 89 da c1 ea 08 83 e2 01 88 54 24 20 48 8b 54 24 08 48 85 d2 0f 85 46 01 00 00 <3b> 77 08 0f 82 3d 01 00 00 48 89 f8 44 89 ea 48 89 e1 44 89 e6 89 [ 5.773618] RSP: 0018:ffffaa600005fb20 EFLAGS: 00010246 [ 5.773618] RAX: 0000000000000000 RBX: 00000000006012c0 RCX: 0000000000000000 [ 5.773618] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080 [ 5.773618] RBP: 00000000006012c0 R08: 0000000000000000 R09: 0000000000000002 [ 5.773618] R10: 00000000006080c0 R11: 0000000000000002 R12: 0000000000000000 [ 5.773618] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000002 [ 5.773618] FS: 0000000000000000(0000) GS:ffff8c69afe00000(0000) knlGS:0000000000000000 [ 5.773618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.773618] CR2: 0000000000002088 CR3: 000000087e00a000 CR4: 00000000003406e0 [ 5.773618] Call Trace: [ 5.773618] new_slab+0xa9/0x570 [ 5.773618] ___slab_alloc+0x375/0x540 [ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] __slab_alloc+0x1c/0x38 [ 5.773618] __kmalloc_node_track_caller+0xc8/0x270 [ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] devm_kmalloc+0x28/0x60 [ 5.773618] pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] really_probe+0x73/0x420 [ 5.773618] driver_probe_device+0x115/0x130 [ 5.773618] __driver_attach+0x103/0x110 [ 5.773618] ? driver_probe_device+0x130/0x130 [ 5.773618] bus_for_each_dev+0x67/0xc0 [ 5.773618] ? klist_add_tail+0x3b/0x70 [ 5.773618] bus_add_driver+0x41/0x260 [ 5.773618] ? pcie_port_setup+0x4d/0x4d [ 5.773618] driver_register+0x5b/0xe0 [ 5.773618] ? pcie_port_setup+0x4d/0x4d [ 5.773618] do_one_initcall+0x4e/0x1d4 [ 5.773618] ? init_setup+0x25/0x28 [ 5.773618] kernel_init_freeable+0x1c1/0x26e [ 5.773618] ? loglevel+0x5b/0x5b [ 5.773618] ? rest_init+0xb0/0xb0 [ 5.773618] kernel_init+0xa/0x110 [ 5.773618] ret_from_fork+0x22/0x40 [ 5.773618] Modules linked in: [ 5.773618] CR2: 0000000000002088 [ 5.773618] ---[ end trace 1030c9120a03d081 ]--- [...] Other notes about the reproduction of this bug: After appling the following patch: commit 0d76bcc960e6057750fcf556b65da13f8bbdfd2b Author: Bjorn Helgaas Date: Tue Nov 13 08:38:17 2018 -0600 Revert "ACPI/PCI: Pay attention to device-specific _PXM node values" This bug is covered and not triggered on my test AMD machine. But it should still exist since dev->numa_node info can be set by other method on other archs when using nr_cpus param Signed-off-by: Pingfan Liu Cc: Andrew Morton Cc: Michal Hocko Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Bjorn Helgaas Cc: Jonathan Cameron Reported-by: Pingfan Liu Signed-off-by: Michal Hocko --- include/linux/gfp.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 76f8db0..8324953 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -453,6 +453,8 @@ static inline int gfp_zonelist(gfp_t flags) */ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) { + if (unlikely(!node_online(nid))) + nid = first_online_node; return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); }