From patchwork Wed Nov 4 06:10:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 11879665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AEF8174A for ; Wed, 4 Nov 2020 06:10:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CF4BA223FB for ; Wed, 4 Nov 2020 06:10:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF4BA223FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5CE36B005D; Wed, 4 Nov 2020 01:10:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C0E526B006C; Wed, 4 Nov 2020 01:10:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB0A96B006E; Wed, 4 Nov 2020 01:10:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id 7B9176B005D for ; Wed, 4 Nov 2020 01:10:20 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1CCE2180AD802 for ; Wed, 4 Nov 2020 06:10:20 +0000 (UTC) X-FDA: 77445711000.14.vein64_440c531272bf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id EBCC418229818 for ; Wed, 4 Nov 2020 06:10:19 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,feng.tang@intel.com,,RULES_HIT:30051:30054:30064:30075:30090,0,RBL:134.134.136.20:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95;04y8opchjpr3fk6iofbqcwedha69jocq31oxbz15a8xqojpbpqth3rno3koiwf9.oefrnmrtbfh8wq9qjasqq3wrj81gg7r5i17c8ouuxm5wj3hjf6rsmpu4j1w9i9t.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:68,LUA_SUMMARY:none X-HE-Tag: vein64_440c531272bf X-Filterd-Recvd-Size: 3009 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Nov 2020 06:10:17 +0000 (UTC) IronPort-SDR: pyC5gQwvd6K+B8Gv/4+K/wMkwOh5w739ZbqhHJRxJPGn5UcO+NbvRjGInE5Kq/it5++Kg9DRiM /VdNWyg48BdA== X-IronPort-AV: E=McAfee;i="6000,8403,9794"; a="156160532" X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="156160532" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2020 22:10:15 -0800 IronPort-SDR: txYcJcMOluOi+dwb9fPJCb8Z7iIMmkGOc1fntRQlHry7dsQAOZGxoM4G7PIyX3HW6xn/XIJDIC h7BsqmIcpR+w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="325489774" Received: from shbuild999.sh.intel.com ([10.239.147.98]) by orsmga006.jf.intel.com with ESMTP; 03 Nov 2020 22:10:13 -0800 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Matthew Wilcox , Mel Gorman , dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Feng Tang Subject: [RFC PATCH 1/2] mm, oom: dump meminfo for all memory nodes Date: Wed, 4 Nov 2020 14:10:09 +0800 Message-Id: <1604470210-124827-2-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604470210-124827-1-git-send-email-feng.tang@intel.com> References: <1604470210-124827-1-git-send-email-feng.tang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In some OOM cases, if there is memory node binding(current->mems_allowed is not NULL), system may only print the meminfo for these bound nodes, while other nodes' info could still be important for debugging. For example on a platform with one normal node (has DMA/DMA32/NORMAL... zones) and one node which only has movable zone (either for memory hotplug case or a persistent memory node), some user will run docker while binding memory to the movable node. many memory allocations originated from the docker instance will fall back to the other node, and when a OOM happens, meminfo for both nodes are needed. So extend the show_mem() to cover all memory nodes. Signed-off-by: Feng Tang --- mm/oom_kill.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8b84661..601476cc 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -462,7 +462,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) if (is_memcg_oom(oc)) mem_cgroup_print_oom_meminfo(oc->memcg); else { - show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask); + show_mem(SHOW_MEM_FILTER_NODES, &node_states[N_MEMORY]); if (is_dump_unreclaim_slabs()) dump_unreclaimable_slab(); } From patchwork Wed Nov 4 06:10:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 11879667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8158139F for ; Wed, 4 Nov 2020 06:10:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A82D720780 for ; Wed, 4 Nov 2020 06:10:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A82D720780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CD0416B006C; Wed, 4 Nov 2020 01:10:21 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B96726B006E; Wed, 4 Nov 2020 01:10:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A86D66B0070; Wed, 4 Nov 2020 01:10:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id 76C1F6B006C for ; Wed, 4 Nov 2020 01:10:21 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1CA4F181AC9CB for ; Wed, 4 Nov 2020 06:10:21 +0000 (UTC) X-FDA: 77445711042.25.geese70_2910b09272bf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id F209C1804E3A0 for ; Wed, 4 Nov 2020 06:10:20 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,feng.tang@intel.com,,RULES_HIT:6226:30029:30054:30056:30064:30070:30090,0,RBL:134.134.136.20:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100;04ygfnzj39763c9ecoxr5yrmsd8z6ocd9kygtq5111ah9djuxnh9wyjdc19ipse.csfefqx5pfp4pjztr71w9n19ask668pi17petauqjz3mq8zxy6gwbizmyifrwim.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: geese70_2910b09272bf X-Filterd-Recvd-Size: 4810 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Nov 2020 06:10:18 +0000 (UTC) IronPort-SDR: b4Rba+fSbXayZCbVNQBIa6C0p34Z2pIpdcHVuEX3ExpZFHL0amkkI4oV0uEsr2lrWpC8NA9SsM inmqOM+yoyYg== X-IronPort-AV: E=McAfee;i="6000,8403,9794"; a="156160536" X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="156160536" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2020 22:10:18 -0800 IronPort-SDR: zOuneZdDbqM4h715AhIH0LDXMw042nHyaL2bxskRPRrvh/GuIrjKmtWTcJli3mb824/JDTUcyf C54cIOQsWl2g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="325489782" Received: from shbuild999.sh.intel.com ([10.239.147.98]) by orsmga006.jf.intel.com with ESMTP; 03 Nov 2020 22:10:16 -0800 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Matthew Wilcox , Mel Gorman , dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Feng Tang Subject: [RFC PATCH 2/2] mm, page_alloc: loose the node binding check to avoid helpless oom killing Date: Wed, 4 Nov 2020 14:10:10 +0800 Message-Id: <1604470210-124827-3-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1604470210-124827-1-git-send-email-feng.tang@intel.com> References: <1604470210-124827-1-git-send-email-feng.tang@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the incoming of memory hotplug feature and persitent memory, in some platform there are memory nodes which only have movable zone. Users may bind some of their workload(like docker/container) to these nodes, and there are many reports of OOM and page allocation failures, one callstack is: [ 1387.877565] runc:[2:INIT] invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=0 [ 1387.877568] CPU: 8 PID: 8291 Comm: runc:[2:INIT] Tainted: G W I E 5.8.2-0.g71b519a-default #1 openSUSE Tumbleweed (unreleased) [ 1387.877569] Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.6.4 04/09/2020 [ 1387.877570] Call Trace: [ 1387.877579] dump_stack+0x6b/0x88 [ 1387.877584] dump_header+0x4a/0x1e2 [ 1387.877586] oom_kill_process.cold+0xb/0x10 [ 1387.877588] out_of_memory.part.0+0xaf/0x230 [ 1387.877591] out_of_memory+0x3d/0x80 [ 1387.877595] __alloc_pages_slowpath.constprop.0+0x954/0xa20 [ 1387.877599] __alloc_pages_nodemask+0x2d3/0x300 [ 1387.877602] pipe_write+0x322/0x590 [ 1387.877607] new_sync_write+0x196/0x1b0 [ 1387.877609] vfs_write+0x1c3/0x1f0 [ 1387.877611] ksys_write+0xa7/0xe0 [ 1387.877617] do_syscall_64+0x52/0xd0 [ 1387.877621] entry_SYSCALL_64_after_hwframe+0x44/0xa9 In a full container run, like installing and running the stress tool "stress-ng", there are many different kinds of page requests (gfp_masks), many of which only allow non-movable zones. Some of them can fall back to other nodes with NORMAL/DMA32/DMA zones, but others are blocked by the __GFP_HARDWALL or ALLOC_CPUSET check, and cause OOM killing. But OOM killing won't do any help here, as this is not an issue of lack of free memory, but simply blocked by the node binding policy check. So loose the policy check for this case. Signed-off-by: Feng Tang --- mm/page_alloc.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d772206..efd49a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4669,6 +4669,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (!ac->preferred_zoneref->zone) goto nopage; + /* + * If the task's target memory nodes only has movable zones, while the + * gfp_mask allowed zone is lower than ZONE_MOVABLE, loose the check + * for __GFP_HARDWALL and ALLOC_CPUSET, otherwise it could trigger OOM + * killing, which still can not solve this policy check. + */ + if (ac->highest_zoneidx <= ZONE_NORMAL) { + int nid; + unsigned long unmovable = 0; + + /* FIXME: this could be a separate function */ + for_each_node_mask(nid, cpuset_current_mems_allowed) { + unmovable += NODE_DATA(nid)->node_present_pages - + NODE_DATA(nid)->node_zones[ZONE_MOVABLE].present_pages; + } + + if (!unmovable) { + gfp_mask &= ~(__GFP_HARDWALL); + alloc_flags &= ~ALLOC_CPUSET; + } + } + if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac);