From patchwork Wed Jun 20 11:20:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10477059 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 012D960383 for ; Wed, 20 Jun 2018 11:21:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E538D28E3F for ; Wed, 20 Jun 2018 11:21:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA17228DE5; Wed, 20 Jun 2018 11:21:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A749D28E36 for ; Wed, 20 Jun 2018 11:21:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C8DC6B0003; Wed, 20 Jun 2018 07:21:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 979F86B0006; Wed, 20 Jun 2018 07:21:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81ADD6B0007; Wed, 20 Jun 2018 07:21:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 59E2A6B0003 for ; Wed, 20 Jun 2018 07:21:10 -0400 (EDT) Received: by mail-io0-f200.google.com with SMTP id g22-v6so2492261ioh.5 for ; Wed, 20 Jun 2018 04:21:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=otkOCec3WYw1fhmNs0LMlPyQilGqSY87fnRQz4u1GD0=; b=lD2IiGhp4Qn2hJMdg+WjaQo0kM52iMAhFNIrwLrA65fFXONIDmL1q0/umDvISuByqQ YRGvxDOnI4B4zplovWSpgUHJG10QDJNe8ZVfwYDfPxvmfRtcOf3kmCTafOinyFgFzdDd YuZ+lDiN1kAGofyRT87qaTsuQDuCPqW4aZ2Z0UvcsXRW+HAiTm3Pye+AwGuFXPuFtSdj wJaVhxySIOOhemHOYrGspuf7U0Pm/qCcT8oqqWJ/biHA0J8DD8bqUUcKlMIVIP+nGmnW zW5a06kqDYTGZ/7URt7EkpX3Mg6iQ7sOo90UltmJbDrYuhwsn9DkJIf0AEZfdpySMi/A N29w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: APt69E3sd8Gdlao3YwW0k6aQvr4Fyw6imoOmWgQwGgFonYKL9A3a/2Rl RNobQrBhbEB7g3StevZRE2KSlGP9FwZq5A9N0ZOsOckqsrOJWJGmNFT2iQwvn16g1fOiAijxiNp PjwMdWOBu6VLTP6EZpNARn8I5wecwwRRTBo+IrdjT7qL+mQu4sr/kWFSG5u3/K0XzFQ== X-Received: by 2002:a02:48d:: with SMTP id 135-v6mr17223228jab.92.1529493670043; Wed, 20 Jun 2018 04:21:10 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIM2/royMKbZJO2DgA1bJZ4bHfi6pgcGvBhsNXsXBm8Ca3F+ayOTaTJZUbU2h5MvzJie9O1 X-Received: by 2002:a02:48d:: with SMTP id 135-v6mr17223162jab.92.1529493668550; Wed, 20 Jun 2018 04:21:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529493668; cv=none; d=google.com; s=arc-20160816; b=KJoD8eKPHbfb9j1CIBPexlxa3bFQ83yA+rUzW2eOFl+6BDKU/rP8+7GXA3NcZRHYVR pXtWhxyrA/HvD7sIcgDpGlTB2VpMOZSJ3gRL+SGncelfnaVJGHJyGR0y7jrMbkEWVHY/ hIkwqIMhoaQSJJlS63Oqyhnwqt5kPSmaFR2Wztz43hvsAPtP7Mcv7N/PJ/wuVuDLb2qN xkptV+1TCKSyjkruOa6t3qHKNONTz/5OFgMM0y0c1cDMT7GMcxRXFkIkcicnT7CWCgeg vc9GrLKzF4zRkj+qr74zK0mL8B2e2hVCcyv0M1VAq+Kkrc1BdPV+9+nprwPHmx52yScB SQZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:arc-authentication-results; bh=otkOCec3WYw1fhmNs0LMlPyQilGqSY87fnRQz4u1GD0=; b=Eni3a6aeNP95wXr/7wt4kvr0m1ov0kYWukYtSLJhFEy+SvoLwDS57lrpHwkrFg8eyJ S4GPK7tW6XSS/Jfp48sW6M102cl1DfKXCq+kw51j+ujfe6Q/l9kUHrvBF7XwMAZ26Vsr 3kKaHgc0n8iVUH91PfhwEJ6YMcY3kWPF+IKwYWA1x/ZiRUkCGeuERNm6F1Mg0dNKKMjo 1WxgmQvYGVrC8CfT8lovv+UnI1gyhe0ApPlwvo3w0P70oKcWKA7M0hNJw1FxVr7nOpby FtlOmcQ3d/vJTVam7pl4Tg26Y2uRcrQnpxZ37BtaHJ35LKplQVjORy2bydF02UIWgJH6 KFLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id m17-v6si1364031ioj.65.2018.06.20.04.21.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jun 2018 04:21:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav304.sakura.ne.jp (fsav304.sakura.ne.jp [153.120.85.135]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w5KBKsDa061699; Wed, 20 Jun 2018 20:20:54 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav304.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp); Wed, 20 Jun 2018 20:20:54 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w5KBKnuQ061680 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 20 Jun 2018 20:20:54 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: linux-mm@kvack.org Cc: mhocko@kernel.org, rientjes@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Tetsuo Handa Subject: [PATCH] mm, oom: Bring OOM notifier callbacks to outside of OOM killer. Date: Wed, 20 Jun 2018 20:20:38 +0900 Message-Id: <1529493638-6389-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Sleeping with oom_lock held can cause AB-BA lockup bug because __alloc_pages_may_oom() does not wait for oom_lock. Since blocking_notifier_call_chain() in out_of_memory() might sleep, sleeping with oom_lock held is currently an unavoidable problem. As a preparation for not to sleep with oom_lock held, this patch brings OOM notifier callbacks to outside of OOM killer, with two small behavior changes explained below. One is that this patch makes it impossible for SysRq-f and PF-OOM to reclaim via OOM notifier. But such change should be tolerable because "we unlikely try to use SysRq-f for reclaiming memory via OOM notifier callbacks" and "pagefault_out_of_memory() will be called when OOM killer selected current thread as an OOM victim after OOM notifier callbacks already failed to reclaim memory". The other is that this patch makes it possible to reclaim memory via OOM notifier after OOM killer is disabled (that is, suspend/hibernate is in progress). But such change should be safe because of pm_suspended_storage() check. Signed-off-by: Tetsuo Handa --- include/linux/oom.h | 1 + mm/oom_kill.c | 35 ++++++++++++++++++------ mm/page_alloc.c | 76 +++++++++++++++++++++++++++++++---------------------- 3 files changed, 73 insertions(+), 39 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 6adac11..085b033 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -101,6 +101,7 @@ extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages); +extern unsigned long try_oom_notifier(void); extern bool out_of_memory(struct oom_control *oc); extern void exit_oom_victim(void); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 84081e7..2ff5db2 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1010,6 +1010,33 @@ int unregister_oom_notifier(struct notifier_block *nb) EXPORT_SYMBOL_GPL(unregister_oom_notifier); /** + * try_oom_notifier - Try to reclaim memory from OOM notifier list. + * + * Returns non-zero if notifier callbacks released something, zero otherwise. + */ +unsigned long try_oom_notifier(void) +{ + static DEFINE_MUTEX(oom_notifier_lock); + unsigned long freed = 0; + + /* + * Since OOM notifier callbacks must not depend on __GFP_DIRECT_RECLAIM + * && !__GFP_NORETRY memory allocation, waiting for mutex here is safe. + * If lockdep reports possible deadlock dependency, it will be a bug in + * OOM notifier callbacks. + * + * If SIGKILL is pending, it is likely that current thread was selected + * as an OOM victim. In that case, current thread should return as soon + * as possible using memory reserves. + */ + if (mutex_lock_killable(&oom_notifier_lock)) + return 0; + blocking_notifier_call_chain(&oom_notify_list, 0, &freed); + mutex_unlock(&oom_notifier_lock); + return freed; +} + +/** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control * @@ -1020,19 +1047,11 @@ int unregister_oom_notifier(struct notifier_block *nb) */ bool out_of_memory(struct oom_control *oc) { - unsigned long freed = 0; enum oom_constraint constraint = CONSTRAINT_NONE; if (oom_killer_disabled) return false; - if (!is_memcg_oom(oc)) { - blocking_notifier_call_chain(&oom_notify_list, 0, &freed); - if (freed > 0) - /* Got some memory back in the last second. */ - return true; - } - /* * If current has a pending SIGKILL or is exiting, then automatically * select it. The goal is to allow it to allocate so that it may diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..c72ef1e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3447,10 +3447,50 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) return page; } +static inline bool can_oomkill(gfp_t gfp_mask, unsigned int order, + const struct alloc_context *ac) +{ + /* Coredumps can quickly deplete all memory reserves */ + if (current->flags & PF_DUMPCORE) + return false; + /* The OOM killer will not help higher order allocs */ + if (order > PAGE_ALLOC_COSTLY_ORDER) + return false; + /* + * We have already exhausted all our reclaim opportunities without any + * success so it is time to admit defeat. We will skip the OOM killer + * because it is very likely that the caller has a more reasonable + * fallback than shooting a random task. + */ + if (gfp_mask & __GFP_RETRY_MAYFAIL) + return false; + /* The OOM killer does not needlessly kill tasks for lowmem */ + if (ac->high_zoneidx < ZONE_NORMAL) + return false; + if (pm_suspended_storage()) + return false; + /* + * XXX: GFP_NOFS allocations should rather fail than rely on + * other request to make a forward progress. + * We are in an unfortunate situation where out_of_memory cannot + * do much for this context but let's try it to at least get + * access to memory reserved if the current task is killed (see + * out_of_memory). Once filesystems are ready to handle allocation + * failures more gracefully we should just bail out here. + */ + + /* The OOM killer may not free memory on a specific node */ + if (gfp_mask & __GFP_THISNODE) + return false; + + return true; +} + static inline struct page * __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, const struct alloc_context *ac, unsigned long *did_some_progress) { + const bool oomkill = can_oomkill(gfp_mask, order, ac); struct oom_control oc = { .zonelist = ac->zonelist, .nodemask = ac->nodemask, @@ -3462,6 +3502,10 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) *did_some_progress = 0; + /* Try to reclaim via OOM notifier callback. */ + if (oomkill) + *did_some_progress = try_oom_notifier(); + /* * Acquire the oom lock. If that fails, somebody else is * making progress for us. @@ -3485,37 +3529,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) if (page) goto out; - /* Coredumps can quickly deplete all memory reserves */ - if (current->flags & PF_DUMPCORE) - goto out; - /* The OOM killer will not help higher order allocs */ - if (order > PAGE_ALLOC_COSTLY_ORDER) - goto out; - /* - * We have already exhausted all our reclaim opportunities without any - * success so it is time to admit defeat. We will skip the OOM killer - * because it is very likely that the caller has a more reasonable - * fallback than shooting a random task. - */ - if (gfp_mask & __GFP_RETRY_MAYFAIL) - goto out; - /* The OOM killer does not needlessly kill tasks for lowmem */ - if (ac->high_zoneidx < ZONE_NORMAL) - goto out; - if (pm_suspended_storage()) - goto out; - /* - * XXX: GFP_NOFS allocations should rather fail than rely on - * other request to make a forward progress. - * We are in an unfortunate situation where out_of_memory cannot - * do much for this context but let's try it to at least get - * access to memory reserved if the current task is killed (see - * out_of_memory). Once filesystems are ready to handle allocation - * failures more gracefully we should just bail out here. - */ - - /* The OOM killer may not free memory on a specific node */ - if (gfp_mask & __GFP_THISNODE) + if (!oomkill) goto out; /* Exhausted what can be done so it's blame time */