From patchwork Thu Nov 11 23:42:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12615881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6A7EC433FE for ; Thu, 11 Nov 2021 23:42:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2C6E36108B for ; Thu, 11 Nov 2021 23:42:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2C6E36108B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1CBF46B007D; Thu, 11 Nov 2021 18:42:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1501F6B0081; Thu, 11 Nov 2021 18:42:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F32566B0082; Thu, 11 Nov 2021 18:42:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id E09C86B007D for ; Thu, 11 Nov 2021 18:42:20 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9F93418544810 for ; Thu, 11 Nov 2021 23:42:20 +0000 (UTC) X-FDA: 78798275640.13.FB52AB3 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf19.hostedemail.com (Postfix) with ESMTP id 01CADB0000BC for ; Thu, 11 Nov 2021 23:42:09 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id k63-20020a628442000000b004812ea67c34so4712148pfd.2 for ; Thu, 11 Nov 2021 15:42:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=uSYMrm67Vam+Dj2xrSbIEx2XERRWcvK22Y5/aS6BHtY=; b=CiDhBgQvahrK+6hVk+HcAOFmXy97c2vf/1ydDDBH1xL0pfjijygM31NRMWm8S5whE6 VwFoIHYU7JCsu11Ie0nFUvS1kTau69fdABOM/7qmi9XqIpPvM4gnyKU03S0rxudHIdd4 xydVAMFmUPhqpxIziGaf67c/sT3dJ8w3NLXFbMF4+RCnD6/f/RxSiqpFh7254Gktxk1h r7WL7PoRl1aVj1lmIY5o/3Xb5FU/kFA5qsSuO+/VYs3A/t8jN9I0cEL6QIgzUeGslgbT +XNoI/2B0fbn4NtjwWsasZUm+4Kz3h1pnnI8dfhKkyAlZdQ+JxFPA6b47EQKFllu+LFD 8F0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=uSYMrm67Vam+Dj2xrSbIEx2XERRWcvK22Y5/aS6BHtY=; b=GycPnhdKY4BNIC8874TzVZjgdohQhRRV8sxMCqjQuGxhW6OAdq/R6TEqLL9HjA2KQX PIr8e1mdFPlYxAm1xeW5yVS5ahNEDIosqIMO4Xxlnw6x8a0e+KkWy+cjaqNPuNERRM9A TKD4v1ASfdYNSVKrEeFhuBd1SSmG+9+7Q9GmdwQ3xgxRoTptePbMwf6JQETOQGszlJjf MEaiAuLGTrjqsuukPYnsMoqLOEX2aseUir4asZr5vXFPsajLkzXUUJd6Tmtl63RWeI4x AJwl/FiQZh9/f4mOrV+Z6AH105HgpZiXdQVEnCPimkP4frQj/mtcmqQyXgPb00ND+bvQ CHxg== X-Gm-Message-State: AOAM532OkctkmW83nzp9maAMrjylIvx6NTMsDbY5Yt+umoFqY0mhc+xK AsY/SHEzC6CUtQquStGWRQxD5NkRx6WOIyDCvQ== X-Google-Smtp-Source: ABdhPJyp/iCYvtwqeJScKbPzvLF/N7vXMcYDBxGm3GttEQK9QMMQ4wjrI/AFO1Dp+y83jLMeVYKkH2PYDbUx9V6kDw== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:672d:70d0:3f83:676d]) (user=almasrymina job=sendgmr) by 2002:a63:7c41:: with SMTP id l1mr6912741pgn.372.1636674138045; Thu, 11 Nov 2021 15:42:18 -0800 (PST) Date: Thu, 11 Nov 2021 15:42:01 -0800 In-Reply-To: <20211111234203.1824138-1-almasrymina@google.com> Message-Id: <20211111234203.1824138-3-almasrymina@google.com> Mime-Version: 1.0 References: <20211111234203.1824138-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc1.387.gb447b232ab-goog Subject: [PATCH v3 2/4] mm/oom: handle remote ooms From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CiDhBgQv; spf=pass (imf19.hostedemail.com: domain of 3WqqNYQsKCPcZklZrqxlhmZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--almasrymina.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3WqqNYQsKCPcZklZrqxlhmZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 01CADB0000BC X-Stat-Signature: hkbane4kza5pbdctb37h6n97i15nu61t X-HE-Tag: 1636674129-969401 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On remote ooms (OOMs due to remote charging), the oom-killer will attempt to find a task to kill in the memcg under oom, if the oom-killer is unable to find one, the oom-killer should simply return ENOMEM to the allocating process. If we're in pagefault path and we're unable to return ENOMEM to the allocating process, we instead kill the allocating process. Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins CC: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- Changes in v3: - Fixed build failures/warnings Reported-by: kernel test robot Changes in v2: - Moved the remote oom handling as Roman requested. - Used mem_cgroup_from_task(current) instead of grabbing the memcg from current->mm --- include/linux/memcontrol.h | 16 ++++++++++++++++ mm/memcontrol.c | 29 +++++++++++++++++++++++++++++ mm/oom_kill.c | 22 ++++++++++++++++++++++ 3 files changed, 67 insertions(+) -- 2.34.0.rc1.387.gb447b232ab-goog diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8583d37c05d9b..b7a045ace7b2c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -944,6 +944,7 @@ struct mem_cgroup *mem_cgroup_get_from_path(const char *path); * it. */ int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len); +bool is_remote_oom(struct mem_cgroup *memcg_under_oom); void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages); @@ -981,6 +982,11 @@ static inline void mem_cgroup_exit_user_fault(void) current->in_user_fault = 0; } +static inline bool is_in_user_fault(void) +{ + return current->in_user_fault; +} + static inline bool task_in_memcg_oom(struct task_struct *p) { return p->memcg_in_oom; @@ -1281,6 +1287,11 @@ static inline int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, return 0; } +static inline bool is_remote_oom(struct mem_cgroup *memcg_under_oom) +{ + return false; +} + static inline int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { @@ -1472,6 +1483,11 @@ static inline void mem_cgroup_exit_user_fault(void) { } +static inline bool is_in_user_fault(void) +{ + return false; +} + static inline bool task_in_memcg_oom(struct task_struct *p) { return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b3d8f52a63d17..8019c396bfdd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2664,6 +2664,35 @@ int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len) return ret < 0 ? ret : 0; } +/* + * Returns true if current's mm is a descendant of the memcg_under_oom (or + * equal to it). False otherwise. This is used by the oom-killer to detect + * ooms due to remote charging. + */ +bool is_remote_oom(struct mem_cgroup *memcg_under_oom) +{ + struct mem_cgroup *current_memcg; + bool is_remote_oom; + + if (!memcg_under_oom) + return false; + + rcu_read_lock(); + current_memcg = mem_cgroup_from_task(current); + if (current_memcg && !css_tryget_online(¤t_memcg->css)) + current_memcg = NULL; + rcu_read_unlock(); + + if (!current_memcg) + return false; + + is_remote_oom = + !mem_cgroup_is_descendant(current_memcg, memcg_under_oom); + css_put(¤t_memcg->css); + + return is_remote_oom; +} + /* * Set or clear (if @memcg is NULL) charge association from file system to * memcg. If @memcg != NULL, then a css reference must be held by the caller to diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0a7e16b16b8c3..499924efab370 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1108,6 +1108,28 @@ bool out_of_memory(struct oom_control *oc) select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { + if (is_remote_oom(oc->memcg)) { + /* + * For remote ooms in userfaults, we have no choice but + * to kill the allocating process. + */ + if (is_in_user_fault() && + !oom_unkillable_task(current)) { + get_task_struct(current); + oc->chosen = current; + oom_kill_process( + oc, + "Out of memory (Killing remote allocating task)"); + return true; + } + + /* + * For remote ooms in non-userfaults, simply return + * ENOMEM to the caller. + */ + return false; + } + dump_header(oc, NULL); pr_warn("Out of memory and no killable processes...\n"); /*