From patchwork Tue May 29 07:17:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10434491 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CB060602CC for ; Tue, 29 May 2018 07:17:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B293A2811E for ; Tue, 29 May 2018 07:17:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A6F3E2818E; Tue, 29 May 2018 07:17:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 259982811E for ; Tue, 29 May 2018 07:17:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED0C96B0007; Tue, 29 May 2018 03:17:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E7DE36B0008; Tue, 29 May 2018 03:17:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D46126B000A; Tue, 29 May 2018 03:17:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 78EAC6B0007 for ; Tue, 29 May 2018 03:17:40 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id a7-v6so12213862wrq.13 for ; Tue, 29 May 2018 00:17:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :in-reply-to:user-agent; bh=xMZ94/4aeq0BDnCCjzI27IVudVnU1s5OvRt5vjJdRgU=; b=C67/yTxZ9T0KoYtvkHVkB2u7+0cNMljn5olaG2MOW/vCALlffPwKiIB0wUVvljAgqb viMTQ8LQaK5w9h21dFKyQuHxoE+u9nBfhQvtrU01okN9jX2NFQMmdXXWQsxQ7Q19zlMB xUy7W827WDOm6XQw4cTgKwxOmODrVWt8C2o0wfIm1VRwcnKwcIN8WU4adwdWchfidXWQ OUmiNSa+vOiuqlGtNBs0T9zMYU3UdKRxi1RT2RoGvgJvROUK4RAgXLw/UEae2lzeprgi G6E4c3PnIzPuPDAbh1MkAJbDPF7GK0I3WMgJDEB7U56pwOvWQWEURK1RfBqi6drVJ4xM +Ruw== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: ALKqPwc7FnfC8vLOzEUSNfUw7OWAvCc2T7YVzbsekeUYavaIMKV8OU9l Dxbn1CmScmXCIPytsH38px3jAvoY8mQ0nXcktBFtr4mskze1f/Kn94uMbQnIktkgoHUgIHynWNU IJuGSG5+bO4j1bKsuGPOUeBq2xKF0wG03JjPG/pAvN1MZdMdSh/yeZYeJPjrcwzA= X-Received: by 2002:aa7:c588:: with SMTP id g8-v6mr11652114edq.200.1527578260041; Tue, 29 May 2018 00:17:40 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIlLML07ug7Gfv/30jHqqY5a11nOQA5wpKZhcHbjPgK4gECtvsxEOyYIZPBoxwcbkcj7i++ X-Received: by 2002:aa7:c588:: with SMTP id g8-v6mr11652068edq.200.1527578259243; Tue, 29 May 2018 00:17:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527578259; cv=none; d=google.com; s=arc-20160816; b=hTBi6P24vkaPGs7upDwEXk7MD+VZtk5iauJuLZ7zlYQSEO8UpwkvsSPwjoJtqA1615 05sdweD9jTlxQ7Pg1xkOEONc/9yabLkFqMeXMN7tMMcf1cttfId1W2FU3HxwFf+lHPXK vQOKgoTOwcjRGp1q7f9rGW8bfyiIfS34xKaR2LqVdcQFEGjZbzfCWPlEBWPKT+xWgUv2 /Vmq3Foe4wrJEdrrYG+J0rtlhQvE9IL/HZhak+Qb5PEQO7ZsIrfFp4gGsA8awR5UrbI2 lcrLP1kfeDYW+x3LK7acCT08uA2CQolKjrxCqIOht7J2aRrmB7QnqTIRfnMFebZcij5n 54dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=xMZ94/4aeq0BDnCCjzI27IVudVnU1s5OvRt5vjJdRgU=; b=SRQz/XpkzIKXdq9jWS5DpVJ+Y7KYY3z7ae0Lzd6vdpozk0et0Pg3SuT2Taes6UT48f TVCW+886MMbGojhWU3CMgxAHtXur9MqtoOSsEwVEBgUG2/UJOUm62TKPLl7mjx3gDXDr HAgrrSI2D8fNAFaorZ7yLbLRrIVhUJ2tKV6u/PvmAI9WBRseZtzDQ4L2N5a+NQN6WcMc BLEiuOiR3P6WRaFfUYkMgOBRjnU95Abbuv5aIFAruUCLzWIpFwNlTlGS/e3lfGg3kc2q CJ+G7Doo8i6CYEpK/ScFu8swiBimky2fD/VS+HqNpzhBqAGHprBQok0k5RC7WgQmA++3 KOVw== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id h10-v6si11391150edr.245.2018.05.29.00.17.39 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 29 May 2018 00:17:39 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 509B9AF67; Tue, 29 May 2018 07:17:37 +0000 (UTC) Date: Tue, 29 May 2018 09:17:36 +0200 From: Michal Hocko To: guro@fb.com, Tetsuo Handa Cc: rientjes@google.com, hannes@cmpxchg.org, vdavydov.dev@gmail.com, tj@kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, torvalds@linux-foundation.org Subject: Re: [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held. Message-ID: <20180529071736.GI27180@dhcp22.suse.cz> References: <20180515091655.GD12670@dhcp22.suse.cz> <201805181914.IFF18202.FOJOVSOtLFMFHQ@I-love.SAKURA.ne.jp> <20180518122045.GG21711@dhcp22.suse.cz> <201805210056.IEC51073.VSFFHFOOQtJMOL@I-love.SAKURA.ne.jp> <20180522061850.GB20020@dhcp22.suse.cz> <201805231924.EED86916.FSQJMtHOLVOFOF@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <201805231924.EED86916.FSQJMtHOLVOFOF@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.9.5 (2018-04-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On Wed 23-05-18 19:24:48, Tetsuo Handa wrote: > Michal Hocko wrote: > > > I don't understand why you are talking about PF_WQ_WORKER case. > > > > Because that seems to be the reason to have it there as per your > > comment. > > OK. Then, I will fold below change into my patch. > > if (did_some_progress) { > no_progress_loops = 0; > + /* > -+ * This schedule_timeout_*() serves as a guaranteed sleep for > -+ * PF_WQ_WORKER threads when __zone_watermark_ok() == false. > ++ * Try to give the OOM killer/reaper/victims some time for > ++ * releasing memory. > + */ > + if (!tsk_is_oom_victim(current)) > + schedule_timeout_uninterruptible(1); > > But Roman, my patch conflicts with your "mm, oom: cgroup-aware OOM killer" patch > in linux-next. And it seems to me that your patch contains a bug which leads to > premature memory allocation failure explained below. > > @@ -1029,6 +1050,7 @@ bool out_of_memory(struct oom_control *oc) > { > unsigned long freed = 0; > enum oom_constraint constraint = CONSTRAINT_NONE; > + bool delay = false; /* if set, delay next allocation attempt */ > > if (oom_killer_disabled) > return false; > @@ -1073,27 +1095,39 @@ bool out_of_memory(struct oom_control *oc) > current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) && > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { > get_task_struct(current); > - oc->chosen = current; > + oc->chosen_task = current; > oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)"); > return true; > } > > + if (mem_cgroup_select_oom_victim(oc)) { > > /* mem_cgroup_select_oom_victim() returns true if select_victim_memcg() made > oc->chosen_memcg != NULL. > select_victim_memcg() makes oc->chosen_memcg = INFLIGHT_VICTIM if there is > inflight memcg. But oc->chosen_task remains NULL because it did not call > oom_evaluate_task(), didn't it? (And if it called oom_evaluate_task(), > put_task_struct() is missing here.) */ > > + if (oom_kill_memcg_victim(oc)) { > > /* oom_kill_memcg_victim() returns true if oc->chosen_memcg == INFLIGHT_VICTIM. */ > > + delay = true; > + goto out; > + } > + } > + > select_bad_process(oc); > /* Found nothing?!?! Either we hang forever, or we panic. */ > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > + if (!oc->chosen_task && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > dump_header(oc, NULL); > panic("Out of memory and no killable processes...\n"); > } > - if (oc->chosen && oc->chosen != (void *)-1UL) { > + if (oc->chosen_task && oc->chosen_task != (void *)-1UL) { > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > "Memory cgroup out of memory"); > - /* > - * Give the killed process a good chance to exit before trying > - * to allocate memory again. > - */ > - schedule_timeout_killable(1); > + delay = true; > } > - return !!oc->chosen; > + > +out: > + /* > + * Give the killed process a good chance to exit before trying > + * to allocate memory again. > + */ > + if (delay) > + schedule_timeout_killable(1); > + > > /* out_of_memory() returns false because oc->chosen_task remains NULL. */ > > + return !!oc->chosen_task; > } > What about this fix Roman? diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 565e7da55318..fc06af041447 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1174,7 +1174,7 @@ bool out_of_memory(struct oom_control *oc) if (delay) schedule_timeout_killable(1); - return !!oc->chosen_task; + return !!(oc->chosen_task | oc->chosen_memcg); } /*