From patchwork Wed Oct 25 14:12:21 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10026777 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D24EF6032C for ; Wed, 25 Oct 2017 14:12:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE12328AF8 for ; Wed, 25 Oct 2017 14:12:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B206228B9D; Wed, 25 Oct 2017 14:12:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6282828AF8 for ; Wed, 25 Oct 2017 14:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751309AbdJYOMZ (ORCPT ); Wed, 25 Oct 2017 10:12:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:42765 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750823AbdJYOMZ (ORCPT ); Wed, 25 Oct 2017 10:12:25 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 6848BAB12; Wed, 25 Oct 2017 14:12:23 +0000 (UTC) Date: Wed, 25 Oct 2017 16:12:21 +0200 From: Michal Hocko To: Johannes Weiner Cc: Greg Thelen , Shakeel Butt , Alexander Viro , Vladimir Davydov , Andrew Morton , Linux MM , linux-fsdevel@vger.kernel.org, LKML Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg Message-ID: <20171025141221.xm4cqp2z6nunr6vy@dhcp22.suse.cz> References: <20171013152421.yf76n7jui3z5bbn4@dhcp22.suse.cz> <20171024160637.GB32340@cmpxchg.org> <20171024162213.n6jrpz3t5pldkgxy@dhcp22.suse.cz> <20171024172330.GA3973@cmpxchg.org> <20171024175558.uxqtxwhjgu6ceadk@dhcp22.suse.cz> <20171024185854.GA6154@cmpxchg.org> <20171024201522.3z2fjnfywgx2egqx@dhcp22.suse.cz> <20171025071522.xyw4lsvdv4xsbhbo@dhcp22.suse.cz> <20171025131151.GA8210@cmpxchg.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20171025131151.GA8210@cmpxchg.org> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed 25-10-17 09:11:51, Johannes Weiner wrote: > On Wed, Oct 25, 2017 at 09:15:22AM +0200, Michal Hocko wrote: [...] > > ... we shouldn't make it more loose though. > > Then we can end this discussion right now. I pointed out right from > the start that the only way to replace -ENOMEM with OOM killing in the > syscall is to force charges. If we don't, we either deadlock or still > return -ENOMEM occasionally. Nobody has refuted that this is the case. Yes this is true. I guess we are back to the non-failing allocations discussion... Currently we are too ENOMEM happy for memcg !PF paths which can lead to weird issues Greg has pointed out earlier. Going to opposite direction to basically never ENOMEM and rather pretend a success (which allows runaways for extreme setups with no oom eligible tasks) sounds like going from one extreme to another. This basically means that those charges will effectively GFP_NOFAIL. Too much to guarantee IMHO. > > > The current thread can loop in syscall exit until > > > usage is reconciled (either via reclaim or kill). This seems consistent > > > with pagefault oom handling and compatible with overcommit use case. > > > > But we do not really want to make the syscall exit path any more complex > > or more expensive than it is. The point is that we shouldn't be afraid > > about triggering the oom killer from the charge patch because we do have > > async OOM killer. This is very same with the standard allocator path. So > > why should be memcg any different? > > I have nothing against triggering the OOM killer from the allocation > path. I am dead-set against making the -ENOMEM return from syscalls > rare and unpredictable. Isn't that the case when we put memcg out of the picture already? More on that below. > They're a challenge as it is. The only sane options are to stick with > the status quo, One thing that really worries me about the current status quo is that the behavior depends on whether you run under memcg or not. The global policy is "almost never fail unless something horrible is going on". But we _do not_ guarantee that ENOMEM stays inside the kernel. So if we need to do something about that I would think we need an universal solution rather than something memcg specific. Sure global ENOMEMs are so rare that nobody will probably trigger those but that is just a wishful thinking... So how about we start with a BIG FAT WARNING for the failure case? Something resembling warn_alloc for the failure case. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5d9323028870..3ba62c73eee5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1547,9 +1547,14 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) * victim and then we have rely on mem_cgroup_oom_synchronize otherwise * we would fall back to the global oom killer in pagefault_out_of_memory */ - if (!memcg->oom_kill_disable && - mem_cgroup_out_of_memory(memcg, mask, order)) - return true; + if (!memcg->oom_kill_disable) { + if (mem_cgroup_out_of_memory(memcg, mask, order)) + return true; + + WARN(!current->memcg_may_oom, + "Memory cgroup charge failed because of no reclaimable memory! " + "This looks like a misconfiguration or a kernel bug."); + } if (!current->memcg_may_oom) return false;