From patchwork Wed Jan 9 19:14:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 10754843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B0C7E6C2 for ; Wed, 9 Jan 2019 19:18:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A16C228CEF for ; Wed, 9 Jan 2019 19:18:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9420028FA6; Wed, 9 Jan 2019 19:18:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B510228CEF for ; Wed, 9 Jan 2019 19:18:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D5808E009D; Wed, 9 Jan 2019 14:18:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 985A88E0038; Wed, 9 Jan 2019 14:18:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8725B8E009D; Wed, 9 Jan 2019 14:18:17 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 4293F8E0038 for ; Wed, 9 Jan 2019 14:18:17 -0500 (EST) Received: by mail-pl1-f197.google.com with SMTP id m13so4713924pls.15 for ; Wed, 09 Jan 2019 11:18:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=EN+sxvfh+Jvfs8kApQQJpfLqOm7GMQaU4B1T4WVue8o=; b=g+AbmeSGGoAjkYVMojXHc963gk+dizS+MdAsLiV4q2/p04hc/3+jlUgnzCFQdfS1eR 6JIkm7M/bRaGdxt1qwUTrMNbWyRPCjwWKyR0mycQV0yOIuWVpAxuSavsVTUnV2rFfn16 Vmh1SJWBECLQUEfvs/aZqq78Wy121yPnuB2oDeWB9na7B+D9zn82nxKaRItobTZv9OUO spoxuVH3ue5tQMCLlUzFumX4YFedXivxCfEzfnt5hGqbHK2459C04FV0eiOTEhCs6nSh 7FsY7DZGOB7yMugv2VaMaNvDNcatdLTm82m4V8yObt5BPiacWGx0JP+3o5sIi2bvxjRz FhWQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Gm-Message-State: AJcUukcoTkUckIB6eTOja1jBZV1U8VFHCx0PM9rbgM79e98LspZte+kp 62fPlHd1XB0updJhGwjGRapDmLmcbw5CLyaaMkBiv1Y/e84BPeIo/8Rkw2aA9ZQg3Z8zhRcUnsp FaFOOQbfgNbRc9VI5K5+xSHLGLRyTcZcqX20o0gguWuai3DOa1ITDVNmiMDkjCXNwNg== X-Received: by 2002:a63:e247:: with SMTP id y7mr4145811pgj.84.1547061496878; Wed, 09 Jan 2019 11:18:16 -0800 (PST) X-Google-Smtp-Source: ALg8bN63BzAdbRx0P3fxrpx6TLy7xgnenvRSIOyUVOYdW8FU/MRk4Gt6kq29xwx6uUkvQGcNJx62 X-Received: by 2002:a63:e247:: with SMTP id y7mr4145756pgj.84.1547061495824; Wed, 09 Jan 2019 11:18:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547061495; cv=none; d=google.com; s=arc-20160816; b=Z5dmv/m1TjcfA2ObpqQ3oHGNMDjmZ1Oc32jd8UAlUkwcc2gzUCzcbO5LI0b011gqMU CcVv203/mKdDGvGxfEW1Wu2YeVd7bj+4tWn5lWMtccBBE05SqHN3RPNACYgzTcnzU7HF jhGvKPZ3/Czaj5pQKB0InkyNoKOxpAwbwIldNK6SrAd3rOwiwi6MmDj1cG+MzSc3fWEY og4dSrfMe6gaiOaYj8+ctn09DgmlieRx2DK6OpS4E5K7y5tQHkU4gexX79lgDnOsJwJq Lfe/rWBrnh3ExYNp1QrH02SPOMmTcSGGfnDb0KUNp2j1tZpnpSgesL2WX9gtKWd2xVX2 ZpnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=EN+sxvfh+Jvfs8kApQQJpfLqOm7GMQaU4B1T4WVue8o=; b=GBrjoc9uCb6vG6ZwC2SUMY8EgcFdIn7FtFPmpZXLcvhS2SmKkrKg5rkQnpDkZ9UKKh P289UCRGCiSDTlbLpJ/+oY7JJ7Zx4xat3K1JQepLF5P+oljgM9nJLBsCvtwtTPKix7qx gdPQTgzVqXTRL6vCz8qdccMjyBdhKM2+Fnt+E0Ctq2Rw+iBTH0yVF/op2JE2VYY1ar0u +04iSxSV2G1qN/xq8wzpYXNanXMZSTyuwZTU9GVlHk3k6tHBa+d5LDOAZYjvsF3hVFeZ hD++ADdxrhc0o4Os6E+LNL6W9Gzb9Ytd9NG8vW8QoNXjyUhozvXV7+CCeuq6Cb+VCUqF QaWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out4437.biz.mail.alibaba.com (out4437.biz.mail.alibaba.com. [47.88.44.37]) by mx.google.com with ESMTPS id q16si22876825pgh.185.2019.01.09.11.18.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jan 2019 11:18:15 -0800 (PST) Received-SPF: pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) client-ip=47.88.44.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of yang.shi@linux.alibaba.com designates 47.88.44.37 as permitted sender) smtp.mailfrom=yang.shi@linux.alibaba.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0THtvvDg_1547061291; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0THtvvDg_1547061291) by smtp.aliyun-inc.com(127.0.0.1); Thu, 10 Jan 2019 03:14:59 +0800 From: Yang Shi To: mhocko@suse.com, hannes@cmpxchg.org, shakeelb@google.com, akpm@linux-foundation.org Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v3 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface Date: Thu, 10 Jan 2019 03:14:43 +0800 Message-Id: <1547061285-100329-4-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1547061285-100329-1-git-send-email-yang.shi@linux.alibaba.com> References: <1547061285-100329-1-git-send-email-yang.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP We have some usecases which create and remove memcgs very frequently, and the tasks in the memcg may just access the files which are unlikely accessed by anyone else. So, we prefer force_empty the memcg before rmdir'ing it to reclaim the page cache so that they don't get accumulated to incur unnecessary memory pressure. Since the memory pressure may incur direct reclaim to harm some latency sensitive applications. Force empty would help out such usecase, however force empty reclaims memory synchronously when writing to memory.force_empty. It may take some time to return and the afterwards operations are blocked by it. Although this can be done in background, some usecases may need create new memcg with the same name right after the old one is deleted. So, the creation might get blocked by the before reclaim/remove operation. Delaying memory reclaim in cgroup offline for such usecase sounds reasonable. Introduced a new interface, called wipe_on_offline for both default and legacy hierarchy, which does memory reclaim in css offline kworker. Writing to 1 would enable it, writing 0 would disable it. Suggested-by: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 3 +++ mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 54 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83ae11c..2f1258a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -311,6 +311,9 @@ struct mem_cgroup { struct list_head event_list; spinlock_t event_list_lock; + /* Reclaim as much as possible memory in offline kworker */ + bool wipe_on_offline; + struct mem_cgroup_per_node *nodeinfo[0]; /* WARNING: nodeinfo must be the last member here */ }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index eaa3970..ff50810 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2918,6 +2918,35 @@ static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, return mem_cgroup_force_empty(memcg, true) ?: nbytes; } +static int wipe_on_offline_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + + seq_printf(m, "%lu\n", (unsigned long)memcg->wipe_on_offline); + + return 0; +} + +static int wipe_on_offline_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + int ret = 0; + + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + if (mem_cgroup_is_root(memcg)) + return -EINVAL; + + if (val == 0) + memcg->wipe_on_offline = false; + else if (val == 1) + memcg->wipe_on_offline = true; + else + ret = -EINVAL; + + return ret; +} + static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -4283,6 +4312,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, .write = mem_cgroup_reset, .read_u64 = mem_cgroup_read_u64, }, + { + .name = "wipe_on_offline", + .seq_show = wipe_on_offline_show, + .write_u64 = wipe_on_offline_write, + }, { }, /* terminate */ }; @@ -4569,11 +4603,20 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); + /* + * Reclaim as much as possible memory when offlining. + * + * Do it after min/low is reset otherwise some memory might + * be protected by min/low. + */ + if (memcg->wipe_on_offline) + mem_cgroup_force_empty(memcg, false); + else + drain_all_stock(memcg); + memcg_offline_kmem(memcg); wb_memcg_offline(memcg); - drain_all_stock(memcg); - mem_cgroup_id_put(memcg); } @@ -5694,6 +5737,12 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, .seq_show = memory_oom_group_show, .write = memory_oom_group_write, }, + { + .name = "wipe_on_offline", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = wipe_on_offline_show, + .write_u64 = wipe_on_offline_write, + }, { } /* terminate */ };