From patchwork Thu May 27 09:33:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12283851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C06DAC47089 for ; Thu, 27 May 2021 09:33:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 66ECF613DE for ; Thu, 27 May 2021 09:33:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66ECF613DE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 05DA46B007B; Thu, 27 May 2021 05:33:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00B2D6B007D; Thu, 27 May 2021 05:33:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7AB86B007E; Thu, 27 May 2021 05:33:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id 9C8486B007B for ; Thu, 27 May 2021 05:33:57 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3D11EA8D9 for ; Thu, 27 May 2021 09:33:57 +0000 (UTC) X-FDA: 78186499314.12.72AF601 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf23.hostedemail.com (Postfix) with ESMTP id 4FB97A0001CC for ; Thu, 27 May 2021 09:33:50 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id v13so2047553ple.9 for ; Thu, 27 May 2021 02:33:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lgg5gwWDGBSh9Kzn8bagl8A5DgS3QNqOwkw9xHiFtB0=; b=KK8A2KldFyGj7BfVccS4eWdqh+4uXp9ErZMfa/IilY5/35TTsSZj58jDKtsnDO+OQY R2apwBUPieHZZ3FLbsgp8spkUK/Id+tq9RUeqWldk26n8OXI17uVOmb2zo851cxfXcrQ Pz1at5nXqQIqzp/0h7B1n0AA0GvNVz9/AnezlZxfY6Vb3UZ38ljCgyRZvi27gnbMvVQq fJoT6gG2U6eJdbeOhjNz1ZVZmH2FkpVTGk53TBSVZk/s/ydrO7qFI/gLIBuGM3UVc5/x mLVWrkfOE6F54WmVdJVFo0cVlJti7uylPJh+sH0TL4ozVA9Lj15eRcQPTQS5R4R7IPHe sq0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lgg5gwWDGBSh9Kzn8bagl8A5DgS3QNqOwkw9xHiFtB0=; b=TJLMacZm/OHzOB88hijixbRJZZWHOX4mC6LQVwtDDqgRXayR7xRUGP45mqawGD7ELg AuCiNnrsw4tJnFLVFSJecJwkzTwraeg3OsW1sEmVT0hDbbffObAvtHxm7kc7WtikNOL/ aRj4a3I3SQhVf/wjINrf5jpdS32FR45cdaAHO7xZU10PHlTdHBRX4rvA/TPRHZEMyG05 4hsyfbHSu/ClaaghwVltDAIPK55Sdca4MGVtK8QIPl5tpTdkJkNfnTiHPjF2Xx0qd+OV UOsWiFvaYHBJ7789pJaBBObVczx096dyns5A3XK0zOq+gkuhdCFyEqrHpYO9RdHAzCfX hVIw== X-Gm-Message-State: AOAM533WPOZGrGK4ICo83tLLCtZ5qHMnM/on4UE+Hiqeiu8qsKPIDdRJ 5zmQO44aN6TxdeE6QaFNnbrrrTv+l/iaaKiP X-Google-Smtp-Source: ABdhPJwMUurKr8o2MbZVHOJj3WUYVWlBvuG3TMB8Aqlx/T1pf4M3XVn1LeEVEEZJaQdr/yOszGCgPw== X-Received: by 2002:a17:902:c784:b029:ef:b14e:2b0b with SMTP id w4-20020a170902c784b02900efb14e2b0bmr2477441pla.64.1622108035850; Thu, 27 May 2021 02:33:55 -0700 (PDT) Received: from localhost.bytedance.net ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id a9sm1418917pfl.57.2021.05.27.02.33.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 May 2021 02:33:55 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [RFC PATCH v4 01/12] mm: memcontrol: prepare objcg API for non-kmem usage Date: Thu, 27 May 2021 17:33:25 +0800 Message-Id: <20210527093336.14895-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210527093336.14895-1-songmuchun@bytedance.com> References: <20210527093336.14895-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=KK8A2Kld; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf23.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4FB97A0001CC X-Stat-Signature: noa8om5buadgize9sf5wrpmtwu4b6b53 X-HE-Tag: 1622108030-22513 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pagecache pages are charged at the allocation time and holding a reference to the original memory cgroup until being reclaimed. Depending on the memory pressure, specific patterns of the page sharing between different cgroups and the cgroup creation and destruction rates, a large number of dying memory cgroups can be pinned by pagecache pages. It makes the page reclaim less efficient and wastes memory. We can convert LRU pages and most other raw memcg pins to the objcg direction to fix this problem, and then the page->memcg will always point to an object cgroup pointer. Therefore, the infrastructure of objcg no longer only serves CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages can reuse it to charge pages. We know that the LRU pages are not accounted at the root level. But the page->memcg_data points to the root_mem_cgroup. So the page->memcg_data of the LRU pages always points to a valid pointer. But the root_mem_cgroup dose not have an object cgroup. If we use obj_cgroup APIs to charge the LRU pages, we should set the page->memcg_data to a root object cgroup. So we also allocate an object cgroup for the root_mem_cgroup. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 4 ++- mm/memcontrol.c | 66 +++++++++++++++++++++++++++++----------------- 2 files changed, 45 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3cc18c2176e7..0159e1191a86 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -223,7 +223,9 @@ struct memcg_cgwb_frn { struct obj_cgroup { struct percpu_ref refcnt; struct mem_cgroup *memcg; +#ifdef CONFIG_MEMCG_KMEM atomic_t nr_charged_bytes; +#endif union { struct list_head list; struct rcu_head rcu; @@ -321,9 +323,9 @@ struct mem_cgroup { #ifdef CONFIG_MEMCG_KMEM int kmemcg_id; enum memcg_kmem_state kmem_state; +#endif struct obj_cgroup __rcu *objcg; struct list_head objcg_list; /* list of inherited objcgs */ -#endif MEMCG_PADDING(_pad2_); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 70a7faa733b3..66f6ad1cc8e4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -252,18 +252,16 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr) return &container_of(vmpr, struct mem_cgroup, vmpressure)->css; } -#ifdef CONFIG_MEMCG_KMEM extern spinlock_t css_set_lock; +#ifdef CONFIG_MEMCG_KMEM static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, unsigned int nr_pages); -static void obj_cgroup_release(struct percpu_ref *ref) +static void obj_cgroup_release_kmem(struct obj_cgroup *objcg) { - struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); unsigned int nr_bytes; unsigned int nr_pages; - unsigned long flags; /* * At this point all allocated objects are freed, and @@ -277,9 +275,9 @@ static void obj_cgroup_release(struct percpu_ref *ref) * 3) CPU1: a process from another memcg is allocating something, * the stock if flushed, * objcg->nr_charged_bytes = PAGE_SIZE - 92 - * 5) CPU0: we do release this object, + * 4) CPU0: we do release this object, * 92 bytes are added to stock->nr_bytes - * 6) CPU0: stock is flushed, + * 5) CPU0: stock is flushed, * 92 bytes are added to objcg->nr_charged_bytes * * In the result, nr_charged_bytes == PAGE_SIZE. @@ -291,6 +289,19 @@ static void obj_cgroup_release(struct percpu_ref *ref) if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); +} +#else +static inline void obj_cgroup_release_kmem(struct obj_cgroup *objcg) +{ +} +#endif + +static void obj_cgroup_release(struct percpu_ref *ref) +{ + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); + unsigned long flags; + + obj_cgroup_release_kmem(objcg); spin_lock_irqsave(&css_set_lock, flags); list_del(&objcg->list); @@ -319,10 +330,14 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent; + + parent = parent_mem_cgroup(memcg); + if (!parent) + parent = root_mem_cgroup; objcg = rcu_replace_pointer(memcg->objcg, NULL, true); @@ -341,6 +356,7 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg, percpu_ref_kill(&objcg->refcnt); } +#ifdef CONFIG_MEMCG_KMEM /* * This will be used as a shrinker list's index. * The main reason for not using cgroup id for this: @@ -3623,7 +3639,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; int memcg_id; if (cgroup_memory_nokmem) @@ -3636,14 +3651,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) if (memcg_id < 0) return memcg_id; - objcg = obj_cgroup_alloc(); - if (!objcg) { - memcg_free_cache_id(memcg_id); - return -ENOMEM; - } - objcg->memcg = memcg; - rcu_assign_pointer(memcg->objcg, objcg); - static_branch_enable(&memcg_kmem_enabled_key); memcg->kmemcg_id = memcg_id; @@ -3667,8 +3674,6 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) if (!parent) parent = root_mem_cgroup; - memcg_reparent_objcgs(memcg, parent); - kmemcg_id = memcg->kmemcg_id; BUG_ON(kmemcg_id < 0); @@ -5212,8 +5217,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->socket_pressure = jiffies; #ifdef CONFIG_MEMCG_KMEM memcg->kmemcg_id = -1; - INIT_LIST_HEAD(&memcg->objcg_list); #endif + INIT_LIST_HEAD(&memcg->objcg_list); #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) @@ -5285,21 +5290,33 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct obj_cgroup *objcg; /* * A memcg must be visible for expand_shrinker_info() * by the time the maps are allocated. So, we allocate maps * here, when for_each_mem_cgroup() can't skip it. */ - if (alloc_shrinker_info(memcg)) { - mem_cgroup_id_remove(memcg); - return -ENOMEM; - } + if (alloc_shrinker_info(memcg)) + goto remove_id; + + objcg = obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg = memcg; + rcu_assign_pointer(memcg->objcg, objcg); /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); css_get(css); return 0; + +free_shrinker: + free_shrinker_info(memcg); +remove_id: + mem_cgroup_id_remove(memcg); + return -ENOMEM; } static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) @@ -5323,6 +5340,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) page_counter_set_low(&memcg->memory, 0); memcg_offline_kmem(memcg); + memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg);