From patchwork Thu Mar 18 11:06:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12147913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85787C433E0 for ; Thu, 18 Mar 2021 11:08:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D6A8664D9A for ; Thu, 18 Mar 2021 11:08:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6A8664D9A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 56C696B0070; Thu, 18 Mar 2021 07:08:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 542336B0071; Thu, 18 Mar 2021 07:08:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BC966B0072; Thu, 18 Mar 2021 07:08:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 1FCFE6B0070 for ; Thu, 18 Mar 2021 07:08:30 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DC2DD1DF9 for ; Thu, 18 Mar 2021 11:08:29 +0000 (UTC) X-FDA: 77932721538.19.469E353 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf03.hostedemail.com (Postfix) with ESMTP id 96301C0007D0 for ; Thu, 18 Mar 2021 11:08:27 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id a22-20020a17090aa516b02900c1215e9b33so4814384pjq.5 for ; Thu, 18 Mar 2021 04:08:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yZ33cKi5KuIfXO9xxvNPTIxQT4GFvVDZu4co6ScWjWQ=; b=QVjE2DHLlcMBALEI9tK7xOXy2Gj1UBMGDPIqtzdlB/+O4LfIXhTNhGAmBRCOx5rGSJ eWdenLdyvKVK8EVh28tMimvHkrN3GpbIBzDC+8Z0cz23AB8J+lml+/0wd6378CHZ8IF2 iZFGILrOKVaBMuBgeFXGALyE/nHrYD1haX4zi4wqy1wPN/vRBIq5irG5eLiwekqlYreC sezJVSR2oVzsYL0/JRY4MF//DPkpTyQ03+hGiTzKj2esF9wPLk1z1lHGyWQS1AV8Oco/ S8slZ+DchdNqXfw17VrlxcwqlpKGilQZqFQsiEXoTAG/Z2LMUbiu6EToCO3hWlQ0zQZk srwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yZ33cKi5KuIfXO9xxvNPTIxQT4GFvVDZu4co6ScWjWQ=; b=rCiP5Yaxt5X2XEjeIIqY7hpvaskZD8Uu1tnVAeWESJlYkYk2YLj78i27o1V6vgZvaH 0frxp0XaattZYfpkY8dodQOm3DW52sS8nDbQAAFBciQreaAZWrw3kSMRjpy6na4nFQWR yZT8cTBynD0Fn44FNpZeaUv0KDCiocKizBONqIxwCEq+vy98lrx2LyI0ouhFehS/qaee Zq9WpiY06UywmjG3HQTRdhm2v5kSAhAGQcIzpufR6Am3hYDI74k4fjgLPOsEOZBPoR9A 1bgI/qwzrW2QpIxNltZOlCQA59gCHdY8af9n8epn4NVupvJt0E3cNMyKSU9SMEnRUReR IZUA== X-Gm-Message-State: AOAM533JSTO/AB2rz+/TRwYBxDqhALXg0C9C0/FhSNbnER+HJQBFzSwX FWMNFlCTyK+37bkE/WJUTeqWJQ== X-Google-Smtp-Source: ABdhPJwgPH7ggihGVwRnBCXzHXsa34A0wexo+pHFMBlyVyGR1piyNlL1Rtq3bH8qNyDAwK/NZvw1Gg== X-Received: by 2002:a17:90a:55ca:: with SMTP id o10mr3643841pjm.173.1616065706522; Thu, 18 Mar 2021 04:08:26 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.231]) by smtp.gmail.com with ESMTPSA id e21sm1779509pgv.74.2021.03.18.04.08.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Mar 2021 04:08:26 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v4 0/5] Use obj_cgroup APIs to charge kmem pages Date: Thu, 18 Mar 2021 19:06:53 +0800 Message-Id: <20210318110658.60892-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Stat-Signature: 1jiaiq19j7d91jtyhnn7n7b7kefz46oa X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 96301C0007D0 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=mail-pj1-f42.google.com; client-ip=209.85.216.42 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616065707-221277 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since Roman series "The new cgroup slab memory controller" applied. All slab objects are charged with the new APIs of obj_cgroup. The new APIs introduce a struct obj_cgroup to charge slab objects. It prevents long-living objects from pinning the original memory cgroup in the memory. But there are still some corner objects (e.g. allocations larger than order-1 page on SLUB) which are not charged with the new APIs. Those objects (include the pages which are allocated from buddy allocator directly) are charged as kmem pages which still hold a reference to the memory cgroup. E.g. We know that the kernel stack is charged as kmem pages because the size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64 or arm64). If we create a thread (suppose the thread stack is charged to memory cgroup A) and then move it from memory cgroup A to memory cgroup B. Because the kernel stack of the thread hold a reference to the memory cgroup A. The thread can pin the memory cgroup A in the memory even if we remove the cgroup A. If we want to see this scenario by using the following script. We can see that the system has added 500 dying cgroups (This is not a real world issue, just a script to show that the large kmallocs are charged as kmem pages which can pin the memory cgroup in the memory). #!/bin/bash cat /proc/cgroups | grep memory cd /sys/fs/cgroup/memory echo 1 > memory.move_charge_at_immigrate for i in range{1..500} do mkdir kmem_test echo $$ > kmem_test/cgroup.procs sleep 3600 & echo $$ > cgroup.procs echo `cat kmem_test/cgroup.procs` > cgroup.procs rmdir kmem_test done cat /proc/cgroups | grep memory This patchset aims to make those kmem pages to drop the reference to memory cgroup by using the APIs of obj_cgroup. Finally, we can see that the number of the dying cgroups will not increase if we run the above test script. Changlogs in v4: 1. Do not change behavior of page_memcg() and page_memcg_rcu(). 2. Rework uncharge_page() and uncharge_batch(). 3. Add two patches (patch #2 and patch #3). Thanks to Johannes and Shakeel and Roman's review and suggestions. Changlogs in v3: 1. Drop "remote objcg charging APIs" patch. 2. Rename obj_cgroup_{un}charge_page to obj_cgroup_{un}charge_pages. 3. Make page_memcg/page_memcg_rcu safe for adding new memcg_data flags. 4. Reuse the ug infrastructure to uncharge the kmem pages. 5. Add a new patch to move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM. Thanks to Roman's review and suggestions. Changlogs in v2: 1. Fix some types in the commit log (Thanks Roman). 2. Do not introduce page_memcg_kmem helper (Thanks to Johannes and Shakeel). 3. Reduce the CC list to mm/memcg folks (Thanks to Johannes). 4. Introduce remote objcg charging APIs instead of convert "remote memcg charging APIs" to "remote objcg charging APIs". Muchun Song (5): mm: memcontrol: introduce obj_cgroup_{un}charge_pages mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c mm: memcontrol: change ug->dummy_page only if memcg changed mm: memcontrol: use obj_cgroup APIs to charge kmem pages mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM include/linux/memcontrol.h | 123 ++++++++++++++++++++++++++++++++--------- mm/memcontrol.c | 133 ++++++++++++++++++++++++++++++--------------- mm/page_alloc.c | 4 +- 3 files changed, 188 insertions(+), 72 deletions(-)