From patchwork Wed Mar 24 19:06:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12162055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FE24C433C1 for ; Wed, 24 Mar 2021 19:06:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A62D1619B1 for ; Wed, 24 Mar 2021 19:06:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A62D1619B1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8DAAC6B02EE; Wed, 24 Mar 2021 15:06:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88A7A6B02EF; Wed, 24 Mar 2021 15:06:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 704218D0017; Wed, 24 Mar 2021 15:06:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 520416B02EE for ; Wed, 24 Mar 2021 15:06:38 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 136CE585C for ; Wed, 24 Mar 2021 19:06:38 +0000 (UTC) X-FDA: 77955699276.04.192206A Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf15.hostedemail.com (Postfix) with ESMTP id 38A80A0003A1 for ; Wed, 24 Mar 2021 19:06:36 +0000 (UTC) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.43/8.16.0.43) with SMTP id 12OJ1rnL029503 for ; Wed, 24 Mar 2021 12:06:36 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding : content-type; s=facebook; bh=hZDBTvWEE01rM4Olmsjdj7mKlECbbT1uO9DrFO8s5zU=; b=jb6MOg648M9c1gAOlL8XNFA5X8dBApLRffgiFw/Je0pfWpgaljHbOHI092mmzt5EOU1G bah5GgnCvUUDgClckwiCJ5oRyHc+FrYC/OmyyOWIwcXkyVqjI16UZTMsIEk5TkzX67Sk uyEeUUHi/i+5U4ja0LYj/Pixm5LLUwACqQk= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net with ESMTP id 37g173kq15-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 24 Mar 2021 12:06:36 -0700 Received: from intmgw003.48.prn1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 24 Mar 2021 12:06:35 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 8022657ACF28; Wed, 24 Mar 2021 12:06:33 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH rfc 0/4] percpu: partial chunk depopulation Date: Wed, 24 Mar 2021 12:06:22 -0700 Message-ID: <20210324190626.564297-1-guro@fb.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-03-24_13:2021-03-24,2021-03-24 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 priorityscore=1501 mlxscore=0 adultscore=0 mlxlogscore=945 clxscore=1015 phishscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103240137 X-FB-Internal: deliver X-Stat-Signature: uddt3qxepr4d5r5qe1iuh839cmpyort3 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 38A80A0003A1 Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=mx0a-00082601.pphosted.com; client-ip=67.231.153.30 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616612796-316064 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In our production experience the percpu memory allocator is sometimes struggling with returning the memory to the system. A typical example is a creation of several thousands memory cgroups (each has several chunks of the percpu data used for vmstats, vmevents, ref counters etc). Deletion and complete releasing of these cgroups doesn't always lead to a shrinkage of the percpu memory. The underlying problem is the fragmentation: to release an underlying chunk all percpu allocations should be released first. The percpu allocator tends to top up chunks to improve the utilization. It means new small-ish allocations (e.g. percpu ref counters) are placed onto almost filled old-ish chunks, effectively pinning them in memory. This patchset pretends to solve this problem by implementing a partial depopulation of percpu chunks: chunks with many empty pages are being asynchronously depopulated and the pages are returned to the system. To illustrate the problem the following script can be used: --- #!/bin/bash cd /sys/fs/cgroup mkdir percpu_test echo "+memory" > percpu_test/cgroup.subtree_control cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do mkdir percpu_test/cg_"${i}" for j in `seq 1 10`; do mkdir percpu_test/cg_"${i}"_"${j}" done done cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do for j in `seq 1 10`; do rmdir percpu_test/cg_"${i}"_"${j}" done done sleep 10 cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do rmdir percpu_test/cg_"${i}" done rmdir percpu_test -- It creates 11000 memory cgroups and removes every 10 out of 11. It prints the initial size of the percpu memory, the size after creating all cgroups and the size after deleting most of them. Results: vanilla: $ ./percpu_test.sh Percpu: 7296 kB Percpu: 481024 kB Percpu: 481024 kB with this patchset applied: ./percpu_test.sh Percpu: 7488 kB Percpu: 481152 kB Percpu: 153920 kB So the total size of the percpu memory was reduced by ~3 times. Roman Gushchin (4): percpu: implement partial chunk depopulation percpu: split __pcpu_balance_workfn() percpu: on demand chunk depopulation percpu: fix a comment about the chunks ordering mm/percpu-internal.h | 1 + mm/percpu.c | 242 ++++++++++++++++++++++++++++++++++++------- 2 files changed, 203 insertions(+), 40 deletions(-)