From patchwork Thu Apr 1 21:42:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12179641 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 075F4C43461 for ; Thu, 1 Apr 2021 21:43:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8913E60232 for ; Thu, 1 Apr 2021 21:43:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8913E60232 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1F0C76B00C6; Thu, 1 Apr 2021 17:43:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CAD16B00C8; Thu, 1 Apr 2021 17:43:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0174F6B00CA; Thu, 1 Apr 2021 17:43:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id CD8A46B00C6 for ; Thu, 1 Apr 2021 17:43:10 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 89A0C18033D39 for ; Thu, 1 Apr 2021 21:43:10 +0000 (UTC) X-FDA: 77985124140.15.666A9AF Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf26.hostedemail.com (Postfix) with ESMTP id DE9A840002C0 for ; Thu, 1 Apr 2021 21:43:08 +0000 (UTC) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 131LTn3t011877 for ; Thu, 1 Apr 2021 14:43:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : content-type : content-transfer-encoding : mime-version; s=facebook; bh=cPdVRtjHCrwtHY9AKEXGme23byUr2ZAJdsE3h3TIrUI=; b=GKshCNpLtrzUttzgbAZOLlhLzuAy8LlWmzjT0e4tBnxC1wIQN+x66+BJWCy8SjjozpkR Xk5k0jBpZkFp14xg2mLjESuzHMU5x96X+l3+ZYv7kI0T8N92vbBxQdT8hEyARlhmqOGC F9OJ9XoSQfsvOhYj5WlJMCGxFid8VT/kVkw= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 37ng35330k-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 01 Apr 2021 14:43:08 -0700 Received: from intmgw003.48.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Thu, 1 Apr 2021 14:43:06 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 3B5AE5C2F8E5; Thu, 1 Apr 2021 14:43:04 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH v1 0/5] percpu: partial chunk depopulation Date: Thu, 1 Apr 2021 14:42:56 -0700 Message-ID: <20210401214301.1689099-1-guro@fb.com> X-Mailer: git-send-email 2.30.2 X-FB-Internal: Safe X-Proofpoint-GUID: 720_DYFOQ3d5H__FQ7vccf6AKD8dybnv X-Proofpoint-ORIG-GUID: 720_DYFOQ3d5H__FQ7vccf6AKD8dybnv X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-04-01_13:2021-04-01,2021-04-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 lowpriorityscore=0 malwarescore=0 mlxscore=0 adultscore=0 clxscore=1015 spamscore=0 impostorscore=0 phishscore=0 bulkscore=0 priorityscore=1501 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2103310000 definitions=main-2104010138 X-FB-Internal: deliver X-Stat-Signature: 4zoma7gjg9krundyxtgy77j7abrwkqk6 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DE9A840002C0 Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=mx0a-00082601.pphosted.com; client-ip=67.231.145.42 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617313388-963811 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In our production experience the percpu memory allocator is sometimes struggling with returning the memory to the system. A typical example is a creation of several thousands memory cgroups (each has several chunks of the percpu data used for vmstats, vmevents, ref counters etc). Deletion and complete releasing of these cgroups doesn't always lead to a shrinkage of the percpu memory. The underlying problem is the fragmentation: to release an underlying chunk all percpu allocations should be released first. The percpu allocator tends to top up chunks to improve the utilization. It means new small-ish allocations (e.g. percpu ref counters) are placed onto almost filled old-ish chunks, effectively pinning them in memory. This patchset pretends to solve this problem by implementing a partial depopulation of percpu chunks: chunks with many empty pages are being asynchronously depopulated and the pages are returned to the system. To illustrate the problem the following script can be used: --- #!/bin/bash cd /sys/fs/cgroup mkdir percpu_test echo "+memory" > percpu_test/cgroup.subtree_control cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do mkdir percpu_test/cg_"${i}" for j in `seq 1 10`; do mkdir percpu_test/cg_"${i}"_"${j}" done done cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do for j in `seq 1 10`; do rmdir percpu_test/cg_"${i}"_"${j}" done done sleep 10 cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do rmdir percpu_test/cg_"${i}" done rmdir percpu_test -- It creates 11000 memory cgroups and removes every 10 out of 11. It prints the initial size of the percpu memory, the size after creating all cgroups and the size after deleting most of them. Results: vanilla: ./percpu_test.sh Percpu: 7488 kB Percpu: 481152 kB Percpu: 481152 kB with this patchset applied: ./percpu_test.sh Percpu: 7488 kB Percpu: 481408 kB Percpu: 159488 kB So the total size of the percpu memory was reduced by 3 times. v2: - depopulation heuristics changed and optimized - chunks are put into a separate list, depopulation scan this list - chunk->isolated is introduced, chunk->depopulate is dropped - rearranged patches a bit - fixed a panic discovered by krobot - made pcpu_nr_empty_pop_pages per chunk type - minor fixes rfc: https://lwn.net/Articles/850508/ Roman Gushchin (5): percpu: split __pcpu_balance_workfn() percpu: make pcpu_nr_empty_pop_pages per chunk type percpu: generalize pcpu_balance_populated() percpu: fix a comment about the chunks ordering percpu: implement partial chunk depopulation mm/percpu-internal.h | 3 +- mm/percpu-stats.c | 9 +- mm/percpu.c | 219 ++++++++++++++++++++++++++++++++++--------- 3 files changed, 182 insertions(+), 49 deletions(-)