From patchwork Thu Aug 29 10:20:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhongkun He X-Patchwork-Id: 13782956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4B74C8302B for ; Thu, 29 Aug 2024 10:21:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4AAB46B00AB; Thu, 29 Aug 2024 06:21:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 45B456B00AC; Thu, 29 Aug 2024 06:21:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FB3C6B00AD; Thu, 29 Aug 2024 06:21:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 101AE6B00AB for ; Thu, 29 Aug 2024 06:21:02 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 89993140422 for ; Thu, 29 Aug 2024 10:21:01 +0000 (UTC) X-FDA: 82504889922.15.B3E582A Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf10.hostedemail.com (Postfix) with ESMTP id ABDD0C0015 for ; Thu, 29 Aug 2024 10:20:59 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=fst+mNCz; spf=pass (imf10.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724926815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LxEuLGuLLlfpVLUQyjdn9XrbJexFuQp9KlY1lBVI+bg=; b=DPWqDh7o1mC5IiCw99kOnbShY3KhHCyGikvbtdtEb74mYbfegd/OHBkbDCtfPNRKvgPjGk 8MdDbV1xcfZdNz0NOMd7Fgt1OwpSgvdP1+AL6nVZYo525jFs0a8kZe14tkdTL+meqsI27a 3Jgohm/z9QzddGC86avrlIKbVach1tE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=fst+mNCz; spf=pass (imf10.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724926815; a=rsa-sha256; cv=none; b=x3b7appUiX0uwtjlReR59PdCgRt44UreLIruyWDys99zWyjq0AIgul06WQmxEEelZ6cnmf +pJe9WRjaV/q1aTs1nqPMGEG9/Cihu16NigvFKoZhLQVcU39FhCUoxMiJsUJ/9Bsxe4OEJ 457jruYw4RSH7SRxd1pc3cH3gVhI4pA= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-201ee6b084bso3956915ad.2 for ; Thu, 29 Aug 2024 03:20:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724926858; x=1725531658; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LxEuLGuLLlfpVLUQyjdn9XrbJexFuQp9KlY1lBVI+bg=; b=fst+mNCzHq8st2qMoVCtak3E+T15UqcbCEMXN/rWM/czy/0IVh6hL+lrvpvvjSfGIv RXSV754CYlCUXXRLIv2UyKvYnlGtGVl11ck7sxAzbhdD9JLNjuhNCRnBFLPBGe6gYqbj ZxuM5XVy6iE+xN/sVQ5eXRxGo7aNHbrF0DLZx6gGWNWyuGvAwIh1VhsKCrh7DIHqnkIX Uu6SCfvX2Dg4Eqb6HVNbMYS/1r0HXITuiU7Vd/AKgWD0sUZoUiDZLA/+Q/Q3FKGqkdET vTUHeGb0aivrCRYdMmiQWx01AlPjYn7DqS+yYm7F+zPuLEvckWB587mKAzTBG6pZpP0H 5uNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724926858; x=1725531658; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LxEuLGuLLlfpVLUQyjdn9XrbJexFuQp9KlY1lBVI+bg=; b=NQ3Heq4ON2TSbusHcpJJDUKBDovhAGKEmofi5vdsFqCsBgQRr1Tfja7SZGjq/U+sdM 6iau3JhMt8VkWdNTaOVCV/lc7+SZABaQHQB/6v3b6s5QXxtXSN557XuBxOwtOyrv1TXG pboWrw8LdkLYqBPMkTQubAnyYBHPABFUIObPP7bRpiCUbTQJEgbD6XG/LtHVzwQZHV9U haeiPskRFVKc9s6IJ3bUqL1v39/GRjlK/zVtYj1XU+diuH+vXgVFeoCbkLWMxIAQMDJd AzjJmMPYTv4q+wCwtY2dlpRvFu+3pKzPFuwKoizdirHCN1vNCDzhcVtsIA3AQSt8Mr+1 0jyg== X-Forwarded-Encrypted: i=1; AJvYcCXk2Diu2HiTY0EVudMnExpQ5ku8oM4kndoRKxDN9aUHWOexMBF/578BKyPpHwmkG7rqC/yKl2HF6A==@kvack.org X-Gm-Message-State: AOJu0Yz/KGwR2j5lKWBZfd6i6j5V1d899h3syEICBU6wsjF5+VSpHoOx zhu3M5Eg3xv5kxgQP+9dw1JVGv/z9KaNBlIBql7ppejFGkWbayMy8WDhjGAg4zo= X-Google-Smtp-Source: AGHT+IGiChsduuOeCBhmi3FuNaA5wdY9oxIgLe+AT8c04YtsobEYBqNcPH6TPr9CWLw8iWUvRNwL9g== X-Received: by 2002:a17:903:8c8:b0:202:3762:ff88 with SMTP id d9443c01a7336-2050c4d2ebbmr33268465ad.63.1724926858195; Thu, 29 Aug 2024 03:20:58 -0700 (PDT) Received: from n37-034-248.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-205155673a2sm8355725ad.303.2024.08.29.03.20.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Aug 2024 03:20:56 -0700 (PDT) From: Zhongkun He To: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org Cc: roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, lizefan.x@bytedance.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Zhongkun He Subject: [RFC PATCH 2/2] mm: memcg: add disbale_unmap_file arg to memory.reclaim Date: Thu, 29 Aug 2024 18:20:39 +0800 Message-Id: <20240829102039.3455842-2-hezhongkun.hzk@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240829102039.3455842-1-hezhongkun.hzk@bytedance.com> References: <20240829102039.3455842-1-hezhongkun.hzk@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: ABDD0C0015 X-Stat-Signature: gw65ryzq6yd1b93f1pr5sqcou4drzbd6 X-HE-Tag: 1724926859-260000 X-HE-Meta: U2FsdGVkX1+vJt8sCQzQFgr5oPkfLTKpUiZECfsVnO8B1Bwd3oWKGk5iutNgtuPFO0v4LgtKVC5+i6MZuE+tAPpEoWmZcvShQ82t1e68qzS9uiKPCOoZl3aPPZevB3vgSBftf+B4VFs72qNHkG+7Z5ETIjgvgGaGmTbOcqutlUyBVXeTwmuR+BZG5lBaQ/KpW5dLQ1twYzdc62mwyazpUspLYK/1ApdnABAPtAZKe5PSzc59PLc6F0MPMXSa2hJWUgXN6Le1RGvQW5Y3HLoNa7xrNFoHZgnpXcPQTjZGVbUrb4GGb4BMUhemzzsjApYpPestApCIg27LkBJQDGvdrfvbfEGIVwVJ81HKNMIWU0ovQ5UulspoBUy1lCoiMDTq82HLNHoU9HEgfuJXVnUHEuOiGuDEOBzcneDubXtLldRkCrJG5gFOb0ykJJpj478YRRS1Iu4OKDzU6HVwNE4sokpPDotrL/C/NgNWNB2HIoK8/D55pA99B46o0qcD2o3FTb23p1KsHsWhiU+SRSqnJvOwE6TiFgiC9cfYyODRPttJP3Mh4kExMy6W/yOxSI8lUCvULVPLUNQNZpwCIpWf1dCVl1gnb/87xNCu5eDrHWwRXUCCBWoOBptrB+I2iKgPXV1awRS2ayRwQa8BeN8CO56nN0FhNzTr6dT8a/ZoNaExH2CALcz66WYBmlPyHwob6D6NOwu68wPv/ipKDNoGTQg7QcpKnz0uVLPaf29Kre91431El8DsXJ2q7do55I6ncLlFWDljbWzqV+4/RVXOkjN1HDvebVxYKRKmf4eb+RvQd5hdho5JfmMKxVm2NSysITMUTiM39CehJUy1a50+v10kHVVNZZY1R9KNuiquDzfRcDLDf21bk/Ihn0RZzHJKhcE4WQgRUry0xTrCTdPtItQYMkuwQNUXxICQjemw+D3bOjpyL3rtChz8iCYXWQ54vfNN2cxmvWy4KFUPirY a4XSOnFh Hq8kkQm5pHpAllqm801s5hro1ptsN6nIaO1d4WERJ/3eUE00D9iRZhClGfIBB+a6kTz8fXqTUJpkASOsgkzZNI1W8qojoMfMAejdySxqWYraznfpLhKgAXVc9zzCCYAohE3pMhOQqPKsh4/1HyuOdAWa07nyo/ct5Z7p2rB9ojG7j+MXhoCtnbZj6QnseWPMsgb8gJCz32JCeU4se8L1cg2K3m5vOf8BZ/Xzzr/HHjEzERm71Og6WWWShMYidnf9Q8gKmozXJBeNMOoH39YFI9RvIIZjc60BFz9sv8FR9nGQAbUJU73cx8e/bk3ljmbdaY154+pVRGyw7mcpsW/by4PepfnHplG2XAEJugns0qilqenV97PvlNPyBrCpVnlkTj105bCLwNWuJJ30= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Allow proactively memory reclaimers to submit an additional disbale_unmap_file argument to memory.reclaim. This will skip the mapped file for that reclaim attempt. For example: echo "2M disable_unmap_file" > /sys/fs/cgroup/test/memory.reclaim will perform reclaim on the test cgroup with no mapped file page. The memory.reclaim is a useful interface. We can carry out proactive memory reclaim in the user space, which can increase the utilization rate of memory. In the actual usage scenarios, we found that when there are sufficient anonymous pages, mapped file pages with a relatively small proportion would still be reclaimed. This is likely to cause an increase in refaults and an increase in task delay, because mapped file pages usually include important executable codes, data, and shared libraries, etc. According to the verified situation, if we can skip this part of the memory, the business delay will be reduced. Even if there are sufficient anonymous pages and a small number of page cache and mapped file pages, mapped file pages will still be reclaimed. Here is an example of anonymous pages being sufficient but mapped file pages still being reclaimed: cat memory.stat | grep -wE 'anon|file|file_mapped' anon 3406462976 file 332967936 file_mapped 300302336 echo 1g > memory.reclaim swappiness=200 > memory.reclaim cat memory.stat | grep -wE 'anon|file|file_mapped' anon 2613276672 file 52523008 file_mapped 30982144 echo 1g > memory.reclaim swappiness=200 > memory.reclaim cat memory.stat | grep -wE 'anon|file|file_mapped' anon 1552130048 file 39759872 file_mapped 20299776 With this patch, the file_mapped pages will be skiped. echo 1g > memory.reclaim swappiness=200 disable_unmap_file > memory.reclaim cat memory.stat | grep -wE 'anon|file|file_mapped' anon 480059392 file 37978112 file_mapped 20299776 IMO,it is difficult to balance the priorities of various pages in the kernel, there are too many scenarios to consider. However, for the scenario of proactive memory reclaim in user space, we can make a simple judgment in this case. Signed-off-by: Zhongkun He --- include/linux/swap.h | 1 + mm/memcontrol.c | 9 +++++++-- mm/vmscan.c | 4 ++++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index ca533b478c21..49df8f3748e8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -409,6 +409,7 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) #define MEMCG_RECLAIM_PROACTIVE (1 << 2) +#define MEMCG_RECLAIM_DIS_UNMAP_FILE (1 << 3) #define MIN_SWAPPINESS 0 #define MAX_SWAPPINESS 200 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 35431035e782..7b0181553b0c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4282,11 +4282,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, enum { MEMORY_RECLAIM_SWAPPINESS = 0, + MEMORY_RECLAIM_DISABLE_UNMAP_FILE, MEMORY_RECLAIM_NULL, }; static const match_table_t tokens = { { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, + { MEMORY_RECLAIM_DISABLE_UNMAP_FILE, "disable_unmap_file"}, { MEMORY_RECLAIM_NULL, NULL }, }; @@ -4297,7 +4299,7 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, unsigned int nr_retries = MAX_RECLAIM_RETRIES; unsigned long nr_to_reclaim, nr_reclaimed = 0; int swappiness = -1; - unsigned int reclaim_options; + unsigned int reclaim_options = 0; char *old_buf, *start; substring_t args[MAX_OPT_ARGS]; @@ -4320,12 +4322,15 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS) return -EINVAL; break; + case MEMORY_RECLAIM_DISABLE_UNMAP_FILE: + reclaim_options = MEMCG_RECLAIM_DIS_UNMAP_FILE; + break; default: return -EINVAL; } } - reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; + reclaim_options |= MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; while (nr_reclaimed < nr_to_reclaim) { /* Will converge on zero, but reclaim enforces a minimum */ unsigned long batch_size = (nr_to_reclaim - nr_reclaimed) / 4; diff --git a/mm/vmscan.c b/mm/vmscan.c index 50ac714cba2f..1b58126a8246 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6609,6 +6609,10 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), }; + + if (reclaim_options & MEMCG_RECLAIM_DIS_UNMAP_FILE) + sc.may_unmap &= ~UNMAP_FILE; + /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put * equal pressure on all the nodes. This is based on the assumption that