From patchwork Mon May 11 22:55:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 11541851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C7717913 for ; Mon, 11 May 2020 22:55:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 89A1820736 for ; Mon, 11 May 2020 22:55:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="W2dLXmEh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89A1820736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7C4BF90008B; Mon, 11 May 2020 18:55:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 74E49900036; Mon, 11 May 2020 18:55:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63C5990008B; Mon, 11 May 2020 18:55:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 49637900036 for ; Mon, 11 May 2020 18:55:32 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0E4DF8248047 for ; Mon, 11 May 2020 22:55:32 +0000 (UTC) X-FDA: 76805946504.23.earth54_6a79e570b8b3a X-Spam-Summary: 2,0,0,e559373a98f2f406,d41d8cd98f00b204,kuba@kernel.org,,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1543:1711:1730:1747:1777:1792:1969:2195:2199:2393:2559:2562:2898:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:4321:4605:5007:6261:6653:7875:7901:7903:9592:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13255:13869:13894:14096:14110:14181:14394:14721:21080:21451:21627:21740:21990:30005:30034:30054:30070,0,RBL:198.145.29.99:@kernel.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: earth54_6a79e570b8b3a X-Filterd-Recvd-Size: 4678 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Mon, 11 May 2020 22:55:31 +0000 (UTC) Received: from kicinski-fedora-PC1C0HJN.thefacebook.com (unknown [163.114.132.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6F11B2075E; Mon, 11 May 2020 22:55:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589237730; bh=hPqe/05rnU2qdGjFk4enrQ1+qhpSIPRruaw+FWrv/LY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=W2dLXmEh0CKBx9VFApsDss2e5KCl8DyX4XBB2a6yvgq6nUKiDvzxgzmhPxH2P0HVv y/u/H8BUsrJfjDRZlulnyIFUOa1t2633inD5FBCT8CxVxzw98RN10xChEdedn+TyIO hD/9mMEX97ho/kShv/lNAlH7Mo04AX65nsEH3Otk= From: Jakub Kicinski To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, shakeelb@google.com, mhocko@kernel.org, Jakub Kicinski Subject: [PATCH mm v2 1/3] mm: prepare for swap over-high accounting and penalty calculation Date: Mon, 11 May 2020 15:55:14 -0700 Message-Id: <20200511225516.2431921-2-kuba@kernel.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200511225516.2431921-1-kuba@kernel.org> References: <20200511225516.2431921-1-kuba@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Slice the memory overage calculation logic a little bit so we can reuse it to apply a similar penalty to the swap. The logic which accesses the memory-specific fields (use and high values) has to be taken out of calculate_high_delay(). Signed-off-by: Jakub Kicinski Acked-by: Michal Hocko --- mm/memcontrol.c | 62 ++++++++++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 27 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 05dcb72314b5..8a9b671c3249 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2321,41 +2321,48 @@ static void high_work_func(struct work_struct *work) #define MEMCG_DELAY_PRECISION_SHIFT 20 #define MEMCG_DELAY_SCALING_SHIFT 14 -/* - * Get the number of jiffies that we should penalise a mischievous cgroup which - * is exceeding its memory.high by checking both it and its ancestors. - */ -static unsigned long calculate_high_delay(struct mem_cgroup *memcg, - unsigned int nr_pages) +static u64 calculate_overage(unsigned long usage, unsigned long high) { - unsigned long penalty_jiffies; - u64 max_overage = 0; - - do { - unsigned long usage, high; - u64 overage; + u64 overage; - usage = page_counter_read(&memcg->memory); - high = READ_ONCE(memcg->high); + if (usage <= high) + return 0; - if (usage <= high) - continue; + /* + * Prevent division by 0 in overage calculation by acting as if + * it was a threshold of 1 page + */ + high = max(high, 1UL); - /* - * Prevent division by 0 in overage calculation by acting as if - * it was a threshold of 1 page - */ - high = max(high, 1UL); + overage = usage - high; + overage <<= MEMCG_DELAY_PRECISION_SHIFT; + return div64_u64(overage, high); +} - overage = usage - high; - overage <<= MEMCG_DELAY_PRECISION_SHIFT; - overage = div64_u64(overage, high); +static u64 mem_find_max_overage(struct mem_cgroup *memcg) +{ + u64 overage, max_overage = 0; - if (overage > max_overage) - max_overage = overage; + do { + overage = calculate_overage(page_counter_read(&memcg->memory), + READ_ONCE(memcg->high)); + max_overage = max(overage, max_overage); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); + return max_overage; +} + +/* + * Get the number of jiffies that we should penalise a mischievous cgroup which + * is exceeding its memory.high by checking both it and its ancestors. + */ +static unsigned long calculate_high_delay(struct mem_cgroup *memcg, + unsigned int nr_pages, + u64 max_overage) +{ + unsigned long penalty_jiffies; + if (!max_overage) return 0; @@ -2411,7 +2418,8 @@ void mem_cgroup_handle_over_high(void) * memory.high is breached and reclaim is unable to keep up. Throttle * allocators proactively to slow down excessive growth. */ - penalty_jiffies = calculate_high_delay(memcg, nr_pages); + penalty_jiffies = calculate_high_delay(memcg, nr_pages, + mem_find_max_overage(memcg)); /* * Don't sleep if the amount of jiffies this memcg owes us is so low From patchwork Mon May 11 22:55:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 11541853 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1820C913 for ; Mon, 11 May 2020 22:55:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D93182070B for ; Mon, 11 May 2020 22:55:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="GMnDpNBp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D93182070B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DDF6F90008C; Mon, 11 May 2020 18:55:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D6923900036; Mon, 11 May 2020 18:55:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBB9690008C; Mon, 11 May 2020 18:55:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id A3C6A900036 for ; Mon, 11 May 2020 18:55:32 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 68999180AD807 for ; Mon, 11 May 2020 22:55:32 +0000 (UTC) X-FDA: 76805946504.24.jelly11_6a8a79d0cca37 X-Spam-Summary: 2,0,0,1d9aa766e46708a6,d41d8cd98f00b204,kuba@kernel.org,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:2693:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3871:3872:3874:4605:5007:6261:6653:7875:7901:8660:9592:10004:11026:11473:11658:11914:12043:12114:12220:12296:12297:12438:12517:12519:12555:12895:13069:13148:13230:13311:13357:13869:13894:14096:14181:14384:14394:14721:21080:21627:21740:21990:30005:30054,0,RBL:198.145.29.99:@kernel.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: jelly11_6a8a79d0cca37 X-Filterd-Recvd-Size: 3228 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Mon, 11 May 2020 22:55:31 +0000 (UTC) Received: from kicinski-fedora-PC1C0HJN.thefacebook.com (unknown [163.114.132.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EE61F207FF; Mon, 11 May 2020 22:55:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589237731; bh=kOtuHvIXyuCsDDJSILlMpuMpQDEuJ9nb7HZSheh1l9M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GMnDpNBpgqGMTV/qgT3udP01QYtKHuFoj83oSoqGfJZU6V7q4n5JUGbFvMg0Q04E0 sAaL7QoYzDsp0LyVwY4u33Ppj+oId6ugQBfvoUyGtPE1wAee98SF6gKmdL4hvpUitz 4bleT1nbHFFCc748VMbgj6I3XZzhMCk5NGQXubNM= From: Jakub Kicinski To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, shakeelb@google.com, mhocko@kernel.org, Jakub Kicinski Subject: [PATCH mm v2 2/3] mm: move penalty delay clamping out of calculate_high_delay() Date: Mon, 11 May 2020 15:55:15 -0700 Message-Id: <20200511225516.2431921-3-kuba@kernel.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200511225516.2431921-1-kuba@kernel.org> References: <20200511225516.2431921-1-kuba@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We will want to call calculate_high_delay() twice - once for memory and once for swap, and we should apply the clamp value to sum of the penalties. Clamping has to be applied outside of calculate_high_delay(). Signed-off-by: Jakub Kicinski --- mm/memcontrol.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8a9b671c3249..66dd87bb9e0f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2386,14 +2386,7 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg, * MEMCG_CHARGE_BATCH pages is nominal, so work out how much smaller or * larger the current charge patch is than that. */ - penalty_jiffies = penalty_jiffies * nr_pages / MEMCG_CHARGE_BATCH; - - /* - * Clamp the max delay per usermode return so as to still keep the - * application moving forwards and also permit diagnostics, albeit - * extremely slowly. - */ - return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + return penalty_jiffies * nr_pages / MEMCG_CHARGE_BATCH; } /* @@ -2421,6 +2414,13 @@ void mem_cgroup_handle_over_high(void) penalty_jiffies = calculate_high_delay(memcg, nr_pages, mem_find_max_overage(memcg)); + /* + * Clamp the max delay per usermode return so as to still keep the + * application moving forwards and also permit diagnostics, albeit + * extremely slowly. + */ + penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + /* * Don't sleep if the amount of jiffies this memcg owes us is so low * that it's not even worth doing, in an attempt to be nice to those who From patchwork Mon May 11 22:55:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 11541855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37D5E913 for ; Mon, 11 May 2020 22:55:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EC0AE2070B for ; Mon, 11 May 2020 22:55:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="kkiZM580" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC0AE2070B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 88C6990008D; Mon, 11 May 2020 18:55:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7C282900036; Mon, 11 May 2020 18:55:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63D5590008D; Mon, 11 May 2020 18:55:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 483A0900036 for ; Mon, 11 May 2020 18:55:33 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0573B2C8F for ; Mon, 11 May 2020 22:55:33 +0000 (UTC) X-FDA: 76805946546.14.fear58_6a9c3317fc756 X-Spam-Summary: 2,0,0,1e021a25d18b7e96,d41d8cd98f00b204,kuba@kernel.org,,RULES_HIT:1:2:41:355:379:541:800:960:973:981:988:989:1260:1311:1314:1345:1359:1437:1500:1515:1605:1730:1747:1777:1792:1801:1969:2195:2197:2199:2200:2393:2559:2562:2693:2740:2897:2898:2904:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4052:4250:4321:4605:5007:6119:6261:6653:7875:7901:7903:9036:10004:11026:11232:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12895:12986:13161:13229:13869:13894:14394:21080:21212:21324:21433:21627:21740:21795:21990:30005:30029:30034:30051:30054:30056:30070,0,RBL:198.145.29.99:@kernel.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:15,LUA_SUMMARY:none X-HE-Tag: fear58_6a9c3317fc756 X-Filterd-Recvd-Size: 10585 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Mon, 11 May 2020 22:55:32 +0000 (UTC) Received: from kicinski-fedora-PC1C0HJN.thefacebook.com (unknown [163.114.132.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 769B92078A; Mon, 11 May 2020 22:55:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589237731; bh=dMewI9Xy5TjZrTJbcxsrqxTspTRCeyUBNw8PCcygb/Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kkiZM580gQgyKudAgwQwtDtOrXs1ogCLuVn5R1zBtT8t8AV+O1iL1Bthi7Y0u8NqJ FJ588GxYzc9VNWfiiTIEzwry1BuHRlHakm1IRF2G6/wWihO5Dyx/kcu/D1+Qo5dST5 lmCX4o6ESrdgD6W7uIzSGMxvOEtIM3nuyz32gDxQ= From: Jakub Kicinski To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, shakeelb@google.com, mhocko@kernel.org, Jakub Kicinski Subject: [PATCH mm v2 3/3] mm: automatically penalize tasks with high swap use Date: Mon, 11 May 2020 15:55:16 -0700 Message-Id: <20200511225516.2431921-4-kuba@kernel.org> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200511225516.2431921-1-kuba@kernel.org> References: <20200511225516.2431921-1-kuba@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a memory.swap.high knob, which can be used to protect the system from SWAP exhaustion. The mechanism used for penelizing is similar to memory.high penalty (sleep on return to user space), but with a less steep slope. That is not to say that the knob itself is equivalent to memory.high. The objective is more to protect the system from potentially buggy tasks consuming a lot of swap and impacting other tasks, or even bringing the whole system to stand still with complete SWAP exhaustion. Hopefully without the need to find per-task hard limits. Slowing misbehaving tasks down gradually allows user space oom killers or other protection mechanisms to react. oomd and earlyoom already do killing based on swap exhaustion, and memory.swap.high protection will help implement such userspace oom policies more reliably. Use one counter for number of pages allocated under pressure to save struct task space and avoid two separate hierarchy walks on the hot path. Use swap.high when deciding if swap is full. Perform reclaim and count memory over high events. Signed-off-by: Jakub Kicinski --- v2: - add docs, - improve commit message. --- Documentation/admin-guide/cgroup-v2.rst | 16 +++++ include/linux/memcontrol.h | 4 ++ mm/memcontrol.c | 94 +++++++++++++++++++++++-- 3 files changed, 107 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 5f12f203822e..c60226daa193 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1374,6 +1374,22 @@ PAGE_SIZE multiple when read back. The total amount of swap currently being used by the cgroup and its descendants. + memory.swap.high + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Swap usage throttle limit. If a cgroup's swap usage exceeds + this limit, all its further allocations will be throttled to + allow userspace to implement custom out-of-memory procedures. + + This limit marks a point of no return for the cgroup. It is NOT + designed to manage the amount of swapping a workload does + during regular operation. Compare to memory.swap.max, which + prohibits swapping past a set amount, but lets the cgroup + continue unimpeded as long as other memory can be reclaimed. + + Healthy workloads are not expected to reach this limit. + memory.swap.max A read-write single value file which exists on non-root cgroups. The default is "max". diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b478a4e83297..882bda952a5c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -45,6 +45,7 @@ enum memcg_memory_event { MEMCG_MAX, MEMCG_OOM, MEMCG_OOM_KILL, + MEMCG_SWAP_HIGH, MEMCG_SWAP_MAX, MEMCG_SWAP_FAIL, MEMCG_NR_MEMORY_EVENTS, @@ -212,6 +213,9 @@ struct mem_cgroup { /* Upper bound of normal memory consumption range */ unsigned long high; + /* Upper bound of swap consumption range */ + unsigned long swap_high; + /* Range enforcement for interrupt charges */ struct work_struct high_work; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 66dd87bb9e0f..a3d13b30e3d6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2353,12 +2353,34 @@ static u64 mem_find_max_overage(struct mem_cgroup *memcg) return max_overage; } +static u64 swap_find_max_overage(struct mem_cgroup *memcg) +{ + u64 overage, max_overage = 0; + struct mem_cgroup *max_cg; + + do { + overage = calculate_overage(page_counter_read(&memcg->swap), + READ_ONCE(memcg->swap_high)); + if (overage > max_overage) { + max_overage = overage; + max_cg = memcg; + } + } while ((memcg = parent_mem_cgroup(memcg)) && + !mem_cgroup_is_root(memcg)); + + if (max_overage) + memcg_memory_event(max_cg, MEMCG_SWAP_HIGH); + + return max_overage; +} + /* * Get the number of jiffies that we should penalise a mischievous cgroup which * is exceeding its memory.high by checking both it and its ancestors. */ static unsigned long calculate_high_delay(struct mem_cgroup *memcg, unsigned int nr_pages, + unsigned char cost_shift, u64 max_overage) { unsigned long penalty_jiffies; @@ -2366,6 +2388,9 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg, if (!max_overage) return 0; + if (cost_shift) + max_overage >>= cost_shift; + /* * We use overage compared to memory.high to calculate the number of * jiffies to sleep (penalty_jiffies). Ideally this value should be @@ -2411,9 +2436,16 @@ void mem_cgroup_handle_over_high(void) * memory.high is breached and reclaim is unable to keep up. Throttle * allocators proactively to slow down excessive growth. */ - penalty_jiffies = calculate_high_delay(memcg, nr_pages, + penalty_jiffies = calculate_high_delay(memcg, nr_pages, 0, mem_find_max_overage(memcg)); + /* + * Make the swap curve more gradual, swap can be considered "cheaper", + * and is allocated in larger chunks. We want the delays to be gradual. + */ + penalty_jiffies += calculate_high_delay(memcg, nr_pages, 2, + swap_find_max_overage(memcg)); + /* * Clamp the max delay per usermode return so as to still keep the * application moving forwards and also permit diagnostics, albeit @@ -2604,12 +2636,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * reclaim, the cost of mismatch is negligible. */ do { - if (page_counter_read(&memcg->memory) > READ_ONCE(memcg->high)) { - /* Don't bother a random interrupted task */ - if (in_interrupt()) { + bool mem_high, swap_high; + + mem_high = page_counter_read(&memcg->memory) > + READ_ONCE(memcg->high); + swap_high = page_counter_read(&memcg->swap) > + READ_ONCE(memcg->swap_high); + + /* Don't bother a random interrupted task */ + if (in_interrupt()) { + if (mem_high) { schedule_work(&memcg->high_work); break; } + continue; + } + + if (mem_high || swap_high) { current->memcg_nr_pages_over_high += batch; set_notify_resume(current); break; @@ -5076,6 +5119,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) WRITE_ONCE(memcg->high, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; + WRITE_ONCE(memcg->swap_high, PAGE_COUNTER_MAX); if (parent) { memcg->swappiness = mem_cgroup_swappiness(parent); memcg->oom_kill_disable = parent->oom_kill_disable; @@ -5229,6 +5273,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css) page_counter_set_low(&memcg->memory, 0); WRITE_ONCE(memcg->high, PAGE_COUNTER_MAX); memcg->soft_limit = PAGE_COUNTER_MAX; + WRITE_ONCE(memcg->swap_high, PAGE_COUNTER_MAX); memcg_wb_domain_size_changed(memcg); } @@ -7136,10 +7181,13 @@ bool mem_cgroup_swap_full(struct page *page) if (!memcg) return false; - for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) - if (page_counter_read(&memcg->swap) * 2 >= - READ_ONCE(memcg->swap.max)) + for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { + unsigned long usage = page_counter_read(&memcg->swap); + + if (usage * 2 >= READ_ONCE(memcg->swap_high) || + usage * 2 >= READ_ONCE(memcg->swap.max)) return true; + } return false; } @@ -7169,6 +7217,30 @@ static u64 swap_current_read(struct cgroup_subsys_state *css, return (u64)page_counter_read(&memcg->swap) * PAGE_SIZE; } +static int swap_high_show(struct seq_file *m, void *v) +{ + unsigned long high = READ_ONCE(mem_cgroup_from_seq(m)->swap_high); + + return seq_puts_memcg_tunable(m, high); +} + +static ssize_t swap_high_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned long high; + int err; + + buf = strstrip(buf); + err = page_counter_memparse(buf, "max", &high); + if (err) + return err; + + WRITE_ONCE(memcg->swap_high, high); + + return nbytes; +} + static int swap_max_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -7196,6 +7268,8 @@ static int swap_events_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + seq_printf(m, "high %lu\n", + atomic_long_read(&memcg->memory_events[MEMCG_SWAP_HIGH])); seq_printf(m, "max %lu\n", atomic_long_read(&memcg->memory_events[MEMCG_SWAP_MAX])); seq_printf(m, "fail %lu\n", @@ -7210,6 +7284,12 @@ static struct cftype swap_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = swap_current_read, }, + { + .name = "swap.high", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = swap_high_show, + .write = swap_high_write, + }, { .name = "swap.max", .flags = CFTYPE_NOT_ON_ROOT,