From patchwork Mon Aug 17 14:08:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11718335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DDBD513B1 for ; Mon, 17 Aug 2020 14:10:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 93BCA20748 for ; Mon, 17 Aug 2020 14:10:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S3LHxYV8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 93BCA20748 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BEF046B0006; Mon, 17 Aug 2020 10:10:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BC7F26B0007; Mon, 17 Aug 2020 10:10:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADCF96B0008; Mon, 17 Aug 2020 10:10:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 9959E6B0006 for ; Mon, 17 Aug 2020 10:10:09 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3CA36180AD830 for ; Mon, 17 Aug 2020 14:10:09 +0000 (UTC) X-FDA: 77160244938.23.sink20_191793a27017 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 7FA4D37635 for ; Mon, 17 Aug 2020 14:10:02 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,longman@redhat.com,,RULES_HIT:4423:30051:30054:30070:30090,0,RBL:205.139.110.61:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10;04yg3t4qswrpdjyin499eughmjtzoopni8y35zapqntz1mwbdfjxd4x9oz4ir1x.8h4wk8snodkborxqahx14ex1f6bph1o1ifrix5jw9doyzbgmxqynqqpb6eeiasj.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: sink20_191793a27017 X-Filterd-Recvd-Size: 7912 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Mon, 17 Aug 2020 14:10:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1597673401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=gLC3iaZ7OOAbkDpDBx+l3EQ6N48p5E+2TCrsju7XHF8=; b=S3LHxYV88MwC3cQ9g/bF1j1ctloTVJb7TDiXDjuHaWKrLjakMf63jZL1KvNKF2gRlVS0pB BNlrXpKvMP3NSiY9290QnOL3GB1wPfRxDSCbFG9KF7U9dXZ/QsjKi03RLHK3UfSdhLzNbN YnhQpeJRZdYDw47ilJ5e8NbclMRs+Io= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-466-fyIpzfgoMP2ZcJRcTktqDA-1; Mon, 17 Aug 2020 10:09:59 -0400 X-MC-Unique: fyIpzfgoMP2ZcJRcTktqDA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 846CA801AD9; Mon, 17 Aug 2020 14:09:57 +0000 (UTC) Received: from llong.com (ovpn-118-35.rdu2.redhat.com [10.10.118.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id B8EA821E8F; Mon, 17 Aug 2020 14:09:55 +0000 (UTC) From: Waiman Long To: Andrew Morton , Johannes Weiner , Michal Hocko , Vladimir Davydov , Jonathan Corbet , Alexey Dobriyan , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Waiman Long Subject: [RFC PATCH 5/8] memcg: Allow direct per-task memory limit checking Date: Mon, 17 Aug 2020 10:08:28 -0400 Message-Id: <20200817140831.30260-6-longman@redhat.com> In-Reply-To: <20200817140831.30260-1-longman@redhat.com> References: <20200817140831.30260-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspamd-Queue-Id: 7FA4D37635 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Up to now, the PR_SET_MEMCONTROL prctl(2) call enables user-specified action only if the total memory consumption in the memory cgroup exceeds memory.high by the additional memory threshold specified. There are cases where a user may want direct memory consumption control for certain applications even if the total cgroup memory consumption has not exceeded the limit yet. One way of doing that is to create one memory cgroup per application. However, if an application call other helper applications, these helper applications will fall into the same cgroup breaking the one application per cgroup rule. Another alternative is to enable user to enable direct per-task memory limit checking which is what this patch is about. That is for special use cases and is not recommended for general use as memory reclaim may not be triggered even if the per-task memory limit has been exceeded. Signed-off-by: Waiman Long --- include/uapi/linux/prctl.h | 4 ++- mm/memcontrol.c | 52 +++++++++++++++++++++++++++----------- 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index ef8d84c94b4a..7ba40e10737d 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -265,13 +265,15 @@ struct prctl_mm_map { /* Flags for PR_SET_MEMCONTROL */ # define PR_MEMFLAG_SIGCONT (1UL << 0) /* Continuous signal delivery */ +# define PR_MEMFLAG_DIRECT (1UL << 1) /* Direct memory limit */ # define PR_MEMFLAG_RSS_ANON (1UL << 8) /* Check anonymous pages */ # define PR_MEMFLAG_RSS_FILE (1UL << 9) /* Check file pages */ # define PR_MEMFLAG_RSS_SHMEM (1UL << 10) /* Check shmem pages */ # define PR_MEMFLAG_RSS (PR_MEMFLAG_RSS_ANON |\ PR_MEMFLAG_RSS_FILE |\ PR_MEMFLAG_RSS_SHMEM) -# define PR_MEMFLAG_MASK (PR_MEMFLAG_SIGCONT | PR_MEMFLAG_RSS) +# define PR_MEMFLAG_MASK (PR_MEMFLAG_SIGCONT | PR_MEMFLAG_RSS |\ + PR_MEMFLAG_DIRECT) /* Action word masks */ # define PR_MEMACT_MASK 0xff diff --git a/mm/memcontrol.c b/mm/memcontrol.c index aa76bae7f408..6488f8a10d66 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2640,27 +2640,27 @@ get_rss_counter(struct mm_struct *mm, int mm_bit, u16 flags, int rss_bit) * Return true if an action has been taken or further check is not needed, * false otherwise. */ -static bool __mem_cgroup_over_high_action(struct mem_cgroup *memcg, u8 action) +static bool __mem_cgroup_over_high_action(struct mem_cgroup *memcg, u8 action, + u16 flags) { - unsigned long mem; + unsigned long mem = 0; bool ret = false; struct mm_struct *mm = get_task_mm(current); u8 signal = READ_ONCE(current->memcg_over_high_signal); - u16 flags = READ_ONCE(current->memcg_over_high_flags); - u32 limit = READ_ONCE(current->memcg_over_high_climit); + u32 limit; if (!mm) return true; /* No more check is needed */ - if (READ_ONCE(current->memcg_over_limit)) - WRITE_ONCE(current->memcg_over_limit, false); - if ((action == PR_MEMACT_SIGNAL) && !signal) goto out; - mem = page_counter_read(&memcg->memory); - if (mem <= memcg->memory.high + limit) - goto out; + if (memcg) { + mem = page_counter_read(&memcg->memory); + limit = READ_ONCE(current->memcg_over_high_climit); + if (mem <= memcg->memory.high + limit) + goto out; + } /* * Check RSS memory if any of the PR_MEMFLAG_RSS flags is set. @@ -2706,20 +2706,34 @@ static bool __mem_cgroup_over_high_action(struct mem_cgroup *memcg, u8 action) out: mmput(mm); - return ret; + /* + * We only need to do direct per-task memory limit checking once. + */ + return memcg ? ret : true; } /* * Return true if an action has been taken or further check is not needed, * false otherwise. */ -static inline bool mem_cgroup_over_high_action(struct mem_cgroup *memcg) +static inline bool mem_cgroup_over_high_action(struct mem_cgroup *memcg, + bool mem_high) { u8 action = READ_ONCE(current->memcg_over_high_action); + u16 flags = READ_ONCE(current->memcg_over_high_flags); if (!action) return true; /* No more check is needed */ - return __mem_cgroup_over_high_action(memcg, action); + + if (READ_ONCE(current->memcg_over_limit)) + WRITE_ONCE(current->memcg_over_limit, false); + + if (flags & PR_MEMFLAG_DIRECT) + memcg = NULL; /* Direct per-task memory limit checking */ + else if (!mem_high) + return false; + + return __mem_cgroup_over_high_action(memcg, action, flags); } /* @@ -2907,8 +2921,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, swap_high = page_counter_read(&memcg->swap) > READ_ONCE(memcg->swap.high); - if (mem_high && !taken) - taken = mem_cgroup_over_high_action(memcg); + if (!taken) + taken = mem_cgroup_over_high_action(memcg, mem_high); /* Don't bother a random interrupted task */ if (in_interrupt()) { @@ -7103,6 +7117,14 @@ long mem_cgroup_over_high_set(struct task_struct *task, unsigned long action, (sig >= _NSIG)) return -EINVAL; + /* + * PR_MEMFLAG_DIRECT can only be set if any of the PR_MEMFLAG_RSS flag + * is set and limit2 is non-zero. + */ + if ((flags & PR_MEMFLAG_DIRECT) && + (!(flags & PR_MEMFLAG_RSS) || !limit2)) + return -EINVAL; + WRITE_ONCE(task->memcg_over_high_action, cmd); WRITE_ONCE(task->memcg_over_high_signal, sig); WRITE_ONCE(task->memcg_over_high_flags, flags);