From patchwork Thu Jun 27 15:47:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maarten Lankhorst X-Patchwork-Id: 13714714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0231DC30653 for ; Thu, 27 Jun 2024 15:48:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57C186B008C; Thu, 27 Jun 2024 11:48:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 476D26B0096; Thu, 27 Jun 2024 11:48:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 203FF6B008C; Thu, 27 Jun 2024 11:48:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EB2DC6B008A for ; Thu, 27 Jun 2024 11:48:06 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9292D120849 for ; Thu, 27 Jun 2024 15:48:06 +0000 (UTC) X-FDA: 82277099772.18.4CA14FC Received: from mblankhorst.nl (lankhorst.se [141.105.120.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 732491C0014 for ; Thu, 27 Jun 2024 15:48:04 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf18.hostedemail.com: domain of mlankhorst@mblankhorst.nl has no SPF policy when checking 141.105.120.124) smtp.mailfrom=mlankhorst@mblankhorst.nl ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719503272; a=rsa-sha256; cv=none; b=gERVgTwrJSZykI8Gcx+CYRd2NJzTkxwjboUMjROk9F92jtS2TDnYzKrEquBUYgtmsUCR2/ Fsus8K+OuT7aPKitggeBW2z2n31adK6V4zT6dVOKAcJ0JcWeEvSL1nGBqOGXZLa68zEAS0 w7psHDI3+65REo9KfsciChTj2/1qLTk= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf18.hostedemail.com: domain of mlankhorst@mblankhorst.nl has no SPF policy when checking 141.105.120.124) smtp.mailfrom=mlankhorst@mblankhorst.nl ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719503272; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ilFQkRNZg4JP3i5XiY9uml6zuzXeFxrKRQ2c7RYg/tI=; b=Oa5moPhjlgmimt3ytuhHV/Z/dxBhPhPWpxCoX9v0y1qEw2mAbof6usHNFF7pIk1daJLlbZ rCGjECr1VZNNQPaqYP5Oo4Aby4+6p7deg7i44NHsvQMzEUoWHvgw13pXa09R8SRSWt4LXo MlnVqWZjPSDA3JHk+mPC4T5LRP+HYGI= From: Maarten Lankhorst To: intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Tejun Heo , Zefan Li , Johannes Weiner , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song Cc: Friedrich Vock , cgroups@vger.kernel.org, linux-mm@kvack.org, Maarten Lankhorst Subject: [RFC PATCH 1/6] mm/page_counter: Move calculating protection values to page_counter Date: Thu, 27 Jun 2024 17:47:20 +0200 Message-ID: <20240627154754.74828-2-maarten.lankhorst@linux.intel.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240627154754.74828-1-maarten.lankhorst@linux.intel.com> References: <20240627154754.74828-1-maarten.lankhorst@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 732491C0014 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ndjj4md3ckqbcr9xdpxaradmbionicof X-HE-Tag: 1719503284-122533 X-HE-Meta: U2FsdGVkX1+9adFYz+skTNMlo2/keM2Gu2KiZ2YoUCVrr/oPc0fGj320dCIS+TatCV5cQjErhpYbDx0qxl7N5fMhIg6jUKXFdXgY2E2quTl7dobmpbvZlBKjlAUSSCG900YKKrWaADuxMIO5SoCtIbel1eKxx+dc+5n93rNi1nWiLSZ12wI4m6Ww+3gXdkVBbGMgFIrCaLhs58neYAc7oKM29vPxVJpEjTRJjfpaAGoNEb6M0MaRRqgZ/6aKmXSyDvhKt1ITp7cXyKByGZ4HqjhsCQu852Il6Qdai6nkN/ER+dvHnzDqgt76UoATdEl1q5FZ3P++ZMUg22BiArP4brZUEro+CayN9LXyRUesgxrXF+DTpRlLV1uhYB4F+cv52SLfsAnmeF28v4aG+rbzmfPmpDla/kDDrE8VEBVLe6Ks7nmBGw5wDiYb6y83YvgteNUB74NruGHXOudzZx0kLyKcRCQiBhpwp6Dsxt/f/cYFgQqoftTaxl49jJIqLNs1bG8ROegL1B3aQPN2OFyq/IlTwGvDw0666YRVhLAsb9oYUYquowwb383sjgh6i46pVVTjRRttAiPTZxNFMpeJJTdQ267819mLmnlJIVdXjwlJ5xSyhQZ8LA8H+UZJbSuxw6wUH78eujMPGQdfjpJUcCLHrtWg1UEDkL/vgRZBII2LyhQ/dR0hKVY8bt9evPQ32d9TV1ljJiiBIVph0iclRervqv2NjLmIsP0HyJGsbUD+fPAGJv7XfkKJO+BHEyUGkar0RWQe8bQPKCrpcA8mmMY1ZGm/Co1bO76Yap2RcYc1bCk7AaisCrI5ax76tbxWPJknXk61zjam+ndtbpOjK+HMRJzD5fRTZA+Nm6Uq+DsEvBK5+XiKXnwnF03qQ89iZrojiqGYksGIxB0rBIZERHCJRTnKfg/GO1YjeoOENF3xZbJMI47I3W0B5AMa7kxVWNkYq6wGD/ESxn80KqT /9e3kKMI Ru8epzrRHjgglqvftKUXVS83QLV5ZhEF35Nv6lM9qVsnkS+ekYzsVm5EIfdigLTKkGrNeY2mtSaCEJ4Sp6YZwVSAy4QHLgilD+MURnGdXxWqhwzrhuXz0RmosGRYdAmE+mbwf/sJG5VicCcS9g+RXOd5uOG/0iZMIG7wDOMdY6tFhvhPheCGIMkWl2Q3cQy8JfpQ6B0mJhZ0eSjQDzXdbyLW4CtGueUeeGnxTgWH4duqMC0jyi+OrnnoRWLT4Fo1RiTbaqcvfL93g8nReOyfhDj8rakVG9hN7or4x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It's a lot of math, and there is nothing memcontrol specific about it. This makes it easier to use inside of the drm cgroup controller. Signed-off-by: Maarten Lankhorst Acked-by: Roman Gushchin Acked-by: Shakeel Butt --- include/linux/page_counter.h | 4 + mm/memcontrol.c | 154 +------------------------------ mm/page_counter.c | 173 +++++++++++++++++++++++++++++++++++ 3 files changed, 180 insertions(+), 151 deletions(-) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 8cd858d912c4..904c52f97284 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -81,4 +81,8 @@ static inline void page_counter_reset_watermark(struct page_counter *counter) counter->watermark = page_counter_read(counter); } +void page_counter_calculate_protection(struct page_counter *root, + struct page_counter *counter, + bool recursive_protection); + #endif /* _LINUX_PAGE_COUNTER_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 71fe2a95b8bd..9454e1a3120e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7316,122 +7316,6 @@ struct cgroup_subsys memory_cgrp_subsys = { .early_init = 0, }; -/* - * This function calculates an individual cgroup's effective - * protection which is derived from its own memory.min/low, its - * parent's and siblings' settings, as well as the actual memory - * distribution in the tree. - * - * The following rules apply to the effective protection values: - * - * 1. At the first level of reclaim, effective protection is equal to - * the declared protection in memory.min and memory.low. - * - * 2. To enable safe delegation of the protection configuration, at - * subsequent levels the effective protection is capped to the - * parent's effective protection. - * - * 3. To make complex and dynamic subtrees easier to configure, the - * user is allowed to overcommit the declared protection at a given - * level. If that is the case, the parent's effective protection is - * distributed to the children in proportion to how much protection - * they have declared and how much of it they are utilizing. - * - * This makes distribution proportional, but also work-conserving: - * if one cgroup claims much more protection than it uses memory, - * the unused remainder is available to its siblings. - * - * 4. Conversely, when the declared protection is undercommitted at a - * given level, the distribution of the larger parental protection - * budget is NOT proportional. A cgroup's protection from a sibling - * is capped to its own memory.min/low setting. - * - * 5. However, to allow protecting recursive subtrees from each other - * without having to declare each individual cgroup's fixed share - * of the ancestor's claim to protection, any unutilized - - * "floating" - protection from up the tree is distributed in - * proportion to each cgroup's *usage*. This makes the protection - * neutral wrt sibling cgroups and lets them compete freely over - * the shared parental protection budget, but it protects the - * subtree as a whole from neighboring subtrees. - * - * Note that 4. and 5. are not in conflict: 4. is about protecting - * against immediate siblings whereas 5. is about protecting against - * neighboring subtrees. - */ -static unsigned long effective_protection(unsigned long usage, - unsigned long parent_usage, - unsigned long setting, - unsigned long parent_effective, - unsigned long siblings_protected) -{ - unsigned long protected; - unsigned long ep; - - protected = min(usage, setting); - /* - * If all cgroups at this level combined claim and use more - * protection than what the parent affords them, distribute - * shares in proportion to utilization. - * - * We are using actual utilization rather than the statically - * claimed protection in order to be work-conserving: claimed - * but unused protection is available to siblings that would - * otherwise get a smaller chunk than what they claimed. - */ - if (siblings_protected > parent_effective) - return protected * parent_effective / siblings_protected; - - /* - * Ok, utilized protection of all children is within what the - * parent affords them, so we know whatever this child claims - * and utilizes is effectively protected. - * - * If there is unprotected usage beyond this value, reclaim - * will apply pressure in proportion to that amount. - * - * If there is unutilized protection, the cgroup will be fully - * shielded from reclaim, but we do return a smaller value for - * protection than what the group could enjoy in theory. This - * is okay. With the overcommit distribution above, effective - * protection is always dependent on how memory is actually - * consumed among the siblings anyway. - */ - ep = protected; - - /* - * If the children aren't claiming (all of) the protection - * afforded to them by the parent, distribute the remainder in - * proportion to the (unprotected) memory of each cgroup. That - * way, cgroups that aren't explicitly prioritized wrt each - * other compete freely over the allowance, but they are - * collectively protected from neighboring trees. - * - * We're using unprotected memory for the weight so that if - * some cgroups DO claim explicit protection, we don't protect - * the same bytes twice. - * - * Check both usage and parent_usage against the respective - * protected values. One should imply the other, but they - * aren't read atomically - make sure the division is sane. - */ - if (!(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT)) - return ep; - if (parent_effective > siblings_protected && - parent_usage > siblings_protected && - usage > protected) { - unsigned long unclaimed; - - unclaimed = parent_effective - siblings_protected; - unclaimed *= usage - protected; - unclaimed /= parent_usage - siblings_protected; - - ep += unclaimed; - } - - return ep; -} - /** * mem_cgroup_calculate_protection - check if memory consumption is in the normal range * @root: the top ancestor of the sub-tree being checked @@ -7443,8 +7327,8 @@ static unsigned long effective_protection(unsigned long usage, void mem_cgroup_calculate_protection(struct mem_cgroup *root, struct mem_cgroup *memcg) { - unsigned long usage, parent_usage; - struct mem_cgroup *parent; + bool recursive_protection = + cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT; if (mem_cgroup_disabled()) return; @@ -7452,39 +7336,7 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, if (!root) root = root_mem_cgroup; - /* - * Effective values of the reclaim targets are ignored so they - * can be stale. Have a look at mem_cgroup_protection for more - * details. - * TODO: calculation should be more robust so that we do not need - * that special casing. - */ - if (memcg == root) - return; - - usage = page_counter_read(&memcg->memory); - if (!usage) - return; - - parent = parent_mem_cgroup(memcg); - - if (parent == root) { - memcg->memory.emin = READ_ONCE(memcg->memory.min); - memcg->memory.elow = READ_ONCE(memcg->memory.low); - return; - } - - parent_usage = page_counter_read(&parent->memory); - - WRITE_ONCE(memcg->memory.emin, effective_protection(usage, parent_usage, - READ_ONCE(memcg->memory.min), - READ_ONCE(parent->memory.emin), - atomic_long_read(&parent->memory.children_min_usage))); - - WRITE_ONCE(memcg->memory.elow, effective_protection(usage, parent_usage, - READ_ONCE(memcg->memory.low), - READ_ONCE(parent->memory.elow), - atomic_long_read(&parent->memory.children_low_usage))); + page_counter_calculate_protection(&root->memory, &memcg->memory, recursive_protection); } static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, diff --git a/mm/page_counter.c b/mm/page_counter.c index db20d6452b71..8ee49cbf71be 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -262,3 +262,176 @@ int page_counter_memparse(const char *buf, const char *max, return 0; } + + +/* + * This function calculates an individual page counter's effective + * protection which is derived from its own memory.min/low, its + * parent's and siblings' settings, as well as the actual memory + * distribution in the tree. + * + * The following rules apply to the effective protection values: + * + * 1. At the first level of reclaim, effective protection is equal to + * the declared protection in memory.min and memory.low. + * + * 2. To enable safe delegation of the protection configuration, at + * subsequent levels the effective protection is capped to the + * parent's effective protection. + * + * 3. To make complex and dynamic subtrees easier to configure, the + * user is allowed to overcommit the declared protection at a given + * level. If that is the case, the parent's effective protection is + * distributed to the children in proportion to how much protection + * they have declared and how much of it they are utilizing. + * + * This makes distribution proportional, but also work-conserving: + * if one counter claims much more protection than it uses memory, + * the unused remainder is available to its siblings. + * + * 4. Conversely, when the declared protection is undercommitted at a + * given level, the distribution of the larger parental protection + * budget is NOT proportional. A counter's protection from a sibling + * is capped to its own memory.min/low setting. + * + * 5. However, to allow protecting recursive subtrees from each other + * without having to declare each individual counter's fixed share + * of the ancestor's claim to protection, any unutilized - + * "floating" - protection from up the tree is distributed in + * proportion to each counter's *usage*. This makes the protection + * neutral wrt sibling cgroups and lets them compete freely over + * the shared parental protection budget, but it protects the + * subtree as a whole from neighboring subtrees. + * + * Note that 4. and 5. are not in conflict: 4. is about protecting + * against immediate siblings whereas 5. is about protecting against + * neighboring subtrees. + */ +static unsigned long effective_protection(unsigned long usage, + unsigned long parent_usage, + unsigned long setting, + unsigned long parent_effective, + unsigned long siblings_protected, + bool recursive_protection) +{ + unsigned long protected; + unsigned long ep; + + protected = min(usage, setting); + /* + * If all cgroups at this level combined claim and use more + * protection than what the parent affords them, distribute + * shares in proportion to utilization. + * + * We are using actual utilization rather than the statically + * claimed protection in order to be work-conserving: claimed + * but unused protection is available to siblings that would + * otherwise get a smaller chunk than what they claimed. + */ + if (siblings_protected > parent_effective) + return protected * parent_effective / siblings_protected; + + /* + * Ok, utilized protection of all children is within what the + * parent affords them, so we know whatever this child claims + * and utilizes is effectively protected. + * + * If there is unprotected usage beyond this value, reclaim + * will apply pressure in proportion to that amount. + * + * If there is unutilized protection, the cgroup will be fully + * shielded from reclaim, but we do return a smaller value for + * protection than what the group could enjoy in theory. This + * is okay. With the overcommit distribution above, effective + * protection is always dependent on how memory is actually + * consumed among the siblings anyway. + */ + ep = protected; + + /* + * If the children aren't claiming (all of) the protection + * afforded to them by the parent, distribute the remainder in + * proportion to the (unprotected) memory of each cgroup. That + * way, cgroups that aren't explicitly prioritized wrt each + * other compete freely over the allowance, but they are + * collectively protected from neighboring trees. + * + * We're using unprotected memory for the weight so that if + * some cgroups DO claim explicit protection, we don't protect + * the same bytes twice. + * + * Check both usage and parent_usage against the respective + * protected values. One should imply the other, but they + * aren't read atomically - make sure the division is sane. + */ + if (!recursive_protection) + return ep; + + if (parent_effective > siblings_protected && + parent_usage > siblings_protected && + usage > protected) { + unsigned long unclaimed; + + unclaimed = parent_effective - siblings_protected; + unclaimed *= usage - protected; + unclaimed /= parent_usage - siblings_protected; + + ep += unclaimed; + } + + return ep; +} + + +/** + * page_counter_calculate_protection - check if memory consumption is in the normal range + * @root: the top ancestor of the sub-tree being checked + * @memcg: the memory cgroup to check + * @recursive_protection: Whether to use memory_recursiveprot behavior. + * + * Calculates elow/emin thresholds for given page_counter. + * + * WARNING: This function is not stateless! It can only be used as part + * of a top-down tree iteration, not for isolated queries. + */ +void page_counter_calculate_protection(struct page_counter *root, + struct page_counter *counter, + bool recursive_protection) +{ + unsigned long usage, parent_usage; + struct page_counter *parent = counter->parent; + + /* + * Effective values of the reclaim targets are ignored so they + * can be stale. Have a look at mem_cgroup_protection for more + * details. + * TODO: calculation should be more robust so that we do not need + * that special casing. + */ + if (root == counter) + return; + + usage = page_counter_read(counter); + if (!usage) + return; + + if (parent == root) { + counter->emin = READ_ONCE(counter->min); + counter->elow = READ_ONCE(counter->low); + return; + } + + parent_usage = page_counter_read(parent); + + WRITE_ONCE(counter->emin, effective_protection(usage, parent_usage, + READ_ONCE(counter->min), + READ_ONCE(parent->emin), + atomic_long_read(&parent->children_min_usage), + recursive_protection)); + + WRITE_ONCE(counter->elow, effective_protection(usage, parent_usage, + READ_ONCE(counter->low), + READ_ONCE(parent->elow), + atomic_long_read(&parent->children_low_usage), + recursive_protection)); +}