From patchwork Thu Mar 23 04:00:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76B2CC6FD1D for ; Thu, 23 Mar 2023 04:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230145AbjCWEAy (ORCPT ); Thu, 23 Mar 2023 00:00:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229463AbjCWEAw (ORCPT ); Thu, 23 Mar 2023 00:00:52 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDD0F1F92C for ; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id m9-20020a17090a7f8900b0023769205928so370047pjl.6 for ; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=q0IiHI38SIToezuWJ5YCNMq/43EaxUo3i4AMEPHcC9bZCfWe3qfaYbxabY9EbcCoqm LUESQ/1v1D2aalE9SO+twkVNP5hnW+UNiSPbsUMxGnfaQG+8mDUTyeXBv18fXpGW3SWD GCVDR2eWTWBw72p7FFHYiZQYGqUWBUdEyjEquw53iNIuEWGhIQhSBNfbqBw3sfwQmtJy zq0COAvO1AYz/HqEKQJev5lhSkkjfsk/GYKhQDhBPMJgp6hmFHeDaNzuC/4om1icb3Uk IS34ZPkCnh/Dxdz7tz/xvLp+bFG8mdt7D5IhxCaGQj3LQws3RSkaD2yIkaXnXsqGxGWA GdAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=NSK+YHM2c1Hf5xz2knqMlBtL2iy2OBkagte16rcORf1XP/CKNSS+O/TujPePsFOMfU 16TMQ+HpuAEOgapJ2S+ycLJLDy77dHu63D5ZP4ebS1UuP0roAjZUG/toTr6snfgnYtsH YZ8mt6ZBM070uEiLmRw2sNVjmsupmPEMBtljMljaBwYKRYicktggNaxHFXThpOUSs+Wb F/VbyjFSfhItcc3CEr8Vm26STP0Nokp+pK5ZLoxe2BGvt0brPKhfFoMR8DrIh7C/i/pE gznNHeBP/yRuxprcyC64xBObx8R5F1cFFJz1FiiTLkvYNrptORlKN0ENA9Ahla6uGqqX bCNQ== X-Gm-Message-State: AO0yUKUDgFlCL+J9joIzMq8uK/X+P3HUStIec/1gu4mGZL6TrdjrHUgQ 0O91RM3xRLHXglbX4U29onSv+9T9VtHM2z13 X-Google-Smtp-Source: AK7set97jrlAfVvPpHOrM2Vr0RCmWVnsZU6sMMMwHyygC2dGwvgADEuqjxobUXV27YUvQlS1Att+r/ZLiyOif2dS X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:902:e745:b0:1a0:4aa3:3a9a with SMTP id p5-20020a170902e74500b001a04aa33a9amr1983132plf.2.1679544043324; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:31 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-2-yosryahmed@google.com> Subject: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC Currently, when sleeping is not allowed during rstat flushing, we hold the global rstat lock with interrupts disabled throughout the entire flush operation. Flushing in an O(# cgroups * # cpus) operation, and having interrupts disabled throughout is dangerous. For some contexts, we may not want to sleep, but can be interrupted (e.g. while holding a spinlock or RCU read lock). As such, do not disable interrupts throughout rstat flushing, only when holding the percpu lock. This breaks down the O(# cgroups * # cpus) duration with interrupts disabled to a series of O(# cgroups) durations. Furthermore, if a cpu spinning waiting for the global rstat lock, it doesn't need to spin with interrupts disabled anymore. If the caller of rstat flushing needs interrupts to be disabled, it's up to them to decide that, and it should be fine to hold the global rstat lock with interrupts disabled. There is currently a single context that may invoke rstat flushing with interrupts disabled, the mem_cgroup_flush_stats() call in mem_cgroup_usage(), if called from mem_cgroup_threshold(). To make it safe to hold the global rstat lock with interrupts enabled, make sure we only flush from in_task() contexts. The side effect of that we read stale stats in interrupt context, but this should be okay, as flushing in interrupt context is dangerous anyway as it is an expensive operation, so reading stale stats is safer. Signed-off-by: Yosry Ahmed --- kernel/cgroup/rstat.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 831f1f472bb8..af11e28bd055 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -7,6 +7,7 @@ #include #include +/* This lock should only be held from task context */ static DEFINE_SPINLOCK(cgroup_rstat_lock); static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); @@ -210,14 +211,24 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) /* if @may_sleep, play nice and yield if necessary */ if (may_sleep && (need_resched() || spin_needbreak(&cgroup_rstat_lock))) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); if (!cond_resched()) cpu_relax(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); } } } +static bool should_skip_flush(void) +{ + /* + * We acquire cgroup_rstat_lock without disabling interrupts, so we + * should not try to acquire from interrupt contexts to avoid deadlocks. + * It can be expensive to flush stats in interrupt context anyway. + */ + return !in_task(); +} + /** * cgroup_rstat_flush - flush stats in @cgrp's subtree * @cgrp: target cgroup @@ -229,15 +240,18 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) * This also gets all cgroups in the subtree including @cgrp off the * ->updated_children lists. * - * This function may block. + * This function is safe to call from contexts that disable interrupts, but + * @may_sleep must be set to false, otherwise the function may block. */ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) { - might_sleep(); + if (should_skip_flush()) + return; - spin_lock_irq(&cgroup_rstat_lock); + might_sleep(); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, true); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } /** @@ -248,11 +262,12 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) */ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) { - unsigned long flags; + if (should_skip_flush()) + return; - spin_lock_irqsave(&cgroup_rstat_lock, flags); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, false); - spin_unlock_irqrestore(&cgroup_rstat_lock, flags); + spin_unlock(&cgroup_rstat_lock); } /** @@ -267,8 +282,11 @@ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) void cgroup_rstat_flush_hold(struct cgroup *cgrp) __acquires(&cgroup_rstat_lock) { + if (should_skip_flush()) + return; + might_sleep(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, true); } @@ -278,7 +296,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp) void cgroup_rstat_flush_release(void) __releases(&cgroup_rstat_lock) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } int cgroup_rstat_init(struct cgroup *cgrp) From patchwork Thu Mar 23 04:00:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 279EAC76196 for ; Thu, 23 Mar 2023 04:01:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbjCWEA6 (ORCPT ); Thu, 23 Mar 2023 00:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229625AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AD6C20042 for ; Wed, 22 Mar 2023 21:00:46 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id y186-20020a62cec3000000b00627df3d6ec4so6826678pfg.12 for ; Wed, 22 Mar 2023 21:00:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544045; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0V6KepxSWjC0U8YUMhyf5sEsW80RflfKR4tDJnFyjUM=; b=nvFwhwj7qGyMPQaTPrzYFw+T0XNSe66OQzM5b4pjwLTwVBRy9Yg/AbwpYFnRMIM5gG L465fbRhjctsBw/iYduTrtP7LUUTB2pfnwQVaagEQM0GyAtfd40drA654+PpJ27hrtIB BUWcp/c32KQ98SFLCZ9FSHYNM6vaide1p2wRQYwdJvG/bU3/l2g1YAjLINeXoxbcrV2W 3/G5Y/6fhAJDfUsgxXfVWzP78YqVFCMwh/+/7IKP9AVvA+7qfhsyQkLxneMASA+CreO0 VfuwrktKMl3lyTDdGpOXeKO210ebUx1dz4XnfSvMg+6i1Ne/G/0TP1oqkrL/txGwmyNx VJtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544045; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0V6KepxSWjC0U8YUMhyf5sEsW80RflfKR4tDJnFyjUM=; b=LvB2bXoo8ToYRmsC1oDbmiFkHl6C2+ERbyrMZZm4olBm3/qMva9h430DM90WK4tNn1 FbjN+y1QE35wByYVHZz69yXA3TBK6EjuM9bWOVzGn7/VPY2BEphyZGaDfvItzdQNgPel z/ofw7iKq5s0d3rO9pIED28WAh2tX3WfjRMO8SKVbz1uesrBDQcBVjluFOGoH6dtFSXc xJcI9I/qu0PWpgiKp15TsSGWQsxTh53HCckEYo2phy+QIqqFQm7abi6QZf9WOVOlpMrP +fd5VYwp98ajki8RmH2Ptsvp6Fe16DSfIPSAf0W53M/XuND4yVKpYWoYwfeNIlJmQxhj 9Q/Q== X-Gm-Message-State: AO0yUKWs63T5k5lx33hMBDFbfySOVA3cHUS+CP22Y/SW4pA7D9N4Y1tW eyTJnAJ3FvP63vgbWd8HbyTDJrjFfNf+u8yR X-Google-Smtp-Source: AK7set+NjW/4AXEB5d0IG7XiVNOSp7doL59qiTn9uvLqiFnoG/BqWGCoWZHV7SR3pQA/2uxuP1t6d8Miz+ziUaX1 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:851:b0:628:30d:2d2f with SMTP id q17-20020a056a00085100b00628030d2d2fmr2879444pfk.5.1679544045305; Wed, 22 Mar 2023 21:00:45 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:32 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-3-yosryahmed@google.com> Subject: [RFC PATCH 2/7] memcg: do not disable interrupts when holding stats_flush_lock From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC The rstat flushing code was modified so that we do not disable interrupts when we hold the global rstat lock. Do the same for stats_flush_lock on the memcg side to avoid unnecessarily disabling interrupts throughout flushing. Since the code exclusively uses trylock to acquire this lock, it should be fine to hold from interrupt contexts or normal contexts without disabling interrupts as a deadlock cannot occur. For interrupt contexts we will return immediately without flushing anyway. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5abffe6f8389..e0e92b38fa51 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -636,15 +636,17 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * This lock can be acquired from interrupt context, but we only acquire + * using trylock so it should be fine as we cannot cause a deadlock. + */ + if (!spin_trylock(&stats_flush_lock)) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + spin_unlock(&stats_flush_lock); } void mem_cgroup_flush_stats(void) From patchwork Thu Mar 23 04:00:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4DFEC77B62 for ; Thu, 23 Mar 2023 04:01:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230263AbjCWEBC (ORCPT ); Thu, 23 Mar 2023 00:01:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229728AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDB772004C for ; Wed, 22 Mar 2023 21:00:47 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-54463468d06so207650777b3.7 for ; Wed, 22 Mar 2023 21:00:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544047; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=736+92Vy7w+whzoBxwgGPc/WG8vZYtmz8qFOhZmUk9w=; b=GwAMe3Sdh/YfT7TszWBGAWb/Omm+6/db9kDSH3rx0U2jTcaU6cdo97INaHv86kxNQA jhXBaZOARcF2wynwqEfwzWZoINYtyHtPiv9iRsKtX2yswvpMBD6qg4pgoYL8XYK3JdrY NUg095+EPmMHIupEKGzGCL9BJH8z0FwgmxWYVXMQd4hSYkBDyIXOQyRVbTCpsq34KRPV CyG3E5BvF3wTZXEl8tQiaUrIBU2ScbP5rypiT3jSEorEK7YISHgjHIMI4g2IycEeQTfl s5/P1y3KHUJjZSE6PJ3e2OlZHSMu+2KD/QDk8kFiQ8Hi9jpOcpijxBB3B0Ib/9iMoeNk 6zZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544047; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=736+92Vy7w+whzoBxwgGPc/WG8vZYtmz8qFOhZmUk9w=; b=g9ELb/Oktmj+t4U4P8dlBlDxYPjmh/sOUdSPcmoKnsvEzQERCytMZdC1loq5YEppkc sUjlRPqyK6+SNXlexpUrU0WXGI8ObZGJtPgIUBDq0UivYTQoJPT7qJyIgmEFxTZ66QHd WHwl/iVcI7LWbMTRkF+6Kn9jBdLlP4zog9AjLwHwkJz4A83klK/zdYz+kv65RjrHtzDm y1IXA1fPWSTvwEqRUskj3RNunr/m0yD0yl16qRAVaNxtEwnrj4zd/xf0xLmKLQZoy2Rz XvnjUR77wv++8D5F6xncmu2p96akSGaM2pRnZCPL0WVoHaIJzhEp9mdK5yj+v3FXCl77 gPsg== X-Gm-Message-State: AAQBX9d1CeDhI4Z7Jth5mYrASSkLt/jcNPNl7/lnYYIVCCVHubGppKPq NEict3wlEyl0t/8apCl1Gb8wh3cvK7+rXRFS X-Google-Smtp-Source: AKy350YCGOdNWpxPNSJcwpjqKe6Apct4Cbzrlv7LWeidWe3sKdVG3fkxwH2jR1od8/69JvygqRYEaeYcvlRO26C1 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:ad1b:0:b0:541:69bc:8626 with SMTP id l27-20020a81ad1b000000b0054169bc8626mr1137749ywh.10.1679544047097; Wed, 22 Mar 2023 21:00:47 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:33 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-4-yosryahmed@google.com> Subject: [RFC PATCH 3/7] cgroup: rstat: remove cgroup_rstat_flush_irqsafe() From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC The naming of cgroup_rstat_flush_irqsafe() can be confusing. It can read like "irqsave", which means that it disables interrupts throughout, but it actually doesn't. It is just "safe" to call from contexts with interrupts disabled. Furthermore, this is only used today by mem_cgroup_flush_stats(), which will be changed by a later patch to optionally allow sleeping. Simplify the code and make it easier to reason about by instead passing in an explicit @may_sleep argument to cgroup_rstat_flush(), which gets passed directly to cgroup_rstat_flush_locked(). Signed-off-by: Yosry Ahmed --- block/blk-cgroup.c | 2 +- include/linux/cgroup.h | 3 +-- kernel/cgroup/cgroup.c | 4 ++-- kernel/cgroup/rstat.c | 24 +++++------------------- mm/memcontrol.c | 6 +++--- 5 files changed, 12 insertions(+), 27 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bd50b55bdb61..3fe313ce5e6b 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1043,7 +1043,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v) if (!seq_css(sf)->parent) blkcg_fill_root_iostats(); else - cgroup_rstat_flush(blkcg->css.cgroup); + cgroup_rstat_flush(blkcg->css.cgroup, true); rcu_read_lock(); hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 3410aecffdb4..743c345b6a3f 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -691,8 +691,7 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) * cgroup scalable recursive statistics. */ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); -void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); +void cgroup_rstat_flush(struct cgroup *cgrp, bool may_sleep); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(void); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..58df0fc379f6 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5393,7 +5393,7 @@ static void css_release_work_fn(struct work_struct *work) if (ss) { /* css release path */ if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp, true); list_del_rcu(&css->rstat_css_node); } @@ -5406,7 +5406,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp, true); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index af11e28bd055..fe8690bb1e1c 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -243,30 +243,16 @@ static bool should_skip_flush(void) * This function is safe to call from contexts that disable interrupts, but * @may_sleep must be set to false, otherwise the function may block. */ -__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) +__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp, bool may_sleep) { if (should_skip_flush()) return; - might_sleep(); - spin_lock(&cgroup_rstat_lock); - cgroup_rstat_flush_locked(cgrp, true); - spin_unlock(&cgroup_rstat_lock); -} - -/** - * cgroup_rstat_flush_irqsafe - irqsafe version of cgroup_rstat_flush() - * @cgrp: target cgroup - * - * This function can be called from any context. - */ -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) -{ - if (should_skip_flush()) - return; + if (may_sleep) + might_sleep(); spin_lock(&cgroup_rstat_lock); - cgroup_rstat_flush_locked(cgrp, false); + cgroup_rstat_flush_locked(cgrp, may_sleep); spin_unlock(&cgroup_rstat_lock); } @@ -325,7 +311,7 @@ void cgroup_rstat_exit(struct cgroup *cgrp) { int cpu; - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp, true); /* sanity check */ for_each_possible_cpu(cpu) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e0e92b38fa51..72cd44f88d97 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -644,7 +644,7 @@ static void __mem_cgroup_flush_stats(void) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; - cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false); atomic_set(&stats_flush_threshold, 0); spin_unlock(&stats_flush_lock); } @@ -7745,7 +7745,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) break; } - cgroup_rstat_flush(memcg->css.cgroup); + cgroup_rstat_flush(memcg->css.cgroup, true); pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE; if (pages < max) continue; @@ -7810,7 +7810,7 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) static u64 zswap_current_read(struct cgroup_subsys_state *css, struct cftype *cft) { - cgroup_rstat_flush(css->cgroup); + cgroup_rstat_flush(css->cgroup, true); return memcg_page_state(mem_cgroup_from_css(css), MEMCG_ZSWAP_B); } From patchwork Thu Mar 23 04:00:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78A18C6FD1C for ; Thu, 23 Mar 2023 04:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229714AbjCWEA7 (ORCPT ); Thu, 23 Mar 2023 00:00:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229847AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B144820076 for ; Wed, 22 Mar 2023 21:00:49 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-54476ef9caeso207621307b3.6 for ; Wed, 22 Mar 2023 21:00:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=cgw7YjSYw+yZCrYg4MWNWyL1ocWVlMfhI6U/IH5SNNI/KXZEirBQTnVHw1Jk19ca7x YZMRDXY2GIXRShVJyUBfGcwrm5OQ54Bn3XOEBif3BbHD9ArewU0b2vLyFm38bp8+NRx8 4lwk3DHRAIpY5JhlVhZi+qJFnS5DTPOFtneU65+9wwMhEPN3ScjeohcSLADEGreopxgz hPSmYPd9gVS979yLkwNh9j+9QGCIU5M1yATespEadqsIAym+pG18x2O6VFUlYJCK8ejc v8yJF8c/6m0MiyUkNdlDzwM4KjC9ikuULQ0wGXuneSrTmETcMqRGnUz3Mhll67JgWKef njxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=YJgb2oqH/5hSnT2A2YaFxFhVkcpqwDjCnae5ToAnPyOSU6GUKt3UdDO43ZYJ0/vDOB pfxVRH9oyAW/y1k1gK5J2lSLrWbSbVhXg36h3z4N/zABiQFoNOHldgGwK2VO6CJQ+pLk P+OKqRiIARDfK4AQJX7BBw/ledC3f1GjmLcVT43mcjx+HoXiD6OA7Zybb7k4pR5DGr8+ 1lgV6NQQ8feep3doPxL5iUcDr0Duua2UfFx4fIiP8HExXerluWCWmRxNJisZPhDMPTTi Kr2r2XgzrVpdbpu81P8UMuqGpvJII+F7nPOjwoP/jnwVz+qqEayIxY4HeRf+ws1Zhrec 1jQg== X-Gm-Message-State: AAQBX9c9i1B5UU2X6ANognv+aafARMl/XUn6hnzkcfPLMrWXZHFTX+G8 /vDxK3RzUC/M49sDu0xY7Eu6uDOIO1K5W64u X-Google-Smtp-Source: AKy350bTQGWCAJk1730fCXLD+nAem5ueAaNE2q7H+XY1RCF27byWvmJXxhgyQR2PhcI0/Laslob9u/xZD8mfS05D X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:b149:0:b0:544:bb1e:f9cf with SMTP id p70-20020a81b149000000b00544bb1ef9cfmr1185027ywh.4.1679544048914; Wed, 22 Mar 2023 21:00:48 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:34 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-5-yosryahmed@google.com> Subject: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Enable choosing whether sleeping is allowed or not when flushing memcg stats, and allow sleeping in safe contexts to avoid unnecessarily performing a lot of work without sleeping. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 8 ++++---- mm/memcontrol.c | 35 ++++++++++++++++++++++------------- mm/vmscan.c | 2 +- mm/workingset.c | 3 ++- 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b6eda2ab205d..0c7b286f2caf 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1036,8 +1036,8 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return x; } -void mem_cgroup_flush_stats(void); -void mem_cgroup_flush_stats_delayed(void); +void mem_cgroup_flush_stats(bool may_sleep); +void mem_cgroup_flush_stats_delayed(bool may_sleep); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1531,11 +1531,11 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return node_page_state(lruvec_pgdat(lruvec), idx); } -static inline void mem_cgroup_flush_stats(void) +static inline void mem_cgroup_flush_stats(bool may_sleep) { } -static inline void mem_cgroup_flush_stats_delayed(void) +static inline void mem_cgroup_flush_stats_delayed(bool may_sleep) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 72cd44f88d97..39a9c7a978ae 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,7 +634,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void __mem_cgroup_flush_stats(void) +static void __mem_cgroup_flush_stats(bool may_sleep) { /* * This lock can be acquired from interrupt context, but we only acquire @@ -644,26 +644,26 @@ static void __mem_cgroup_flush_stats(void) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; - cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false); + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, may_sleep); atomic_set(&stats_flush_threshold, 0); spin_unlock(&stats_flush_lock); } -void mem_cgroup_flush_stats(void) +void mem_cgroup_flush_stats(bool may_sleep) { if (atomic_read(&stats_flush_threshold) > num_online_cpus()) - __mem_cgroup_flush_stats(); + __mem_cgroup_flush_stats(may_sleep); } -void mem_cgroup_flush_stats_delayed(void) +void mem_cgroup_flush_stats_delayed(bool may_sleep) { if (time_after64(jiffies_64, flush_next_time)) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(may_sleep); } static void flush_memcg_stats_dwork(struct work_struct *w) { - __mem_cgroup_flush_stats(); + __mem_cgroup_flush_stats(true); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } @@ -1570,7 +1570,7 @@ static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) * * Current memory state: */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; @@ -3671,7 +3671,11 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) unsigned long val; if (mem_cgroup_is_root(memcg)) { - mem_cgroup_flush_stats(); + /* + * mem_cgroup_threshold() calls here from irqsafe context. + * Don't sleep. + */ + mem_cgroup_flush_stats(false); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) @@ -4014,7 +4018,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) int nid; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { seq_printf(m, "%s=%lu", stat->name, @@ -4090,7 +4094,7 @@ static int memcg_stat_show(struct seq_file *m, void *v) BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { unsigned long nr; @@ -4594,7 +4598,12 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + /* + * wb_writeback() takes a spinlock and calls + * wb_over_bg_thresh()->mem_cgroup_wb_stats(). + * Do not sleep. + */ + mem_cgroup_flush_stats(false); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); @@ -6596,7 +6605,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v) int i; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { int nid; diff --git a/mm/vmscan.c b/mm/vmscan.c index 9c1c5e8b24b8..59d1830d08ac 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(false); /* * Determine the scan balance between anon and file LRUs. diff --git a/mm/workingset.c b/mm/workingset.c index 00c6f4d9d9be..042eabbb43f6 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -462,7 +462,8 @@ void workingset_refault(struct folio *folio, void *shadow) mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats_delayed(); + /* Do not sleep with RCU lock held */ + mem_cgroup_flush_stats_delayed(false); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 23 04:00:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B493C76196 for ; Thu, 23 Mar 2023 04:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230236AbjCWEBA (ORCPT ); Thu, 23 Mar 2023 00:01:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230026AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3731320568 for ; Wed, 22 Mar 2023 21:00:51 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id q9-20020a170902dac900b001a18ceff5ebso11679292plx.4 for ; Wed, 22 Mar 2023 21:00:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544050; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9m7S+Ebo8MkLWAncbvKsOW8wVqp1O+0Jr3lM7bEbkLI=; b=VdCCijFiKDJ4yDz8tFMap2tBXNfAakMH2cmowf1HUrZHuzx98c171wgP+Q37bsMmOY dvm3L1704oXKLAgAkOK0GjUnYQoP54+fQujrx3ACvUW9YBMoC0EV4BPolgDqFHWKSqHY vC+M0AbP8Thw306r5f0tU0s0RnClNRrpM7iLZQZCh4urjjm1nPlLVN2cvKbhB0gks3OE JIza4tgsYELq7gTHgeHRM8vhlVtpzgxg6m7gbPXptQ7q4d0Q9K++kyZO3UiZG82ponKY DM+ht1D3A+Eg8Co2dt+ym/gky/eIx2ki3TeVSJ2PxhVMHdlX5/2KaImaACWOeUxKy2jU FXPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544050; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9m7S+Ebo8MkLWAncbvKsOW8wVqp1O+0Jr3lM7bEbkLI=; b=ach/t9GHZ/QfrhUbscUZOt+zSogFUbD6I8XEevq6HDYhhWBKeHvgSzqyspU1J+TXKM fotHEreiCP4wWblHnfAlWL2MV+1xqaQMp67bNnoN6xJQwMFDgnpRSQbRYlWeVNb2iXBo Xj4LHb8/GV3nfjs/kleAFPRDD8JpVGR8kmad/OmD1fhhe3f7YuSVqw3aa0t9Y/fVWtYk wWTHuNTOfqSTMcR6pHtnDhezeATSgJc3ThM5nEHv42+DK+efe1fJMAxnKcHB9p/QOOKZ sSOfnQrwnQ9YG0sOd8zfyQTb7u6QgtShPFVPBdmlRyJy2od0Pn+C/cU5BsQqdhXA4kfK 94wA== X-Gm-Message-State: AO0yUKWMmC5hWwpkV8bv4AMZwxVZRHxOxUeHWJfNl4Gu3m7x8G7BHxoW lV1nqjvD1nJvv8PERc3co8qtNW4eYPSIJ62K X-Google-Smtp-Source: AK7set+m4AnWZnXPXOtgLtJe2rjiQdXqc3xUA0g5EOBUkvh8xUumZjwEW1aaQclKH89BoDnkrewjWDyAajrYgNTk X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:80e6:b0:623:8a88:1bba with SMTP id ei38-20020a056a0080e600b006238a881bbamr2937696pfb.2.1679544050539; Wed, 22 Mar 2023 21:00:50 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:35 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-6-yosryahmed@google.com> Subject: [RFC PATCH 5/7] vmscan: memcg: sleep when flushing stats during reclaim From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC Memory reclaim should be a sleepable context. Allow sleeping when flushing memcg stats to avoid unnecessarily performing a lot of work without sleeping. This can slow down reclaim code if flushing stats is taking too long, but there is already multiple cond_resched()'s in reclaim code. Signed-off-by: Yosry Ahmed --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 59d1830d08ac..bae35cfb33c8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(false); + mem_cgroup_flush_stats(true); /* * Determine the scan balance between anon and file LRUs. From patchwork Thu Mar 23 04:00:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184900 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70B90C77B61 for ; Thu, 23 Mar 2023 04:01:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230026AbjCWEBB (ORCPT ); Thu, 23 Mar 2023 00:01:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230196AbjCWEAy (ORCPT ); Thu, 23 Mar 2023 00:00:54 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13FA02CFD7 for ; Wed, 22 Mar 2023 21:00:52 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id l12-20020a170903120c00b001a1241a9bb1so11811792plh.11 for ; Wed, 22 Mar 2023 21:00:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544052; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IoeZYZryJtZ5YKLINZigwUk+eCMrWPKZYyl5pNjfs0s=; b=CYUJafqLeSTzFoXixoXxJfg+d8a1R4JvCUerDh64ifDjpdaxbdbWqowybN/gvJ19LK Og473fSzkkpOeMA58DC64BMRm6M0MCGDM/VGSfHLkAlVcw8rDdEYpMTrVbzG7kCaO9yo XEy99SSbQbOpczsL5Zy04pgal2f5dv+k8luu/GDpUWQB1RZl9qYQhWpR8p3p3vD98PGc ce78QmYoavhlvFbHk6YMCQ65CM/lnXEYztj/xUq2pBOa3UnU3uBgApeimhahme8eLkyC CQZkL9MM7bPQMzx+6ozclKMKwr8T8qPF+ZR71Vewcxu4v3esM5hXIWD8hj0gOLxV+qOY L9ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544052; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IoeZYZryJtZ5YKLINZigwUk+eCMrWPKZYyl5pNjfs0s=; b=V1PNv/9N9Ypcidz3l2+yW5pk221QfpzhePKxf6MmZcfrRogALMjZDQL4UemSLl5/5g U/tKafA+lt6O5chE8Yzc6eGHYLfKVH652dE91aIB5YPRjw0pUBoJYls1O9ml8qKiCq4o Yq4BTwYka1bWsQGO+ft9BkqrL9es84aKkzX0Y9MLJLqeyjzwjFI411AjoR3Xb0ikwE8s mUqZ5C2xQwNUpA2QBTMDCmdysDv+IdSuOeLGF2WjvYenXsKSoQuznW/n9OdzcoyksOUN mXDv1pgGXmyPYfzjWVTHrHR/8ByDdpIouf9MHlrpXPyEXe/+700YFLXHmCE+T2Q+4Aq4 +PzA== X-Gm-Message-State: AO0yUKVkOlEa174+R1dfKsjNGHmy3XMr9o4HNmJE/I27wCOZNGEzf3Y3 AiazyGk8ulqOHrzvdPIec1aCZppACdA/xD8e X-Google-Smtp-Source: AK7set9aF1O2O+BZVk6rqDRrbbtYYQ/yqA23fBe+9ZzPXGEhcCIhooyW8p0Z2epSuqN1jjR7FDL5kfmsz1ZKE44X X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:90a:ce11:b0:23f:d473:dd44 with SMTP id f17-20020a17090ace1100b0023fd473dd44mr1881490pju.3.1679544052257; Wed, 22 Mar 2023 21:00:52 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:36 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-7-yosryahmed@google.com> Subject: [RFC PATCH 6/7] workingset: memcg: sleep when flushing stats in workingset_refault() From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC In workingset_refault(), we call mem_cgroup_flush_stats_delayed() to flush stats within an RCU read section and with sleeping disallowed. Move the call to mem_cgroup_flush_stats_delayed() above the RCU read section and allow sleeping to avoid unnecessarily performing a lot of work without sleeping. Signed-off-by: Yosry Ahmed --- A lot of code paths call into workingset_refault(), so I am not generally sure at all whether it's okay to sleep in all contexts or not. Feedback here would be very helpful. --- mm/workingset.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 042eabbb43f6..410bc6684ea7 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); eviction <<= bucket_order; + /* Flush stats (and potentially sleep) before holding RCU read lock */ + mem_cgroup_flush_stats_delayed(true); rcu_read_lock(); /* * Look up the memcg associated with the stored ID. It might @@ -461,9 +463,6 @@ void workingset_refault(struct folio *folio, void *shadow) lruvec = mem_cgroup_lruvec(memcg, pgdat); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - - /* Do not sleep with RCU lock held */ - mem_cgroup_flush_stats_delayed(false); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 23 04:00:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184902 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18F7EC6FD1C for ; Thu, 23 Mar 2023 04:01:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229563AbjCWEBH (ORCPT ); Thu, 23 Mar 2023 00:01:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230213AbjCWEA5 (ORCPT ); Thu, 23 Mar 2023 00:00:57 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C16611F92C for ; Wed, 22 Mar 2023 21:00:54 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-544781e30easo205820777b3.1 for ; Wed, 22 Mar 2023 21:00:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544054; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GbeROioJTSRzpGZIx65CHc5YHmhlz1VRWOdmKxovUco=; b=Hsck6MLI0adIPASRaTY3d/eUkBNtZFq5fHTozHNB8yOtl2ZkHkJQQ4GNrxQh4iSoWz JkY9CbdFxu+ujh+Pyx1oKlS53Vj0EANz1HJ/mNRwoFFcZsbYiHuLCMd4Lx5axIZ0gB5l yaad/tHsVvn1tHU57blkHy/+dS8g9HHclDAV/QT3pdal8fCNCqCcJD9lMmCv5J074fQc TEWaFK69tR0mz3hWUAlm/mUgTwQep/HE638zK7huhLNEr0Yv6vPjLeQWwk4UQCFbwI31 KRefecpL1tiBaS/Y5Rnpmz8ntCNmT0BogLXtNGk9m7i1zmm8BH4noVNiIwwFJbgEqIWE ZH9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544054; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GbeROioJTSRzpGZIx65CHc5YHmhlz1VRWOdmKxovUco=; b=ofnDewpFcYKwJ1fc0jFfXSVbFDmgGTq+ZDCQHKjnfYI6IeVNGxuducy52o3TYYIf8j yQC1796KB9iNcKUrdcfi8q9kDpRKqyL9CeHoYTO/ydCq3F+qO87MwVt0PrknfalGJ9wA 1/DEtp31lXE8poTVfrd4158uDCznIKvtpOoqKtVJ8g1gNF+nnMC2v/eOvEpPeZ8jQzwn MVTcwuxvfzgWbxuiOZNfjt6JBOWQ3/LXbdXgZiOBcofdbV2fc56Hw52N45BgB3+sKR8D 1Z+n3dZ/WARjIhB1+dvi468Sj1D++Y+7Cj6Z/O+a0LcDktM9UUHPoNCmVf4iltBkjZ9J heqw== X-Gm-Message-State: AAQBX9eoXlF4RCWHl3KowtufK16yN88Y+oCgkB7x8CBeiPRZ0vGSLRcG CenyKSYw8PJy0nKQfL07zR2LCONlc+315SQ+ X-Google-Smtp-Source: AKy350aHC0fJxZVNDyOM4vrsmksnjt2c5UEEiO2IoOQY1+2DycjUjKQKdK6RunG6Yd33ZztPbzyijCwEZgPQmgPG X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:1549:b0:b73:4a25:fd36 with SMTP id r9-20020a056902154900b00b734a25fd36mr1286158ybu.2.1679544053971; Wed, 22 Mar 2023 21:00:53 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:37 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-8-yosryahmed@google.com> Subject: [RFC PATCH 7/7] memcg: do not modify rstat tree for zero updates From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC In some situations, we may end up calling memcg_rstat_updated() with a value of 0, which means the stat was not actually updated. An example is if we fail to reclaim any pages in shrink_folio_list(). Do not add the cgroup to the rstat updated tree in this case, to avoid unnecessarily flushing it. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 39a9c7a978ae..7afd29399409 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -618,6 +618,9 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; + if (!val) + return; + cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id()); x = __this_cpu_add_return(stats_updates, abs(val));