From patchwork Thu Mar 23 04:00:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184904 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C09EC6FD1D for ; Thu, 23 Mar 2023 04:00:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 29AC76B0078; Thu, 23 Mar 2023 00:00:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 222D36B007B; Thu, 23 Mar 2023 00:00:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C4D26B007D; Thu, 23 Mar 2023 00:00:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 004386B0078 for ; Thu, 23 Mar 2023 00:00:46 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C8B71AAF23 for ; Thu, 23 Mar 2023 04:00:46 +0000 (UTC) X-FDA: 80598811692.15.0F0028B Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf13.hostedemail.com (Postfix) with ESMTP id D2C6620005 for ; Thu, 23 Mar 2023 04:00:44 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=q0IiHI38; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of 3684bZAoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3684bZAoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679544044; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=QHZQwEPWOUe+QSWV0elJW1eWOKZQuj++jPL2hjHgC8U0T8VVR3yqnTcs4lsDuaxGRMGbwk qL4RJ6cA2paWQ+/T4VzaXhf0gD1w10uGwo45EcvrQR7ZiCTZAT8kXO6FZSUdteEQclxzdQ 7Bp/dskwoatN38qzAqPv1BzMzmeB2I8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=q0IiHI38; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of 3684bZAoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3684bZAoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679544044; a=rsa-sha256; cv=none; b=EZ+XWslbcGdPBWuQ1Fo0DfxPkAwv60xae2upNCkuQ/wzX9+T1jdN6ttSA+7upvBF+wiCcW NimeY4VkaEV0cdF9AeSfbVDUKQkEdjY+APHcw9mPVSMkAEGsQ+qEtKe9ee4VYwl0ti8G64 1lFg+VvcdsyRb4k9aS0bzKtjau3lBuM= Received: by mail-pl1-f202.google.com with SMTP id a9-20020a170902b58900b0019e2eafafddso11623758pls.7 for ; Wed, 22 Mar 2023 21:00:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=q0IiHI38SIToezuWJ5YCNMq/43EaxUo3i4AMEPHcC9bZCfWe3qfaYbxabY9EbcCoqm LUESQ/1v1D2aalE9SO+twkVNP5hnW+UNiSPbsUMxGnfaQG+8mDUTyeXBv18fXpGW3SWD GCVDR2eWTWBw72p7FFHYiZQYGqUWBUdEyjEquw53iNIuEWGhIQhSBNfbqBw3sfwQmtJy zq0COAvO1AYz/HqEKQJev5lhSkkjfsk/GYKhQDhBPMJgp6hmFHeDaNzuC/4om1icb3Uk IS34ZPkCnh/Dxdz7tz/xvLp+bFG8mdt7D5IhxCaGQj3LQws3RSkaD2yIkaXnXsqGxGWA GdAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=BwjjgHeeq+NrEGNpwW/GFY+9R6k+Yq+xXogBirjMLI8TGdkjPk55720j0u2UqyJUXw r8h8c5UNe6cwMNZ1n8gZePI+uT7Yp7pJioYvlw8IUcGLKDcZzgAPumD2dJqJ4KfjO1+s x7EhJUtEZyEz2ygqz3lXVAYt7W2fUyVzXJncG5vtuiVbsNTzJuf1fcyXeckfrZu3bKnD rQ3sWTdZb3k2R9FzYFYk7UNX33mygOHs6g0yozj0PO24Uq87YhaBD5CaagyrqRYj9VpW 3uEoXa8+dmdb3CD0s1k3kOsKauhHRsAeuobJWR+ZZgDNRe9oUNnMwFeYfXzA/jnZtKyk 3ktQ== X-Gm-Message-State: AO0yUKW+5rNLetdoYOYRZyBOCoIkiP7CdItew3BJA9HF7ntHLVMG2xFx +KRH4RUm+5ESepeymAU5w7yehzgs2aQrIMyF X-Google-Smtp-Source: AK7set97jrlAfVvPpHOrM2Vr0RCmWVnsZU6sMMMwHyygC2dGwvgADEuqjxobUXV27YUvQlS1Att+r/ZLiyOif2dS X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:902:e745:b0:1a0:4aa3:3a9a with SMTP id p5-20020a170902e74500b001a04aa33a9amr1983132plf.2.1679544043324; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:31 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-2-yosryahmed@google.com> Subject: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: D2C6620005 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 9uy9bo4ycfbfjcgenagq5c7nt58ap7as X-HE-Tag: 1679544044-637766 X-HE-Meta: U2FsdGVkX19X/yKQFvpNJ0OjuoPNf78hMg4Gt//ru24nP9zJAeCdVo1hKvROWAb05h7jPjhOA8HDGvZ19yfGWZx+mwyqrXtk3EsKO04MM2gs8wgh25z3omcuPDWtCChuYcMD061n2cIr8AyK+DUZ6FiWerBd98+EwiBZlMLR+5hBO89z2IkuWnG0VXRZHq3xv0BH/RwPkuOyHemrNsb+AzB4FlXHUoNYV0QLgiy25DVSgVq6A27jQFNR3+8LgMzzHY5SxrN4sz5jTtSKgOZQZ1iClY6yJdYtmutbRautyWbmzzP/giuL3fg82tP2R1IL5ItoJadAPWCAa/lC947NVtCgI1X3UMB/tjvaSnIcgQVL2kRv7rDbiqenPYc+dt+a4vCxgKi+4kXh8dkTj4OXRFy8WalOsMnET47QjQVnzO+45Au1ikc5iuxRHQ0n73Wb2opxOnVqxvERu6/fqrYSm4Z20r2fRkkawpH/lsP/z+XOdTqzRnCnnPjrSdPZTCdfRP9K/k8qfoUPP+JgKkNF0IVOLDpXCSDbuSFYPYSXqPfa8dKJWacZfsrJXJJmEcHMBDPNd8VIHHvyn6BN/qZxp0cw2mPx8/a+mNz0AYzyZxDTaqy1tRF/2wDD+u2T4cxrtPN3ZI56rN/ZjxCJfzjnc4myHkpShAtZf0VucRYqeCPhY+9TL0fg6Qwv05Vj6DrpqaxZaKMbVfDl3+FKnvVyqlLNujJLGa4+DAqeJSFNp2PwzBzXnEjJj3xtf/sMXac1wuPRfIegiGAasIzoyldKT9nnJ3liWIz8cQNCqFq+VsBei9Encsge9KeReLu+oAcigy7QK7BY3mjebuyRYKcnfdFZhfpQ0NDGgkgGbg60nFQTsNl+HT4TfLfGSk7kvgfqfO1W6QoHXeScF/Q7I6qf6KMBfCmmRXR1u+ERj7X+yvwsZ6mmrfuQGFK2wA0c1vvnC2su2PCur21Z2IFGpez QT0wZKvG ZDEu4KjHJXE5WEpO2riCHfWBs2HzoKdynjSx04YhTQfeR4tGWYmGZ/euQcpIYxn8aSk/S/pjg86JHIE7ztVnKyisjYasXfnIxKBOgwUbu78ENHchRdHmUQFbJ2d6REnNhkI7Rx3XteyfAUZ/I57KhOEoSfUxf3paF/zoRQ+mapjhSj4Wa0qq/MhMPzp3FNuGoa3yunI9xxKqaAQnqBBpcAzEercDR7kTkxKAufSGMOtjbAxkcHTqJNZy3EOE2GOmseQYHjTkz+JoBsTMU0hrrgo6VByXkPZXMJSavxiHbbjH+/yR+/0dDt92coUEyHkNRyCllH8Rd95C0GjSu8w/+rr9RfWFRtZftgBKHlStYT53WM/FvxEokwxSMU8ifakF3UQFiY1/tJCHQ2vKhSgxQNaXFWV6gU3ssX80YqQOc7R58Rw+W2shOVXRlyKYtYRUNugex8x7M6X0ruxxo1+paaDKB2iimMRq2l/5eXuelCr86rObTko0tJZ79vhadGzmU1o22NKEzrbJ9iZVl4vdN3ScV1lfJejFucanuIhrhFE0pvKXbuxdfFztQJqE4ce87Ef2fr+p+n1G0Fsm9pPq9hWX7DlkS9VHV21s+juhfURtJwWUtAXk/fS1BSBXfzdxd81mXghuoNAIfIbD2yXmD9rLyTw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, when sleeping is not allowed during rstat flushing, we hold the global rstat lock with interrupts disabled throughout the entire flush operation. Flushing in an O(# cgroups * # cpus) operation, and having interrupts disabled throughout is dangerous. For some contexts, we may not want to sleep, but can be interrupted (e.g. while holding a spinlock or RCU read lock). As such, do not disable interrupts throughout rstat flushing, only when holding the percpu lock. This breaks down the O(# cgroups * # cpus) duration with interrupts disabled to a series of O(# cgroups) durations. Furthermore, if a cpu spinning waiting for the global rstat lock, it doesn't need to spin with interrupts disabled anymore. If the caller of rstat flushing needs interrupts to be disabled, it's up to them to decide that, and it should be fine to hold the global rstat lock with interrupts disabled. There is currently a single context that may invoke rstat flushing with interrupts disabled, the mem_cgroup_flush_stats() call in mem_cgroup_usage(), if called from mem_cgroup_threshold(). To make it safe to hold the global rstat lock with interrupts enabled, make sure we only flush from in_task() contexts. The side effect of that we read stale stats in interrupt context, but this should be okay, as flushing in interrupt context is dangerous anyway as it is an expensive operation, so reading stale stats is safer. Signed-off-by: Yosry Ahmed --- kernel/cgroup/rstat.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 831f1f472bb8..af11e28bd055 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -7,6 +7,7 @@ #include #include +/* This lock should only be held from task context */ static DEFINE_SPINLOCK(cgroup_rstat_lock); static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); @@ -210,14 +211,24 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) /* if @may_sleep, play nice and yield if necessary */ if (may_sleep && (need_resched() || spin_needbreak(&cgroup_rstat_lock))) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); if (!cond_resched()) cpu_relax(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); } } } +static bool should_skip_flush(void) +{ + /* + * We acquire cgroup_rstat_lock without disabling interrupts, so we + * should not try to acquire from interrupt contexts to avoid deadlocks. + * It can be expensive to flush stats in interrupt context anyway. + */ + return !in_task(); +} + /** * cgroup_rstat_flush - flush stats in @cgrp's subtree * @cgrp: target cgroup @@ -229,15 +240,18 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) * This also gets all cgroups in the subtree including @cgrp off the * ->updated_children lists. * - * This function may block. + * This function is safe to call from contexts that disable interrupts, but + * @may_sleep must be set to false, otherwise the function may block. */ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) { - might_sleep(); + if (should_skip_flush()) + return; - spin_lock_irq(&cgroup_rstat_lock); + might_sleep(); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, true); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } /** @@ -248,11 +262,12 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) */ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) { - unsigned long flags; + if (should_skip_flush()) + return; - spin_lock_irqsave(&cgroup_rstat_lock, flags); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, false); - spin_unlock_irqrestore(&cgroup_rstat_lock, flags); + spin_unlock(&cgroup_rstat_lock); } /** @@ -267,8 +282,11 @@ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) void cgroup_rstat_flush_hold(struct cgroup *cgrp) __acquires(&cgroup_rstat_lock) { + if (should_skip_flush()) + return; + might_sleep(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, true); } @@ -278,7 +296,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp) void cgroup_rstat_flush_release(void) __releases(&cgroup_rstat_lock) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } int cgroup_rstat_init(struct cgroup *cgrp)