From patchwork Wed Jul 31 19:47:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 13749140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A53F4C52D54 for ; Wed, 31 Jul 2024 19:47:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26B6A6B0085; Wed, 31 Jul 2024 15:47:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21BA36B0088; Wed, 31 Jul 2024 15:47:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E3996B0089; Wed, 31 Jul 2024 15:47:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E3CF76B0085 for ; Wed, 31 Jul 2024 15:47:32 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6385FA5E75 for ; Wed, 31 Jul 2024 19:47:32 +0000 (UTC) X-FDA: 82401082344.23.92D7F2A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 8D3144000F for ; Wed, 31 Jul 2024 19:47:30 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=u0L820fc; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722455246; a=rsa-sha256; cv=none; b=gEYQHoNQDs0oaoRM1lp61IUlS8XlvTQUJFXrlsPw3hZr1RgAt11n2cUj5LCcU3inLLbLI+ VDGlirpQ7eWb6+KPVXSev5jI5ynntH7j6a39BetYBmhZaKuUoGF4XwaKsVynLkePXNe7Z9 qiMeNNjuGKRS5j1q8fugL+6xIN1D3Gg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=u0L820fc; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722455246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QEBgh6a+BlMKGpUab0Qd+RjfNoGtp/N4LwMuZJAvxD8=; b=aA0pj52/B+mQc5j9m9TOCeg/ojcdXzrfk9BUbx1NWr+307gpWW+HGhBd3Q1BvHA3FwNG81 AoIRYHQAgPjE6RGqsiltRY98MoJcxEf1ZhzkYkWWHLQHRVsTkpo2ypSU/njgi/ufAC8GmO v08eIjTgZhs9a7coxtv7vkzKCGWU72A= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7339A62607; Wed, 31 Jul 2024 19:47:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1F23C116B1; Wed, 31 Jul 2024 19:47:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722455249; bh=Q2e7po8J+L6F35gasVB8tGKFXWuPCBzWtFxt9x2NrGQ=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=u0L820fc2XQ8WzMEYAVX39p5MrG9oz4sqnYgcqtpSKKKwTVDAkf1Zzcl+eDq7UdJ1 WMW2zjUp0/RoJJYfCqHiyIazcRKWwvd+rDfCKgwRdJBM6MRWBk9Gk/PcYdmrHlzhck eWDkClLNs1bEYw7hEh1zwQGThMZg8UxS3yG8nVGvPg03NloQr+uZ9ns4UHiCmJFcgL fGkHJ3wCd9FvOYqCEo8vek0nchdAxcv05Zb5PyaHOB9C5LSGgpvBidbGVCSyOoCTKK JkBLFvGHe+Tkt/67tH8jiu8AGCeXq2SDv8zTwYana19BSEEecUNZs/nBSdhiUCdq0p NodnBpuSKnjMA== Subject: [PATCH V9 2/2] cgroup/rstat: add tracepoints for ongoing flusher waits From: Jesper Dangaard Brouer To: tj@kernel.org, cgroups@vger.kernel.org, yosryahmed@google.com, shakeel.butt@linux.dev Cc: Jesper Dangaard Brouer , hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 31 Jul 2024 21:47:24 +0200 Message-ID: <172245523597.3147408.4165443154847854225.stgit@firesoul> In-Reply-To: <172245504313.3147408.12138439169548255896.stgit@firesoul> References: <172245504313.3147408.12138439169548255896.stgit@firesoul> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 8D3144000F X-Rspamd-Server: rspam01 X-Stat-Signature: 4f49eyu8hb8cktcp9nko6a31yd99mn1g X-HE-Tag: 1722455250-315288 X-HE-Meta: U2FsdGVkX18sfDNFRU37XDzLzvmdbbepuE8iTLzVC8crm6iVsPY2M+bfCB8YbygdzYJubBqmAtCM/FvwK77IRX51znxLKdQInXZ3LF6/jyCDrPGdvkhw9MkIOW0aXJ5HdFvxQQYCPXlGgTy4XQ0pBjy8V7ZuOUcEU5Yw35wG8tioE5zqk85+Lyz6x6z0Va+VjeH7cDt8TtpsUrq1TyFuFm3/RnH2FTeuHvJebM+2RbfrMoanow3gmdT30SgZju5O6vrIbriLC6lAaPKKkhWzWnv+Cegku1lI5tL7oy3Mmu4OgbvY2edhn+Z50EXM3VxYohebcXttRiysIEXw2rcHwEZuHaa8LwVdStvNMojsvGFsK+JmrnLs4YBR3+UZ68rFLr8GHveeYD7rqWT5GCXwYWcaf+Mb5Kg6ca4uOorJDZ6jPXivzcHJcJhlNRbKmnR9locceBmzPHzy4elg8Eac7fQmCtF8WiBgDnqT87cWnR9JqBdLxlvMrtns0qhDtLVjaZ074lJPTcLpkhIXZ+O7Jj0iaq6vHXsMErZD5JvjoExJOrwo8J3/BgesbXNBsuL3LCGFufHq9/+oQuVGhkwAe0dUuX6C/46mpMH/NpUA0KS3K/UCDsak3b6qOZZ1qyrDEdFdzBqVtf9Te5aZIn3EXcFMBpQrUGWQDdYx6ZDFebgwGWTekTYqouxQlQS6gVRZf/cNhMl9cRVv5z05EnaAflyUY3zPhCsQp2vnQtJm4QiWRswIzELVgx8h5Z6SRhXejwzLMRZ74Y+FXDKvftWpT++NxvM/Ldgb2ZEaBK/5IpctsIoeA4Mys+RTf2miQJwtZiz8Gz+04dWUhh64Pc7C8jMwk57hK9mIRJkqtqmY1bEFR8pJki9esrEUe9OseQH4Bsv2O4z5YFHhHPgkPmmCY2AwuqXQXOoOFMywXlQYvvasj2/O65TCm6tKSVBy+NpTwFspRLAYzQdPVuM8E+S fDWtVJk8 +25RCWYgoy3IVGx1J9Vbpd/QcVLhXS2Ccy30qdzNYFyd6ftPoESmoCXxWhPezBKPwHAbxQ19lCBafg9J5pViqKSxHv0yYhcLlhgu5Q0OK4ZtiLptt7GmJXjGpeYoCes8Zim8nl60yN5dizfuayx94dkIWPxZAv4Oqq+4fL7IBGmgTyiPervfYUEz/6So1ctH2Ps7h0EmiMFcqiqvVNbOymuFeiZgHfwSx0qysy+u5x4rxJsMvwaFs3vgAoQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000033, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: These tracepoints were practical to measure ongoing flusher wait time behavior and see that race do occur in production. Signed-off-by: Jesper Dangaard Brouer --- include/trace/events/cgroup.h | 56 +++++++++++++++++++++++++++++++++++++++++ kernel/cgroup/rstat.c | 21 ++++++++++++--- 2 files changed, 73 insertions(+), 4 deletions(-) diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h index af2755bda6eb..81f57fa751c4 100644 --- a/include/trace/events/cgroup.h +++ b/include/trace/events/cgroup.h @@ -296,6 +296,62 @@ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock_fastpath, TP_ARGS(cgrp, cpu, contended) ); +DECLARE_EVENT_CLASS(cgroup_ongoing, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts), + + TP_STRUCT__entry( + __field( int, root ) + __field( int, level ) + __field( u64, id ) + __field( u64, id_ongoing ) + __field( ktime_t, ts ) + __field( long, res ) + __field( u64, race ) + ), + + TP_fast_assign( + __entry->root = cgrp->root->hierarchy_id; + __entry->id = cgroup_id(cgrp); + __entry->level = cgrp->level; + __entry->id_ongoing = cgroup_id(cgrp_ongoing); + __entry->res = res; + __entry->race = race; + __entry->ts = ts; + ), + + TP_printk("root=%d id=%llu level=%d ongoing_flusher=%llu res=%ld race=%llu ts=%lld", + __entry->root, __entry->id, __entry->level, + __entry->id_ongoing, __entry->res, __entry->race, __entry->ts) +); + +DEFINE_EVENT(cgroup_ongoing, cgroup_ongoing_flusher, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts) +); + +DEFINE_EVENT(cgroup_ongoing, cgroup_ongoing_flusher_wait, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts) +); + +DEFINE_EVENT(cgroup_ongoing, cgroup_ongoing_flusher_yield, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts) +); + #endif /* _TRACE_CGROUP_H */ /* This part must be outside protection */ diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 463f9807ec7e..c343506b2c7b 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -328,6 +328,7 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) { struct cgroup *ongoing; + unsigned int race = 0; bool locked; /* @@ -338,17 +339,25 @@ static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) retry: ongoing = READ_ONCE(cgrp_rstat_ongoing_flusher); if (ongoing && cgroup_is_descendant(cgrp, ongoing)) { - wait_for_completion_interruptible_timeout( + ktime_t ts = ktime_get_mono_fast_ns(); + long res = 0; + + trace_cgroup_ongoing_flusher(cgrp, ongoing, 0, race, ts); + + res = wait_for_completion_interruptible_timeout( &ongoing->flush_done, MAX_WAIT); - /* TODO: Add tracepoint here */ + trace_cgroup_ongoing_flusher_wait(cgrp, ongoing, res, race, ts); + return false; } locked = __cgroup_rstat_trylock(cgrp, -1); if (!locked) { /* Contended: Handle losing race for ongoing flusher */ - if (!ongoing && READ_ONCE(cgrp_rstat_ongoing_flusher)) + if (!ongoing && READ_ONCE(cgrp_rstat_ongoing_flusher)) { + race++; goto retry; + } __cgroup_rstat_lock(cgrp, -1, true); } @@ -357,7 +366,8 @@ static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) * Due to lock yielding, we might obtain lock while another * ongoing flusher (that isn't a parent) owns ongoing_flusher. */ - if (!READ_ONCE(cgrp_rstat_ongoing_flusher)) { + ongoing = READ_ONCE(cgrp_rstat_ongoing_flusher); + if (!ongoing) { /* * Limit to top-level as lock yielding allows others to obtain * lock without being ongoing_flusher. Leading to cgroup that @@ -368,6 +378,9 @@ static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) reinit_completion(&cgrp->flush_done); WRITE_ONCE(cgrp_rstat_ongoing_flusher, cgrp); } + } else { + /* Detect multiple flushers as ongoing yielded lock */ + trace_cgroup_ongoing_flusher_yield(cgrp, ongoing, 0, 0, 0); } return true; }