From patchwork Fri Jul 19 13:07:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 13737326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9594C3DA5D for ; Fri, 19 Jul 2024 13:07:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51AE66B008C; Fri, 19 Jul 2024 09:07:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C6886B0092; Fri, 19 Jul 2024 09:07:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38E2D6B0093; Fri, 19 Jul 2024 09:07:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 190026B008C for ; Fri, 19 Jul 2024 09:07:56 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8673E1C140C for ; Fri, 19 Jul 2024 13:07:55 +0000 (UTC) X-FDA: 82356529710.24.74A2EB9 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id CAB4F20029 for ; Fri, 19 Jul 2024 13:07:53 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SxPWbaF1; spf=pass (imf03.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721394432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DSnbZ3VpiEe0mvciQe4pVg3jZQWxApGTmQtkUiFIOKg=; b=ZTEiOryeJnR1F6m7OoUvOeRNcVzkaBz2PhPY8o6+txdXIodEi+I//k1fu0K57S//9ib/NU km+RrZsQIfcbPpCGefLSU4+8wcEid2qLjMe4XoogKoKMIF2IIgMwIa9ZxWYCa5rUtu6TPk 31zEqSDqqCQ0Xmy0DqTuQP6BdEDa1C4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721394432; a=rsa-sha256; cv=none; b=hgPDZr9IRJnv0IbwN+KDBIY2XXGdee32P6zcImuk34qiNBOjHmM2+1N0Nb66yDp3IXa0ID OpSvDId7+yxkpEqcUgXgE0n2SDNGKItIui9Jvj93dpASU3XP3zh/SqUSDk5npPXHZl32Zi l9Q/5MRQQMIjNgZYWFCHHz7XdeFuss4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SxPWbaF1; spf=pass (imf03.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id CD4C560AF9; Fri, 19 Jul 2024 13:07:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 192C4C32782; Fri, 19 Jul 2024 13:07:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721394472; bh=7zTaxivH6TZiJpCmsPOaYPHmaRAX9f1Y7IJuuZGbuPM=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=SxPWbaF1lpLtzt7stK2H9/YKhzURL4bsn/4B95hTLCDfIqZUOAzFF/I/f3pcg5dTl fqeRJQU74sYzfkaNxZl9T0eK0u4+841IfnsTiIrbvXFvuCPXDLv2WCwbaiZCY4gfNf NEne1nH3dn8AOGL8J82c9qjVv7rUygZp5ZzWw3Yw87giT0gJ7fog2sUQQgB5lF6qzY IKmiGhQiFzwLAq5XxcVxMFlNlc7y/n6w2/FDoYzk2FmBTY9wP+elnOYu/z7mhcGCc4 cr/emNtktmAENh5SKEA5xBkf/ASaq10pq8y8u6yp0vh4H3VRmnfrhVW/84QuloUBjZ rItxlvav9kqIA== Subject: [PATCH V8 2/2] cgroup/rstat: add tracepoints for ongoing flusher waits From: Jesper Dangaard Brouer To: tj@kernel.org, cgroups@vger.kernel.org, yosryahmed@google.com, shakeel.butt@linux.dev Cc: Jesper Dangaard Brouer , hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 19 Jul 2024 15:07:47 +0200 Message-ID: <172139440730.3084888.16497707303868810863.stgit@firesoul> In-Reply-To: <172139415725.3084888.13770938453137383953.stgit@firesoul> References: <172139415725.3084888.13770938453137383953.stgit@firesoul> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Rspamd-Queue-Id: CAB4F20029 X-Stat-Signature: jwmar74nxqk5gmnf9nah176geoyccw6w X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721394473-376926 X-HE-Meta: U2FsdGVkX1+UcYF9AwK7MPYi5U5qhqljcN55CXJP8eGZs0lX8sOd27YMdYWcLMnGI8rVWjsUSWWvvbhSnA54jOTjgaPRF416EN40eBU6RzvDwGumxx9whN4Xhy7MrxNEapH7KAsAnEoyX07P7+S8tlbbQLTZjVJQFGAA7NliKHhXWLlJevvGeqBKjPCo6Z+nUlj3cUqESQ8yJUnuTzxT7Zm9EUnnpil3Wf58ZJ53GVCJVJ6gClJQdpnE2iJhaCK77qaDCQ8Rhqfgk0pgf1vLpfVRhT1+kYLRHbpy3nw+JjbJah0nI3Ia0eXZaYXPRe7d1kpEoaOFylMINsCcys4z/zzZ5i7mFNspHAZjv70L9UMcjqs9ry90bsptCGMAw6VUKpCLLfdbeMKAubltukRWv8NUloKgo5G/gZCP6DECGJFuDHuoG0miFSn527hHeRpPwoNvR7rDGenoN9yiJ23E/lGBUxLG5S4WZNLcDfwqfhjAIkTZtDlFNNZYDgX4TTLr108Rh9grRGTHY8QxQy72iTZfQzvG3TOiaA1Km5gUD3UigXlOTCoYrhxPsAZVDyr7nidKWeTSWzXUX8c781JkwwKhpunRq+ajK9obSZo50Mams6ub9mDhDMvQvB/7pDwfSnP5sXa4DKaw1XUyh0Af2aLsT9nUmCTgetCYdlzyJ13nvuM58slZfQ08/nqrsT3RpbgYsKECfPSymulIvWhjU4XvVh1CbmqlMOG5fx9FVeU1IsLnAkV8MGVqX8BXGvKq+CMgWuZmSeQ5nRQn0/KrTEuBrZb1jrVMQNpTdNCtt9du1e25KJE1IMZ3yfwR2ZeCo2wfeyRiGxf4vMRkJU32gEolwGdkLoZ0x4G1HIJqRe18zYToo2h4XXz4ytI5nvFGGKKjYxe3eAjBz/yZ5rm0OMemq1gd/tolZnZ6ulpC49nWRFta0dXnV5vcA4plA4pJ2aiZQkSrBKdkxDs8Ih5 6mB8L/F0 FHf6VJRoZLpT/Bc6ObPsj33eBmXHlm8Q4iW+G1e6KnBHJVo7xP2ePxSD4GNWWWNrCTpeUWQy8pOy+Td2QnHeXAskSI/cjusXZ66PaLZcyVfPJ6ZibQnVmoZnHRGCGmVE2ICF3NTbUbr0PxkmPrhNWh3YnXWl7YwZhXZxOgsVCPAXjuTKkIOP9BET3zt3p93JP/Z9xgvfUJXM/20Vdn5O0cNWcRz9YgjOXvvx2OulwI4JyNypz6BXYoVb0OQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000053, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: These tracepoints were practical to measure ongoing flusher wait time behavior and see that race do occur in production. Signed-off-by: Jesper Dangaard Brouer --- V8: Add TP for detecting ongoing_flusher yielding lock include/trace/events/cgroup.h | 56 +++++++++++++++++++++++++++++++++++++++++ kernel/cgroup/rstat.c | 18 +++++++++++-- 2 files changed, 71 insertions(+), 3 deletions(-) diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h index af2755bda6eb..81f57fa751c4 100644 --- a/include/trace/events/cgroup.h +++ b/include/trace/events/cgroup.h @@ -296,6 +296,62 @@ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock_fastpath, TP_ARGS(cgrp, cpu, contended) ); +DECLARE_EVENT_CLASS(cgroup_ongoing, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts), + + TP_STRUCT__entry( + __field( int, root ) + __field( int, level ) + __field( u64, id ) + __field( u64, id_ongoing ) + __field( ktime_t, ts ) + __field( long, res ) + __field( u64, race ) + ), + + TP_fast_assign( + __entry->root = cgrp->root->hierarchy_id; + __entry->id = cgroup_id(cgrp); + __entry->level = cgrp->level; + __entry->id_ongoing = cgroup_id(cgrp_ongoing); + __entry->res = res; + __entry->race = race; + __entry->ts = ts; + ), + + TP_printk("root=%d id=%llu level=%d ongoing_flusher=%llu res=%ld race=%llu ts=%lld", + __entry->root, __entry->id, __entry->level, + __entry->id_ongoing, __entry->res, __entry->race, __entry->ts) +); + +DEFINE_EVENT(cgroup_ongoing, cgroup_ongoing_flusher, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts) +); + +DEFINE_EVENT(cgroup_ongoing, cgroup_ongoing_flusher_wait, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts) +); + +DEFINE_EVENT(cgroup_ongoing, cgroup_ongoing_flusher_yield, + + TP_PROTO(struct cgroup *cgrp, struct cgroup *cgrp_ongoing, \ + long res, unsigned int race, ktime_t ts), + + TP_ARGS(cgrp, cgrp_ongoing, res, race, ts) +); + #endif /* _TRACE_CGROUP_H */ /* This part must be outside protection */ diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index eaa138f2da2f..cf344c0e71b3 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -328,6 +328,7 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) { struct cgroup *ongoing; + unsigned int race = 0; bool locked; /* @@ -338,17 +339,25 @@ static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) retry: ongoing = READ_ONCE(cgrp_rstat_ongoing_flusher); if (ongoing && cgroup_is_descendant(cgrp, ongoing)) { - wait_for_completion_interruptible_timeout( + ktime_t ts = ktime_get_mono_fast_ns(); + long res = 0; + + trace_cgroup_ongoing_flusher(cgrp, ongoing, 0, race, ts); + + res = wait_for_completion_interruptible_timeout( &ongoing->flush_done, MAX_WAIT); - /* TODO: Add tracepoint here */ + trace_cgroup_ongoing_flusher_wait(cgrp, ongoing, res, race, ts); + return false; } locked = __cgroup_rstat_trylock(cgrp, -1); if (!locked) { /* Contended: Handle losing race for ongoing flusher */ - if (!ongoing && READ_ONCE(cgrp_rstat_ongoing_flusher)) + if (!ongoing && READ_ONCE(cgrp_rstat_ongoing_flusher)) { + race++; goto retry; + } __cgroup_rstat_lock(cgrp, -1, true); } @@ -369,6 +378,9 @@ static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) reinit_completion(&cgrp->flush_done); WRITE_ONCE(cgrp_rstat_ongoing_flusher, cgrp); } + } else { + /* Detect multiple flushers as ongoing yielded lock */ + trace_cgroup_ongoing_flusher_yield(cgrp, ongoing, 0, 0, 0); } return true; }