From patchwork Thu Jul 12 17:29:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10522057 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B2795602B3 for ; Thu, 12 Jul 2018 17:27:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 94A6A29ACD for ; Thu, 12 Jul 2018 17:27:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 887A129AD5; Thu, 12 Jul 2018 17:27:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE60929ACD for ; Thu, 12 Jul 2018 17:27:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 442D76B0274; Thu, 12 Jul 2018 13:27:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3CD5F6B0275; Thu, 12 Jul 2018 13:27:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21C136B0276; Thu, 12 Jul 2018 13:27:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id BCBBE6B0274 for ; Thu, 12 Jul 2018 13:27:51 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id f8-v6so5758258eds.6 for ; Thu, 12 Jul 2018 10:27:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=Gscx0QC+nIHjTWfvlirBxBIaOIZlgx6eimTug5oZ3Ts=; b=nEmXkuP2tvwcWwV9aJ0F/HQA79wudR99VScqCI4lOJJ7+nF2HrpowscsnChsdVy7et PD2nAkbbGrHbA3C/C0kTm8wgz7vcZrfNiEhG8YCQ8mfqAYMckAntvaZVl71P7oGrYpb7 agsvT1RGpaxn6MjouOg9CbqLKxplQ6/6T/S9Wzr9Hqy3/iZqHNNnaVq8RUBtbZrK73nw 6QMbYn5xpsqMfmFfTvlUmJJjbcdYfYWJwupR0TrGeod/ziKxpqXYfsNCdV8ehDkH4E9a Zs5ENr8ID00JVGznTdF6z3GGNT+YpnGtErX+o8mQCJoEcr/kdpxdJXPta7K/F/vO2xKh THng== X-Gm-Message-State: AOUpUlGbp8wIeKb7nOq0DwiQhi+OEu3XZPedWfRU9a+JcoJFFSeT6f5a Yn1DOtZR+FBBQ7+wA8mATbGwq0tdZ/sghFyr+GHq7reDSFoKNW6sMfx81e61kKEmCoWSr/5bmtN GjYt3Z2yWWwX06cTj3xfvGPFSI+NTvgf/SM2RdCmMtAPfG0f7vlTjecFLBc708N+rWA== X-Received: by 2002:a50:82c7:: with SMTP id 65-v6mr3505243edg.114.1531416471323; Thu, 12 Jul 2018 10:27:51 -0700 (PDT) X-Google-Smtp-Source: AAOMgpepD+gxIgQri+YeJysuCKaZUpDTAIkvEbe9T5Gs1SqpwXAtzV20zFbBYq4WVJpSOO6mHIjN X-Received: by 2002:a50:82c7:: with SMTP id 65-v6mr3505191edg.114.1531416470522; Thu, 12 Jul 2018 10:27:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531416470; cv=none; d=google.com; s=arc-20160816; b=edelh9M0edXfmCCgww+P2CwsIIztIIRAjEDweMUU87O1o3MG+mqUU17krQpo9cnsDq ewg5kdg7wacxgQCE6IgSYOR6FHA5Vym3VcU9oR1JhzxfBe6tQahj0ByE5BzTEQRjwbHc UNBFEVybWu/Rj7s7wFjHl0Tydf3Nw2lqdFohioS4o01VHB8Ch2iROKNbdH9LoRQCn3Fr I+f4vHT55jEN7YG8B2GLBqL09pPnL3KAJld6Rt0dDAFNzEKByctIzN0itBVYy1Sm9Reb BgrXNnI83lIaT8ZwbuMH26roDeoCUEMF/ozQ/OyO4Yfr8K9fF6PWMTovAHBI/1jGZjdb QyYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=Gscx0QC+nIHjTWfvlirBxBIaOIZlgx6eimTug5oZ3Ts=; b=ybLt71bEpdT8LrEH10RbgOCvJxD3yOvopN8MzAgOvy1xQJk1DWz0w5DHbp1kn0Kahg 7jAY3Ng73JaZXL5oJaBCfimYupKgbMGrrKjfVM1R2DLY0GgXGm4OR/yFno8WJZAA4M7r 4TkL5GDHpICCFEmM4ruC0VA6GZEP/cSWdSRwlUdZ2ScSWeVODvOuYip6cP41DGEsUoul wMIK79lRjS2mORhAcmiJGz89IZK+UKrh4Hs5+Z3+c4JxfwbGL0wkH5zG69gING1lIyU3 MDNEcvCk2X66t21r+m4zSxalmmC+APnZt+95+6bJqTFgAHXnCtdtPUZ40Fqeji2dfuHh 6jEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg.org header.s=x header.b=STJjyn9m; spf=pass (google.com: domain of hannes@cmpxchg.org designates 85.214.110.215 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id p34-v6si12900edp.402.2018.07.12.10.27.50 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 12 Jul 2018 10:27:50 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 85.214.110.215 as permitted sender) client-ip=85.214.110.215; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg.org header.s=x header.b=STJjyn9m; spf=pass (google.com: domain of hannes@cmpxchg.org designates 85.214.110.215 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender: Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Gscx0QC+nIHjTWfvlirBxBIaOIZlgx6eimTug5oZ3Ts=; b=STJjyn9m1SD2Am2aqOtF4chPFt MJLz7RUUm++MtN8wY8m/Vbq/imo2XmNA0v+57Zp6hKfZqb7vFf0EKp3CuZ2cpRDiJIjVaHjAWMIMt bG5I0iSJJqQOA1o3JZfhaFK2RCZc7RR8lhN5yDWBDYxw7fnNjU9enkufQ5VSql0/lcB8=; From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure Date: Thu, 12 Jul 2018 13:29:42 -0400 Message-Id: <20180712172942.10094-11-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180712172942.10094-1-hannes@cmpxchg.org> References: <20180712172942.10094-1-hannes@cmpxchg.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Right now, psi reports pressure and stall times of already concluded stall events. For most use cases this is current enough, but certain highly latency-sensitive applications, like the Android OOM killer, might want to know about and react to stall states before they have even concluded (e.g. a prolonged reclaim cycle). This patches the procfs/cgroupfs interface such that when the pressure metrics are read, the current per-cpu states, if any, are taken into account as well. Any ongoing states are concluded, their time snapshotted, and then restarted. This requires holding the rq lock to avoid corruption. It could use some form of rq lock ratelimiting or avoidance. Requested-by: Suren Baghdasaryan Not-yet-signed-off-by: Johannes Weiner --- kernel/sched/psi.c | 56 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 10 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 53e0b7b83e2e..5a6c6057f775 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -190,7 +190,7 @@ static void calc_avgs(unsigned long avg[3], u64 time, int missed_periods) } } -static bool psi_update_stats(struct psi_group *group) +static bool psi_update_stats(struct psi_group *group, bool ondemand) { u64 some[NR_PSI_RESOURCES] = { 0, }; u64 full[NR_PSI_RESOURCES] = { 0, }; @@ -200,8 +200,6 @@ static bool psi_update_stats(struct psi_group *group) int cpu; int r; - mutex_lock(&group->stat_lock); - /* * Collect the per-cpu time buckets and average them into a * single time sample that is normalized to wallclock time. @@ -218,10 +216,36 @@ static bool psi_update_stats(struct psi_group *group) for_each_online_cpu(cpu) { struct psi_group_cpu *groupc = per_cpu_ptr(group->cpus, cpu); unsigned long nonidle; + struct rq_flags rf; + struct rq *rq; + u64 now; - if (!groupc->nonidle_time) + if (!groupc->nonidle_time && !groupc->nonidle) continue; + /* + * We come here for two things: 1) periodic per-cpu + * bucket flushing and averaging and 2) when the user + * wants to read a pressure file. For flushing and + * averaging, which is relatively infrequent, we can + * be lazy and tolerate some raciness with concurrent + * updates to the per-cpu counters. However, if a user + * polls the pressure state, we want to give them the + * most uptodate information we have, including any + * currently active state which hasn't been timed yet, + * because in case of an iowait or a reclaim run, that + * can be significant. + */ + if (ondemand) { + rq = cpu_rq(cpu); + rq_lock_irq(rq, &rf); + + now = cpu_clock(cpu); + + groupc->nonidle_time += now - groupc->nonidle_start; + groupc->nonidle_start = now; + } + nonidle = nsecs_to_jiffies(groupc->nonidle_time); groupc->nonidle_time = 0; nonidle_total += nonidle; @@ -229,13 +253,27 @@ static bool psi_update_stats(struct psi_group *group) for (r = 0; r < NR_PSI_RESOURCES; r++) { struct psi_resource *res = &groupc->res[r]; + if (ondemand && res->state != PSI_NONE) { + bool is_full = res->state == PSI_FULL; + + res->times[is_full] += now - res->state_start; + res->state_start = now; + } + some[r] += (res->times[0] + res->times[1]) * nonidle; full[r] += res->times[1] * nonidle; - /* It's racy, but we can tolerate some error */ res->times[0] = 0; res->times[1] = 0; } + + if (ondemand) + rq_unlock_irq(rq, &rf); + } + + for (r = 0; r < NR_PSI_RESOURCES; r++) { + do_div(some[r], max(nonidle_total, 1UL)); + do_div(full[r], max(nonidle_total, 1UL)); } /* @@ -249,12 +287,10 @@ static bool psi_update_stats(struct psi_group *group) * activity, thus no data, and clock ticks are sporadic. The * below handles both. */ + mutex_lock(&group->stat_lock); /* total= */ for (r = 0; r < NR_PSI_RESOURCES; r++) { - do_div(some[r], max(nonidle_total, 1UL)); - do_div(full[r], max(nonidle_total, 1UL)); - group->some[r] += some[r]; group->full[r] += full[r]; } @@ -301,7 +337,7 @@ static void psi_clock(struct work_struct *work) * go - see calc_avgs() and missed_periods. */ - nonidle = psi_update_stats(group); + nonidle = psi_update_stats(group, false); if (nonidle) { unsigned long delay = 0; @@ -570,7 +606,7 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) if (psi_disabled) return -EOPNOTSUPP; - psi_update_stats(group); + psi_update_stats(group, true); for (w = 0; w < 3; w++) { avg[0][w] = group->avg_some[res][w];