From patchwork Wed Aug 17 02:05:46 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 9285095 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 14476607FD for ; Wed, 17 Aug 2016 02:23:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F1597289A4 for ; Wed, 17 Aug 2016 02:23:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E6141289B6; Wed, 17 Aug 2016 02:23:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87B0B289A4 for ; Wed, 17 Aug 2016 02:23:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753071AbcHQCXR (ORCPT ); Tue, 16 Aug 2016 22:23:17 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:36180 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752152AbcHQCXQ (ORCPT ); Tue, 16 Aug 2016 22:23:16 -0400 Received: by mail-pf0-f194.google.com with SMTP id y134so6504987pfg.3; Tue, 16 Aug 2016 19:23:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=IW3/HFh0GcN4k4w3Hhf7NWsyUlRJmPtCj+hkAW+b5E8=; b=BQhhvxt67P/gcSu6BzvRrb7TfvO1W4grl0gMA7pRiR41fH+T2FC/tuLe2U+AFA3IPn HFS0jx3lF9Tv+BUtTw17KoVzDsUWOXVuHqRxzOb1S7qn+zuu3BymAIsuXTV4xFBPmarc coRd7cnxX2MO+4v62l3ldi4f+bM9Q1ySIsPIdWR5h+h8CTiyMYEoZxHUHCSuM3tqAbZC QTrtVN0AOOgYimukIgG5EJLqgRQzzLoXWUiUHZnzLD3gRTsRZn7sPCxpQQAFJ3gFmOSb bnnE0cnpQCk2VG/uO5lwx+Cn2HdtJq7vd/rYabRAE9pngpsosP5aYJ8yS2FuNzsbhuEf e48A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=IW3/HFh0GcN4k4w3Hhf7NWsyUlRJmPtCj+hkAW+b5E8=; b=bLFiHtUau0e521cDAOao+T8A3JtXJkKamNEZLD8xUBqkm0GDgebS3Nzfgdv6VwK3Pf Uj38mCfyusbpzgz6NxnEWql4shPsVsSFf4HWqiILVNPs28N+V35AwwXR8QUtjMD4d+CF VAzuD2GU9XMzv2JiyYbT/wrChsQyQ7EZdNhMjdM1JRzL18xtA/6EyWnKJlulmpLJBPI4 Fnm8zkQ5Q/KoiGE1O1SHzOx9um4LVWWeBqcfzEOK7wIQfBGQ2hYgEu07pDsKrvC6qKUn m8ugU/mgWLR7K2L2BMJMnan/lb6Yx2MxfhPCy7OElRV3og14NJh50nrcbLM1MgsBHf+J IARQ== X-Gm-Message-State: AEkoouusLu0csvUCRD+XYD/Vn9bLDKgc7Xr/ZSPfhi6VYdod/n4y+Tb07fzsAnu5DXLcTw== X-Received: by 10.98.74.201 with SMTP id c70mr69749479pfj.113.1471399552125; Tue, 16 Aug 2016 19:05:52 -0700 (PDT) Received: from kernel.kingsoft.cn ([114.255.44.132]) by smtp.gmail.com with ESMTPSA id m5sm42407719paw.40.2016.08.16.19.05.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 16 Aug 2016 19:05:51 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Wanpeng Li , Ingo Molnar , Peter Zijlstra , Rik van Riel , Paolo Bonzini , Radim Krcmar , Mike Galbraith , Frederic Weisbecker , Thomas Gleixner Subject: [PATCH v3] sched/cputime: Resync steal time when guest & host lose sync Date: Wed, 17 Aug 2016 10:05:46 +0800 Message-Id: <1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com> X-Mailer: git-send-email 1.9.1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li Commit: 57430218317e ("sched/cputime: Count actually elapsed irq & softirq time") ... triggered a regression: | An i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four | cpu hog processes(for loop) running in the guest, I hot-unplug the pCPUs | on host one by one until there is only one left, then observe the top in | guest, there are 100% st for cpu0(housekeeping), and 75% st for other cpus | (nohz full mode). However, w/o this commit, 75% for all the four cpus. When a guest is interrupted for a longer amount of time, missed clock ticks are not redelivered later. Because of that, we should not limit the amount of steal time accounted to the amount of time that the calling functions think have passed. However, the interval returned by account_other_time() is NOT rounded down to the nearest jiffy, while the base interval in get_vtime_delta() it is subtracted from is, so the max cputime limit is required to avoid underflow. This patch fix the regression by limiting the account_other_time() from get_vtime_delta() to avoid underflow, and let other three call sites (account_other_time() and steal_account_process_time()) account however much steal time the host told us elapsed. Suggested-by: Rik van Riel Suggested-by: Paolo Bonzini Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rik van Riel Cc: Paolo Bonzini Cc: Radim Krcmar Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Thomas Gleixner Signed-off-by: Wanpeng Li --- v2 -> v3: * update code comments v1 -> v2: * add code comments and update the changelog kernel/sched/cputime.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 9858266..2b9e5e5 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -263,6 +263,11 @@ void account_idle_time(cputime_t cputime) cpustat[CPUTIME_IDLE] += (__force u64) cputime; } +/* + * When a guest is interrupted for a longer amount of time, missed clock + * ticks are not redelivered later. Due to that, this function may on + * occasion account more time than the calling functions think elapsed. + */ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime) { #ifdef CONFIG_PARAVIRT @@ -371,7 +376,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick, * idle, or potentially user or system time. Due to rounding, * other time can exceed ticks occasionally. */ - other = account_other_time(cputime); + other = account_other_time(ULONG_MAX); if (other >= cputime) return; cputime -= other; @@ -486,7 +491,7 @@ void account_process_tick(struct task_struct *p, int user_tick) } cputime = cputime_one_jiffy; - steal = steal_account_process_time(cputime); + steal = steal_account_process_time(ULONG_MAX); if (steal >= cputime) return; @@ -516,7 +521,7 @@ void account_idle_ticks(unsigned long ticks) } cputime = jiffies_to_cputime(ticks); - steal = steal_account_process_time(cputime); + steal = steal_account_process_time(ULONG_MAX); if (steal >= cputime) return; @@ -694,6 +699,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk) unsigned long now = READ_ONCE(jiffies); cputime_t delta, other; + /* + * Unlike tick based timing, vtime based timing never has lost + * ticks, and no need for steal time accounting to make up for + * lost ticks. Vtime accounts a rounded version of actual + * elapsed time. Limit account_other_time to prevent rounding + * errors from causing elapsed vtime to go negative. + */ delta = jiffies_to_cputime(now - tsk->vtime_snap); other = account_other_time(delta); WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);