From patchwork Thu Oct 22 10:25:17 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 7464361 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 07A2C9F302 for ; Thu, 22 Oct 2015 10:29:36 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 03CD220854 for ; Thu, 22 Oct 2015 10:29:35 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 27E0620851 for ; Thu, 22 Oct 2015 10:29:34 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZpD4j-0005nK-2L; Thu, 22 Oct 2015 10:26:25 +0000 Received: from galois.linutronix.de ([2001:470:1f0b:db:abcd:42:0:1]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZpD4g-0005ey-4e for linux-arm-kernel@lists.infradead.org; Thu, 22 Oct 2015 10:26:22 +0000 Received: from localhost ([127.0.0.1]) by Galois.linutronix.de with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1ZpD4G-00024w-HX; Thu, 22 Oct 2015 12:25:56 +0200 Date: Thu, 22 Oct 2015 12:25:17 +0200 (CEST) From: Thomas Gleixner To: Ding Tianhong Subject: Re: Problem about CPU stalling in hrtimer_intterrupts() In-Reply-To: <5628AC58.2030509@huawei.com> Message-ID: References: <56288585.40204@huawei.com> <5628AC58.2030509@huawei.com> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1, SHORTCIRCUIT=-0.0001 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20151022_032622_377150_984367B1 X-CRM114-Status: GOOD ( 27.48 ) X-Spam-Score: -4.2 (----) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hanjun Guo , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Yang Yingliang Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, 22 Oct 2015, Ding Tianhong wrote: > On 2015/10/22 15:43, Thomas Gleixner wrote: > >> Jan 01 00:03:32 Linux kernel: i:0 basenow.tv64:4809284991830 > >> hrtimer_get_softexpires_tv64(timer):4440120000000 ccpu0 > >> timer:ffffffdffdec6138, timer->function:ffffffc000129b84 > >> Jan 01 00:03:32 Linux kernel: i:0 basenow.tv64:4809284991830 > >> hrtimer_get_softexpires_tv64(timer):4440120000000 ccpu0 > This problem could only occur on the system with 32 cores, when I > cut the cores to 16, this problem disappeared, so I think there is > some parallel problem when the 32 core set clock time together: > I try to reproduce the scene: > > 1.do_settimeofday64 > 2.update tk time > 3.update base time offset > 4.update expires_next > > the 3 and 4 will be called in softirq, but the hrtimer_interrupt may > break the order and run before 3, I am not sure whether this could > make the problem, do we need to update base time and expires_next in > the hrtimer_interrupt? maybe I miss something, thanks for any > suggestion. Base offset is updated in hrtimer_interrupt as well. hrtimer_update_base() does that. So that's not the problem. Can you apply the patch below and enable the hrtimer tracepoints and collect trace data across the point where the problem happens? Thanks, tglx ---- diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 44d2cc0436f4..614f8d272cb0 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -575,8 +575,14 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action) update_fast_timekeeper(&tk->tkr_mono, &tk_fast_mono); update_fast_timekeeper(&tk->tkr_raw, &tk_fast_raw); - if (action & TK_CLOCK_WAS_SET) + if (action & TK_CLOCK_WAS_SET) { tk->clock_was_set_seq++; + trace_printk("TK: Seq: %u R: %lld B: %lld T: %lld\n", + tk->clock_was_set_seq, + tk->offs_real, + tk->offs_boot, + tk->offs_tai); + } /* * The mirroring of the data to the shadow-timekeeper needs * to happen last here to ensure we don't over-write the @@ -1954,6 +1960,11 @@ ktime_t ktime_get_update_offsets_now(unsigned int *cwsseq, ktime_t *offs_real, base = ktime_add_ns(base, nsecs); if (*cwsseq != tk->clock_was_set_seq) { + trace_printk("HR: Seq: %u R: %lld B: %lld T: %lld\n", + tk->clock_was_set_seq, + tk->offs_real, + tk->offs_boot, + tk->offs_tai); *cwsseq = tk->clock_was_set_seq; *offs_real = tk->offs_real; *offs_boot = tk->offs_boot;