From patchwork Mon Jun 6 23:40:14 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Emilio Cota X-Patchwork-Id: 9159569 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 48F3460777 for ; Mon, 6 Jun 2016 23:40:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3970624B44 for ; Mon, 6 Jun 2016 23:40:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2ABA028210; Mon, 6 Jun 2016 23:40:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4CABB24B44 for ; Mon, 6 Jun 2016 23:40:53 +0000 (UTC) Received: from localhost ([::1]:46057 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bA48Y-0004mb-U6 for patchwork-qemu-devel@patchwork.kernel.org; Mon, 06 Jun 2016 19:40:50 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57858) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bA48F-0004mG-JO for qemu-devel@nongnu.org; Mon, 06 Jun 2016 19:40:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bA48B-0005NO-RQ for qemu-devel@nongnu.org; Mon, 06 Jun 2016 19:40:30 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:59015) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bA489-0005HN-BK for qemu-devel@nongnu.org; Mon, 06 Jun 2016 19:40:27 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 996E720F6D; Mon, 6 Jun 2016 19:40:14 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute5.internal (MEProxy); Mon, 06 Jun 2016 19:40:14 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=braap.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=f8LOmpIUMN/zLqkhOO8sI2vf9vs=; b=04ahfG 088KEgd6ZKIMcAioMoafDtcZyJ+jFepIKHRYp3XflD/gbJr3xRc0tv8z6r3VvO78 +ylDrudDnW6iqk3QMaVDPceiUHxJcOJS+I31Usmn+F24WHN8KVsSwh2XVVUp9pAF v6LeRQxiBF0lcVH4Tpj9agfwZ9r7KlNk7SzOc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=f8LOmpIUMN/zLqk hOO8sI2vf9vs=; b=jD+LVtKyW+pkqikiXjh6JLBHY9OH8xgYcaBt76pQaQhZbFb Pa5jsicNCN6+Y0rueLyWC3y+fungR2ypCzq32fsv/gH3TGLNUCHlORWyxxd3TZWB jbGxVcoDne3qI6HUx+z7FuC1aNhqk6oE2CO/4NTGsWNFcsG1BSPymWNyqTNw= X-Sasl-enc: WlKMlMXvsNk7/QauOxRLGhV5HqwDJGD2iATPZnw3Hqm7 1465256414 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 40FAFCCDAC; Mon, 6 Jun 2016 19:40:14 -0400 (EDT) Date: Mon, 6 Jun 2016 19:40:14 -0400 From: "Emilio G. Cota" To: Sergey Fedorov Message-ID: <20160606234014.GA4418@flamenco> References: <1464138802-23503-1-git-send-email-cota@braap.org> <1464138802-23503-9-git-send-email-cota@braap.org> <5749E02A.3080909@gmail.com> <20160603172245.GA8361@flamenco> <5751BE73.2090402@gmail.com> <5751C25F.6080801@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <5751C25F.6080801@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.26 Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: MTTCG Devel , Paolo Bonzini , Alex =?iso-8859-1?Q?Benn=E9e?= , QEMU Developers , Richard Henderson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Jun 03, 2016 at 20:46:07 +0300, Sergey Fedorov wrote: > On 03/06/16 20:29, Sergey Fedorov wrote: > > On 03/06/16 20:22, Emilio G. Cota wrote: > >> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote: > >>> On 25/05/16 04:13, Emilio G. Cota wrote: > >>> (snip) > >>>> +double qdist_avg(const struct qdist *dist) > >>>> +{ > >>>> + unsigned long count; > >>>> + size_t i; > >>>> + double ret = 0; > >>>> + > >>>> + count = qdist_sample_count(dist); > >>>> + if (!count) { > >>>> + return NAN; > >>>> + } > >>>> + for (i = 0; i < dist->n; i++) { > >>>> + struct qdist_entry *e = &dist->entries[i]; > >>>> + > >>>> + ret += e->x * e->count / count; > >>> Please use Welford’s method or something like that, see > >>> http://stackoverflow.com/a/1346890. > >> Yes, the way the mean is computed right now, we might suffer > >> from underflow if count is huge. But I'd rather take that, than the > >> perf penalty of an iterative method (such as the one used > >> in Welford's). Note that we might have huge amounts of > >> items, e.g. one item per head bucket in qht's occupancy qdist > >> (and 0.5M head buckets is easy to achieve). > >> > >> If we were to use an iterative method, we'd need to do something > >> like: > >> > >> double qdist_avg(const struct qdist *dist) > >> { > >> size_t i, j; > >> double ret = 0; > >> > >> if (!qdist_sample_count(dist)) { > >> return NAN; > >> } > >> /* compute moving average to prevent under/overflow */ > >> for (i = 0; i < dist->n; i++) { > >> struct qdist_entry *e = &dist->entries[i]; > >> > >> for (j = 0; j < e->count; j++) { > >> > >> ret += (e->x - ret) / (i + j + 1); > >> } > >> } > >> return ret; > >> } > >> > >> Note that skipping the inner loop would be incorrect. > > Ah, it's a shame. I'm wondering if there is some other algorithm that > > could work for us? > > Maybe something like > https://en.wikipedia.org/wiki/Kahan_summation_algorithm could help? That algorithm is overkill for what we're doing. Pairwise summation should suffice: Emilio diff --git a/util/qdist.c b/util/qdist.c index 3343640..909bd2b 100644 --- a/util/qdist.c +++ b/util/qdist.c @@ -367,20 +367,34 @@ unsigned long qdist_sample_count(const struct qdist *dist) return count; } +static double qdist_pairwise_avg(const struct qdist *dist, size_t index, + size_t n, unsigned long count) +{ + if (n <= 2) { + size_t i; + double ret = 0; + + for (i = 0; i < n; i++) { + struct qdist_entry *e = &dist->entries[index + i]; + + ret += e->x * e->count / count; + } + return ret; + } else { + size_t n2 = n / 2; + + return qdist_pairwise_avg(dist, index, n2, count) + + qdist_pairwise_avg(dist, index + n2, n - n2, count); + } +} + double qdist_avg(const struct qdist *dist) { unsigned long count; - size_t i; - double ret = 0; count = qdist_sample_count(dist); if (!count) { return NAN; } - for (i = 0; i < dist->n; i++) { - struct qdist_entry *e = &dist->entries[i]; - - ret += e->x * e->count / count; - } - return ret; + return qdist_pairwise_avg(dist, 0, dist->n, count); }