From patchwork Wed Jun 8 00:02:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Emilio Cota X-Patchwork-Id: 9162891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 92E3660571 for ; Wed, 8 Jun 2016 00:03:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7510528372 for ; Wed, 8 Jun 2016 00:03:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6793228377; Wed, 8 Jun 2016 00:03:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A260B28372 for ; Wed, 8 Jun 2016 00:03:09 +0000 (UTC) Received: from localhost ([::1]:53648 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAQxf-0001aJ-Ml for patchwork-qemu-devel@patchwork.kernel.org; Tue, 07 Jun 2016 20:03:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57384) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAQxH-0001Xw-Aa for qemu-devel@nongnu.org; Tue, 07 Jun 2016 20:02:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bAQxD-0002jQ-43 for qemu-devel@nongnu.org; Tue, 07 Jun 2016 20:02:42 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:49955) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAQxA-0002iR-QR for qemu-devel@nongnu.org; Tue, 07 Jun 2016 20:02:39 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 947702048A; Tue, 7 Jun 2016 20:02:25 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute3.internal (MEProxy); Tue, 07 Jun 2016 20:02:25 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=braap.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=AnUL5rHCBygGjGgOq4IHRlNnuO4=; b=x9FgCH YcUpeElvtblPqid5ekHdgzWeCnSDIIF0Jcw3vgjamAHgJFcnugQNkPpbNdLZAGnK d+EUA7s+3gPlf9T4rEhw708ccctabwZ4Lo56MdldoSN6cVvOtXdkWoCzvhwJLcQS ybpMpvy4S5cc/TE4WB9u6fbZQbeMIpismtszU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=AnUL5rHCBygGjGg Oq4IHRlNnuO4=; b=t7rFhknwziCvyyz1xQPjcM6ZyuSO/MOH92xaANucRE3GjYg 4gm9m+n1gusZYAs9L4fC507ByzdlTGjSwr82QfcmJ1vgYTrmwz7HOTAxbaSm5+dr XBzyEVgt/zeZn13QQOcnUCr2ntdW6fcHQ50e/Wbdr9K29bj6dQYm7WwojThs= X-Sasl-enc: Mw8ru3natKa95SII/ckFRcsh9N12OPMuH5qd1Yrj4A2d 1465344145 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 2C3D1F29F4; Tue, 7 Jun 2016 20:02:25 -0400 (EDT) Date: Tue, 7 Jun 2016 20:02:24 -0400 From: "Emilio G. Cota" To: Sergey Fedorov Message-ID: <20160608000224.GB16255@flamenco> References: <1464138802-23503-1-git-send-email-cota@braap.org> <1464138802-23503-9-git-send-email-cota@braap.org> <5749E02A.3080909@gmail.com> <20160607010545.GB4418@flamenco> <5756EEC0.8090502@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <5756EEC0.8090502@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.26 Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: MTTCG Devel , Paolo Bonzini , Alex =?iso-8859-1?Q?Benn=E9e?= , QEMU Developers , Richard Henderson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Jun 07, 2016 at 18:56:48 +0300, Sergey Fedorov wrote: > On 07/06/16 04:05, Emilio G. Cota wrote: > > On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote: > >> On 25/05/16 04:13, Emilio G. Cota wrote: > >>> diff --git a/util/qdist.c b/util/qdist.c > >>> new file mode 100644 > >>> index 0000000..3343640 > >>> --- /dev/null > >>> +++ b/util/qdist.c > >>> @@ -0,0 +1,386 @@ > >> (snip) > >>> + > >>> +void qdist_add(struct qdist *dist, double x, long count) > >>> +{ > >>> + struct qdist_entry *entry = NULL; > >>> + > >>> + if (dist->entries) { > >>> + struct qdist_entry e; > >>> + > >>> + e.x = x; > >>> + entry = bsearch(&e, dist->entries, dist->n, sizeof(e), qdist_cmp); > >>> + } > >>> + > >>> + if (entry) { > >>> + entry->count += count; > >>> + return; > >>> + } > >>> + > >>> + dist->entries = g_realloc(dist->entries, > >>> + sizeof(*dist->entries) * (dist->n + 1)); > >> Repeated doubling? > > Can you please elaborate? > > I mean dynamic array with a growth factor of 2 > [https://en.wikipedia.org/wiki/Dynamic_array]. Changed to: > >> (snip) > >>> +static char *qdist_pr_internal(const struct qdist *dist) > >>> +{ > >>> + double min, max, step; > >>> + GString *s = g_string_new(""); > >>> + size_t i; > >>> + > >>> + /* if only one entry, its printout will be either full or empty */ > >>> + if (dist->n == 1) { > >>> + if (dist->entries[0].count) { > >>> + g_string_append_unichar(s, qdist_blocks[QDIST_NR_BLOCK_CODES - 1]); > >>> + } else { > >>> + g_string_append_c(s, ' '); > >>> + } > >>> + goto out; > >>> + } > >>> + > >>> + /* get min and max counts */ > >>> + min = dist->entries[0].count; > >>> + max = min; > >>> + for (i = 0; i < dist->n; i++) { > >>> + struct qdist_entry *e = &dist->entries[i]; > >>> + > >>> + if (e->count < min) { > >>> + min = e->count; > >>> + } > >>> + if (e->count > max) { > >>> + max = e->count; > >>> + } > >>> + } > >>> + > >>> + /* floor((count - min) * step) will give us the block index */ > >>> + step = (QDIST_NR_BLOCK_CODES - 1) / (max - min); > >>> + > >>> + for (i = 0; i < dist->n; i++) { > >>> + struct qdist_entry *e = &dist->entries[i]; > >>> + int index; > >>> + > >>> + /* make an exception with 0; instead of using block[0], print a space */ > >>> + if (e->count) { > >>> + index = (int)((e->count - min) * step); > >> So "e->count == min" gives us one eighth block instead of just space? > > Yes, only 0 can print a space. > > So our scale is not linear. I think some users might get confused by this. That's correct. I think special-casing 0 makes sense though, since it increases the signal-to-noise ratio of the histogram. For example: 1) 0 as ' ': TB hash occupancy 31.84% avg chain occ. Histogram: [0,10)%|▆ █ ▅▁▃▁▁|[90,100]% TB hash avg chain 1.015 buckets. Histogram: 1|█▁▁|3 2) 0 as '1/8': TB hash occupancy 32.07% avg chain occ. Histogram: [0,10)%|▆▁█▁▁▅▁▃▁▁|[90,100]% TB hash avg chain 1.015 buckets. Histogram: 1|▇▁▁|3 I think in these examples most users would be less confused by 1) than by 2). (snip) > >>> + to->n = from->n; > >>> + memcpy(to->entries, from->entries, sizeof(*to->entries) * to->n); > >>> + return; > >>> + } > >>> + > >>> + rebin: > > By the way, here's a space before the 'rebin' label. Yes, I always do this. It prevents diff from mistaking the label for a function definition, and thus wrongly using the label as context. See: https://lkml.org/lkml/2010/6/16/312 > >>> + j_min = 0; > >>> + for (i = 0; i < n; i++) { > >>> + double x; > >>> + double left, right; > >>> + > >>> + left = xmin + i * step; > >>> + right = xmin + (i + 1) * step; > >>> + > >>> + /* Add x, even if it might not get any counts later */ > >>> + x = left; > >> This way we round down to the left margin of each bin like this: > >> > >> xmin [*---*---*---*---*] xmax -- from > >> | /| /| /| / > >> | / | / | / | / > >> |/ |/ |/ |/ > >> | | | | > >> V V V V > >> [* * * *] -- to > > (snip) > >> xmin [*----*----*----*] xmax -- from > >> \ /\ /\ /\ / > >> \ / \ / \ / \ / > >> | | | | > >> V V V V > >> [* * * *] -- to > >> > >> I'm not sure which is the more correct option from the mathematical > >> point of view; but multiple-binning with the last variant of the > >> algorithm we would still give the same result. > > There's no "right" or "wrong" way as long as we're consistent > > and we print the right counts in the right bins. I think the > > convention I chose is simple enough, and leads to simple printing > > of the labels. But yes other alternatives would be OK here. > > Well, if we go ahead with my last suggestion the code would look like this: > > rebin: > /* We do the binning using the following scheme: > * > * xmin [*----*----*----*] xmax -- from > * \ /\ /\ /\ / > * \ / \ / \ / \ / > * | | | | > * V V V V > * [* * * *] -- to > * > */ > step = (xmax - xmin) / (n - 1); > j = 0; > for (i = 0; i < n; i++) { > double x; > double right; > > x = xmin + i * step; > right = x + 0.5 * step; > > /* Add x, even if it might not get any counts later */ > qdist_add(to, x, 0); > > /* To avoid double-counting we capture [left, right) ranges */ > while (from->entries[j].x < right && j < from->n) { > qdist_add(to, x, from->entries[j].count); > j++; > } > } > assert(j == from->n); > } > > Actually it's simpler than current version. The behaviour isn't the same though. With this we have that the two outer bins (leftmost and rightmost) are unnecessarily large (since they're out of the range of the input data). For example, assume the data is between 0 and 100 and n=5 (i.e. step=25), it makes no sense to report the first bin as [-12.5,12.5). If we then truncate the unnecessary edges, we'd have [0,12.5), but then the second bin is [12.5,37.5). Bins of unequal size are possible (although a bit unusual) in histograms, but given our Unicode-based representation, we're limited to same-width bars. Emilio diff --git a/include/qemu/qdist.h b/include/qemu/qdist.h index 6d8b701..f30050c 100644 --- a/include/qemu/qdist.h +++ b/include/qemu/qdist.h @@ -29,6 +29,7 @@ struct qdist_entry { struct qdist { struct qdist_entry *entries; size_t n; + size_t size; }; #define QDIST_PR_BORDER BIT(0) diff --git a/util/qdist.c b/util/qdist.c index dc9dbd1..3b54354 100644 --- a/util/qdist.c +++ b/util/qdist.c @@ -16,6 +16,7 @@ void qdist_init(struct qdist *dist) { dist->entries = NULL; + dist->size = 0; dist->n = 0; } @@ -58,8 +59,11 @@ void qdist_add(struct qdist *dist, double x, long count) return; } - dist->entries = g_realloc(dist->entries, - sizeof(*dist->entries) * (dist->n + 1)); + if (unlikely(dist->n == dist->size)) { + dist->size = dist->size ? dist->size * 2 : 1; + dist->entries = g_realloc(dist->entries, + sizeof(*dist->entries) * (dist->size)); + } dist->n++; entry = &dist->entries[dist->n - 1]; entry->x = x;