Message ID | 20160608000224.GB16255@flamenco (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/06/16 03:02, Emilio G. Cota wrote: > On Tue, Jun 07, 2016 at 18:56:48 +0300, Sergey Fedorov wrote: >> On 07/06/16 04:05, Emilio G. Cota wrote: >>> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote: >>>> On 25/05/16 04:13, Emilio G. Cota wrote: >>>>> diff --git a/util/qdist.c b/util/qdist.c >>>>> new file mode 100644 >>>>> index 0000000..3343640 >>>>> --- /dev/null >>>>> +++ b/util/qdist.c >>>>> @@ -0,0 +1,386 @@ >>>> (snip) >>>>> + >>>>> +void qdist_add(struct qdist *dist, double x, long count) >>>>> +{ >>>>> + struct qdist_entry *entry = NULL; >>>>> + >>>>> + if (dist->entries) { >>>>> + struct qdist_entry e; >>>>> + >>>>> + e.x = x; >>>>> + entry = bsearch(&e, dist->entries, dist->n, sizeof(e), qdist_cmp); >>>>> + } >>>>> + >>>>> + if (entry) { >>>>> + entry->count += count; >>>>> + return; >>>>> + } >>>>> + >>>>> + dist->entries = g_realloc(dist->entries, >>>>> + sizeof(*dist->entries) * (dist->n + 1)); >>>> Repeated doubling? >>> Can you please elaborate? >> I mean dynamic array with a growth factor of 2 >> [https://en.wikipedia.org/wiki/Dynamic_array]. > Changed to: > > diff --git a/include/qemu/qdist.h b/include/qemu/qdist.h > index 6d8b701..f30050c 100644 > --- a/include/qemu/qdist.h > +++ b/include/qemu/qdist.h > @@ -29,6 +29,7 @@ struct qdist_entry { > struct qdist { > struct qdist_entry *entries; > size_t n; > + size_t size; > }; > > #define QDIST_PR_BORDER BIT(0) > diff --git a/util/qdist.c b/util/qdist.c > index dc9dbd1..3b54354 100644 > --- a/util/qdist.c > +++ b/util/qdist.c > @@ -16,6 +16,7 @@ > void qdist_init(struct qdist *dist) > { > dist->entries = NULL; > + dist->size = 0; > dist->n = 0; > } > > @@ -58,8 +59,11 @@ void qdist_add(struct qdist *dist, double x, long count) > return; > } > > - dist->entries = g_realloc(dist->entries, > - sizeof(*dist->entries) * (dist->n + 1)); > + if (unlikely(dist->n == dist->size)) { > + dist->size = dist->size ? dist->size * 2 : 1; We could initialize 'dist->size' to 1 and allocate a 1-entry 'dist->entries' array in qdist_init() to avoid this ternary operation ;-) Otherwise looks good. > + dist->entries = g_realloc(dist->entries, > + sizeof(*dist->entries) * (dist->size)); > + } > dist->n++; > entry = &dist->entries[dist->n - 1]; > entry->x = x; > > >>>> (snip) >>>>> +static char *qdist_pr_internal(const struct qdist *dist) >>>>> +{ >>>>> + double min, max, step; >>>>> + GString *s = g_string_new(""); >>>>> + size_t i; >>>>> + >>>>> + /* if only one entry, its printout will be either full or empty */ >>>>> + if (dist->n == 1) { >>>>> + if (dist->entries[0].count) { >>>>> + g_string_append_unichar(s, qdist_blocks[QDIST_NR_BLOCK_CODES - 1]); >>>>> + } else { >>>>> + g_string_append_c(s, ' '); >>>>> + } >>>>> + goto out; >>>>> + } >>>>> + >>>>> + /* get min and max counts */ >>>>> + min = dist->entries[0].count; >>>>> + max = min; >>>>> + for (i = 0; i < dist->n; i++) { >>>>> + struct qdist_entry *e = &dist->entries[i]; >>>>> + >>>>> + if (e->count < min) { >>>>> + min = e->count; >>>>> + } >>>>> + if (e->count > max) { >>>>> + max = e->count; >>>>> + } >>>>> + } >>>>> + >>>>> + /* floor((count - min) * step) will give us the block index */ >>>>> + step = (QDIST_NR_BLOCK_CODES - 1) / (max - min); >>>>> + >>>>> + for (i = 0; i < dist->n; i++) { >>>>> + struct qdist_entry *e = &dist->entries[i]; >>>>> + int index; >>>>> + >>>>> + /* make an exception with 0; instead of using block[0], print a space */ >>>>> + if (e->count) { >>>>> + index = (int)((e->count - min) * step); >>>> So "e->count == min" gives us one eighth block instead of just space? >>> Yes, only 0 can print a space. >> So our scale is not linear. I think some users might get confused by this. > That's correct. I think special-casing 0 makes sense though, since > it increases the signal-to-noise ratio of the histogram. For example: > > 1) 0 as ' ': > TB hash occupancy 31.84% avg chain occ. Histogram: [0,10)%|▆ █ ▅▁▃▁▁|[90,100]% > TB hash avg chain 1.015 buckets. Histogram: 1|█▁▁|3 > > 2) 0 as '1/8': > TB hash occupancy 32.07% avg chain occ. Histogram: [0,10)%|▆▁█▁▁▅▁▃▁▁|[90,100]% > TB hash avg chain 1.015 buckets. Histogram: 1|▇▁▁|3 > > I think in these examples most users would be less confused by 1) than by 2). I was meaning to represent all bars whose value < 1/8 as a space, not only whose value is pure zero. Otherwise we can see 1/8 bar where the actual value is negligibly differ from zero as in the second example. > > (snip) >>>>> + to->n = from->n; >>>>> + memcpy(to->entries, from->entries, sizeof(*to->entries) * to->n); >>>>> + return; >>>>> + } >>>>> + >>>>> + rebin: >> By the way, here's a space before the 'rebin' label. > Yes, I always do this. > It prevents diff from mistaking the label for a function definition, > and thus wrongly using the label as context. See: > https://lkml.org/lkml/2010/6/16/312 Cool! > > >>>>> + j_min = 0; >>>>> + for (i = 0; i < n; i++) { >>>>> + double x; >>>>> + double left, right; >>>>> + >>>>> + left = xmin + i * step; >>>>> + right = xmin + (i + 1) * step; >>>>> + >>>>> + /* Add x, even if it might not get any counts later */ >>>>> + x = left; >>>> This way we round down to the left margin of each bin like this: >>>> >>>> xmin [*---*---*---*---*] xmax -- from >>>> | /| /| /| / >>>> | / | / | / | / >>>> |/ |/ |/ |/ >>>> | | | | >>>> V V V V >>>> [* * * *] -- to >>> (snip) >>>> xmin [*----*----*----*] xmax -- from >>>> \ /\ /\ /\ / >>>> \ / \ / \ / \ / >>>> | | | | >>>> V V V V >>>> [* * * *] -- to >>>> >>>> I'm not sure which is the more correct option from the mathematical >>>> point of view; but multiple-binning with the last variant of the >>>> algorithm we would still give the same result. >>> There's no "right" or "wrong" way as long as we're consistent >>> and we print the right counts in the right bins. I think the >>> convention I chose is simple enough, and leads to simple printing >>> of the labels. But yes other alternatives would be OK here. >> Well, if we go ahead with my last suggestion the code would look like this: >> >> rebin: >> /* We do the binning using the following scheme: >> * >> * xmin [*----*----*----*] xmax -- from >> * \ /\ /\ /\ / >> * \ / \ / \ / \ / >> * | | | | >> * V V V V >> * [* * * *] -- to >> * >> */ >> step = (xmax - xmin) / (n - 1); >> j = 0; >> for (i = 0; i < n; i++) { >> double x; >> double right; >> >> x = xmin + i * step; >> right = x + 0.5 * step; >> >> /* Add x, even if it might not get any counts later */ >> qdist_add(to, x, 0); >> >> /* To avoid double-counting we capture [left, right) ranges */ >> while (from->entries[j].x < right && j < from->n) { >> qdist_add(to, x, from->entries[j].count); >> j++; >> } >> } >> assert(j == from->n); >> } >> >> Actually it's simpler than current version. > The behaviour isn't the same though. With this we have > that the two outer bins (leftmost and rightmost) are unnecessarily > large (since they're out of the range of the input data). > > For example, assume the data is between 0 and 100 and n=5 (i.e. step=25), > it makes no sense to report the first bin as [-12.5,12.5). If we > then truncate the unnecessary edges, we'd have [0,12.5), but > then the second bin is [12.5,37.5). Bins of unequal size are > possible (although a bit unusual) in histograms, but given > our Unicode-based representation, we're limited to same-width bars. That is why I noted that I'm not sure what is the most correct from mathematical point of view. Maybe consider the second option? I.e. rounding to the middle of each bin with: x = left + step / 2; which would give the picture like this: xmin [*---*---*---*---*] xmax -- from | | | | | \ / \ / \ / \ / | | | | V V V V [* * * *] -- to Anyway, you may consider if you like whether it's possible to apply some simplifications from my code to the final version. Kind regards, Sergey
diff --git a/include/qemu/qdist.h b/include/qemu/qdist.h index 6d8b701..f30050c 100644 --- a/include/qemu/qdist.h +++ b/include/qemu/qdist.h @@ -29,6 +29,7 @@ struct qdist_entry { struct qdist { struct qdist_entry *entries; size_t n; + size_t size; }; #define QDIST_PR_BORDER BIT(0) diff --git a/util/qdist.c b/util/qdist.c index dc9dbd1..3b54354 100644 --- a/util/qdist.c +++ b/util/qdist.c @@ -16,6 +16,7 @@ void qdist_init(struct qdist *dist) { dist->entries = NULL; + dist->size = 0; dist->n = 0; } @@ -58,8 +59,11 @@ void qdist_add(struct qdist *dist, double x, long count) return; } - dist->entries = g_realloc(dist->entries, - sizeof(*dist->entries) * (dist->n + 1)); + if (unlikely(dist->n == dist->size)) { + dist->size = dist->size ? dist->size * 2 : 1; + dist->entries = g_realloc(dist->entries, + sizeof(*dist->entries) * (dist->size)); + } dist->n++; entry = &dist->entries[dist->n - 1]; entry->x = x;