From patchwork Tue Oct 9 23:13:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10633381 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 561A115E2 for ; Tue, 9 Oct 2018 23:13:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 41924298EB for ; Tue, 9 Oct 2018 23:13:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 304C6299AD; Tue, 9 Oct 2018 23:13:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A47D1298EB for ; Tue, 9 Oct 2018 23:13:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726564AbeJJGcu (ORCPT ); Wed, 10 Oct 2018 02:32:50 -0400 Received: from cloud.peff.net ([104.130.231.41]:34870 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725750AbeJJGcu (ORCPT ); Wed, 10 Oct 2018 02:32:50 -0400 Received: (qmail 14404 invoked by uid 109); 9 Oct 2018 23:13:34 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Tue, 09 Oct 2018 23:13:34 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 11300 invoked by uid 111); 9 Oct 2018 23:12:42 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 09 Oct 2018 19:12:42 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 09 Oct 2018 19:13:32 -0400 Date: Tue, 9 Oct 2018 19:13:32 -0400 From: Jeff King To: Derrick Stolee Cc: SZEDER =?utf-8?b?R8OhYm9y?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Stefan Beller , git , Duy Nguyen Subject: [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Message-ID: <20181009231331.GA23730@sigill.intra.peff.net> References: <20181009231250.GA19342@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181009231250.GA19342@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jeff King --- Makefile | 1 + t/helper/test-tree-bitmap.c | 167 ++++++++++++++++++++++++++++++++++++ 2 files changed, 168 insertions(+) create mode 100644 t/helper/test-tree-bitmap.c diff --git a/Makefile b/Makefile index 13e1c52478..f6e823f2d6 100644 --- a/Makefile +++ b/Makefile @@ -751,6 +751,7 @@ TEST_PROGRAMS_NEED_X += test-parse-options TEST_PROGRAMS_NEED_X += test-pkt-line TEST_PROGRAMS_NEED_X += test-svn-fe TEST_PROGRAMS_NEED_X += test-tool +TEST_PROGRAMS_NEED_X += test-tree-bitmap TEST_PROGRAMS = $(patsubst %,t/helper/%$X,$(TEST_PROGRAMS_NEED_X)) diff --git a/t/helper/test-tree-bitmap.c b/t/helper/test-tree-bitmap.c new file mode 100644 index 0000000000..bc5cf0e514 --- /dev/null +++ b/t/helper/test-tree-bitmap.c @@ -0,0 +1,167 @@ +#include "cache.h" +#include "revision.h" +#include "diffcore.h" +#include "argv-array.h" +#include "ewah/ewok.h" + +/* map of pathnames to bit positions */ +struct pathmap_entry { + struct hashmap_entry ent; + unsigned pos; + char path[FLEX_ARRAY]; +}; + +static int pathmap_entry_hashcmp(const void *unused_cmp_data, + const void *entry, + const void *entry_or_key, + const void *keydata) +{ + const struct pathmap_entry *a = entry; + const struct pathmap_entry *b = entry_or_key; + const char *key = keydata; + + return strcmp(a->path, key ? key : b->path); +} + +static int pathmap_entry_strcmp(const void *va, const void *vb) +{ + struct pathmap_entry *a = *(struct pathmap_entry **)va; + struct pathmap_entry *b = *(struct pathmap_entry **)vb; + return strcmp(a->path, b->path); +} + +struct walk_paths_data { + struct hashmap *paths; + struct commit *commit; +}; + +static void walk_paths(diff_format_fn_t fn, struct hashmap *paths) +{ + struct argv_array argv = ARGV_ARRAY_INIT; + struct rev_info revs; + struct walk_paths_data data; + struct commit *commit; + + argv_array_pushl(&argv, "rev-list", + "--all", "-t", "--no-renames", + NULL); + init_revisions(&revs, NULL); + setup_revisions(argv.argc, argv.argv, &revs, NULL); + revs.diffopt.output_format = DIFF_FORMAT_CALLBACK; + revs.diffopt.format_callback = fn; + revs.diffopt.format_callback_data = &data; + + data.paths = paths; + + prepare_revision_walk(&revs); + while ((commit = get_revision(&revs))) { + data.commit = commit; + diff_tree_combined_merge(commit, 0, &revs); + } + + reset_revision_walk(); + argv_array_clear(&argv); +} + +static void collect_commit_paths(struct diff_queue_struct *q, + struct diff_options *opts, + void *vdata) +{ + struct walk_paths_data *data = vdata; + int i; + + for (i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + const char *path = p->one->path; + struct pathmap_entry *entry; + struct hashmap_entry lookup; + + hashmap_entry_init(&lookup, strhash(path)); + entry = hashmap_get(data->paths, &lookup, path); + if (entry) + continue; /* already present */ + + FLEX_ALLOC_STR(entry, path, path); + entry->ent = lookup; + hashmap_put(data->paths, entry); + } +} + +/* assign a bit position to all possible paths */ +static void collect_paths(struct hashmap *paths) +{ + struct pathmap_entry **sorted; + size_t i, n; + struct hashmap_iter iter; + struct pathmap_entry *entry; + + /* grab all unique paths */ + hashmap_init(paths, pathmap_entry_hashcmp, NULL, 0); + walk_paths(collect_commit_paths, paths); + + /* and assign them bits in sorted order */ + n = hashmap_get_size(paths); + ALLOC_ARRAY(sorted, n); + i = 0; + for (entry = hashmap_iter_first(paths, &iter); + entry; + entry = hashmap_iter_next(&iter)) { + assert(i < n); + sorted[i++] = entry; + } + QSORT(sorted, i, pathmap_entry_strcmp); + for (i = 0; i < n; i++) + sorted[i]->pos = i; + free(sorted); +} + +/* generate the bitmap for a single commit */ +static void generate_bitmap(struct diff_queue_struct *q, + struct diff_options *opts, + void *vdata) +{ + struct walk_paths_data *data = vdata; + struct bitmap *bitmap = bitmap_new(); + struct ewah_bitmap *ewah; + struct strbuf out = STRBUF_INIT; + size_t i; + + for (i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + const char *path = p->one->path; + struct pathmap_entry *entry; + struct hashmap_entry lookup; + + hashmap_entry_init(&lookup, strhash(path)); + entry = hashmap_get(data->paths, &lookup, path); + if (!entry) + BUG("mysterious path appeared: %s", path); + + bitmap_set(bitmap, entry->pos); + } + + ewah = bitmap_to_ewah(bitmap); + ewah_serialize_strbuf(ewah, &out); + fwrite(out.buf, 1, out.len, stdout); + + trace_printf("bitmap %s %u %u", + oid_to_hex(&data->commit->object.oid), + (unsigned)q->nr, + (unsigned)out.len); + + strbuf_release(&out); + ewah_free(ewah); + bitmap_free(bitmap); +} + +int cmd_main(int argc, const char **argv) +{ + struct hashmap paths; + + setup_git_directory(); + collect_paths(&paths); + + walk_paths(generate_bitmap, &paths); + + return 0; +} From patchwork Tue Oct 9 23:14:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10633383 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD68415E2 for ; Tue, 9 Oct 2018 23:14:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC1D629B35 for ; Tue, 9 Oct 2018 23:14:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BFFCB29B45; Tue, 9 Oct 2018 23:14:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 551D929B35 for ; Tue, 9 Oct 2018 23:14:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726903AbeJJGdX (ORCPT ); Wed, 10 Oct 2018 02:33:23 -0400 Received: from cloud.peff.net ([104.130.231.41]:34882 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725750AbeJJGdX (ORCPT ); Wed, 10 Oct 2018 02:33:23 -0400 Received: (qmail 14469 invoked by uid 109); 9 Oct 2018 23:14:07 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Tue, 09 Oct 2018 23:14:07 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 11327 invoked by uid 111); 9 Oct 2018 23:13:15 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 09 Oct 2018 19:13:15 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 09 Oct 2018 19:14:05 -0400 Date: Tue, 9 Oct 2018 19:14:05 -0400 From: Jeff King To: Derrick Stolee Cc: SZEDER =?utf-8?b?R8OhYm9y?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Stefan Beller , git , Duy Nguyen Subject: [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Message-ID: <20181009231405.GB23730@sigill.intra.peff.net> References: <20181009231250.GA19342@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181009231250.GA19342@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This teaches "gen" mode (formerly the only mode) to include the list of paths, and to prefix each bitmap with its matching oid. The "dump" mode can then read that back in and generate the list of changed paths. This should be almost identical to: git rev-list --all | git diff-tree --stdin --name-only -t The one difference is the sort order: git's diff output is in tree-sort order, so a subtree "foo" sorts like "foo/", which is after "foo.bar". Whereas the bitmap path list has a true byte sort, which puts "foo.bar" after "foo". Signed-off-by: Jeff King --- t/helper/test-tree-bitmap.c | 104 +++++++++++++++++++++++++++++++++++- 1 file changed, 102 insertions(+), 2 deletions(-) diff --git a/t/helper/test-tree-bitmap.c b/t/helper/test-tree-bitmap.c index bc5cf0e514..6f8833344a 100644 --- a/t/helper/test-tree-bitmap.c +++ b/t/helper/test-tree-bitmap.c @@ -112,6 +112,14 @@ static void collect_paths(struct hashmap *paths) QSORT(sorted, i, pathmap_entry_strcmp); for (i = 0; i < n; i++) sorted[i]->pos = i; + + /* dump it while we have the sorted order in memory */ + for (i = 0; i < n; i++) { + printf("%s", sorted[i]->path); + putchar('\0'); + } + putchar('\0'); + free(sorted); } @@ -142,6 +150,8 @@ static void generate_bitmap(struct diff_queue_struct *q, ewah = bitmap_to_ewah(bitmap); ewah_serialize_strbuf(ewah, &out); + + fwrite(data->commit->object.oid.hash, 1, GIT_SHA1_RAWSZ, stdout); fwrite(out.buf, 1, out.len, stdout); trace_printf("bitmap %s %u %u", @@ -154,14 +164,104 @@ static void generate_bitmap(struct diff_queue_struct *q, bitmap_free(bitmap); } -int cmd_main(int argc, const char **argv) +static void do_gen(void) { struct hashmap paths; - setup_git_directory(); collect_paths(&paths); walk_paths(generate_bitmap, &paths); +} + +static void show_path(size_t pos, void *data) +{ + const char **paths = data; + + /* assert(pos < nr_paths), but we didn't pass the latter in */ + printf("%s\n", paths[pos]); +} + +static void do_dump(void) +{ + struct strbuf in = STRBUF_INIT; + const char *cur; + size_t remain; + + const char **paths = NULL; + size_t alloc_paths = 0, nr_paths = 0; + + /* slurp stdin; in the real world we'd mmap all this */ + strbuf_read(&in, 0, 0); + cur = in.buf; + remain = in.len; + + /* read path for each bit; in the real world this would be separate */ + while (remain) { + const char *end = memchr(cur, '\0', remain); + if (!end) { + error("truncated input while reading path"); + goto out; + } + if (end == cur) { + /* empty field signals end of paths */ + cur++; + remain--; + break; + } + + ALLOC_GROW(paths, nr_paths + 1, alloc_paths); + paths[nr_paths++] = cur; + + remain -= end - cur + 1; + cur = end + 1; + } + + /* read the bitmap for each commit */ + while (remain) { + struct object_id oid; + struct ewah_bitmap *ewah; + ssize_t len; + + if (remain < GIT_SHA1_RAWSZ) { + error("truncated input reading oid"); + goto out; + } + hashcpy(oid.hash, (const unsigned char *)cur); + cur += GIT_SHA1_RAWSZ; + remain -= GIT_SHA1_RAWSZ; + + ewah = ewah_new(); + len = ewah_read_mmap(ewah, cur, remain); + if (len < 0) { + ewah_free(ewah); + goto out; + } + + printf("%s\n", oid_to_hex(&oid)); + ewah_each_bit(ewah, show_path, paths); + + ewah_free(ewah); + cur += len; + remain -= len; + } + +out: + free(paths); + strbuf_release(&in); +} + +int cmd_main(int argc, const char **argv) +{ + const char *usage_msg = "test-tree-bitmap "; + + if (!argv[1]) + usage(usage_msg); + else if (!strcmp(argv[1], "gen")) + do_gen(); + else if (!strcmp(argv[1], "dump")) + do_dump(); + else + usage(usage_msg); return 0; } From patchwork Tue Oct 9 23:14:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 10633385 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E248A13AD for ; Tue, 9 Oct 2018 23:14:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CCC5D29998 for ; Tue, 9 Oct 2018 23:14:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD30A29B3E; Tue, 9 Oct 2018 23:14:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E99229998 for ; Tue, 9 Oct 2018 23:14:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726867AbeJJGeA (ORCPT ); Wed, 10 Oct 2018 02:34:00 -0400 Received: from cloud.peff.net ([104.130.231.41]:34894 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725750AbeJJGeA (ORCPT ); Wed, 10 Oct 2018 02:34:00 -0400 Received: (qmail 14519 invoked by uid 109); 9 Oct 2018 23:14:43 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Tue, 09 Oct 2018 23:14:43 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 11351 invoked by uid 111); 9 Oct 2018 23:13:51 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Tue, 09 Oct 2018 19:13:51 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 09 Oct 2018 19:14:41 -0400 Date: Tue, 9 Oct 2018 19:14:41 -0400 From: Jeff King To: Derrick Stolee Cc: SZEDER =?utf-8?b?R8OhYm9y?= , =?utf-8?b?w4Z2YXIg?= =?utf-8?b?QXJuZmrDtnLDsA==?= Bjarmason , Stefan Beller , git , Duy Nguyen Subject: [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Message-ID: <20181009231441.GC23730@sigill.intra.peff.net> References: <20181009231250.GA19342@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181009231250.GA19342@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The rules are basically: - each bitmap is a series of counts of runs of 0/1 - each count is one of our standard varints - each bitmap must have at least one initial count of zeroes (which may itself be a zero-length count, if the first bit is set) - a zero-length count anywhere else marks the end of the bitmap For a sparse bitmap, these will tend to be quite short, because long runs are encoded as fairly small counts. The worst case is an alternate 0/1/0/1 bitmap, where we will spend a full byte to specify each bit (thus bloating it by a factor of 8 over an uncompressed bitmap). Signed-off-by: Jeff King --- t/helper/test-tree-bitmap.c | 105 +++++++++++++++++++++++++++++++----- 1 file changed, 91 insertions(+), 14 deletions(-) diff --git a/t/helper/test-tree-bitmap.c b/t/helper/test-tree-bitmap.c index 6f8833344a..36f19ed464 100644 --- a/t/helper/test-tree-bitmap.c +++ b/t/helper/test-tree-bitmap.c @@ -3,6 +3,7 @@ #include "diffcore.h" #include "argv-array.h" #include "ewah/ewok.h" +#include "varint.h" /* map of pathnames to bit positions */ struct pathmap_entry { @@ -123,6 +124,49 @@ static void collect_paths(struct hashmap *paths) free(sorted); } +static void strbuf_add_varint(struct strbuf *out, uintmax_t val) +{ + size_t len; + strbuf_grow(out, 16); /* enough for any varint */ + len = encode_varint(val, (unsigned char *)out->buf + out->len); + strbuf_setlen(out, out->len + len); +} + +static void bitmap_to_rle(struct strbuf *out, struct bitmap *bitmap) +{ + int curval = 0; /* count zeroes, then ones, then zeroes, etc */ + size_t run = 0; + size_t word; + size_t orig_len = out->len; + + for (word = 0; word < bitmap->word_alloc; word++) { + int bit; + + for (bit = 0; bit < BITS_IN_EWORD; bit++) { + int val = !!(bitmap->words[word] & (((eword_t)1) << bit)); + if (val == curval) + run++; + else { + strbuf_add_varint(out, run); + curval = 1 - curval; /* flip 0/1 */ + run = 1; + } + } + } + + /* + * complete the run, but do not bother with trailing zeroes, unless we + * failed to write even an initial run of 0's. + */ + if (curval && run) + strbuf_add_varint(out, run); + else if (orig_len == out->len) + strbuf_add_varint(out, 0); + + /* signal end-of-input with an empty run */ + strbuf_add_varint(out, 0); +} + /* generate the bitmap for a single commit */ static void generate_bitmap(struct diff_queue_struct *q, struct diff_options *opts, @@ -130,7 +174,6 @@ static void generate_bitmap(struct diff_queue_struct *q, { struct walk_paths_data *data = vdata; struct bitmap *bitmap = bitmap_new(); - struct ewah_bitmap *ewah; struct strbuf out = STRBUF_INIT; size_t i; @@ -148,8 +191,7 @@ static void generate_bitmap(struct diff_queue_struct *q, bitmap_set(bitmap, entry->pos); } - ewah = bitmap_to_ewah(bitmap); - ewah_serialize_strbuf(ewah, &out); + bitmap_to_rle(&out, bitmap); fwrite(data->commit->object.oid.hash, 1, GIT_SHA1_RAWSZ, stdout); fwrite(out.buf, 1, out.len, stdout); @@ -160,7 +202,6 @@ static void generate_bitmap(struct diff_queue_struct *q, (unsigned)out.len); strbuf_release(&out); - ewah_free(ewah); bitmap_free(bitmap); } @@ -181,6 +222,51 @@ static void show_path(size_t pos, void *data) printf("%s\n", paths[pos]); } +static size_t rle_each_bit(const unsigned char *in, size_t len, + void (*fn)(size_t, void *), void *data) +{ + int curval = 0; /* look for zeroes first, then ones, etc */ + const unsigned char *cur = in; + const unsigned char *end = in + len; + size_t pos; + + /* we always have a first run, even if it's 0 zeroes */ + pos = decode_varint(&cur); + + /* + * ugh, varint does not seem to have a way to prevent reading past + * the end of the buffer. We'll do a length check after each one, + * so the worst case is bounded. + */ + if (cur > end) { + error("input underflow in rle"); + return len; + } + + while (1) { + size_t run = decode_varint(&cur); + + if (cur > end) { + error("input underflow in rle"); + return len; + } + + if (!run) + break; /* empty run signals end */ + + curval = 1 - curval; /* flip 0/1 */ + if (curval) { + /* we have a run of 1's; deliver them */ + size_t i; + for (i = 0; i < run; i++) + fn(pos + i, data); + } + pos += run; + } + + return cur - in; +} + static void do_dump(void) { struct strbuf in = STRBUF_INIT; @@ -219,7 +305,6 @@ static void do_dump(void) /* read the bitmap for each commit */ while (remain) { struct object_id oid; - struct ewah_bitmap *ewah; ssize_t len; if (remain < GIT_SHA1_RAWSZ) { @@ -230,17 +315,9 @@ static void do_dump(void) cur += GIT_SHA1_RAWSZ; remain -= GIT_SHA1_RAWSZ; - ewah = ewah_new(); - len = ewah_read_mmap(ewah, cur, remain); - if (len < 0) { - ewah_free(ewah); - goto out; - } - printf("%s\n", oid_to_hex(&oid)); - ewah_each_bit(ewah, show_path, paths); + len = rle_each_bit((const unsigned char *)cur, remain, show_path, paths); - ewah_free(ewah); cur += len; remain -= len; }