From patchwork Fri Dec 20 22:05:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C064139A for ; Fri, 20 Dec 2019 22:05:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0EE6D2146E for ; Fri, 20 Dec 2019 22:05:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cAGjSyo/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727590AbfLTWF2 (ORCPT ); Fri, 20 Dec 2019 17:05:28 -0500 Received: from mail-ed1-f67.google.com ([209.85.208.67]:37752 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727539AbfLTWFZ (ORCPT ); Fri, 20 Dec 2019 17:05:25 -0500 Received: by mail-ed1-f67.google.com with SMTP id cy15so9853217edb.4 for ; Fri, 20 Dec 2019 14:05:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=5jXK0NxFrW0RhtPRb+zP4ffc0skGZ1HBnPaormCuV1o=; b=cAGjSyo/OWZYU66aI2VD4YjycdcN7xtaaviLTHc5dpR56BneNeQbYCLdk1HfIzwe1Z jXFk3dhsyHXGEaavBTVNJk7DbXKh3VmcXlUP4k48NcBithhRSHWO5N7P5jdC0FZZcVKb i1zhr4HzXuvHsKH8BjD7dKag9/pwGLx6UYRVzpJKQo8Hx/Vcugc9tphccif+iDMCUHoz Jnzmhs26szFrqRS5GI4PCuLzy0oA7qzk/FPsI4l9Oh1UNXsm6D4lJijUnJpdC7I2qsVg hILrOiE1hflh8gBppOJasGsdTuKfX/8GCNAPwWxOhkRFNiltUKCPlF0UXtw7/J8NZHZX epmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=5jXK0NxFrW0RhtPRb+zP4ffc0skGZ1HBnPaormCuV1o=; b=alJbiPeNrn53COx7edhQp1XSW0Qej61J1+nbv1bpdTWd8PAEtHKM6ZYsAxwrqtrZZL DlKFzgslnM05Q8+w9xjgXXD4Perz3MTcABjiIanorptZjzyIMHMfGtNGhTZBwy4ODdEZ 33044x8TAZrIIKIvg4z+dHR4jzftgDAFZP5zrDgXZ+e37U7YqDzPm0NrNhsMWA9zIqKP hE9FYs11BXZTeVBgvo3TlaD/L/yW5Yw5AqMR50F0+D1jYiw4egKgZfMib+AJ+hbZwnRx wavGY0NzKERzY9wk47d84wfv5+IP1mDwFrWU5IlPOmf/tlFgo/xN1EAxcHtdLhknW4kq MGvQ== X-Gm-Message-State: APjAAAV2t8DDT3HklKGX9/FnSjIycWlkctK9MZYjy8A3RjNcfp9L2vQg QjDcRC3slD9REYVD7UPvt4+PjIUC X-Google-Smtp-Source: APXvYqytVLAXAne0RkwRSVqete5NXMTMFNSSeWfjkfj+IolggZ+NI60ChMn19Hfr+ZMS7+Qx+5dDgA== X-Received: by 2002:a17:906:ce2e:: with SMTP id sd14mr19037426ejb.190.1576879523262; Fri, 20 Dec 2019 14:05:23 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b13sm89213ejl.5.2019.12.20.14.05.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:22 -0800 (PST) Message-Id: <6bdde5e4f0ceb54546978e3e9cdde00045d45468.1576879520.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:12 +0000 Subject: [PATCH 1/9] commit-graph: add --changed-paths option to write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh Add --changed-paths option to git commit-graph write. This option will soon allow users to compute bloom filters for the paths changed between a commit and its first significant parent, and write this information into the commit-graph file. Note: This commit does not change any behavior. It only introduces the option and passes down the appropriate flag to the commit-graph. RFC Notes: 1. We named the option --changed-paths to capture what the option does, instead of how it does it. The current implementation does this using bloom filters. We believe using --changed-paths however keeps the implementation open to other data structures. All thoughts and suggestions for the name and this approach are welcome 2. Currently, a subsequent commit in this series will add tests that exercise this option. I plan to split that test commit across the series as appropriate. Helped-by: Derrick Stolee Signed-off-by: Garima Singh --- Documentation/git-commit-graph.txt | 5 +++++ builtin/commit-graph.c | 9 +++++++-- commit-graph.h | 3 ++- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index bcd85c1976..1efe6e5c5a 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -54,6 +54,11 @@ or `--stdin-packs`.) With the `--append` option, include all commits that are present in the existing commit-graph file. + +With the `--changed-paths` option, compute and write information about the +paths changed between a commit and it's first parent. This operation can +take a while on large repositories. It provides significant performance gains +for getting file based history logs with `git log` ++ With the `--split` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. The new commits not already in the commit-graph are added in a new "tip" file. This file diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index e0c6fc4bbf..9bd1e11161 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -9,7 +9,7 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), - N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] "), + N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] "), NULL }; @@ -19,7 +19,7 @@ static const char * const builtin_commit_graph_verify_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] "), + N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] "), NULL }; @@ -32,6 +32,7 @@ static struct opts_commit_graph { int split; int shallow; int progress; + int enable_bloom_filters; } opts; static int graph_verify(int argc, const char **argv) @@ -110,6 +111,8 @@ static int graph_write(int argc, const char **argv) N_("start walk at commits listed by stdin")), OPT_BOOL(0, "append", &opts.append, N_("include all commits already in the commit-graph file")), + OPT_BOOL(0, "changed-paths", &opts.enable_bloom_filters, + N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), OPT_BOOL(0, "split", &opts.split, N_("allow writing an incremental commit-graph file")), @@ -143,6 +146,8 @@ static int graph_write(int argc, const char **argv) flags |= COMMIT_GRAPH_WRITE_SPLIT; if (opts.progress) flags |= COMMIT_GRAPH_WRITE_PROGRESS; + if (opts.enable_bloom_filters) + flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS; read_replace_refs = 0; diff --git a/commit-graph.h b/commit-graph.h index 7f5c933fa2..952a4b83be 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -76,7 +76,8 @@ enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), COMMIT_GRAPH_WRITE_SPLIT = (1 << 2), /* Make sure that each OID in the input is a valid commit OID. */ - COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3) + COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3), + COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4) }; struct split_commit_graph_opts { From patchwork Fri Dec 20 22:05:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306395 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 704A06C1 for ; Fri, 20 Dec 2019 22:05:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4432B21655 for ; Fri, 20 Dec 2019 22:05:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="snVPbPIV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727584AbfLTWF2 (ORCPT ); Fri, 20 Dec 2019 17:05:28 -0500 Received: from mail-ed1-f67.google.com ([209.85.208.67]:46994 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727473AbfLTWF1 (ORCPT ); Fri, 20 Dec 2019 17:05:27 -0500 Received: by mail-ed1-f67.google.com with SMTP id m8so9821976edi.13 for ; Fri, 20 Dec 2019 14:05:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=n+fhWLRRV8jUrq3jgjTXCVMg0G/wpmNbhsUWvsWJmdg=; b=snVPbPIVGAm6s3mv44QCDyyiYbDuhFjgsk1Xwzbq8rxP1JscFS3b4SXd/2ESTKquz5 FiVZnFWSeDpWCRt4YgwxTBOMTax9I3Qzv5KLTvnQWmedj49jylh3nF+dXfJJTUayi8Oz k/b2lM3UwTd0TRrsfNskyjCECtZCFhncsklq+g+jJ5aCaU2kPscFoU3W+p3qLG6uuNqY 8CO4GAerYc79K/jesq7kYzyWk88JnQOr9aXDduiDpQEhT4eFnOm/GTCHtSYL4JVaBPH2 motEqV8BagZ2wRlBsUN8R6esn7F62FTuwJ0pIjvvHub0ZxyDpMFKwhUzrIR4Dd5CsX5i PZgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=n+fhWLRRV8jUrq3jgjTXCVMg0G/wpmNbhsUWvsWJmdg=; b=X7xvP20Z4Ee4yf8BeQwC1rb8i1n71xX4+ht6uqrLl/HV5suVfeM2pU4wOn0BAOpmWH FzbgRIa6U/exaseIVkJQS3Jw1cV5ImXlk/Nr6p8I1KDMd9CrG7/T/1mPGi/w3RX5wwXS dvgA18kp5F+7JvenXLSRKtpXyDrXDVVZR+ndBFsyKi1xE26f6Zy0reYS4s197hd0tXSD mRZrK0IIvfJvtwe4x3RSa3NsN+AxF7uMHa5jfPfuINJdrrlFXg8E6E7j3SD5CR0Xq/lO qPMcxjQrU1FCEMLpzNSsXMrmWALmy1hsFcjNSBD4Y7E7DNgqAtYVD9QKFl2o9Y647JuA TC2g== X-Gm-Message-State: APjAAAWUeVnBaKBxKR4IJfeN5TssmGYjA208zKR6H3CLVhTGhNx3rSF2 MY2FlL/wu/fEPWLfLNp+brKsK2wO X-Google-Smtp-Source: APXvYqx+79Cu8GhthflJMCkm+Oo2gofjqN/EzD8MsBHeIrIjPWq6KVNUXiLyi79cjpfoCgdhpBLrGA== X-Received: by 2002:a50:e108:: with SMTP id h8mr18910851edl.196.1576879524043; Fri, 20 Dec 2019 14:05:24 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e2sm1248520eja.37.2019.12.20.14.05.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:23 -0800 (PST) Message-Id: In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:13 +0000 Subject: [PATCH 2/9] commit-graph: write changed paths bloom filters Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh The changed path bloom filters help determine which paths changed between a commit and its first parent. We already have the "--changed-paths" option for the "git commit-graph write" subcommand, now actually compute them under that option. The COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag enables this computation. RFC Notes: Here are some details about the implementation and I would love to know your thoughts and suggestions for improvements here. For details on what bloom filters are and how they work, please refer to Dr. Derrick Stolee's blog post [1]. [1] https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-bloom-filters/ 1. The implementation sticks to the recommended values of 7 and 10 for the number of hashes and the size of each entry, as described in the blog. The implementation while not completely open to it at the moment, is flexible enough to allow for tweaking these settings in the future. Note: The performance gains we have observed so far with these values is significant enough to not that we did not need to tweak these settings. The cover letter of this series has the details and the commit where we have git log use bloom filters. 2. As described in the blog and the linked technical paper therin, we do not need 7 independent hashing functions. We use the Murmur3 hashing scheme - seed it twice and then combine those to procure an arbitrary number of hash values. 3. The filters are sized according to the number of changes in the each commit, with minimum size of one 64 bit word. [Call for advice] We currently cap writing bloom filters for commits with atmost 512 changed files. In the current implementation, we compute the diff, and then just throw it away once we see it has more than 512 changes. Any suggestiongs on how to reduce the work we are doing in this case are more than welcome. [Call for advice] Would the git community like this commit to be split up into more granular commits? This commit could possibly be split out further with the bloom.c code in its own commit, to be used by the commit-graph in a subsequent commit. While I prefer it being contained in one commit this way, I am open to suggestions. [Call for advice] Would a technical document explaining the exact details of the bloom filter implemenation and the hashing calculations be helpful? I will be adding details into Documentation/technical/commit-graph-format.txt, but the bloom filter code is an independent subsystem and could be used outside of the commit-graph feature. Is it worth a separate document, or should we apply "You Ain't Gonna Need It" principles? [Call for advice] I plan to add unit tests for bloom.c, specifically to ensure that the hash algorithm and bloom key calculations are stable across versions. Signed-off-by: Garima Singh Helped-by: Derrick Stolee --- Makefile | 1 + bloom.c | 201 +++++++++++++++++++++++++++++++++++++++++++++++++ bloom.h | 46 +++++++++++ commit-graph.c | 32 +++++++- 4 files changed, 279 insertions(+), 1 deletion(-) create mode 100644 bloom.c create mode 100644 bloom.h diff --git a/Makefile b/Makefile index 42a061d3fb..9d5e26f5d6 100644 --- a/Makefile +++ b/Makefile @@ -838,6 +838,7 @@ LIB_OBJS += base85.o LIB_OBJS += bisect.o LIB_OBJS += blame.o LIB_OBJS += blob.o +LIB_OBJS += bloom.o LIB_OBJS += branch.o LIB_OBJS += bulk-checkin.o LIB_OBJS += bundle.o diff --git a/bloom.c b/bloom.c new file mode 100644 index 0000000000..08328cc381 --- /dev/null +++ b/bloom.c @@ -0,0 +1,201 @@ +#include "git-compat-util.h" +#include "bloom.h" +#include "commit-graph.h" +#include "object-store.h" +#include "diff.h" +#include "diffcore.h" +#include "revision.h" +#include "hashmap.h" + +#define BITS_PER_BLOCK 64 + +define_commit_slab(bloom_filter_slab, struct bloom_filter); + +struct bloom_filter_slab bloom_filters; + +struct pathmap_hash_entry { + struct hashmap_entry entry; + const char path[FLEX_ARRAY]; +}; + +static uint32_t rotate_right(uint32_t value, int32_t count) +{ + uint32_t mask = 8 * sizeof(uint32_t) - 1; + count &= mask; + return ((value >> count) | (value << ((-count) & mask))); +} + +static uint32_t seed_murmur3(uint32_t seed, const char *data, int len) +{ + const uint32_t c1 = 0xcc9e2d51; + const uint32_t c2 = 0x1b873593; + const int32_t r1 = 15; + const int32_t r2 = 13; + const uint32_t m = 5; + const uint32_t n = 0xe6546b64; + int i; + uint32_t k1 = 0; + const char *tail; + + int len4 = len / sizeof(uint32_t); + + const uint32_t *blocks = (const uint32_t*)data; + + uint32_t k; + for (i = 0; i < len4; i++) + { + k = blocks[i]; + k *= c1; + k = rotate_right(k, r1); + k *= c2; + + seed ^= k; + seed = rotate_right(seed, r2) * m + n; + } + + tail = (data + len4 * sizeof(uint32_t)); + + switch (len & (sizeof(uint32_t) - 1)) + { + case 3: + k1 ^= ((uint32_t)tail[2]) << 16; + /*-fallthrough*/ + case 2: + k1 ^= ((uint32_t)tail[1]) << 8; + /*-fallthrough*/ + case 1: + k1 ^= ((uint32_t)tail[0]) << 0; + k1 *= c1; + k1 = rotate_right(k1, r1); + k1 *= c2; + seed ^= k1; + break; + } + + seed ^= (uint32_t)len; + seed ^= (seed >> 16); + seed *= 0x85ebca6b; + seed ^= (seed >> 13); + seed *= 0xc2b2ae35; + seed ^= (seed >> 16); + + return seed; +} + +static inline uint64_t get_bitmask(uint32_t pos) +{ + return ((uint64_t)1) << (pos & (BITS_PER_BLOCK - 1)); +} + +void fill_bloom_key(const char *data, + int len, + struct bloom_key *key, + struct bloom_filter_settings *settings) +{ + int i; + uint32_t seed0 = 0x293ae76f; + uint32_t seed1 = 0x7e646e2c; + + uint32_t hash0 = seed_murmur3(seed0, data, len); + uint32_t hash1 = seed_murmur3(seed1, data, len); + + key->hashes = (uint32_t *)xcalloc(settings->num_hashes, sizeof(uint32_t)); + for (i = 0; i < settings->num_hashes; i++) + key->hashes[i] = hash0 + i * hash1; +} + +static void add_key_to_filter(struct bloom_key *key, + struct bloom_filter *filter, + struct bloom_filter_settings *settings) +{ + int i; + uint64_t mod = filter->len * BITS_PER_BLOCK; + + for (i = 0; i < settings->num_hashes; i++) { + uint64_t hash_mod = key->hashes[i] % mod; + uint64_t block_pos = hash_mod / BITS_PER_BLOCK; + + filter->data[block_pos] |= get_bitmask(hash_mod); + } +} + +void load_bloom_filters(void) +{ + init_bloom_filter_slab(&bloom_filters); +} + +struct bloom_filter *get_bloom_filter(struct repository *r, + struct commit *c) +{ + struct bloom_filter *filter; + struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; + int i; + struct rev_info revs; + const char *revs_argv[] = {NULL, "HEAD", NULL}; + + filter = bloom_filter_slab_at(&bloom_filters, c); + init_revisions(&revs, NULL); + revs.diffopt.flags.recursive = 1; + + setup_revisions(2, revs_argv, &revs, NULL); + + if (c->parents) + diff_tree_oid(&c->parents->item->object.oid, &c->object.oid, "", &revs.diffopt); + else + diff_tree_oid(NULL, &c->object.oid, "", &revs.diffopt); + diffcore_std(&revs.diffopt); + + if (diff_queued_diff.nr <= 512) { + struct hashmap pathmap; + struct pathmap_hash_entry* e; + struct hashmap_iter iter; + hashmap_init(&pathmap, NULL, NULL, 0); + + for (i = 0; i < diff_queued_diff.nr; i++) { + const char* path = diff_queued_diff.queue[i]->two->path; + const char* p = path; + + /* + * Add each leading directory of the changed file, i.e. for + * 'dir/subdir/file' add 'dir' and 'dir/subdir' as well, so + * the Bloom filter could be used to speed up commands like + * 'git log dir/subdir', too. + * + * Note that directories are added without the trailing '/'. + */ + do { + char* last_slash = strrchr(p, '/'); + + FLEX_ALLOC_STR(e, path, path); + hashmap_entry_init(&e->entry, strhash(p)); + hashmap_add(&pathmap, &e->entry); + + if (!last_slash) + last_slash = (char*)p; + *last_slash = '\0'; + + } while (*p); + + diff_free_filepair(diff_queued_diff.queue[i]); + } + + filter->len = (hashmap_get_size(&pathmap) * settings.bits_per_entry + BITS_PER_BLOCK - 1) / BITS_PER_BLOCK; + filter->data = xcalloc(filter->len, sizeof(uint64_t)); + + hashmap_for_each_entry(&pathmap, &iter, e, entry) { + struct bloom_key key; + fill_bloom_key(e->path, strlen(e->path), &key, &settings); + add_key_to_filter(&key, filter, &settings); + } + + hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); + } else { + filter->data = NULL; + filter->len = 0; + } + + free(diff_queued_diff.queue); + DIFF_QUEUE_CLEAR(&diff_queued_diff); + + return filter; +} \ No newline at end of file diff --git a/bloom.h b/bloom.h new file mode 100644 index 0000000000..ba8ae70b67 --- /dev/null +++ b/bloom.h @@ -0,0 +1,46 @@ +#ifndef BLOOM_H +#define BLOOM_H + +struct commit; +struct repository; + +struct bloom_filter_settings { + uint32_t hash_version; + uint32_t num_hashes; + uint32_t bits_per_entry; +}; + +#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 } + +/* + * A bloom_filter struct represents a data segment to + * use when testing hash values. The 'len' member + * dictates how many uint64_t entries are stored in + * 'data'. + */ +struct bloom_filter { + uint64_t *data; + int len; +}; + +/* + * A bloom_key represents the k hash values for a + * given hash input. These can be precomputed and + * stored in a bloom_key for re-use when testing + * against a bloom_filter. + */ +struct bloom_key { + uint32_t *hashes; +}; + +void load_bloom_filters(void); + +struct bloom_filter *get_bloom_filter(struct repository *r, + struct commit *c); + +void fill_bloom_key(const char *data, + int len, + struct bloom_key *key, + struct bloom_filter_settings *settings); + +#endif diff --git a/commit-graph.c b/commit-graph.c index e771394aff..61e60ff98a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -16,6 +16,7 @@ #include "hashmap.h" #include "replace-object.h" #include "progress.h" +#include "bloom.h" #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ @@ -794,9 +795,11 @@ struct write_commit_graph_context { unsigned append:1, report_progress:1, split:1, - check_oids:1; + check_oids:1, + bloom:1; const struct split_commit_graph_opts *split_opts; + uint32_t total_bloom_filter_size; }; static void write_graph_chunk_fanout(struct hashfile *f, @@ -1139,6 +1142,28 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void compute_bloom_filters(struct write_commit_graph_context *ctx) +{ + int i; + struct progress *progress = NULL; + + load_bloom_filters(); + + if (ctx->report_progress) + progress = start_progress( + _("Computing commit diff Bloom filters"), + ctx->commits.nr); + + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + struct bloom_filter *filter = get_bloom_filter(ctx->r, c); + ctx->total_bloom_filter_size += sizeof(uint64_t) * filter->len; + display_progress(progress, i + 1); + } + + stop_progress(&progress); +} + static int add_ref_to_list(const char *refname, const struct object_id *oid, int flags, void *cb_data) @@ -1791,6 +1816,8 @@ int write_commit_graph(const char *obj_dir, ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0; ctx->split_opts = split_opts; + ctx->bloom = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; + ctx->total_bloom_filter_size = 0; if (ctx->split) { struct commit_graph *g; @@ -1885,6 +1912,9 @@ int write_commit_graph(const char *obj_dir, compute_generation_numbers(ctx); + if (ctx->bloom) + compute_bloom_filters(ctx); + res = write_commit_graph_file(ctx); if (ctx->split) From patchwork Fri Dec 20 22:05:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E68746C1 for ; Fri, 20 Dec 2019 22:05:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C363E218AC for ; Fri, 20 Dec 2019 22:05:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jtaXD3pQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727602AbfLTWF3 (ORCPT ); Fri, 20 Dec 2019 17:05:29 -0500 Received: from mail-ed1-f67.google.com ([209.85.208.67]:33196 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727567AbfLTWF1 (ORCPT ); Fri, 20 Dec 2019 17:05:27 -0500 Received: by mail-ed1-f67.google.com with SMTP id r21so9853238edq.0 for ; Fri, 20 Dec 2019 14:05:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=gOPv5Og/2dZIAjomx53PBGA50t4FG/TL63zOrdFjvhQ=; b=jtaXD3pQQrfWOXCwQ+NypOHgB0ImkdNYESmplymLFlvQaVar/hI+VbeiczIJAtJ5Xd 1BvIlWn3OxlMhMfSDXoJ0L9cHs+kJ4py3FvTEsorVniXq66YX+c/7LHqkXEewIdrerFv Rh0quaGjbxpkOTm8pU/XlCO4AGw/I9BRXI+Uszv+YstmKVHav3sqI48BoyH89sabOttl 21NdDS2E3YBJTmPh0VQfYx4KeWqlyFeZDeZwATdxaXHmfVH5ZWJJSCvy8/GHl4hwVTZ9 7MegW6JxO3T18Oc0xTIlWndke+OXSNEL3imVij+QfxxcPOdJo5vpVz+GpLkfdpB7Kx8E na7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=gOPv5Og/2dZIAjomx53PBGA50t4FG/TL63zOrdFjvhQ=; b=oZKjDyvKVMLi4GaK+clZn1kLKKZmGQp0vH37FjrbUwTxiubOpHSEJOZ42k7/frLZ4Z cj+1vZjCClWiH1X80yMe0jGXrpa1dL129rbTwkZSwuiJib3mI2nDTkFkTw5wRag5S+lO ATnU6D+69lBaB4MwOMPHpnXKRQnDcH7heeTAIUUifbOgXFxeZKRBSayqAmLbo9W1mFs1 TgJa/dpT5xWIpkl36ZwIKt5PibOWkKZxdU+YulTNw7sKuxGcO/q3yNrAgICjGPpzpITh nnYjdlPYtKsu5qyJa+GSWijiYRzkTm/GIwVrIWvOqpgCWDPflxvW26KwlquAGBfkrJQ6 vckQ== X-Gm-Message-State: APjAAAWm3C1jHifm8ylYkBfPympjDbasfchcovimSelVycdWnN6cSEQT 6N3NBJBfYsidUwDYvf18XLnynfj+ X-Google-Smtp-Source: APXvYqynIcMOVFcPDzl9DzeWHxUWVqJzJ+hr8dxg4zhSRPi7NVS+5UgS8PBVeClh3zWn+KrxUIQY+g== X-Received: by 2002:a05:6402:3046:: with SMTP id bu6mr18415908edb.139.1576879524948; Fri, 20 Dec 2019 14:05:24 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s18sm1218948ejj.86.2019.12.20.14.05.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:24 -0800 (PST) Message-Id: In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:14 +0000 Subject: [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh This is a minor cleanup to make it easier to change the number of chunks being written to the commit-graph in the future. Signed-off-by: Garima Singh --- commit-graph.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 61e60ff98a..8c4941eeaa 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -24,6 +24,7 @@ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ +#define MAX_NUM_CHUNKS 5 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -1381,8 +1382,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int fd; struct hashfile *f; struct lock_file lk = LOCK_INIT; - uint32_t chunk_ids[6]; - uint64_t chunk_offsets[6]; + uint32_t chunk_ids[MAX_NUM_CHUNKS + 1]; + uint64_t chunk_offsets[MAX_NUM_CHUNKS + 1]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; From patchwork Fri Dec 20 22:05:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 806501892 for ; Fri, 20 Dec 2019 22:05:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5F79F218AC for ; Fri, 20 Dec 2019 22:05:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="u77WK7cH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727600AbfLTWF2 (ORCPT ); Fri, 20 Dec 2019 17:05:28 -0500 Received: from mail-ed1-f68.google.com ([209.85.208.68]:35527 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727569AbfLTWF1 (ORCPT ); Fri, 20 Dec 2019 17:05:27 -0500 Received: by mail-ed1-f68.google.com with SMTP id f8so9859028edv.2 for ; Fri, 20 Dec 2019 14:05:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=nAlSjZATuzpSGNPDUtmJz/VIFttWV/K0yn7P/eWBBak=; b=u77WK7cH/lO5zUMiRK+NouwTk9pe7D67/2JnZw7BWnkiS9cYTNbzh9v5I5rIKLRRQ0 88YHTGjD3xmsMNn7VrISLtUELsuHYwk7JlFs21kWyZlmv2HOQXSDez4RSkNxBBkmdduT dNFzzcYK2Qx7FKqZvqYEaKIfFXVg2zjNTQIf0dj2FJY45hggGku+4SBnF5dEZBcBuq6S 1H2rtaQnleqt9GwlU2KnET/wqlVhhVJIASBzgfPtP0Iyh2GjnUOek4aqERgaG2KvMqV2 vuQsodpBqGmWUC8xrCzXmE+atMEb8BZBIFSLohKNvRBGHWki7RfJZeO9h5wOficda+FY YSkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=nAlSjZATuzpSGNPDUtmJz/VIFttWV/K0yn7P/eWBBak=; b=qbd4jzyClmT176+1OX81gKdp1KFwsfktyOfPPRXjXnfOOKn/tfBassf6RyHJXjpww+ DSsBqKCwsssY659tcB+jQ6ceCJT0OsR7Dx+21JyZQ67nZxdf4D8z8sl8Gu5h7MSlbHqD YmeDLhG/twZ6GsiuYp10dB8LjW8Bhn7pc1RD9y52umzt8S/XWivcAY7GTHy6S+L0l6vc 2Z/Ppe/fhdUjpdVbwK+yrAGpyfGBZEzJC9Zy30CmGSuKJPqkSvMvgP+tgxNhOlO05ZM1 2FJB5NEND9f4h3lVEO1TkL259W1aVDwZE39qO/SJ+IoOrxIaS7IY5jJX+B74L3N0w5iM EM8A== X-Gm-Message-State: APjAAAWK2z4FcXZpKFq+YKV2s2flRVO6Yo500pL4YufdxtVpy5TGEoj2 QQoC8i+RlNIfHePBgUURbF4UWhIk X-Google-Smtp-Source: APXvYqyLbV0+ckuhJdeqi1GUIs4ibMuvu5v9IVQlauqQR5c+pJr2lUi6RXh6/AWYorVIYVsnmBah0Q== X-Received: by 2002:a17:906:3798:: with SMTP id n24mr18642547ejc.15.1576879525747; Fri, 20 Dec 2019 14:05:25 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y6sm1115208edm.84.2019.12.20.14.05.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:25 -0800 (PST) Message-Id: <3182a11f7c07af834ba71dc7861742458754eb91.1576879520.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:15 +0000 Subject: [PATCH 4/9] commit-graph: document bloom filter format Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh Update the technical documentation for commit-graph-format with BIDX and BDAT chunk information. RFC Notes: 1. [Call for advice] We specifically mention that we are using bloom filters in this technical document. Should this document also be made open to other data structures in the future, with versioning information? 2. [Call for advice] We are also not describing the explicit nature of how we store the bloom filter binary data. Would it be useful to document details about the hash algorithm, the number of hashes and the specific seed values we are using in a separate document, or perhaps in a separate section in this document? Helped-by: Derrick Stolee Signed-off-by: Garima Singh --- Documentation/technical/commit-graph-format.txt | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index a4f17441ae..6497f19f08 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -17,6 +17,9 @@ metadata, including: - The parents of the commit, stored using positional references within the graph file. +- The bloom filter of the commit carrying the paths that were changed between + the commit and it's first parent. + These positional references are stored as unsigned 32-bit integers corresponding to the array position within the list of commit OIDs. Due to some special constants we use to track parents, we can store at most @@ -93,6 +96,20 @@ CHUNK DATA: positions for the parents until reaching a value with the most-significant bit on. The other bits correspond to the position of the last parent. + Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) [Optional] + For each commit we store the offset of its bloom filter in the BDAT chunk + as follows: + BIDX[i] = number of 8-byte words in all the bloom filters from commit 0 to + commit i (inclusive) + + Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] + * It starts with three 32 bit integers for the + - version of the hash algorithm being used + - the number of hashes used in the computation + - the number of bits per entry + * The rest of the chunk is the concatenation of all the computed bloom + filters for the commits in lexicographic order. + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] This list of H-byte hashes describe a set of B commit-graph files that form a commit-graph chain. The graph position for the ith commit in this From patchwork Fri Dec 20 22:05:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306391 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC8C517EF for ; Fri, 20 Dec 2019 22:05:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 818CB218AC for ; Fri, 20 Dec 2019 22:05:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TtzGNp5p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727611AbfLTWFc (ORCPT ); Fri, 20 Dec 2019 17:05:32 -0500 Received: from mail-ed1-f67.google.com ([209.85.208.67]:35528 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727572AbfLTWF3 (ORCPT ); Fri, 20 Dec 2019 17:05:29 -0500 Received: by mail-ed1-f67.google.com with SMTP id f8so9859059edv.2 for ; Fri, 20 Dec 2019 14:05:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=jvzyRr0y4T3PrWPI48LkRBUpMJffE0pXek+8LCZXFTc=; b=TtzGNp5pGt790CTYKRl/hlRJtZ/9dI1EWJqDzZEY7P+X3Os4CgzJU59GJtvnhmwVrg AoHq5Tbo6NeBHGhRMJ2/YdBqllQkaIIozT5tFDa+0MCgLBzZ9XlpZVYi+hcoPg+GUQON l3RBKKo1kquijDorgDJ+NGoDRLh36EykuSDxIHGWSGn23EhR45utCD1BSlAdhN3dH5By TIVeAGIjjJZm8N23YY/996Ca91+HBlDx788nPvpAZI/RZLdzPWA4YT1/cXtJpXA30U9k QcCH9jc2OAhJnRkjwFZ700gJ/c++E/aeK8B1B3Jw36ImH7cjw1K4pMu0pOyTZ8WfM6pt Pxzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=jvzyRr0y4T3PrWPI48LkRBUpMJffE0pXek+8LCZXFTc=; b=A1WK9prSp/YTNyIQdF5M7WyshU9CtbiERjxvETe18GHaAhiOO+HFtR7kyYcRB7dxrm vXQNTuJyRuzCbyxX9nOMTCsV2RCE30fePPO6yaOLh+rXalNpp3w8/0EHvgQwy241DK97 lQGb8MLEHqPPOtBX95eGgmsy0iVKL4bPavHpzNupQd5F+RxEKWo+fKy+PBkgF9nQMWGf u48GnmhHC44x5jJES9yR8q11mIR8gRfVJ0QezmKeYHmVEgryDSgQj40nZAvGA1cgQzD6 PGDztf1bAM52X5W7ZW32bO1p+eSRSuaMQ89k1YJL9bR4s5uFYs9OvhR9aYu/A5nMKYfq za1g== X-Gm-Message-State: APjAAAV/xiMCU5mF6SNTO65H4YBkUTKtjVFU1d8XvVx5Ymt9/YntxTpf 5OPO8qh1xopBtxzOfytrXb03QeHf X-Google-Smtp-Source: APXvYqw0xtKQgRJUqZuZ4N2vrd0CjUQkqpq7+uTpz6JuHs7qhiP29TpWyrfzV56VJ3VQ2fiRCd/g9g== X-Received: by 2002:a17:906:f286:: with SMTP id gu6mr18517322ejb.146.1576879526580; Fri, 20 Dec 2019 14:05:26 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id va15sm1268038ejb.18.2019.12.20.14.05.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:26 -0800 (PST) Message-Id: <7648021072ca11153ac65c90f0ebed5973f20e1a.1576879520.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:16 +0000 Subject: [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file. Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh Write bloom filters to the commit-graph using the format described in Documentation/technical/commit-graph-format.txt Helped-by: Derrick Stolee Signed-off-by: Garima Singh --- commit-graph.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++- commit-graph.h | 5 ++++ 2 files changed, 85 insertions(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index 8c4941eeaa..def2ade166 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -24,7 +24,9 @@ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 5 +#define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ +#define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ +#define MAX_NUM_CHUNKS 7 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -282,6 +284,32 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, chunk_repeated = 1; else graph->chunk_base_graphs = data + chunk_offset; + break; + + case GRAPH_CHUNKID_BLOOMINDEXES: + if (graph->chunk_bloom_indexes) + chunk_repeated = 1; + else + graph->chunk_bloom_indexes = data + chunk_offset; + break; + + case GRAPH_CHUNKID_BLOOMDATA: + if (graph->chunk_bloom_data) + chunk_repeated = 1; + else { + uint32_t hash_version; + graph->chunk_bloom_data = data + chunk_offset; + hash_version = get_be32(data + chunk_offset); + + if (hash_version != 1) + break; + + graph->settings = xmalloc(sizeof(struct bloom_filter_settings)); + graph->settings->hash_version = hash_version; + graph->settings->num_hashes = get_be32(data + chunk_offset + 4); + graph->settings->bits_per_entry = get_be32(data + chunk_offset + 8); + } + break; } if (chunk_repeated) { @@ -996,6 +1024,39 @@ static void write_graph_chunk_extra_edges(struct hashfile *f, } } +static void write_graph_chunk_bloom_indexes(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + struct commit **list = ctx->commits.list; + struct commit **last = ctx->commits.list + ctx->commits.nr; + uint32_t cur_pos = 0; + + while (list < last) { + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); + cur_pos += filter->len; + hashwrite_be32(f, cur_pos); + list++; + } +} + +static void write_graph_chunk_bloom_data(struct hashfile *f, + struct write_commit_graph_context *ctx, + struct bloom_filter_settings *settings) +{ + struct commit **first = ctx->commits.list; + struct commit **last = ctx->commits.list + ctx->commits.nr; + + hashwrite_be32(f, settings->hash_version); + hashwrite_be32(f, settings->num_hashes); + hashwrite_be32(f, settings->bits_per_entry); + + while (first < last) { + struct bloom_filter *filter = get_bloom_filter(ctx->r, *first); + hashwrite(f, filter->data, filter->len * sizeof(uint64_t)); + first++; + } +} + static int oid_compare(const void *_a, const void *_b) { const struct object_id *a = (const struct object_id *)_a; @@ -1388,6 +1449,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) struct strbuf progress_title = STRBUF_INIT; int num_chunks = 3; struct object_id file_hash; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -1432,6 +1494,12 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES; num_chunks++; } + if (ctx->bloom) { + chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES; + num_chunks++; + chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA; + num_chunks++; + } if (ctx->num_commit_graphs_after > 1) { chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE; num_chunks++; @@ -1450,6 +1518,13 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) 4 * ctx->num_extra_edges; num_chunks++; } + if (ctx->bloom) { + chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + sizeof(uint32_t) * ctx->commits.nr; + num_chunks++; + + chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + sizeof(uint32_t) * 3 + ctx->total_bloom_filter_size; + num_chunks++; + } if (ctx->num_commit_graphs_after > 1) { chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + hashsz * (ctx->num_commit_graphs_after - 1); @@ -1487,6 +1562,10 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) write_graph_chunk_data(f, hashsz, ctx); if (ctx->num_extra_edges) write_graph_chunk_extra_edges(f, ctx); + if (ctx->bloom) { + write_graph_chunk_bloom_indexes(f, ctx); + write_graph_chunk_bloom_data(f, ctx, &bloom_settings); + } if (ctx->num_commit_graphs_after > 1 && write_graph_chunk_base(f, ctx)) { return -1; diff --git a/commit-graph.h b/commit-graph.h index 952a4b83be..2202ad91ae 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -10,6 +10,7 @@ #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" struct commit; +struct bloom_filter_settings; char *get_commit_graph_filename(const char *obj_dir); int open_commit_graph(const char *graph_file, int *fd, struct stat *st); @@ -58,6 +59,10 @@ struct commit_graph { const unsigned char *chunk_commit_data; const unsigned char *chunk_extra_edges; const unsigned char *chunk_base_graphs; + const unsigned char *chunk_bloom_indexes; + const unsigned char *chunk_bloom_data; + + struct bloom_filter_settings *settings; }; struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st); From patchwork Fri Dec 20 22:05:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C0FA139A for ; Fri, 20 Dec 2019 22:05:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E4DFB218AC for ; Fri, 20 Dec 2019 22:05:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="N/pcDop3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727585AbfLTWFb (ORCPT ); Fri, 20 Dec 2019 17:05:31 -0500 Received: from mail-ed1-f67.google.com ([209.85.208.67]:37758 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727473AbfLTWFa (ORCPT ); Fri, 20 Dec 2019 17:05:30 -0500 Received: by mail-ed1-f67.google.com with SMTP id cy15so9853334edb.4 for ; Fri, 20 Dec 2019 14:05:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=9hkApjfOp2Gt6TekmUQkVxOm6TDztfxvSwVC+YHKR44=; b=N/pcDop3Mq1skxBgczMcWE0zCSCa2RZ7yQrqK3ja748pMZDn0waeNeiHlvfo/v+bhb PzlqNA0yjB7z7eI4g9Y9LZWl6wBzJzYtaWST5Sk3YLAS+QURKI0oM8j9LLLtmJQhRtJv TZLoNOE9pMCra9wd6TnQhpKyoVqL1wwn/4IvcZzxHLHieTCcFr0SLqPY1bxe4+o9JSkE JErEWwQQYiILOUM5LfHYFs3Ajnj+txiVnsIX6Or2Nue9Pvpt7scxYzn7RjiVlble+bMk jetJyZijO9erzAw2vuInp1I99eYvLMT6HGbeBihnsnnZ1Lr5b6fpM6zQOTDaJEHxKeTL AM9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=9hkApjfOp2Gt6TekmUQkVxOm6TDztfxvSwVC+YHKR44=; b=cYJVXNVQVgvOdT+M9pVsKCSnNGOtPR5gBmuRWYDmBEthqQAHPmRk7Ploz3p3WWRRyE 2LBdCj7N48GIiDtgnVAzn95OAUZogdOHUHe5hfoKL1uMplsOl/iyxca/rpb02CFIcXkZ NN7k9SCmYu2G1edp72vvLQPSgTXccg5FDVtAuWUMxPBMNMQwlo6d3O2jizzjEVPmp8be RjRwulb2scjaV1mzRQkKmBQUuO8oJ3MZ1vx2CCiP3u5kjH7Pv3ggtnVoBLmRYNOcxW6G PhsgExb8ZodAqoPAeC8gUg/iynu8edicE2piTznlh0WOiRzqJVx2qpKeuwKrOPteh8Ks grDA== X-Gm-Message-State: APjAAAXesNvYksOziFKM0HZBh69rKVc2BTv7J8IDZnnMFkC1AsS0WJEh KC16qEFoPQW93dI0Ifw7bZA6bjxB X-Google-Smtp-Source: APXvYqzsfM0wF1deS3rJABnQyQ7WHMJHJBWdEZy3ptNIffkOS1AEasCzlbtoraRpHdJKZ1Me3j2+SA== X-Received: by 2002:aa7:ce87:: with SMTP id y7mr18740632edv.82.1576879527350; Fri, 20 Dec 2019 14:05:27 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s19sm1132673edr.55.2019.12.20.14.05.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:26 -0800 (PST) Message-Id: <85bfdfa59c48891343e3eeb740a4b3554405909a.1576879520.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:17 +0000 Subject: [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh Add tests for the --changed-paths feature when writing commit-graphs. RFC Notes: I plan to split this test across some of the earlier commits as appropriate. Helped-by: Derrick Stolee Signed-off-by: Garima Singh --- t/helper/test-read-graph.c | 4 + t/t5325-commit-graph-bloom.sh | 255 ++++++++++++++++++++++++++++++++++ 2 files changed, 259 insertions(+) create mode 100755 t/t5325-commit-graph-bloom.sh diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index d2884efe0a..aff597c7a3 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -45,6 +45,10 @@ int cmd__read_graph(int argc, const char **argv) printf(" commit_metadata"); if (graph->chunk_extra_edges) printf(" extra_edges"); + if (graph->chunk_bloom_indexes) + printf(" bloom_indexes"); + if (graph->chunk_bloom_data) + printf(" bloom_data"); printf("\n"); UNLEAK(graph); diff --git a/t/t5325-commit-graph-bloom.sh b/t/t5325-commit-graph-bloom.sh new file mode 100755 index 0000000000..d7ef0e7fb3 --- /dev/null +++ b/t/t5325-commit-graph-bloom.sh @@ -0,0 +1,255 @@ +#!/bin/sh + +test_description='commit graph with bloom filters' +. ./test-lib.sh + +test_expect_success 'setup repo' ' + git init && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + infodir=".git/objects/info" && + graphdir="$infodir/commit-graphs" && + test_oid_init +' + +graph_read_expect() { + OPTIONAL="" + NUM_CHUNKS=5 + if test ! -z $2 + then + OPTIONAL=" $2" + NUM_CHUNKS=$((NUM_CHUNKS + $(echo "$2" | wc -w))) + fi + cat >expect <<- EOF + header: 43475048 1 1 $NUM_CHUNKS 0 + num_commits: $1 + chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data + EOF + test-tool read-graph >output && + test_cmp expect output +} + +test_expect_success 'create commits and write commit-graph' ' + for i in $(test_seq 3) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --changed-paths && + test_path_is_file $infodir/commit-graph && + graph_read_expect 3 +' + +graph_git_two_modes() { + git -c core.commitGraph=true $1 >output + git -c core.commitGraph=false $1 >expect + test_cmp expect output +} + +graph_git_behavior() { + MSG=$1 + BRANCH=$2 + COMPARE=$3 + test_expect_success "check normal git operations: $MSG" ' + graph_git_two_modes "log --oneline $BRANCH" && + graph_git_two_modes "log --topo-order $BRANCH" && + graph_git_two_modes "log --graph $COMPARE..$BRANCH" && + graph_git_two_modes "branch -vv" && + graph_git_two_modes "merge-base -a $BRANCH $COMPARE" + ' +} + +graph_git_behavior 'graph exists' commits/3 commits/1 + +verify_chain_files_exist() { + for hash in $(cat $1/commit-graph-chain) + do + test_path_is_file $1/graph-$hash.graph || return 1 + done +} + +test_expect_success 'add more commits, and write a new base graph' ' + git reset --hard commits/1 && + for i in $(test_seq 4 5) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/2 && + for i in $(test_seq 6 10) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/2 && + git merge commits/4 && + git branch merge/1 && + git reset --hard commits/4 && + git merge commits/6 && + git branch merge/2 && + git commit-graph write --reachable --changed-paths && + graph_read_expect 12 +' + +test_expect_success 'fork and fail to base a chain on a commit-graph file' ' + test_when_finished rm -rf fork && + git clone . fork && + ( + cd fork && + rm .git/objects/info/commit-graph && + echo "$(pwd)/../.git/objects" >.git/objects/info/alternates && + test_commit new-commit && + git commit-graph write --reachable --split --changed-paths && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir + ) +' + +test_expect_success 'add three more commits, write a tip graph' ' + git reset --hard commits/3 && + git merge merge/1 && + git merge commits/5 && + git merge merge/2 && + git branch merge/3 && + git commit-graph write --reachable --split --changed-paths && + test_path_is_missing $infodir/commit-graph && + test_path_is_file $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 2 graph-files && + verify_chain_files_exist $graphdir +' + +graph_git_behavior 'split commit-graph: merge 3 vs 2' merge/3 merge/2 + +test_expect_success 'add one commit, write a tip graph' ' + test_commit 11 && + git branch commits/11 && + git commit-graph write --reachable --split --changed-paths && + test_path_is_missing $infodir/commit-graph && + test_path_is_file $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 3 graph-files && + verify_chain_files_exist $graphdir +' + +graph_git_behavior 'three-layer commit-graph: commit 11 vs 6' commits/11 commits/6 + +test_expect_success 'add one commit, write a merged graph' ' + test_commit 12 && + git branch commits/12 && + git commit-graph write --reachable --split --changed-paths && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 2 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 2 graph-files && + verify_chain_files_exist $graphdir +' + +graph_git_behavior 'merged commit-graph: commit 12 vs 6' commits/12 commits/6 + +test_expect_success 'create fork and chain across alternate' ' + git clone . fork && + ( + cd fork && + git config core.commitGraph true && + rm -rf $graphdir && + echo "$(pwd)/../.git/objects" >.git/objects/info/alternates && + test_commit 13 && + git branch commits/13 && + git commit-graph write --reachable --split --changed-paths && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 3 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files && + git -c core.commitGraph=true rev-list HEAD >expect && + git -c core.commitGraph=false rev-list HEAD >actual && + test_cmp expect actual && + test_commit 14 && + git commit-graph write --reachable --split --changed-paths --object-dir=.git/objects/ && + test_line_count = 3 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files + ) +' + +graph_git_behavior 'alternate: commit 13 vs 6' commits/13 commits/6 + +test_expect_success 'test merge stragety constants' ' + git clone . merge-2 && + ( + cd merge-2 && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 14 && + git commit-graph write --reachable --split --changed-paths --size-multiple=2 && + test_line_count = 3 $graphdir/commit-graph-chain + + ) && + git clone . merge-10 && + ( + cd merge-10 && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 14 && + git commit-graph write --reachable --split --changed-paths --size-multiple=10 && + test_line_count = 1 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files + ) && + git clone . merge-10-expire && + ( + cd merge-10-expire && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 15 && + git commit-graph write --reachable --split --changed-paths --size-multiple=10 --expire-time=1980-01-01 && + test_line_count = 1 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 3 graph-files + ) && + git clone --no-hardlinks . max-commits && + ( + cd max-commits && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 16 && + test_commit 17 && + git commit-graph write --reachable --split --changed-paths --max-commits=1 && + test_line_count = 1 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files + ) +' + +test_expect_success 'remove commit-graph-chain file after flattening' ' + git clone . flatten && + ( + cd flatten && + test_line_count = 2 $graphdir/commit-graph-chain && + git commit-graph write --reachable && + test_path_is_missing $graphdir/commit-graph-chain && + ls $graphdir >graph-files && + test_must_be_empty graph-files + ) +' + +graph_git_behavior 'graph exists' merge/octopus commits/12 + +test_expect_success 'split across alternate where alternate is not split' ' + git commit-graph write --reachable && + test_path_is_file .git/objects/info/commit-graph && + cp .git/objects/info/commit-graph . && + git clone --no-hardlinks . alt-split && + ( + cd alt-split && + rm -f .git/objects/info/commit-graph && + echo "$(pwd)"/../.git/objects >.git/objects/info/alternates && + test_commit 18 && + git commit-graph write --reachable --split --changed-paths && + test_line_count = 1 $graphdir/commit-graph-chain + ) && + test_cmp commit-graph .git/objects/info/commit-graph +' + +test_done From patchwork Fri Dec 20 22:05:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 611D96C1 for ; Fri, 20 Dec 2019 22:05:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 35D4D21655 for ; Fri, 20 Dec 2019 22:05:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BU+Y0WKk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727613AbfLTWFd (ORCPT ); Fri, 20 Dec 2019 17:05:33 -0500 Received: from mail-ed1-f65.google.com ([209.85.208.65]:40711 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727601AbfLTWFa (ORCPT ); Fri, 20 Dec 2019 17:05:30 -0500 Received: by mail-ed1-f65.google.com with SMTP id b8so9837374edx.7 for ; Fri, 20 Dec 2019 14:05:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ytFH1tokHe1sswnBkZvQhsRyP6lKPRoIEjFKXG4uLOM=; b=BU+Y0WKk6MaXgqGL5JGdP91wiCfnZiCY/tfLZHuF5MniakyP+Zz0V9Kayw5woCJnmp QF9Kdq0hJdG56tYDhBEeA26gCpvH2hkSeHJfBZk04pS/iCuJCzpuce3/nHLu8gl+kUem 7dscI7BmQyZN5HpcZDAF8yPNDEFc9C5wadyGAUCDcdxtuIsKhvHO16Dv29GYfbIg5TUD xoUczEHlIVXpaHO+P/uOOHIU9TTnf0lH6DdWm9g6+vcz2a10IFFmIpzxbLGIy7AAUH/6 p2vcqK9o2BW6KMM9ssvyuCSs5s2MWJkTkEL0u+wnPF9myVi8Qximgv5Y+YEPMAH41lc3 otqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ytFH1tokHe1sswnBkZvQhsRyP6lKPRoIEjFKXG4uLOM=; b=i8UOvIigOe0sAu5v6Ar1R6s2dPNHnQDFx9DB0ZXfkklyGFhPJg7wryLPJfVfRfLjkb DzeGAA2xt1B8TuFGe/jU1WezY057ThDyLT3Cx6Em3VsyKtZ6kNkVlf/d0zgS3XANQGFv pVsePZJiTv1m0v73i2JGOO09q6L/q3giV5miIL7dwd2Yd68fzKF4IeV6Jh742J82VQs0 Zh/mgJtlk51OzIUluD05nQ3BT76g99EkTGNuIaqmr4kY0no3oaxQo4PAreYMvNd05DUZ sLeaKqOeSOD1LcQtmc29yzThj4Q/7Q49W7Qx9WYXX3wFOIBr7WzGBxyxX5IREVa/bRh8 1/mw== X-Gm-Message-State: APjAAAWGS3E2zPdIfiFMxMB5FmSJ+7nky3O+P7SA39/ii89Br2zPayVb ppD2kmdHoQ+zc56Y1gfP/diqK9fO X-Google-Smtp-Source: APXvYqwRpz+AxM2EaEs7x9air5WLry7jlhWsb8bQKoScNhd5bOygY2CnGhLDHDWQr8BtxsAkkojTAg== X-Received: by 2002:a17:906:8590:: with SMTP id v16mr1961924ejx.202.1576879528168; Fri, 20 Dec 2019 14:05:28 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k15sm1240785ejc.35.2019.12.20.14.05.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:27 -0800 (PST) Message-Id: <1e2acb37ad710cb0d1c09ed163fdd4473a27335c.1576879520.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:18 +0000 Subject: [PATCH 7/9] commit-graph: reuse existing bloom filters during write. Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh Read previously computed bloom filters from the commit-graph file if possible to avoid recomputing during commit-graph write. Reading from the commit-graph is based on the format in which bloom filters are written in the commit graph file. See method `fill_filter_from_graph` in bloom.c For reading the bloom filter for commit at lexicographic position i: 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter of commit i+1 (called the next_index in the code) 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for filter of commit i (called the prev_index in the code) For i = 0, prev_index will be 0. The first lexicographic commit's filter will start at BDAT. 3. The length of the filter will be next_index - prev_index, because BIDX[i] gives the cumulative 8-byte words including the ith commit's filter. We toggle whether bloom filters should be recomputed based on the compute_if_null flag. Helped-by: Derrick Stolee Signed-off-by: Garima Singh --- bloom.c | 40 ++++++++++++++++++++++++++++++++++++++-- bloom.h | 3 ++- commit-graph.c | 6 +++--- 3 files changed, 43 insertions(+), 6 deletions(-) diff --git a/bloom.c b/bloom.c index 08328cc381..86b1005802 100644 --- a/bloom.c +++ b/bloom.c @@ -1,5 +1,7 @@ #include "git-compat-util.h" #include "bloom.h" +#include "commit.h" +#include "commit-slab.h" #include "commit-graph.h" #include "object-store.h" #include "diff.h" @@ -119,13 +121,35 @@ static void add_key_to_filter(struct bloom_key *key, } } +static void fill_filter_from_graph(struct commit_graph *g, + struct bloom_filter *filter, + struct commit *c) +{ + uint32_t lex_pos, prev_index, next_index; + + while (c->graph_pos < g->num_commits_in_base) + g = g->base_graph; + + lex_pos = c->graph_pos - g->num_commits_in_base; + + next_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); + if (lex_pos) + prev_index = get_be32(g->chunk_bloom_indexes + 4 * (lex_pos - 1)); + else + prev_index = 0; + + filter->len = next_index - prev_index; + filter->data = (uint64_t *)(g->chunk_bloom_data + 8 * prev_index + 12); +} + void load_bloom_filters(void) { init_bloom_filter_slab(&bloom_filters); } struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c) + struct commit *c, + int compute_if_null) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -134,6 +158,18 @@ struct bloom_filter *get_bloom_filter(struct repository *r, const char *revs_argv[] = {NULL, "HEAD", NULL}; filter = bloom_filter_slab_at(&bloom_filters, c); + + if (!filter->data) { + load_commit_graph_info(r, c); + if (c->graph_pos != COMMIT_NOT_FROM_GRAPH && r->objects->commit_graph->chunk_bloom_indexes) { + fill_filter_from_graph(r->objects->commit_graph, filter, c); + return filter; + } + } + + if (filter->data || !compute_if_null) + return filter; + init_revisions(&revs, NULL); revs.diffopt.flags.recursive = 1; @@ -198,4 +234,4 @@ struct bloom_filter *get_bloom_filter(struct repository *r, DIFF_QUEUE_CLEAR(&diff_queued_diff); return filter; -} \ No newline at end of file +} diff --git a/bloom.h b/bloom.h index ba8ae70b67..101d689bbd 100644 --- a/bloom.h +++ b/bloom.h @@ -36,7 +36,8 @@ struct bloom_key { void load_bloom_filters(void); struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c); + struct commit *c, + int compute_if_null); void fill_bloom_key(const char *data, int len, diff --git a/commit-graph.c b/commit-graph.c index def2ade166..0580ce75d5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1032,7 +1032,7 @@ static void write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); cur_pos += filter->len; hashwrite_be32(f, cur_pos); list++; @@ -1051,7 +1051,7 @@ static void write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, settings->bits_per_entry); while (first < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *first); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *first, 0); hashwrite(f, filter->data, filter->len * sizeof(uint64_t)); first++; } @@ -1218,7 +1218,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c); + struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); ctx->total_bloom_filter_size += sizeof(uint64_t) * filter->len; display_progress(progress, i + 1); } From patchwork Fri Dec 20 22:05:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306385 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29A196C1 for ; Fri, 20 Dec 2019 22:05:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E836F218AC for ; Fri, 20 Dec 2019 22:05:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SjtC915o" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727609AbfLTWFc (ORCPT ); Fri, 20 Dec 2019 17:05:32 -0500 Received: from mail-ed1-f68.google.com ([209.85.208.68]:43707 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727567AbfLTWFb (ORCPT ); Fri, 20 Dec 2019 17:05:31 -0500 Received: by mail-ed1-f68.google.com with SMTP id dc19so9829341edb.10 for ; Fri, 20 Dec 2019 14:05:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=MNI8K+6ckk6lzOLvdRYflsgweImhuQh9MCR5egEQClE=; b=SjtC915oTCQRaS92tXSJ+DO8GxV8VWcqB0xEBv32Jh5QWO+aFJ0X5QD7Ge9uMMbBXq FJ46BDd8Q8GdfPISxAg/LrFsgrL5qZ19yZAAK5Km+OPx2LLUl31OZ87nX13RSHD2v8Gj 914tiBRhrDGp7In7vvNRWmUS8+WDZf+nT+W47zQUCS8yDH8ziYM7r293G9d8fXKWMP0h o7ufIsVP2wSrj8QdCNHM9AwOgR8DV2XWADeG8+LjdZJj6Ae0bj9qXOTW5E239b0DwWc1 Oq/st68FjWbIDQxPZivpvzktRiqKpH/dpeWLmOZmQmcfOCJSakaiuCYH5t0gGPNBtjaL VG4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=MNI8K+6ckk6lzOLvdRYflsgweImhuQh9MCR5egEQClE=; b=O6pnY6tgwqu3FSNAeanpV0e1idbc8uHD28G07Ol3cGvegBR1aKmHH5UN99xOC2s8zO MmN06ZLbYqMsfybwnDyLE0bcIG0DUuztD1KoaXGuz8guBI4GUGs5v6PFt1zbM7bmGew7 zGGgPhbUQqVk56jp06O57u+h2CDkMT6fl3JklNN7AL4vHhdxN/pYjU8kqEWtnbd1t+NQ GxGAEE48b3sWS0S2U9ZdE0CeexkUYod+HkN0upLZrJj1vFhPpXm6SoA9VV29Uu4eekwy ttyiLRXVDRMnctSBrxC5hliW2iHlRTWJrxFIp/HtragoZFQ+Rw2wvs4tIuZqu8at9rAf 2rHg== X-Gm-Message-State: APjAAAXrWFVYo7zeLNp193ETM5Mxjb1S7+usB5wS+8/ugfzB/J48M7Eq fVR/scD9sv24d0KaCsIK7aiNT79j X-Google-Smtp-Source: APXvYqzHfS+8l7Tr5x84EPCidqRrWZVALKimzGSV6dTl7WJLsI3hKyHVAaGPnGH+Rhzmbf/e/F4RqQ== X-Received: by 2002:a17:906:8511:: with SMTP id i17mr18949269ejx.267.1576879529049; Fri, 20 Dec 2019 14:05:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y19sm1250173ejw.65.2019.12.20.14.05.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:28 -0800 (PST) Message-Id: <72a2bbf6765a1e99a3a23372f801099a07fe11a5.1576879520.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:19 +0000 Subject: [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh If bloom filters have been written to the commit-graph file, revision walk will use them to speed up revision walks for a particular path. Note: The current implementation does this in the case of single pathspec case only. We load the bloom filters during the prepare_revision_walk step when dealing with a single pathspec. While comparing trees in rev_compare_trees(), if the bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. Performance Gains: We tested the performance of `git log --path` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: we observed 10x to 20x speed ups, with some paths going up to 28 times faster. RFC Notes: I plan to collect the folloowing statistics around this usage of bloom filters and trace them out using trace2. - number of bloom filter queries, - number of "No" responses (file hasn't changed) - number of "Maybe" responses (file may have changed) - number of "Commit not parsed" cases (commit had too many changes to have a bloom filter written out, currently our limit is 512 diffs) Helped-by: Derrick Stolee Helped-by: Jonathan Tan Signed-off-by: Garima Singh --- bloom.c | 20 ++++++++++++ bloom.h | 4 +++ revision.c | 67 +++++++++++++++++++++++++++++++++++++-- revision.h | 5 +++ t/t4216-log-bloom.sh | 74 ++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 168 insertions(+), 2 deletions(-) create mode 100755 t/t4216-log-bloom.sh diff --git a/bloom.c b/bloom.c index 86b1005802..0c7505d3d6 100644 --- a/bloom.c +++ b/bloom.c @@ -235,3 +235,23 @@ struct bloom_filter *get_bloom_filter(struct repository *r, return filter; } + +int bloom_filter_contains(struct bloom_filter *filter, + struct bloom_key *key, + struct bloom_filter_settings *settings) +{ + int i; + uint64_t mod = filter->len * BITS_PER_BLOCK; + + if (!mod) + return 1; + + for (i = 0; i < settings->num_hashes; i++) { + uint64_t hash_mod = key->hashes[i] % mod; + uint64_t block_pos = hash_mod / BITS_PER_BLOCK; + if (!(filter->data[block_pos] & get_bitmask(hash_mod))) + return 0; + } + + return 1; +} diff --git a/bloom.h b/bloom.h index 101d689bbd..9bdacd0a8e 100644 --- a/bloom.h +++ b/bloom.h @@ -44,4 +44,8 @@ void fill_bloom_key(const char *data, struct bloom_key *key, struct bloom_filter_settings *settings); +int bloom_filter_contains(struct bloom_filter *filter, + struct bloom_key *key, + struct bloom_filter_settings *settings); + #endif diff --git a/revision.c b/revision.c index 39a25e7a5d..01f5330740 100644 --- a/revision.c +++ b/revision.c @@ -29,6 +29,7 @@ #include "prio-queue.h" #include "hashmap.h" #include "utf8.h" +#include "bloom.h" volatile show_early_output_fn_t show_early_output; @@ -624,11 +625,34 @@ static void file_change(struct diff_options *options, options->flags.has_changes = 1; } +static int check_maybe_different_in_bloom_filter(struct rev_info *revs, + struct commit *commit, + struct bloom_key *key, + struct bloom_filter_settings *settings) +{ + struct bloom_filter *filter; + + if (!revs->repo->objects->commit_graph) + return -1; + if (commit->generation == GENERATION_NUMBER_INFINITY) + return -1; + if (!key || !settings) + return -1; + + filter = get_bloom_filter(revs->repo, commit, 0); + + if (!filter || !filter->len) + return 1; + + return bloom_filter_contains(filter, key, settings); +} + static int rev_compare_tree(struct rev_info *revs, - struct commit *parent, struct commit *commit) + struct commit *parent, struct commit *commit, int nth_parent) { struct tree *t1 = get_commit_tree(parent); struct tree *t2 = get_commit_tree(commit); + int bloom_ret = 1; if (!t1) return REV_TREE_NEW; @@ -653,6 +677,16 @@ static int rev_compare_tree(struct rev_info *revs, return REV_TREE_SAME; } + if (revs->pruning.pathspec.nr == 1 && !nth_parent) { + bloom_ret = check_maybe_different_in_bloom_filter(revs, + commit, + revs->bloom_key, + revs->bloom_filter_settings); + + if (bloom_ret == 0) + return REV_TREE_SAME; + } + tree_difference = REV_TREE_SAME; revs->pruning.flags.has_changes = 0; if (diff_tree_oid(&t1->object.oid, &t2->object.oid, "", @@ -855,7 +889,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) die("cannot simplify commit %s (because of %s)", oid_to_hex(&commit->object.oid), oid_to_hex(&p->object.oid)); - switch (rev_compare_tree(revs, p, commit)) { + switch (rev_compare_tree(revs, p, commit, nth_parent)) { case REV_TREE_SAME: if (!revs->simplify_history || !relevant_commit(p)) { /* Even if a merge with an uninteresting @@ -3342,6 +3376,33 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) } } +static void prepare_to_use_bloom_filter(struct rev_info *revs) +{ + struct pathspec_item *pi; + const char *path; + size_t len; + + if (!revs->commits) + return; + + parse_commit(revs->commits->item); + + if (!revs->repo->objects->commit_graph) + return; + + revs->bloom_filter_settings = revs->repo->objects->commit_graph->settings; + if (!revs->bloom_filter_settings) + return; + + pi = &revs->pruning.pathspec.items[0]; + path = pi->match; + len = strlen(path); + + load_bloom_filters(); + revs->bloom_key = xmalloc(sizeof(struct bloom_key)); + fill_bloom_key(path, len, revs->bloom_key, revs->bloom_filter_settings); +} + int prepare_revision_walk(struct rev_info *revs) { int i; @@ -3391,6 +3452,8 @@ int prepare_revision_walk(struct rev_info *revs) simplify_merges(revs); if (revs->children.name) set_children(revs); + if (revs->pruning.pathspec.nr == 1) + prepare_to_use_bloom_filter(revs); return 0; } diff --git a/revision.h b/revision.h index a1a804bd3d..65dc11e8f1 100644 --- a/revision.h +++ b/revision.h @@ -56,6 +56,8 @@ struct repository; struct rev_info; struct string_list; struct saved_parents; +struct bloom_key; +struct bloom_filter_settings; define_shared_commit_slab(revision_sources, char *); struct rev_cmdline_info { @@ -291,6 +293,9 @@ struct rev_info { struct revision_sources *sources; struct topo_walk_info *topo_walk_info; + + struct bloom_key *bloom_key; + struct bloom_filter_settings *bloom_filter_settings; }; int ref_excluded(struct string_list *, const char *path); diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh new file mode 100755 index 0000000000..d42f077998 --- /dev/null +++ b/t/t4216-log-bloom.sh @@ -0,0 +1,74 @@ +#!/bin/sh + +test_description='git log for a path with bloom filters' +. ./test-lib.sh + +test_expect_success 'setup repo' ' + git init && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + infodir=".git/objects/info" && + graphdir="$infodir/commit-graphs" && + test_oid_init +' + +test_expect_success 'create 9 commits and repack' ' + test_commit c1 file1 && + test_commit c2 file2 && + test_commit c3 file3 && + test_commit c4 file1 && + test_commit c5 file2 && + test_commit c6 file3 && + test_commit c7 file1 && + test_commit c8 file2 && + test_commit c9 file3 +' + +printf "c7\nc4\nc1" > expect_file1 + +test_expect_success 'log without bloom filters' ' + git log --pretty="format:%s" -- file1 > actual && + test_cmp expect_file1 actual +' + +printf "c8\nc7\nc5\nc4\nc2\nc1" > expect_file1_file2 + +test_expect_success 'multi-path log without bloom filters' ' + git log --pretty="format:%s" -- file1 file2 > actual && + test_cmp expect_file1_file2 actual +' + +graph_read_expect() { + OPTIONAL="" + NUM_CHUNKS=5 + if test ! -z $2 + then + OPTIONAL=" $2" + NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) + fi + cat >expect <<- EOF + header: 43475048 1 1 $NUM_CHUNKS 0 + num_commits: $1 + chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data$OPTIONAL + EOF + test-tool read-graph >output && + test_cmp expect output +} + +test_expect_success 'write commit graph with bloom filters' ' + git commit-graph write --reachable --changed-paths && + test_path_is_file $infodir/commit-graph && + graph_read_expect "9" +' + +test_expect_success 'log using bloom filters' ' + git log --pretty="format:%s" -- file1 > actual && + test_cmp expect_file1 actual +' + +test_expect_success 'multi-path log using bloom filters' ' + git log --pretty="format:%s" -- file1 file2 > actual && + test_cmp expect_file1_file2 actual +' + +test_done From patchwork Fri Dec 20 22:05:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee via GitGitGadget X-Patchwork-Id: 11306389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F236617EF for ; Fri, 20 Dec 2019 22:05:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D0AA9218AC for ; Fri, 20 Dec 2019 22:05:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="m7/GfE9K" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727615AbfLTWFd (ORCPT ); Fri, 20 Dec 2019 17:05:33 -0500 Received: from mail-ed1-f67.google.com ([209.85.208.67]:47005 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727604AbfLTWFc (ORCPT ); Fri, 20 Dec 2019 17:05:32 -0500 Received: by mail-ed1-f67.google.com with SMTP id m8so9822146edi.13 for ; Fri, 20 Dec 2019 14:05:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=q7UcMuQt+q25b4TZtxCocB+fPmX1EuIgEXFmxv7wSug=; b=m7/GfE9KEOhY2QWpk3HVA33LBuWRitGRzraNOav5FsjyStzFxKN0DTnCxmSEEjCZqe D8/hJYpQsH1C178csqnPGuKi+WuX8tgPrxi5qe+t/XyM53t3dpu9KUQEghtNdaAbK35Y JVyIs8KcNa27ZwL2KJmi+RjuiHQyBKhgAMB+A6TFzt++CdHj8NgvbCAT4aRAzlCF4+cd r42515EqTPGzDMxV94vvodFd73v5jAPxFfaekSbHmqcyi/ogppdI5KR1MehGNvTO1Ogh H/PAtpWHGT3lOzAtdzGuKEu7onChQildXQOV6dvbk7Aw6BDnVTgDKJ9jwYvVV/CqrjI0 nCkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=q7UcMuQt+q25b4TZtxCocB+fPmX1EuIgEXFmxv7wSug=; b=kuRQgYkSHZJeGUKeNicYRjENfS45RDmJ6Fw1warbR5nLJhPvdggA9POnMYpwRd/++E pCzH9IIABPA2cZyRNNfFqKGSld9/4iMsJZowPS4wwL18jLTUOh6L9JWUU9FRhZgFv93j NgbFZF4ulCflH6fZQWeRogK2B7B6KHZ/fG+pFZNmjCw1oivB3F7e2rl9jHQMLDP2uiSB IJKWgsWkRFeDJybhOkmD8aZFxgNfnZvThAFrOpKgU9zbU8rMlzmDbAt/c1ysbdG8Nk9p fr9MI1YnGYsXdB14KpfNGvAcbTvWy5647Ptxm6oeEDjGeaWP5YIpXvMreTnV58JD88yj mjJA== X-Gm-Message-State: APjAAAXMjIN0UOYK+j786wqoT954CHfnwLJ1UVS0ijwDuM3tCkE2+HWq zOUkqbm/61Fj30CvG4CMrjV03nmT X-Google-Smtp-Source: APXvYqxWRJdtC10tLD/8sd0689Q0fne+LF6EfdhtX1yO1zCpgkU6zRq8KUMCr+f2OQKMSI+FrlDEig== X-Received: by 2002:a50:ec1a:: with SMTP id g26mr18798945edr.164.1576879529728; Fri, 20 Dec 2019 14:05:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e20sm1232185ejq.62.2019.12.20.14.05.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Dec 2019 14:05:29 -0800 (PST) Message-Id: In-Reply-To: References: From: "Garima Singh via GitGitGadget" Date: Fri, 20 Dec 2019 22:05:20 +0000 Subject: [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, jeffhost@microsoft.com, me@ttaylorr.com, peff@peff.net, Junio C Hamano , Garima Singh Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Garima Singh Add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag to the test setup suite in order to toggle writing bloom filters when running any of the git tests. If set to true, we will compute and write bloom filters every time a test calls `git commit-graph write`. The test suite passes when GIT_TEST_COMMIT_GRAPH and GIT_COMMIT_GRAPH_BLOOM_FILTERS are enabled. Helped-by: Derrick Stolee Signed-off-by: Garima Singh --- builtin/commit-graph.c | 2 +- ci/run-build-and-tests.sh | 1 + commit-graph.h | 1 + t/README | 3 +++ t/t4216-log-bloom.sh | 3 +++ t/t5318-commit-graph.sh | 2 ++ t/t5324-split-commit-graph.sh | 1 + t/t5325-commit-graph-bloom.sh | 3 +++ 8 files changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 9bd1e11161..97167959b2 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -146,7 +146,7 @@ static int graph_write(int argc, const char **argv) flags |= COMMIT_GRAPH_WRITE_SPLIT; if (opts.progress) flags |= COMMIT_GRAPH_WRITE_PROGRESS; - if (opts.enable_bloom_filters) + if (opts.enable_bloom_filters || git_env_bool(GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS, 0)) flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS; read_replace_refs = 0; diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh index ff0ef7f08e..19d0846d34 100755 --- a/ci/run-build-and-tests.sh +++ b/ci/run-build-and-tests.sh @@ -19,6 +19,7 @@ linux-gcc) export GIT_TEST_OE_SIZE=10 export GIT_TEST_OE_DELTA_SIZE=5 export GIT_TEST_COMMIT_GRAPH=1 + export GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS=1 export GIT_TEST_MULTI_PACK_INDEX=1 make test ;; diff --git a/commit-graph.h b/commit-graph.h index 2202ad91ae..d914e6abf1 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -8,6 +8,7 @@ #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" +#define GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS "GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS" struct commit; struct bloom_filter_settings; diff --git a/t/README b/t/README index caa125ba9a..399b190437 100644 --- a/t/README +++ b/t/README @@ -378,6 +378,9 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the commit-graph to be written after every 'git commit' command, and overrides the 'core.commitGraph' setting to true. +GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS=, when true, forces commit-graph +write to compute and write bloom filters for every 'git commit-graph write' + GIT_TEST_FSMONITOR=$PWD/t7519/fsmonitor-all exercises the fsmonitor code path for utilizing a file system monitor to speed up detecting new or changed files. diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index d42f077998..0e092b387c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -3,6 +3,9 @@ test_description='git log for a path with bloom filters' . ./test-lib.sh +GIT_TEST_COMMIT_GRAPH=0 +GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS=0 + test_expect_success 'setup repo' ' git init && git config core.commitGraph true && diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 3f03de6018..613228bb12 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -3,6 +3,8 @@ test_description='commit graph' . ./test-lib.sh +GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS=0 + test_expect_success 'setup full repo' ' mkdir full && cd "$TRASH_DIRECTORY/full" && diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index c24823431f..181ca7e0cb 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -4,6 +4,7 @@ test_description='split commit graph' . ./test-lib.sh GIT_TEST_COMMIT_GRAPH=0 +GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS=0 test_expect_success 'setup repo' ' git init && diff --git a/t/t5325-commit-graph-bloom.sh b/t/t5325-commit-graph-bloom.sh index d7ef0e7fb3..a9c9e9fef6 100755 --- a/t/t5325-commit-graph-bloom.sh +++ b/t/t5325-commit-graph-bloom.sh @@ -3,6 +3,9 @@ test_description='commit graph with bloom filters' . ./test-lib.sh +GIT_TEST_COMMIT_GRAPH=0 +GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS=0 + test_expect_success 'setup repo' ' git init && git config core.commitGraph true &&