From patchwork Tue Aug 11 20:52:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44FCF109B for ; Tue, 11 Aug 2020 20:52:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28D8A20781 for ; Tue, 11 Aug 2020 20:52:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="vcCJlIr+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726529AbgHKUwV (ORCPT ); Tue, 11 Aug 2020 16:52:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUwU (ORCPT ); Tue, 11 Aug 2020 16:52:20 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA4EAC06174A for ; Tue, 11 Aug 2020 13:52:20 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id y11so47536qvl.4 for ; Tue, 11 Aug 2020 13:52:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+cZdRC8E7ZSQYMzb3tw3vk7oWazgBS1DHO6N5L/S2yQ=; b=vcCJlIr+hZZqE0PYuUbVua8L7SF6eSeHq4XrKLPg6mY0QZYpWv1E/3kiwcZBUQf6w+ whusKhoN4x72Xxb5SSp/eoHTDVCp8FWeGhv7XhCvilNwCyL1Kh5DQIHOrWegC4RntGSG osyNHBUln3kyB5gOsq8D6blLov/LSnBYKz7qp9/QP9uJMZQHyVLQdmxbJXRbYfazk4IH nRmusMtqCtfpGlZ43WRoHe6GvxAPxh9FTj6ZtneEwMdC5WSTNSp3aXBYDU211GESNqkY gHuqEcYM0/CxeP1C4tBBry09d/Y3nbsjU3zW2ciJiV1wAJoOwYRrNZ27QEIpnt4jrCXl jqIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+cZdRC8E7ZSQYMzb3tw3vk7oWazgBS1DHO6N5L/S2yQ=; b=eRtZtvi+Zm3TDkoFO0NYxWiUGIx0MAXRVqkP8dt+UI2MAaSRl0b5wfFH9krDliKtCD QLtnclbSm0ze7r8R6iXHTv9vytSkA4ATxOqtP9dCfg4gRAE6c+v2tbquQDXLxoSF1+f+ ulBSr95yoBcUFb4ae3WZy1F86h32UqbPXhGxwK6AiQ6m7FVtkCuJwwQoh8eMkWliVstg S+MsrCZMxaDrzHUDEFpgIAAKLzJ/3p5GiA9NkVfAtMnKwgmA6PbpK9pqv4nykSMZtwL+ 0pdo/gkBpSjNizG0w0GOCOQ0ZfhA4LN1YYyzZFu7uQ4Y3mIa5H12YnwuuvJnYDH0iJDH /TCw== X-Gm-Message-State: AOAM530vOb9y/cRZ4LCNWF+poKYtYh/nGVsMT4nArAVJEuIQt5zglKmv 7QR/7FZwBxQn7gehwglfh3LeJ6I5k/xTjWKz X-Google-Smtp-Source: ABdhPJyJyIM42vogVH+WYiFuzfeKkwCvVB/KWXxxv01cfekkC3u/t4/Iv0wu+547TDPIGevz5n72gg== X-Received: by 2002:a05:6214:10e8:: with SMTP id q8mr3191446qvt.59.1597179136557; Tue, 11 Aug 2020 13:52:16 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id l10sm13766048qtl.72.2020.08.11.13.52.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:52:15 -0700 (PDT) Date: Tue, 11 Aug 2020 16:52:14 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 14/14] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: <09f6871f66bff838c067a3e0d23cd4622171f3bd.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag and configuration variable to fill in the 'max_new_filters' variable introduced by the previous patch. The command-line option '--max-new-filters' takes precedence over 'commitGraph.maxNewFilters', which is the default value. '--no-max-new-filters' can also be provided, which sets the value back to '-1', indicating that an unlimited number of new Bloom filters may be generated. (OPT_INTEGER only allows setting the '--no-' variant back to '0', hence a custom callback was used instead). Signed-off-by: Taylor Blau Signed-off-by: Derrick Stolee --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 4 +++ bloom.c | 15 +++++++++++ builtin/commit-graph.c | 39 +++++++++++++++++++++++++--- commit-graph.c | 27 ++++++++++++++++--- commit-graph.h | 1 + t/t4216-log-bloom.sh | 19 ++++++++++++++ 7 files changed, 102 insertions(+), 7 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..9c887d5d79 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,10 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Overrides the `commitGraph.maxNewFilters` configuration. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index ed54e96e57..8d07209c6b 100644 --- a/bloom.c +++ b/bloom.c @@ -51,6 +51,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, else start_index = 0; + if ((start_index == end_index) && + (g->bloom_large && !bitmap_get(g->bloom_large, lex_pos))) { + /* + * If the filter is zero-length, either (1) the filter has no + * changes, (2) the filter has too many changes, or (3) it + * wasn't computed (eg., due to '--max-new-filters'). + * + * If either (1) or (2) is the case, the 'large' bit will be set + * for this Bloom filter. If it is unset, then it wasn't + * computed. In that case, return nothing, since we don't have + * that filter in the graph. + */ + return 0; + } + filter->len = end_index - start_index; filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 38f5f57d15..3500a6e1f1 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index 6886f319a5..4aae5471e3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -954,7 +954,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit } static int get_bloom_filter_large_in_graph(struct commit_graph *g, - const struct commit *c) + const struct commit *c, + uint32_t max_changed_paths) { uint32_t graph_pos = commit_graph_position(c); if (graph_pos == COMMIT_NOT_FROM_GRAPH) @@ -965,6 +966,17 @@ static int get_bloom_filter_large_in_graph(struct commit_graph *g, if (!(g && g->bloom_large)) return 0; + if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) { + /* + * Force all commits which are subject to a different + * 'max_changed_paths' limit to be recomputed from scratch. + * + * Note that this could likely be improved, but is ignored since + * all real-world graphs set the maximum number of changed paths + * at 512. + */ + return 0; + } return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base); } @@ -1470,6 +1482,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; int *sorted_commits; + int max_new_filters; init_bloom_filters(); ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); @@ -1486,10 +1499,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, &ctx->commits); + max_new_filters = ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, + c, + ctx->bloom_settings->max_changed_paths)) { bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_known_large++; } else { @@ -1497,7 +1515,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed) { @@ -1507,7 +1525,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_found_large++; } } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + if (filter) + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index af08c4505d..75ef83708b 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -114,6 +114,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 6859d85369..3aab8ffbe3 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -286,4 +286,23 @@ test_expect_success 'Bloom generation does not recompute too-large filters' ' ) ' +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 2 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 2 0 1 + ) +' + test_done