From patchwork Mon Aug 3 18:57:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44F68912 for ; Mon, 3 Aug 2020 18:57:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89E8D22B45 for ; Mon, 3 Aug 2020 18:57:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="rcF1PLRu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726764AbgHCS5P (ORCPT ); Mon, 3 Aug 2020 14:57:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5O (ORCPT ); Mon, 3 Aug 2020 14:57:14 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6A79C06174A for ; Mon, 3 Aug 2020 11:57:12 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id x7so1551856qvi.5 for ; Mon, 03 Aug 2020 11:57:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=zUM4DdQvNAfC0jqSwbQhkZMTy0Goa44IBJQwjfO6bpw=; b=rcF1PLRuWMuVKrFO3YY5QJgQofDGZHF/LUz5hCDeuafEo868piezCIOiQdeFAx1HQx DmogCEh3TAnNRIIpMADAWXybZUv/IaMr6JcLimgXOp6xa1113bEVdlqPZczln02DhlVi pAUQYesRdVl3me9r10cmBQPxgt9klVx5spmNU9Zw3vUiW2SmEDbcvvp0gp9+U6RqpMBh DcXt3RaE9XnI9Dnorgr0iJ7jDiUL18WKXosw5gGm4tAYvR2qCOZLPjqAqfotyYV82mvX ltWTHnrZ5wG8No5BXPdanCMdM30R7C8t//2tZhuVezsG3ZQGZiowYwpFPJnytMS/q+PP lw+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=zUM4DdQvNAfC0jqSwbQhkZMTy0Goa44IBJQwjfO6bpw=; b=i2EKm9ExXbo1QuexWTOC3zI4MpTU/2XgAGdvBfd98NiGiRrayMfUm3Kt6bo+fuDURO YI+gPwHbNbBGyUvnc20iPBFhoLE2CddkaL6W1jSnBPvQO3FZZ3UkDkAFyEy9TmUF80Mc 086DZEegzq/ALfrXno7MfRWkbfeey/x5+Ub1IbefBZrdm97obhjrQ1z7RUhh5uA02d7/ jWq8Lofx2tJ4ULGkl/t9x3CfgvCX+5vJ9OExvtVCcQCrmafL+Jm96KMBS3LxdXzc24e2 J4OAk1ituYH3s5rrvfOGTuuNJDbE/s+meDBq9eCY175M1oPrbTj3MyxdJjS6J6t7fKDN kB/g== X-Gm-Message-State: AOAM531dTvbdCA2fDVojzr57BFkkS3Zocb3E2bly9ManmUoxlfrs7kM5 //8dme6MnRtRM/+KJ5lrAbjERcp1LOTfyA== X-Google-Smtp-Source: ABdhPJwG1GRgs7lVnxQ/5kPhHbTYd2gd9wZ5B1hwdjNI2BsvFU2ZPk2RMyvio+56WyMOSRtdFgJiig== X-Received: by 2002:a05:6214:612:: with SMTP id z18mr7973952qvw.25.1596481031550; Mon, 03 Aug 2020 11:57:11 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id h24sm19517288qkk.72.2020.08.03.11.57.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:10 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:09 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 01/10] commit-graph: introduce 'get_bloom_filter_settings()' Message-ID: <08479793c1274d5ee0f063578bb0f4d93c910fa9.1596480582.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings *' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter *'. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau --- blame.c | 6 ++++-- bloom.c | 6 +++--- commit-graph.c | 11 +++++++++++ commit-graph.h | 2 ++ revision.c | 2 +- t/t4216-log-bloom.sh | 9 ++++++--- t/t5324-split-commit-graph.sh | 13 +++++++++++++ 7 files changed, 40 insertions(+), 9 deletions(-) diff --git a/blame.c b/blame.c index 82fa16d658..3e5f8787bc 100644 --- a/blame.c +++ b/blame.c @@ -2891,16 +2891,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb, const char *path) { struct blame_bloom_data *bd; + struct bloom_filter_settings *bs; if (!sb->repo->objects->commit_graph) return; - if (!sb->repo->objects->commit_graph->bloom_filter_settings) + bs = get_bloom_filter_settings(sb->repo); + if (!bs) return; bd = xmalloc(sizeof(struct blame_bloom_data)); - bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings; + bd->settings = bs; bd->alloc = 4; bd->nr = 0; diff --git a/bloom.c b/bloom.c index 1a573226e7..cd9380ac62 100644 --- a/bloom.c +++ b/bloom.c @@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, while (graph_pos < g->num_commits_in_base) g = g->base_graph; - /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ + /* The commit graph commit 'c' lives in doesn't carry Bloom filters. */ if (!g->chunk_bloom_indexes) return 0; @@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - r->objects->commit_graph->chunk_bloom_indexes) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) + return filter; } if (filter->data) diff --git a/commit-graph.c b/commit-graph.c index e51c91dd5b..d4b06811be 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -660,6 +660,17 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) +{ + struct commit_graph *g = r->objects->commit_graph; + while (g) { + if (g->bloom_filter_settings) + return g->bloom_filter_settings; + g = g->base_graph; + } + return NULL; +} + static void close_commit_graph_one(struct commit_graph *g) { if (!g) diff --git a/commit-graph.h b/commit-graph.h index 09a97030dc..0677dd1031 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); */ int generation_numbers_enabled(struct repository *r); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), diff --git a/revision.c b/revision.c index 6de29cdf7a..e244beed05 100644 --- a/revision.c +++ b/revision.c @@ -684,7 +684,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) if (!revs->repo->objects->commit_graph) return; - revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings; + revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo); if (!revs->bloom_filter_settings) return; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c21cc160f3..c9f9bdf1ba 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -60,7 +60,7 @@ setup () { test_bloom_filters_used () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\"" + bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\"" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom && @@ -134,8 +134,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' -test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' ' - test_bloom_filters_not_used "-- A/B" +test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' ' + # Ensure that the number of empty filters is equal to the number of + # filters in the latest graph layer to prove that they are loaded (and + # ignored). + test_bloom_filters_used "-- A/B" 3 ' test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 9b850ea907..5bdfd53ef9 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -425,4 +425,17 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success '--split=replace with partial Bloom data' ' + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/3 && + git rev-list -1 HEAD~2 >a && + git rev-list -1 HEAD~1 >b && + git commit-graph write --split=no-merge --stdin-commits --changed-paths graph-files && + test_line_count = 1 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Mon Aug 3 18:57:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A02F912 for ; Mon, 3 Aug 2020 18:57:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B651822BF3 for ; Mon, 3 Aug 2020 18:57:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="0jb+iYI5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726770AbgHCS5S (ORCPT ); Mon, 3 Aug 2020 14:57:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5R (ORCPT ); Mon, 3 Aug 2020 14:57:17 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54C1DC06174A for ; Mon, 3 Aug 2020 11:57:16 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id e5so15018264qth.5 for ; Mon, 03 Aug 2020 11:57:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=L5XHEvTDrEK98ejT2znG00tQPN0JOZt0+wPibc7SDdU=; b=0jb+iYI5a7MhXlkqksKGPkYFQM+cnOON9Vs2fM0xKFTPI26bNFWpLpByydOVQcUdBB Sn8VHPKuDBvTm9r/zSc+ihs1jDMhscABy0MXXilVlVn8WuALkDvGoPDCJI2u6lQZLmke kwjafTl+i35R8gBzsMMpJhCy4jEYxqqyJ/WgqyNhqkcRO0DR/4NtIH9b5l+mGPniTIhP kndBHmumfW3Ta+oXv9CJHdxqs2ko0NuGCnSwHORUpxsmwbFYnRlzfIokkG4hsNaZwfyV apmL8ec7Nd8wIlEnzhs+TNYpaC6Cee8D22Um6MJNQYJmHjPlc3wGj6jTwhfBRHTBdnsN 6syg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=L5XHEvTDrEK98ejT2znG00tQPN0JOZt0+wPibc7SDdU=; b=HlKZOumegBdubAod5LwUT03TzXskpd3iX6Dp8yxpK7EogWvUQzqM5k50arV5WOwtt4 4Ut7L3zxZTtOTLMtD5uTkx4Jzy5JYN5oJRBY68a9Im0PVAsHMefuvJS1Ja52nKcMh46w 3Zu1spne6wGUpS62Bpbmi1xDXZXP/kstKuM6lzjHQy47Iu3LqCxV8wgoL0Ee++qnTm1h Q6nFkodAZ5onCPUQxU7tW8nqQ+kIj5TG8APQ931l96+qgcwksRbPG+Hx3OtOymZQ1ePW YgRveOAH050tlLXcWyok46Fu9eUdSDew5AEZkgZCl3btq8N/DiKI7VC3JV5DiTDGCsA+ /Ykw== X-Gm-Message-State: AOAM530UcnH/nTMO/ReZG5YMa+NG4e7lG0OCWfX7BZsB2zBIjQ8DuUt7 3fxiGgjSUhVwrBiiSkyEUw2YIg0IDcmEng== X-Google-Smtp-Source: ABdhPJw0YFpyreaPvVGZDMD/Nb+iUUJaigu1sB6kRoVHvVXwkO8AIMSdruBBrycUhrwC7Jbi2Kl2Ag== X-Received: by 2002:ac8:73d1:: with SMTP id v17mr18187280qtp.51.1596481035200; Mon, 03 Aug 2020 11:57:15 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id y7sm23447052qta.36.2020.08.03.11.57.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:14 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:13 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 02/10] commit-graph: pass a 'struct repository *' in more places Message-ID: <52f8f7424eddc04b01882d73c39552e8e22a7178.1596480582.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. In the next patch, we will remove the object directory (and instead reference it with 'r->odb'). Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 2 +- commit-graph.c | 18 +++++++++++------- commit-graph.h | 6 ++++-- fuzz-commit-graph.c | 5 +++-- 4 files changed, 19 insertions(+), 12 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 523501f217..ba5584463f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index d4b06811be..81a6f2a8ce 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -224,7 +224,8 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) return 1; } -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb) { void *graph_map; @@ -240,7 +241,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -280,7 +281,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -445,7 +447,9 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) return NULL; } -static struct commit_graph *load_commit_graph_one(const char *graph_file, + +static struct commit_graph *load_commit_graph_one(struct repository *r, + const char *graph_file, struct object_directory *odb) { @@ -457,7 +461,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(r, fd, &st, odb); if (g) g->filename = xstrdup(graph_file); @@ -469,7 +473,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); return g; @@ -550,7 +554,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 0677dd1031..d9acb22bac 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,11 +75,13 @@ struct commit_graph { struct bloom_filter_settings *bloom_filter_settings; }; -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); From patchwork Mon Aug 3 18:57:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08BF1912 for ; Mon, 3 Aug 2020 18:57:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 582B522B45 for ; Mon, 3 Aug 2020 18:57:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="0TmGlyMy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726802AbgHCS5U (ORCPT ); Mon, 3 Aug 2020 14:57:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5U (ORCPT ); Mon, 3 Aug 2020 14:57:20 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11280C06174A for ; Mon, 3 Aug 2020 11:57:20 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id w9so29030508qts.6 for ; Mon, 03 Aug 2020 11:57:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LMuh9652CGyiO0bmz3tWXIVImi3zNm3ghnf+7Y714js=; b=0TmGlyMyqvK+V1LJ7tLNH0wRalU7FBTutcKzPtq14bag7bNBhuk1vDsBkLms2k06Lk 5tjS4p8alfg/PDXb3nHmAh86XgJPBRmuetuhLIrCO4pixZxbJMmr/tfCg9HNhcJ20D6u Wjv5YrQEwjRjcpnF3LZdtiPc1DhaWwpdQhjQciAAiWncy8DHliy0Xu9uRTBJSslFBpWH sfGyLcYnETpCmDFhM7gUp4OTMyl15r7NlXPAEghzkggfx7fyuPWZL5Z00YbI3SKsxgGb bi+GQsjIQ5g1igH8wBe2Yen1Nz7sfnh/k0mF1ppvxYNFixdtA56MHfD5xTbqeVlQ8tbC /yhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LMuh9652CGyiO0bmz3tWXIVImi3zNm3ghnf+7Y714js=; b=L0b5YRQzFbW6FWsaVnl7VxsKUliXFTO2cqT4Dz5EnwkWibi37xRbnslGJQrODKjExm 9Nl5TPK4Vrq2yy8CXAG3QbiSut+U6Gf7bltsyR+MwntfVKb+dzfxXZ58Pls1hh+6sce9 Eq34kGGFJ+yHAqRUHoM+Q/S1gjAevvv5NfoiE/FMXIUayfav4XS3B2xrcV77iQBTdfSt y/FgDTw6ToOEsNfQW6EIT2kbnQgka/GKvhoHrecwh44Y5uMNUp0cwcz4AWl/Wyg3Aa5w t29l9SYpFugJPjegr7Ee5Aj+mdkyifrBYNDgv5TRByTyIkYNAK/KwXToYldF+7YiP54j ZaFw== X-Gm-Message-State: AOAM530tRYByhlRVGVZlaw4r99SSUh2F70tOAWLHvoHVt9I8Lw3IkeMh jkU4CVvlXCOJal1mEVZHaNNLp5d6KqvKGw== X-Google-Smtp-Source: ABdhPJwCO1i9H4l2kHs2fMSJZCaDXmw6HqRWFq2ALkFoQJi39ouuRIgneNN20Y9J+TmBo+PSfPp3zA== X-Received: by 2002:ac8:d82:: with SMTP id s2mr18368388qti.164.1596481038868; Mon, 03 Aug 2020 11:57:18 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id p123sm18761565qkd.26.2020.08.03.11.57.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:18 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:17 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 03/10] t4216: use an '&&'-chain Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine Helped-by: Jeff King Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c9f9bdf1ba..fe19f6a60c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } From patchwork Mon Aug 3 18:57:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E26B513B1 for ; Mon, 3 Aug 2020 18:57:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E6BF22B45 for ; Mon, 3 Aug 2020 18:57:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="xfbjt72b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726823AbgHCS5Z (ORCPT ); Mon, 3 Aug 2020 14:57:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5Z (ORCPT ); Mon, 3 Aug 2020 14:57:25 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4776C06174A for ; Mon, 3 Aug 2020 11:57:24 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id d27so29035093qtg.4 for ; Mon, 03 Aug 2020 11:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=AtctLu256av06ik5IHDWiQKbEbZ8S9fTFgXOJvN+Mvg=; b=xfbjt72bw5MiqOKOa5MS7i2Ij1d6H07p5WHmdlZFWJErt6oaaXhVaozbeCGOsPuArC Di9GTr89Dk77JB/5eJb2YfvI+2sepip8frny3UVApjAnXNnk3CenpOW2lxCrLQjILVpt nObePyx3DJUdkltCY3wXezoXgyRRvH4vjlUPQZyUOJV0Zg6hw8w2D7ERSoqM0XRmsHET kQCu2TJBxVO54fo0Szg56fv3LOHJQNuUC/zYg3WLhfJww+zrIuFzBhFcQr0NF+6hqPBk /WGruTbehtdq4q/M59T2GSZV3j8BkjUMKiisGkA2s5dffktc39eLgknJ1q2ItzwHiz8U QuVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=AtctLu256av06ik5IHDWiQKbEbZ8S9fTFgXOJvN+Mvg=; b=tfKIY6Spk4QZ3AyavHJJGwCotzwI3CfmvTrK6/Un5swtuHFtmiS2ApTtrNcLJctW6u nlhkmFCl18eDwOjKr+JGec31BqklTQSo5NGR1Teqg3qHKLqAXVbsLlciHJ8jiePsMpWD 4jLH2+p9Qpt9ArIU1RC+k4WoU99Hakd6mLDs5Gf2Ojy8IsO4OK/QKQ2RyOMw9xDdyLm7 gGmXOvbwsPY9HH/Ej2DwC2lRnuG1vdmRRuBS6iGxPdwOEj4oQTQl5IKD2d1334hrAnkx zuOHiH818avOry5B05eS6IQ1baOMgdeeN4icP9D4TQ3rd9sdYqHB1LBXgP5pnJMtVcVh +81g== X-Gm-Message-State: AOAM5334/uxb9PdAerVgYCSM32IbwGDq+sWf+T5IL59edF6kZV+fOqA6 iQ4Eyc5zzKOUOflYCjkrYkqnlYPwOyX1vw== X-Google-Smtp-Source: ABdhPJyIvzyDxOz4uylLYl2JUiBkbbCzEYmXXIhEto3kc9KDzePgF+abbCbeYnIlM+loTSrINnyBMg== X-Received: by 2002:aed:2a82:: with SMTP id t2mr18012654qtd.280.1596481043747; Mon, 03 Aug 2020 11:57:23 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id k24sm22986844qtb.26.2020.08.03.11.57.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:23 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:22 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 04/10] t/helper/test-read-graph.c: prepare repo settings Message-ID: <42a0ca9549090c879d5c4d9cb53732d8dc4f0ef5.1596480582.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau --- t/helper/test-read-graph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), From patchwork Mon Aug 3 18:57:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D0F8912 for ; Mon, 3 Aug 2020 18:57:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8C40422BF3 for ; Mon, 3 Aug 2020 18:57:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ahk8D3Nc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726881AbgHCS5a (ORCPT ); Mon, 3 Aug 2020 14:57:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS53 (ORCPT ); Mon, 3 Aug 2020 14:57:29 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59281C06174A for ; Mon, 3 Aug 2020 11:57:29 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id x69so36171726qkb.1 for ; Mon, 03 Aug 2020 11:57:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=N0zqsq6o//WQWC8IODzWqdojyNhjf7P9OwKi320vYn4=; b=ahk8D3NcTZT5hhnmYptZXDAYNBXzT1DG8nKg5El1LPTzO+DyMbhDkdYAMXos24m7Dl csdGin89w7ZuMYCaSDb/rSLktxXJfIVKF/m/83O1jFz0WxWmbWyYhZdWOB3/4LSTDVKz 5fGhGhY1/huu0RKtRbT+l9aVEDB8GBwCEk+dPBvMZvhhYxGnt5B8TGjyYCktEzdTXukU ykFyN5UyaZqO+jHp6uSNnKwU52OSSmNbwv+e83BomBTiPSNZm5N2MtBUApEYbYG3Nbin wUnj77mr/AruloO0kD+RuFcEbSmHFz2smSzc5sV9BaqTODBfFmVDGzJLFUOePdsiyi/j dxHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=N0zqsq6o//WQWC8IODzWqdojyNhjf7P9OwKi320vYn4=; b=rFv3okZHyxBphQOlcWJmgreowFUuv4ao8qV9ImBJ1CJhB18bGjRAJNXK2xADDluGiR NhuWkgZYT+ahwnaTBbarO/gz5tpxMwPYCY7JNPYEPpXZ3dUfrQFs95Q7ICNsnCs/Q/OM GRGm9WGl+rntfTyed9P6psKPrxiOiHZRVw3kWLoGiUnQ7t4C5WaIdtxcVX03+2zAceMw tPRExkGmlb/kREXdW8ooQRn93VN1PPm7mi+VoLSxSzFzSWiAmHPjBAFctcUyEsF/bGab jI3tQdEj3zkSzUhGK5SaZi58NLv3o0MSyTs9SAjyAdfAm4Y85jQX/D5u6yEYQ41wn0aA 0nAA== X-Gm-Message-State: AOAM532olGw0WP06zJTko3uvogY4PMQg09z3/LN6rpDbk2Je7+o3GkfC GPmEt1+LuOYLWfxSOYk559zdbY5LWnTpWg== X-Google-Smtp-Source: ABdhPJwyfNhHULzEI/+6dpUuTfb2GNAzw2VfoXDjd3wLAp3qNXXxZZQw2c3xnwt0sNzH3XgpsPaI3w== X-Received: by 2002:a37:d201:: with SMTP id f1mr17554589qkj.188.1596481048165; Mon, 03 Aug 2020 11:57:28 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id g61sm22242058qtd.65.2020.08.03.11.57.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:27 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:26 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 05/10] commit-graph: respect 'commitgraph.readChangedPaths' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitgraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 ++ Documentation/config/commitgraph.txt | 4 ++++ commit-graph.c | 6 ++++-- repo-settings.c | 3 +++ repository.h | 1 + t/t4216-log-bloom.sh | 4 +++- 6 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 Documentation/config/commitgraph.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index ef0768b91a..78883c6e63 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -340,6 +340,8 @@ include::config/column.txt[] include::config/commit.txt[] +include::config/commitgraph.txt[] + include::config/credential.txt[] include::config/completion.txt[] diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt new file mode 100644 index 0000000000..bb78e72f1b --- /dev/null +++ b/Documentation/config/commitgraph.txt @@ -0,0 +1,4 @@ +commitgraph.readChangedPaths:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 81a6f2a8ce..cb9d7fea04 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -320,6 +320,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } + prepare_repo_settings(r); + graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -396,14 +398,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.commit_graph_read_changed_paths) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.commit_graph_read_changed_paths) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index 0918408b34..9e551bc03d 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "commitgraph.readchangedpaths", &value)) + r->settings.commit_graph_read_changed_paths = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.commit_graph_read_changed_paths, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 3c1f7d54bd..81759b7d27 100644 --- a/repository.h +++ b/repository.h @@ -29,6 +29,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fe19f6a60c..b3d1f596f8 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -90,7 +90,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config commitgraph.readChangedPaths false && + test_bloom_filters_not_used "$option -- $path" ' done done From patchwork Mon Aug 3 18:57:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D11613B1 for ; Mon, 3 Aug 2020 18:57:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6EECF22BF3 for ; Mon, 3 Aug 2020 18:57:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="hkz9kYFw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726942AbgHCS5d (ORCPT ); Mon, 3 Aug 2020 14:57:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5d (ORCPT ); Mon, 3 Aug 2020 14:57:33 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5465CC06174A for ; Mon, 3 Aug 2020 11:57:33 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id l13so11482632qvt.10 for ; Mon, 03 Aug 2020 11:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=XySvQ/33jS+7MSIqB0IzojUu84o+bNSIGkfxH8mq4q0=; b=hkz9kYFwAntxayUlguONG9SvGm26Q1RainmBFX2pI7wHNXC9Gxly4ZEKOdDNXrGJ2i klm7qwDEdsc5vewSp+bhtWTww5rQjp9fh99iWY0WRx6Vwyx5ydYUzQqdAhHnQkAC7doP GAJ4Eto9BFVexrbrisxdwPQK0aW1qlRg5dS4lR0ZViuLqVdEdwNOnsJOym5KakVtP2H3 Fz8VqVqFmuvSnP3htsrFtLrYDmfCpWRJfrjCjYNMSE+hmGWpG8yv7pW2UU7Gd/lz2akj h1GQMIklHK3zboF2kxTRFHguJn5xfXpJrSS+r1YL0IEQ93YxHstl2wJKfzRDDuUT5QpP qskA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=XySvQ/33jS+7MSIqB0IzojUu84o+bNSIGkfxH8mq4q0=; b=UJ8JU7QoKl0duwlHD656QJyg60STe26I0m/AFi+lCDUbWWpe/u7UscYfqeAETtlaMA pXz4iuoiwMNEvfhYw9+pgy/RvxRECNabzo/vmUXS/oRqMdAcJK9pE1ryZgqd/LxZ3bwM eoPMygIOVzAxr1zt5JIMRiqQ8O2epnojQCeNpMaM/wABWIcoCVXIwMXj3+txNQa/4YZd I3xjt9BLjGq6BpmW+Bt5bAsaZM/ppB34AaWgOshT1BqiEHuFtyYbrs+GETqc9VuiLVTn D+b0AVUO/Ujly+gEVC7QkraTf7maUGvI2+nWd2wWw842fj0V+ONDQeAk9sNkzAij9n04 aMWQ== X-Gm-Message-State: AOAM531nq8IUVzVpq5j1cG4aeWbdIZ7Wm2mo5cJpoLJJXofW11SGLRUu Xe0W9yV7Up9u/HLsJHoUeOKuanbdLXcX7g== X-Google-Smtp-Source: ABdhPJypusE8kpE2dCoZ9xf9TxiT6dRAwtvxhPgHcVpnNDXBvkQ4FJ2Qup9uBtzHR1HNRBEGef/7rw== X-Received: by 2002:a0c:a342:: with SMTP id u60mr7563845qvu.48.1596481052197; Mon, 03 Aug 2020 11:57:32 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id s184sm21372495qkf.50.2020.08.03.11.57.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:31 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:30 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 06/10] commit-graph.c: sort index into commits list Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For locality, 'compute_bloom_filters()' sorts the commits for which it wants to compute Bloom filters in a preferred order (cf., 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) for details). The subsequent patch will want to recover the new graph position of each commit. Since the 'packed_commit_list' already stores a double-pointer, avoid a 'COPY_ARRAY' and instead keep track of an index into the original list. (Use an integer index instead of a memory address, since this involves a needlessly confusing triple-pointer). Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to take into account the packed_commit_list they are sorting with respect to. Since 'compute_bloom_filters()' is the only caller for each of those comparison functions, no other call-sites need updating. Signed-off-by: Taylor Blau --- commit-graph.c | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index cb9d7fea04..d6ea556649 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -79,10 +79,18 @@ static void set_commit_pos(struct repository *r, const struct object_id *oid) *commit_pos_at(&commit_pos, commit) = max_pos++; } -static int commit_pos_cmp(const void *va, const void *vb) +struct packed_commit_list { + struct commit **list; + int nr; + int alloc; +}; + +static int commit_pos_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; return commit_pos_at(&commit_pos, a) - commit_pos_at(&commit_pos, b); } @@ -139,10 +147,12 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } -static int commit_gen_cmp(const void *va, const void *vb) +static int commit_gen_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; uint32_t generation_a = commit_graph_generation(a); uint32_t generation_b = commit_graph_generation(b); @@ -923,12 +933,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } -struct packed_commit_list { - struct commit **list; - int nr; - int alloc; -}; - struct packed_oid_list { struct object_id *list; int nr; @@ -1389,7 +1393,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; struct progress *progress = NULL; - struct commit **sorted_commits; + int *sorted_commits; init_bloom_filters(); @@ -1399,15 +1403,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->commits.nr); ALLOC_ARRAY(sorted_commits, ctx->commits.nr); - COPY_ARRAY(sorted_commits, ctx->commits.list, ctx->commits.nr); - - if (ctx->order_by_pack) - QSORT(sorted_commits, ctx->commits.nr, commit_pos_cmp); - else - QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + for (i = 0; i < ctx->commits.nr; i++) + sorted_commits[i] = i; + QSORT_S(sorted_commits, ctx->commits.nr, + ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, + &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { - struct commit *c = sorted_commits[i]; + int pos = sorted_commits[i]; + struct commit *c = ctx->commits.list[pos]; struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); From patchwork Mon Aug 3 18:57:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CBA5C912 for ; Mon, 3 Aug 2020 18:57:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B36822B45 for ; Mon, 3 Aug 2020 18:57:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="z2QdCo+c" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726968AbgHCS5h (ORCPT ); Mon, 3 Aug 2020 14:57:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5h (ORCPT ); Mon, 3 Aug 2020 14:57:37 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7EE6C06174A for ; Mon, 3 Aug 2020 11:57:36 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id a19so11844975qvy.3 for ; Mon, 03 Aug 2020 11:57:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ZxXVLs5DonXr+kjvzSFyk6wm7fSdLNDNaNqVO0Z40YE=; b=z2QdCo+clwKciSWCCkHmd6/lfgC8ZFfJ1O3CBEJs3MYjR+Wd0l5WCAEDF3u+p+bsFL j7md6CRIJ3TkFM6t+VDLE34JBqLyVbbqfwHKCqzs5oF2LLwtPJhyL2KsmHTI7jXFGmOk UrzTo7TDE/wn23LwDR4J9O757FR2PLaXG90w37ts5ORinmjwbPGHUrjkJAUsDqEA4FNh 4pEp40MNRBRH8NdQOZE9bFyRyYZAhl5mB0q3Hj9F/su2v7u+F08yJQ+2+swqwe2mwYn1 24VLVtDt4+Ch4tG9oXv9oXF1bAzyrFNvYcWe80oeYUaG8EgHnpa48gj4uk1keNQ39ClN X8qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ZxXVLs5DonXr+kjvzSFyk6wm7fSdLNDNaNqVO0Z40YE=; b=pjy4/for6zZ5ak8SxrcgME0+/eFTphUM4WBylecb16C87RSgHKqBpCKPvBnKpR/Cs0 lnl6zE/mFhI/D2AtBTueZo0ORZ/2x2S2UytlfesKSu4J2/Bu1wPaJ8I88WMOm7rUmgUI 03Bna4VanVQN5z7wzaavZ3kw1SbJH6WJuvoRqrrZhxsva/GktnA/1EEY1cJYdu5X103D +8Fq6DphQgI9jupxGi+dhq9qvyjr2KeBQyOgX7wKo/BBbbVQsSzhsUbkOpNjIeo9LZH8 bkM99Cu64x6goKlzt7Mcc+zI3UPnoY7FDH7qNcbr+Qbs527NJiLLyp6G6W5rZ81HEz9j aFdQ== X-Gm-Message-State: AOAM5329Wd9sDgqmvjGCME12RQi8F5SeQYnrEFdcJkl3sZpQaiSIv25V OR3ddOtwrluwa7t5lKSrv5IJyMUh+ujaoQ== X-Google-Smtp-Source: ABdhPJxgAZS29Nw9W4i6elY1RxSW60XjnhK57Sv2UYBSO7PJgtfRjEJEsVt2yKosJTZLfRUJSuyd8A== X-Received: by 2002:a0c:b591:: with SMTP id g17mr18495982qve.4.1596481055440; Mon, 03 Aug 2020 11:57:35 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id m17sm24148669qkn.45.2020.08.03.11.57.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:34 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:33 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 07/10] commit-graph: add large-filters bitmap chunk Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a commit has 512 changed paths or greater, the commit-graph machinery represents it as a zero-length filter since having many entries in the Bloom filter has undesirable effects on the false positivity rate. In addition to these too-large filters, the commit-graph machinery also represents commits with no filter and commits with no changed paths in the same way. When writing a commit-graph that aggregates several incremental commit-graph layers (eg., with '--split=replace'), the commit-graph machinery first computes all of the Bloom filters that it wants to write but does not already know about from existing graph layers. Because we overload the zero-length filter in the above fashion, this leads to recomputing large filters over and over again. This is already undesirable, since it means that we are wasting considerable effort to discover that a commit has at least 512 changed paths, only to throw that effort away (and then repeat the process the next time a roll-up is performed). In a subsequent patch, we will add a '--max-new-filters' option, which specifies an upper-bound on the number of new filters we are willing to compute from scratch. Suppose that there are 'N' too-large filters, and we specify '--max-new-filters=M'. If 'N >= M', it is unlikely that any filters will be generated, since we'll spend most of our effort on filters that we ultimately throw away. If 'N < M', filters will trickle in over time, but only at most 'M - N' per-write. To address this, add a new chunk which encodes a bitmap where the ith bit is on iff the ith commit has zero or at least 512 changed paths. When computing Bloom filters, first consult the relevant bitmap (in the case that we are rolling up existing layers) to see if computing the Bloom filter from scratch would be a waste of time. This patch implements a new chunk instead of extending the existing BIDX and BDAT chunks because modifying these chunks would confuse old clients. For example, consider setting the most-significant bit in an entry in the BIDX chunk to indicate that that filter is too-large. New clients would understand how to interpret this, but old clients would treat that entry as a really large filter. By avoiding the need to move to new versions of the BDAT and BIDX chunk, we can give ourselves more time to consider whether or not other modifications to these chunks are worthwhile without holding up this change. Another approach would be to introduce a new BIDX chunk (say, one identified by 'BID2') which is identical to the existing BIDX chunk, except the most-significant bit of each offset is interpreted as "this filter is too big" iff looking at a BID2 chunk. This avoids having to write a bitmap, but forces older clients to rewrite their commit-graphs (as well as reduces the theoretical largest Bloom filters we couldl write, and forces us to maintain the code necessary to translate BIDX chunks to BID2 ones). Separately from this patch, I implemented this alternate approach and did not find it to be advantageous. Signed-off-by: Taylor Blau --- .../technical/commit-graph-format.txt | 9 ++ commit-graph.c | 85 ++++++++++++++++++- commit-graph.h | 2 + t/t4216-log-bloom.sh | 20 ++++- 4 files changed, 112 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 440541045d..a7191c03d3 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -123,6 +123,15 @@ CHUNK DATA: of length zero. * The BDAT chunk is present if and only if BIDX is present. + Large Bloom Filters (ID: {'B', 'F', 'X', 'L'}) [Optional] + * It contains a list of 'eword_t's (the length of this list is determined by + the width of the chunk) which is a bitmap. The 'i'th bit is set exactly + when the 'i'th commit in the graph has a changed-path Bloom filter with + zero entries (either because the commit is empty, or because it contains + more than 512 changed paths). + * The BFXL chunk is present on newer versions of Git iff the BIDX and BDAT + chunks are also present. + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] This list of H-byte hashes describe a set of B commit-graph files that form a commit-graph chain. The graph position for the ith commit in this diff --git a/commit-graph.c b/commit-graph.c index d6ea556649..c870ffe0ed 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -41,8 +41,9 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ +#define GRAPH_CHUNKID_BLOOMLARGE 0x4246584c /* "BFXL" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 8 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -429,6 +430,17 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); } break; + + case GRAPH_CHUNKID_BLOOMLARGE: + if (graph->bloom_large.word_alloc) + chunk_repeated = 1; + else if (r->settings.commit_graph_read_changed_paths) { + uint32_t alloc = get_be64(chunk_lookup + 4) - chunk_offset; + + graph->bloom_large.word_alloc = alloc; + graph->bloom_large.words = (eword_t *)(data + chunk_offset); + } + break; } if (chunk_repeated) { @@ -443,6 +455,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, /* We need both the bloom chunks to exist together. Else ignore the data */ graph->chunk_bloom_indexes = NULL; graph->chunk_bloom_data = NULL; + graph->bloom_large.words = NULL; + graph->bloom_large.word_alloc = 0; FREE_AND_NULL(graph->bloom_filter_settings); } @@ -933,6 +947,21 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } +static int get_bloom_filter_large_in_graph(struct commit_graph *g, + const struct commit *c) +{ + uint32_t graph_pos = commit_graph_position(c); + if (graph_pos == COMMIT_NOT_FROM_GRAPH) + return 0; + + while (g && graph_pos < g->num_commits_in_base) + g = g->base_graph; + + if (!(g && g->bloom_large.word_alloc)) + return 0; + return bitmap_get(&g->bloom_large, graph_pos - g->num_commits_in_base); +} + struct packed_oid_list { struct object_id *list; int nr; @@ -969,6 +998,11 @@ struct write_commit_graph_context { const struct split_commit_graph_opts *split_opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + struct bitmap *bloom_large; + + int count_bloom_filter_known_large; + int count_bloom_filter_found_large; + int count_bloom_filter_computed; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1231,6 +1265,15 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, return 0; } +static int write_graph_chunk_bloom_large(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + trace2_region_enter("commit-graph", "bloom_large", ctx->r); + hashwrite(f, ctx->bloom_large->words, ctx->bloom_large->word_alloc * sizeof(eword_t)); + trace2_region_leave("commit-graph", "bloom_large", ctx->r); + return 0; +} + static int oid_compare(const void *_a, const void *_b) { const struct object_id *a = (const struct object_id *)_a; @@ -1389,6 +1432,24 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_known_large", + ctx->count_bloom_filter_known_large); + jw_object_intmax(&jw, "filter_found_large", + ctx->count_bloom_filter_found_large); + jw_object_intmax(&jw, "filter_computed", + ctx->count_bloom_filter_computed); + jw_end(&jw); + + trace2_data_json("commit-graph", the_repository, "bloom_statistics", &jw); + + jw_release(&jw); +} + static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; @@ -1396,6 +1457,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int *sorted_commits; init_bloom_filters(); + ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD); if (ctx->report_progress) progress = start_delayed_progress( @@ -1412,11 +1474,24 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) for (i = 0; i < ctx->commits.nr; i++) { int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_known_large++; + } else { + struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + if (!filter->len) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_found_large++; + } + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + ctx->count_bloom_filter_computed++; + } display_progress(progress, i + 1); } + if (trace2_is_enabled()) + trace2_bloom_filter_write_statistics(ctx); + free(sorted_commits); stop_progress(&progress); } @@ -1739,6 +1814,10 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) + ctx->total_bloom_filter_data_size; chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; num_chunks++; + chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMLARGE; + chunks[num_chunks].size = sizeof(eword_t) * ctx->bloom_large->word_alloc; + chunks[num_chunks].write_fn = write_graph_chunk_bloom_large; + num_chunks++; } if (ctx->num_commit_graphs_after > 1) { chunks[num_chunks].id = GRAPH_CHUNKID_BASE; diff --git a/commit-graph.h b/commit-graph.h index d9acb22bac..afbc5fa41e 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -4,6 +4,7 @@ #include "git-compat-util.h" #include "object-store.h" #include "oidset.h" +#include "ewah/ewok.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" @@ -71,6 +72,7 @@ struct commit_graph { const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + struct bitmap bloom_large; struct bloom_filter_settings *bloom_filter_settings; }; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index b3d1f596f8..231dc8a3a7 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -33,7 +33,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' git commit-graph write --reachable --changed-paths ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 1 $NUM_CHUNKS 0 num_commits: $1 @@ -195,5 +195,23 @@ test_expect_success 'correctly report changes over limit' ' done ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + bloom_trace_prefix="{\"filter_known_large\":$2,\"filter_found_large\":$3,\"filter_computed\":$4" + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $commit_graph_args && + grep "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation does not recompute too-large filters' ' + ( + cd 513changes && + git commit-graph write --reachable --changed-paths \ + --split=replace && + test_commit c1 filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ + 1 0 1 + ) +' test_done From patchwork Mon Aug 3 18:57:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 93427912 for ; Mon, 3 Aug 2020 18:57:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE2E522B45 for ; Mon, 3 Aug 2020 18:57:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ux3NmyfZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726998AbgHCS5m (ORCPT ); Mon, 3 Aug 2020 14:57:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5l (ORCPT ); Mon, 3 Aug 2020 14:57:41 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FB11C06174A for ; Mon, 3 Aug 2020 11:57:40 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id j10so11150945qvo.13 for ; Mon, 03 Aug 2020 11:57:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=vOk5ic67xvfei8DJOzwHeQejYjg9ZmCch1Q+Sa7yk0g=; b=ux3NmyfZDDSF7poGfzdF8RHM8ALAvqhxK7hkOZjHAT4qkdk5JnQ3Ru8bBkLms7XB6J scQKlyHmd+KOiTT0nmKLLlezQz7tmckGD/iDQlRkBf9vAqG1o2UOjEEFDun2/Zwjp8up tSIhtAgQPkC2lQCHMqyWF7KSoIO71+QferQ4TlA2MqLucKBemXrv2UmaQuYJE0ra3lAE lhAOXdOstj2ionL5Vc0kCdcxLe3yrgIho+2v4oPJ6lMdTnbKr/7Sdaf2KabYIOhgy8/y x4vntr3hn+Kn9ha9rYHeRo/9iIYiep40zZUXvzZZXtV9tHW3foQrkswNLry6NV1mqAMU /++w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=vOk5ic67xvfei8DJOzwHeQejYjg9ZmCch1Q+Sa7yk0g=; b=ACwhVXazA+1PjmOvX3mEAdCurFv/Y6tFCBfyNf9dipVklo/ssw0mhk2sTKCZwvuZd9 OO3tKVnQLHFfM0zJwomN8LsQA8/dvJqZGE9mJu3Lulr9xAGBqYNlWyBV7RL5NMO+lb8G XpswkYADqdVl4uS284hKTKMlwn/QLFdupQzEfZeNAkt9bVDdh8a2OOqVnN97cOnEoO3a RtH42W7uIgEho1NWdCtKzWKHgB2TrDaDj6aYm1tIaZFfIKshbfwAiizLFvcJ0l7+PCjI 6u+sO99Gh6CTImhYqPxmqlPix/xAf5rw5GkMkxA/1b1kYTx+xARetvE7qP1MaG9Uv2Wm TsIw== X-Gm-Message-State: AOAM532z6rhIxLkuRlqM1mhWwkQKSsGDZ2c/Y2rjp/bxTtGdTMvUaFg9 9WjjFTV1prlmpHD/7xFm5TGY7xnSL38bxg== X-Google-Smtp-Source: ABdhPJwVlPjBAJnN7VIUdyePY7VcirucOIhDRnN14E+sKIPH5X5mDMsxMas3/ln+Ah05+S+DB3dsRA== X-Received: by 2002:a0c:b604:: with SMTP id f4mr11724578qve.68.1596481059101; Mon, 03 Aug 2020 11:57:39 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id p53sm24029164qtc.85.2020.08.03.11.57.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:38 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:37 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 08/10] bloom: split 'get_bloom_filter()' in two Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another flag to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). Signed-off-by: Taylor Blau --- blame.c | 2 +- bloom.c | 13 ++++++++++--- bloom.h | 9 ++++++--- commit-graph.c | 6 +++--- line-log.c | 2 +- revision.c | 2 +- t/helper/test-bloom.c | 2 +- 7 files changed, 23 insertions(+), 13 deletions(-) diff --git a/blame.c b/blame.c index 3e5f8787bc..756285fca7 100644 --- a/blame.c +++ b/blame.c @@ -1275,7 +1275,7 @@ static int maybe_changed_path(struct repository *r, if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; - filter = get_bloom_filter(r, origin->commit, 0); + filter = get_bloom_filter(r, origin->commit); if (!filter) return 1; diff --git a/bloom.c b/bloom.c index cd9380ac62..a8a21762f4 100644 --- a/bloom.c +++ b/bloom.c @@ -177,9 +177,10 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present) +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -187,6 +188,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct diff_options diffopt; int max_changes = 512; + if (computed) + *computed = 0; + if (!bloom_filters.slab_size) return NULL; @@ -273,6 +277,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, filter->len = 0; } + if (computed) + *computed = 1; + free(diff_queued_diff.queue); DIFF_QUEUE_CLEAR(&diff_queued_diff); diff --git a/bloom.h b/bloom.h index d8fbb0fbf1..2fdc2918f8 100644 --- a/bloom.h +++ b/bloom.h @@ -80,9 +80,12 @@ void add_key_to_filter(const struct bloom_key *key, void init_bloom_filters(void); -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present); +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed); + +#define get_bloom_filter(r, c) get_or_compute_bloom_filter((r), (c), 0, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index c870ffe0ed..2e765b26d5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1214,7 +1214,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1253,7 +1253,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1478,7 +1478,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_known_large++; } else { - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + struct bloom_filter *filter = get_or_compute_bloom_filter(ctx->r, c, 1, NULL); if (!filter->len) { bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_found_large++; diff --git a/line-log.c b/line-log.c index c53692834d..9e58fd185a 100644 --- a/line-log.c +++ b/line-log.c @@ -1159,7 +1159,7 @@ static int bloom_filter_check(struct rev_info *rev, return 1; if (!rev->bloom_filter_settings || - !(filter = get_bloom_filter(rev->repo, commit, 0))) + !(filter = get_bloom_filter(rev->repo, commit))) return 1; if (!range) diff --git a/revision.c b/revision.c index e244beed05..73b59d2134 100644 --- a/revision.c +++ b/revision.c @@ -756,7 +756,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; - filter = get_bloom_filter(revs->repo, commit, 0); + filter = get_bloom_filter(revs->repo, commit); if (!filter) { count_bloom_filter_not_present++; diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index f0aa80b98e..76a28a2199 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -39,7 +39,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) struct bloom_filter *filter; setup_git_directory(); c = lookup_commit(the_repository, commit_oid); - filter = get_bloom_filter(the_repository, c, 1); + filter = get_or_compute_bloom_filter(the_repository, c, 1, NULL); print_bloom_filter(filter); } From patchwork Mon Aug 3 18:57:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0146B912 for ; Mon, 3 Aug 2020 18:57:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4AF0722B45 for ; Mon, 3 Aug 2020 18:57:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="LUWRA1YK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727012AbgHCS5o (ORCPT ); Mon, 3 Aug 2020 14:57:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5o (ORCPT ); Mon, 3 Aug 2020 14:57:44 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AA52C06174A for ; Mon, 3 Aug 2020 11:57:44 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id a19so11845142qvy.3 for ; Mon, 03 Aug 2020 11:57:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Hbhm77ll047MbjjIMnqf5YD9r2bu7O0Hp8srHXLyFPY=; b=LUWRA1YKyDe0x4N6zG5O2ACkFQsRwKTPBTps4GqXasG3+aOZuHMcrOrv5sY7UhdGXS m8Uy2utpOaxqH/3nXV7x/AzDbARgyy6oeWgtUqZyjdJ430jCwHORwaEt19vyJMOpsd/m FohVBPRtkk7Ie7+gKoAjh96HdF4zUtScO/TLQuhJX8EbUfCDDyYF7i2g1e/20hiqzbP8 NRpgpW/aOq0BVAn9vTGYuRh0AGJv1w1Gzt8BocmfpI0Xmp43E2Wb2utzR+ib3xUW1Zkn C0rgIYKoNxgOwmjQ72uLJwVrMR/rifFhIbNlplPnjy8aXlN/rUFJc22QgCicwEs5YZUS 68dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Hbhm77ll047MbjjIMnqf5YD9r2bu7O0Hp8srHXLyFPY=; b=aN4GlscZBXwIVhuklW2n0Ie8bMXMN1Q58yeyNWhi6vuYcN6Gf2YtSH6NAKsi8kboLU DaJkX9cG3uxk2uFbmVc2KBkQ0baCyoj/T4uaXnS1txGYDRWtMJ2eYRd/ORLlsXKEG76s IpyNYaiS/RpIH8knpolNmN2Q0KuEBF/A/QVzL2dn4OKO44YTypy0h5in/ApIbRGSUT0g 2I2H0+7lVjYjPv7wjFt5kwASfuOodj/KG/JgTBtZE3eNXXIpcrpHXYzOWZeKvDEFv0Pm Fnw+OLvz/RmmSFeweOC47jd7p9Ti4/xHg8/ieuP0dSRC3cn1zIXYXPJC3PnscjqwRFwj bYvA== X-Gm-Message-State: AOAM530PrDM3HXJSolWLigTNqvatuE8ITe8H4ezsUAD5KYi0+DxnY7fI 3pizbauG4EYft2fWo3zPO01ogGaolMaT2w== X-Google-Smtp-Source: ABdhPJwEjbFcjdW/mn2MsYZm6o/hHlSFJ1fkPMTX7SWDdXvn0pLi4HPV4HQfNNJrCBr6uzjmQ4DFgA== X-Received: by 2002:a0c:d7c9:: with SMTP id g9mr18919588qvj.83.1596481062822; Mon, 03 Aug 2020 11:57:42 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id p189sm20210395qkb.61.2020.08.03.11.57.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:42 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:41 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 09/10] commit-graph: rename 'split_commit_graph_opts' Message-ID: <309e76bb174d64ffae6454f2bc9968c6c2632aff.1596480582.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a subsequent commit, additional options will be added to the commit-graph API, which themselves do not have to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Suggsted-by: Derrick Stolee Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 20 ++++++++++---------- commit-graph.c | 40 ++++++++++++++++++++-------------------- commit-graph.h | 6 +++--- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ba5584463f..38f5f57d15 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,7 +119,7 @@ static int graph_verify(int argc, const char **argv) } extern int read_replace_refs; -static struct split_commit_graph_opts split_opts; +static struct commit_graph_opts write_opts; static int write_option_parse_split(const struct option *opt, const char *arg, int unset) @@ -187,24 +187,24 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + OPT_CALLBACK_F(0, "split", &write_opts.flags, NULL, N_("allow writing an incremental commit-graph file"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, write_option_parse_split), - OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + OPT_INTEGER(0, "max-commits", &write_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), - OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + OPT_INTEGER(0, "size-multiple", &write_opts.size_multiple, N_("maximum ratio between two levels of a split commit-graph")), - OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), OPT_END(), }; opts.progress = isatty(2); opts.enable_changed_paths = -1; - split_opts.size_multiple = 2; - split_opts.max_commits = 0; - split_opts.expire_time = 0; + write_opts.size_multiple = 2; + write_opts.max_commits = 0; + write_opts.expire_time = 0; trace2_cmd_mode("write"); @@ -232,7 +232,7 @@ static int graph_write(int argc, const char **argv) odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(odb, flags, &split_opts)) + if (write_commit_graph_reachable(odb, flags, &write_opts)) return 1; return 0; } @@ -261,7 +261,7 @@ static int graph_write(int argc, const char **argv) opts.stdin_packs ? &pack_indexes : NULL, opts.stdin_commits ? &commits : NULL, flags, - &split_opts)) + &write_opts)) result = 1; cleanup: diff --git a/commit-graph.c b/commit-graph.c index 2e765b26d5..8626024faa 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -995,7 +995,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; - const struct split_commit_graph_opts *split_opts; + const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; struct bitmap *bloom_large; @@ -1327,8 +1327,8 @@ static void close_reachable(struct write_commit_graph_context *ctx) { int i; struct commit *commit; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -1520,7 +1520,7 @@ static int add_ref_to_set(const char *refname, int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct oidset commits = OIDSET_INIT; struct refs_cb_data data; @@ -1537,7 +1537,7 @@ int write_commit_graph_reachable(struct object_directory *odb, stop_progress(&data.progress); result = write_commit_graph(odb, NULL, &commits, - flags, split_opts); + flags, opts); oidset_clear(&commits); return result; @@ -1652,8 +1652,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) static void copy_oids_to_commits(struct write_commit_graph_context *ctx) { uint32_t i; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; ctx->num_extra_edges = 0; if (ctx->report_progress) @@ -1951,13 +1951,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) int max_commits = 0; int size_mult = 2; - if (ctx->split_opts) { - max_commits = ctx->split_opts->max_commits; + if (ctx->opts) { + max_commits = ctx->opts->max_commits; - if (ctx->split_opts->size_multiple) - size_mult = ctx->split_opts->size_multiple; + if (ctx->opts->size_multiple) + size_mult = ctx->opts->size_multiple; - flags = ctx->split_opts->flags; + flags = ctx->opts->flags; } g = ctx->r->objects->commit_graph; @@ -2135,8 +2135,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) size_t dirnamelen; timestamp_t expire_time = time(NULL); - if (ctx->split_opts && ctx->split_opts->expire_time) - expire_time = ctx->split_opts->expire_time; + if (ctx->opts && ctx->opts->expire_time) + expire_time = ctx->opts->expire_time; if (!ctx->split) { char *chain_file_name = get_chain_filename(ctx->odb); unlink(chain_file_name); @@ -2187,7 +2187,7 @@ int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -2203,7 +2203,7 @@ int write_commit_graph(struct object_directory *odb, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->split_opts = split_opts; + ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) @@ -2243,15 +2243,15 @@ int write_commit_graph(struct object_directory *odb, } } - if (ctx->split_opts) - replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx->opts) + replace = ctx->opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; } ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) - ctx->oids.alloc = split_opts->max_commits; + if (ctx->split && opts && ctx->oids.alloc > opts->max_commits) + ctx->oids.alloc = opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->odb); diff --git a/commit-graph.h b/commit-graph.h index afbc5fa41e..4cd991cf26 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -107,7 +107,7 @@ enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_REPLACE = 2 }; -struct split_commit_graph_opts { +struct commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; @@ -122,12 +122,12 @@ struct split_commit_graph_opts { */ int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); #define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) From patchwork Mon Aug 3 18:57:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11698611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4453912 for ; Mon, 3 Aug 2020 18:57:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A09D207DF for ; Mon, 3 Aug 2020 18:57:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="yjWGdjFo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727024AbgHCS5s (ORCPT ); Mon, 3 Aug 2020 14:57:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgHCS5r (ORCPT ); Mon, 3 Aug 2020 14:57:47 -0400 Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD316C06174A for ; Mon, 3 Aug 2020 11:57:47 -0700 (PDT) Received: by mail-qt1-x829.google.com with SMTP id x12so20538334qtp.1 for ; Mon, 03 Aug 2020 11:57:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=JeRO2NgYO+kIRBpTg0Hz4axY3e+jf7fqBJOKkN/8VyI=; b=yjWGdjFo66nzcmKkF6AumIc15tCsalw/ehRFZGL/7I7JjNi+43slDuBRj7XAiPPiET 5FoLfJt0njNAqj3hpXeQw+8HiA//96OhaW6U8j5Im8XClCsuvThjiT/VnWBrDL5bIuA3 HBurGh+0+WEWB7PBQI5uuknHfQkK1I9JQbE4zhdqsALqMtHmLrbOPm1BHa2RQLjCvnZp XDQP90dhl8CbTzJqLpOSlbJFqxDDObcW+DgHtweVNiup4JOOoK7iC2dRqUg/XtO3WGw2 6hkGb11H+bTx9fN14pCin29PQ1UOyuaLOecwbXd2uYqXr4Lo08fJZxxDdleV03biKMTN SNcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JeRO2NgYO+kIRBpTg0Hz4axY3e+jf7fqBJOKkN/8VyI=; b=khBFn6sKJ4vkKCTfS5pNKKrxtS3ucDcbUfNp5pZG0PvGnAt3E77Z0M7ZTIU0DmhoRm v5QYjAyVY2NqmFOE/SLH/u5DtPmU5YI6jif7HOkRIWhZbHQ/W/4v4pZ3DO7GnHgtJBW/ B1iWn7apUh8mOCaB/fwYCT9aECwDS6jNHsQCpHspd/o+ZyjJo5+n2PGC+8q31vigEGII xvjFVB2SGSxdC2g6+HWECJWsSpmHJ6+LIgFHF6CnZMXAD3q3x7p54BctST7uxHB6cwzd MSbSXHewYOIcaIPRzWAQMQJnASx0Rp9XEZUtCchIUDkNgFQMqOfe8o192lB4Etz9KiPg dTMQ== X-Gm-Message-State: AOAM5329KF6R8ytHCoX8N+QSZo297B1ArKfcsC7CLKtEy9LzGkNp1hDt hk6am4/EFQvQ3msH0c95bcQP0/lkPr6SXw== X-Google-Smtp-Source: ABdhPJwnOwWKGpTkaP7K+B/KDsu2Ms9YQDn1BRZcADtYPSej3azoB1rGwOwoyW0JwnoXT2dUyCbC/A== X-Received: by 2002:ac8:51d7:: with SMTP id d23mr18062724qtn.382.1596481066345; Mon, 03 Aug 2020 11:57:46 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:3475:b417:c07c:c811]) by smtp.gmail.com with ESMTPSA id i7sm18616588qtb.27.2020.08.03.11.57.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 11:57:45 -0700 (PDT) Date: Mon, 3 Aug 2020 14:57:44 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 10/10] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: <8b670571dc24973cc5e894b866a68236d8fbfa4f.1596480582.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag and configuration variable to fill in the 'max_new_filters' variable introduced by the previous patch. The command-line option '--max-new-filters' takes precedence over 'graph.maxNewFilters', which is the default value. '--no-max-new-filters' can also be provided, which sets the value back to '-1', indicating that an unlimited number of new Bloom filters may be generated. (OPT_INTEGER only allows setting the '--no-' variant back to '0', hence a custom callback was used instead). Signed-off-by: Taylor Blau --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 4 +++ bloom.c | 15 +++++++++++ builtin/commit-graph.c | 39 +++++++++++++++++++++++++--- commit-graph.c | 22 +++++++++++----- commit-graph.h | 1 + t/t4216-log-bloom.sh | 19 ++++++++++++++ 7 files changed, 95 insertions(+), 9 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index bb78e72f1b..46f48d1aa8 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitgraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitgraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..e8034ff0ac 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,10 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Overrides the `graph.maxNewFilters` configuration. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index a8a21762f4..1742beac9e 100644 --- a/bloom.c +++ b/bloom.c @@ -51,6 +51,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, else start_index = 0; + if ((start_index == end_index) && + (g->bloom_large.word_alloc && !bitmap_get(&g->bloom_large, lex_pos))) { + /* + * If the filter is zero-length, either (1) the filter has no + * changes, (2) the filter has too many changes, or (3) it + * wasn't computed (eg., due to '--max-new-filters'). + * + * If either (1) or (2) is the case, the 'large' bit will be set + * for this Bloom filter. If it is unset, then it wasn't + * computed. In that case, return nothing, since we don't have + * that filter in the graph. + */ + return 0; + } + filter->len = end_index - start_index; filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 38f5f57d15..aaf54a76db 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "graph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index 8626024faa..87ea0437e2 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1455,6 +1455,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; int *sorted_commits; + int max_new_filters; init_bloom_filters(); ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD); @@ -1471,6 +1472,9 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, &ctx->commits); + max_new_filters = ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; @@ -1478,13 +1482,19 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_known_large++; } else { - struct bloom_filter *filter = get_or_compute_bloom_filter(ctx->r, c, 1, NULL); - if (!filter->len) { - bitmap_set(ctx->bloom_large, pos); - ctx->count_bloom_filter_found_large++; + int computed = 0; + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, c, + ctx->count_bloom_filter_computed < max_new_filters, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_found_large++; + } } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; - ctx->count_bloom_filter_computed++; + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * (filter ? filter->len : 0); } display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index 4cd991cf26..0842d13744 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -112,6 +112,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 231dc8a3a7..985bea23b2 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -214,4 +214,23 @@ test_expect_success 'Bloom generation does not recompute too-large filters' ' ) ' +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd 513changes && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 1 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd 513changes && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 1 0 1 + ) +' + test_done