From patchwork Wed Sep 9 15:22:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765949 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4CBB9618 for ; Wed, 9 Sep 2020 17:19:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28BDC21D80 for ; Wed, 9 Sep 2020 17:19:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ruHoZabo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730099AbgIIRTo (ORCPT ); Wed, 9 Sep 2020 13:19:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726535AbgIIP1Y (ORCPT ); Wed, 9 Sep 2020 11:27:24 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C81DAC0619C9 for ; Wed, 9 Sep 2020 08:22:48 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id q5so2801198qkc.2 for ; Wed, 09 Sep 2020 08:22:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FhZG+r1AQwjgsxulvANHzQaEJA/C1Zm7QPuO1vm3olc=; b=ruHoZabo1WTtDDaK5IlHA32Xa/K7BxrB+o3mm1O4R9FxVcS51WLrhjL00DcRfIhazi d9aMS7Ly4VTdv3qaMEFBz58hK1GRJBUxTawCfdPvtgX2CIM0XsqjzpOtDFRjQMFRle7l 3qM1YQXBXxjy0X9j+UuHfJDL4kSQaPyrRdvhHGVgiDwG2v5fsZcQkINij9NKrHQ+uNkL SQFvR9XFEvQSDiYbe4rY22GntUiksAMeqkRFIRYcayPwenihUJT9BJvJcZF4/RZdhsyp jKyHZGOu1nIAiPvfHrCCIriv7jiJJ6qT5BUhrlQZySUqyNHxgreTd4XCGCifNhEIkzB8 bROQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FhZG+r1AQwjgsxulvANHzQaEJA/C1Zm7QPuO1vm3olc=; b=J3oAaLHyqyWDHUzY4hhDjB3B/BvJHj3jgGKiVNBds7X7rlBvlbOXPPazqUVriOQiUM OESXaXfejwPlnxaHvdDBBMDuGS6rY976wBp1tC9hlDYBX0S+wsbeVxMzpIZslfDy1Rws 1dP24sA+4dOFxCt6Kt21mD5TyWiOnlWHfLK4u3rqOtUYUx+JHJkRZ3i5w0R8DDM2Qz3+ VH4t0QcWIwca4gDJLh9LO7VT8pD5P/7tROJfM3wGhOImDM3R3s03gNeH/FkTYtN81M+J 87hkIlwBaWPU5Yciogha2j28Lm0lm/WzQ5XzAIQgXo3rV77pNK03x+9ZFpVs2R1miWUw kRKg== X-Gm-Message-State: AOAM532+eQVnDhvllC0fGXndKbl1CoHwnePtHjkb91AyA7iUh5TRdNSL R/tdbtGlCC9D15CGwKrZtgNt4K07Igj+UXyw X-Google-Smtp-Source: ABdhPJxfXxKDKACKGkXkOPjyOjQLVX7rubtvPfxMgL4+TVatFU7NHfcE6NR9exhD259c90L68DzQ3g== X-Received: by 2002:a05:620a:1473:: with SMTP id j19mr1883346qkl.369.1599664967453; Wed, 09 Sep 2020 08:22:47 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id u55sm3540810qtu.42.2020.09.09.08.22.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:22:46 -0700 (PDT) Date: Wed, 9 Sep 2020 11:22:44 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 01/12] commit-graph: introduce 'get_bloom_filter_settings()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed in practice to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings *' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter *'. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau --- blame.c | 6 ++++-- bloom.c | 6 +++--- commit-graph.c | 11 +++++++++++ commit-graph.h | 2 ++ revision.c | 5 +---- t/t4216-log-bloom.sh | 9 ++++++--- t/t5324-split-commit-graph.sh | 13 +++++++++++++ 7 files changed, 40 insertions(+), 12 deletions(-) diff --git a/blame.c b/blame.c index 1be1cd82a2..903e23af23 100644 --- a/blame.c +++ b/blame.c @@ -2892,16 +2892,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb, const char *path) { struct blame_bloom_data *bd; + struct bloom_filter_settings *bs; if (!sb->repo->objects->commit_graph) return; - if (!sb->repo->objects->commit_graph->bloom_filter_settings) + bs = get_bloom_filter_settings(sb->repo); + if (!bs) return; bd = xmalloc(sizeof(struct blame_bloom_data)); - bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings; + bd->settings = bs; bd->alloc = 4; bd->nr = 0; diff --git a/bloom.c b/bloom.c index 1a573226e7..cd9380ac62 100644 --- a/bloom.c +++ b/bloom.c @@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, while (graph_pos < g->num_commits_in_base) g = g->base_graph; - /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ + /* The commit graph commit 'c' lives in doesn't carry Bloom filters. */ if (!g->chunk_bloom_indexes) return 0; @@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - r->objects->commit_graph->chunk_bloom_indexes) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) + return filter; } if (filter->data) diff --git a/commit-graph.c b/commit-graph.c index 0ed003e218..6a36ed0b06 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -667,6 +667,17 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) +{ + struct commit_graph *g = r->objects->commit_graph; + while (g) { + if (g->bloom_filter_settings) + return g->bloom_filter_settings; + g = g->base_graph; + } + return NULL; +} + static void close_commit_graph_one(struct commit_graph *g) { if (!g) diff --git a/commit-graph.h b/commit-graph.h index 09a97030dc..0677dd1031 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); */ int generation_numbers_enabled(struct repository *r); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), diff --git a/revision.c b/revision.c index 08c2ad23af..857274408c 100644 --- a/revision.c +++ b/revision.c @@ -680,10 +680,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) repo_parse_commit(revs->repo, revs->commits->item); - if (!revs->repo->objects->commit_graph) - return; - - revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings; + revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo); if (!revs->bloom_filter_settings) return; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 4bb9e9dbe2..715912ad0f 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -65,7 +65,7 @@ setup () { test_bloom_filters_used () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\"" + bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\"" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom && @@ -139,8 +139,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' -test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' ' - test_bloom_filters_not_used "-- A/B" +test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' ' + # Ensure that the number of empty filters is equal to the number of + # filters in the latest graph layer to prove that they are loaded (and + # ignored). + test_bloom_filters_used "-- A/B" 3 ' test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 18216463c7..c334ee9155 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -427,4 +427,17 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success '--split=replace with partial Bloom data' ' + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/3 && + git rev-list -1 HEAD~2 >a && + git rev-list -1 HEAD~1 >b && + git commit-graph write --split=no-merge --stdin-commits --changed-paths graph-files && + test_line_count = 1 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Wed Sep 9 15:22:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765945 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F1AF618 for ; Wed, 9 Sep 2020 17:19:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E370621D7E for ; Wed, 9 Sep 2020 17:19:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Fgs0yulu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730500AbgIIRTp (ORCPT ); Wed, 9 Sep 2020 13:19:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726883AbgIIP1Y (ORCPT ); Wed, 9 Sep 2020 11:27:24 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D5D3C0619CA for ; Wed, 9 Sep 2020 08:22:56 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id y11so2205253qtn.9 for ; Wed, 09 Sep 2020 08:22:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cPefqbMWPlnCd/bpwOqnWQD/4KM4Ty9r1KCsjRikXYI=; b=Fgs0yulu+gvCuz7l4g0dMUY93O8tdCefqx+VcxOegkKCbif2TMv1zBp7cSv6XmEzYn 1eK73XOq00w4l0EFHP+ujziCxY/wF+C5Y87181TFlrBL8vNIRvU1uXlDOV9D+CbVWCvi DmPOAE8MVKvQvBGH3UWYHySFZFFQCxLs6SeyNDCrUS6KNcVEqS7DDpt0/w5V6hzjeaYi cT/bk66Gtb6Jg5PpBPrT2uQBSmfOtwphHSLloNeARFWBidLj7AC8MbXa4Ha7k0NBKti8 DnKzxuUXd+N9Se8+tL+QAmeDRnVBIQZ+WD5Nnuxj8jVtgP7EwquPqaCT0dbtqAgw/8tE FI/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cPefqbMWPlnCd/bpwOqnWQD/4KM4Ty9r1KCsjRikXYI=; b=UB6RWbRvhyR+6z/ahonW6C3vCbts/ZA7JEqXSrgh8q/OOxyHBtE1/LTys2K0fhS94i z/lDDA/HvYt2gurjcPIIti1XSypH87yciCL/qnsumKlxfYS0lkw7ldfqOPA6Jz3e3gaf JKO6xHoPInfIm10+uMLx3TXmtPh9nNJo9sRy3imx7vNwkMl2L4wKmJu/QwRD2OD2ZXKr pq/JkM3LY0O2DNV7TWcL3W11UekkYp7tUtqZGRG1MSzoUzlX+g1ePw9jL5M6pLpl0A7h DaMKLSbcnrPUkWDgt4LaO+jYD4o1WhwaU5P7b71NEvcu30kVf+CkPjbltPonwHkiKDBr 0c5A== X-Gm-Message-State: AOAM532wJXJputaGpurAZDE2d2wqZnUFai9TeYY5f7c72yVv2s3N2Qs0 xKnZpG5Apqh93C8y6XEYIwxIAHOUM/7zmgWh X-Google-Smtp-Source: ABdhPJzJsiUM2T9uH/0K0hB8QvVxW0ibRXlmic8wPG3AtAG9oO/g5IHvFIu1ZBeFBiTNqJdGnkM+rg== X-Received: by 2002:ac8:12c1:: with SMTP id b1mr3845397qtj.148.1599664973461; Wed, 09 Sep 2020 08:22:53 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id g131sm2777880qkb.135.2020.09.09.08.22.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:22:52 -0700 (PDT) Date: Wed, 9 Sep 2020 11:22:50 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 02/12] t4216: use an '&&'-chain Message-ID: <44df936e51e88e41d5ed9aa02d0bd3ca86696f09.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine Helped-by: Jeff King Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 715912ad0f..cd89c75002 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -58,7 +58,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } From patchwork Wed Sep 9 15:22:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7EEEB138E for ; Wed, 9 Sep 2020 15:29:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5C03422275 for ; Wed, 9 Sep 2020 15:29:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="fORIFwlm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729663AbgIIP3U (ORCPT ); Wed, 9 Sep 2020 11:29:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728442AbgIIP1Y (ORCPT ); Wed, 9 Sep 2020 11:27:24 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70B9FC0619CB for ; Wed, 9 Sep 2020 08:23:00 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id q63so2002506qkf.3 for ; Wed, 09 Sep 2020 08:23:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Ye5G3SmcAbeoLvNM8waHHba3R+gSvPixSydParYIhyg=; b=fORIFwlmqglP9fObDFglm1FudL/XWUGofxmVlAQVS4zSKKzsOeBVuXimsXJujmwGdg ISQJZNrO3KviXoSJFdX3ICcSNhvPdf1mmvpiJfhosAGkEZJlJyXWhMSw76pnrX6y+WVR PHiQwqYT8qt0cvKqwBWr9JMiLn2mc+Gt/IISjOSc6T/KkauYB5YWDRquVRCyOTNWzoVF 5/oFGOsfjSXEpAkuzsJ8/3nZbMDta86ZGxAGfFyBvvFLbxUC7UZaoanZ1oZwiydYDQ1j yeuvCKdatWE/XL0/lPNEWRp4CI483bsjU9rgUe1PjlKbCwX77hInD5zm6KtBt88d668k 1uHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Ye5G3SmcAbeoLvNM8waHHba3R+gSvPixSydParYIhyg=; b=WBNjqK3nqyivB7gWFN9AEZHhSJSfDp7Jc7k+UKkv/0vqv8qgGsSm/RkxDZvsmSGt6q aRlHRpazM3JWLEN4c+rH5Zi4LVpACMi/3fJCclLvJJjxGQyZSc++Z84USXOq+QM2W24t hqbE4cF3/T38/xVjkK38JUw60T/ldIjv/CJFCCKtwOPMmU4jX5uU3FFMfPGkVEZObVM9 Jjat/UCj4LOV0Ao1cGm6RHYGWZ8x7p5K/udp8RtiXl1fYKibQstELOzesQfr4Vqyxrdb ju2+ERPsWn7O6dgDyRMAdQAwlviQ5CJhUiqLnNzX9+mMXqWl71U0dXwfSESPNj5zAl8l PWaQ== X-Gm-Message-State: AOAM5306BhlCLqjJLpsEZD2pD91oW9R3nYKY0VZY8GDp0BCrbpHlV3ot uKbHn1mh7UQzG9el/0qVw+HIC8og+yFAXLxA X-Google-Smtp-Source: ABdhPJxGlx2oe3AH1K8Th2cP1nzh/lVuHXFvMPH6wBO8n+SfJXhNb46ZdJDrzGuw6A0h0tEaV1KHpw== X-Received: by 2002:a37:6248:: with SMTP id w69mr3567104qkb.448.1599664979197; Wed, 09 Sep 2020 08:22:59 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id j1sm3371152qtk.91.2020.09.09.08.22.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:22:58 -0700 (PDT) Date: Wed, 9 Sep 2020 11:22:56 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 03/12] commit-graph: pass a 'struct repository *' in more places Message-ID: <0817b0f618700ced34af6c597eee7312993b0d1f.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 2 +- commit-graph.c | 17 ++++++++++------- commit-graph.h | 6 ++++-- fuzz-commit-graph.c | 5 +++-- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 523501f217..ba5584463f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index 6a36ed0b06..72a838bd00 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -231,7 +231,8 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) return 1; } -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb) { void *graph_map; @@ -247,7 +248,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -287,7 +288,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -452,7 +454,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) return NULL; } -static struct commit_graph *load_commit_graph_one(const char *graph_file, +static struct commit_graph *load_commit_graph_one(struct repository *r, + const char *graph_file, struct object_directory *odb) { @@ -464,7 +467,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(r, fd, &st, odb); if (g) g->filename = xstrdup(graph_file); @@ -476,7 +479,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); return g; @@ -557,7 +560,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 0677dd1031..d9acb22bac 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,11 +75,13 @@ struct commit_graph { struct bloom_filter_settings *bloom_filter_settings; }; -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); From patchwork Wed Sep 9 15:23:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765927 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D734618 for ; Wed, 9 Sep 2020 17:18:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE40721919 for ; Wed, 9 Sep 2020 17:18:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="K78X6TWG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730605AbgIIRSb (ORCPT ); Wed, 9 Sep 2020 13:18:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729992AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5FDFC0619CF for ; Wed, 9 Sep 2020 08:23:07 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id 16so2787799qkf.4 for ; Wed, 09 Sep 2020 08:23:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LIRW4oc5ufASPAzIyTY4rQFIS6pi6+of9Ao6fxrspCs=; b=K78X6TWG2ihZUpuLFIgsQXW7PZSD/o5lQxrmGGXl7dM9I+ROz4zGVetIWeWpEwYu4u 4vASEirh2W92NmIUsRKe+yKiKuNMjsUDwsdHQ/TZtFcuWBBzLa05qk6JF/FfIlLITr4K hZOZ6SIa/xDJn0LuOwyDCTFNRLzgdv4ZnzensI0HYHNDJLC3EBgZvWOKGQVTmB505UNQ acJhag8qX4I+WfTzj7SxMg/zyWF950MUUnk7UXMbZz6/LTj1JvLmAQNquGTyRumkvp4F hTnpuqi8vzm3QuORnfnnTUbW0kwskTC6bYL7L3NoDjGTTGvyYgAD7hIKLY6CgaC1VDaT UJxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LIRW4oc5ufASPAzIyTY4rQFIS6pi6+of9Ao6fxrspCs=; b=gYFzncK77YlmyB6wQVswRa0GCkXCnSvSPkJ0lJpEzsqPX1luWTFxWWa0bm7N8+P0/O h+R6hdFD+tBwaWCPakPf5e1I+rao7dlJHAxlxV9k3bM5rc1q8mHlSD8XnOUPPbHVuKIy LoCRdK+ZioFIOrmK/UbYxQKZJzemdS6l0Z+JClRy4n0PzwGgNlU0VEjj7WGjdvFJkH6/ VkhG5S1Ytfp0VrP6nq1IDhFvXNi+a68kYprRgqFIJObUqURBWaxCkHofMSBMCmwMlhHX WSFTvYysHM65fR5i+X9MJJGWhG4TyM+i5rfPhDrd9ZFqmLLo33aM7EtRDf6ZQ53+5jyI dEog== X-Gm-Message-State: AOAM533nbqa+mG3xEFGpYPogBNQ6abQctEQK5eUTVZUvzyZHuhNrjfzZ 3XkBhsP4lL6ZDnxU71U5YfktzibaVNQrHuZO X-Google-Smtp-Source: ABdhPJxf8mV+fiugbBqfPXJBFJ/upNenezyVturGdn9A2UFhmjD7L7wXWdvFpOBigzHFxlGcmNOoiw== X-Received: by 2002:a05:620a:1107:: with SMTP id o7mr3671711qkk.191.1599664986715; Wed, 09 Sep 2020 08:23:06 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id z29sm3336763qtj.79.2020.09.09.08.23.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:06 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:03 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 04/12] t/helper/test-read-graph.c: prepare repo settings Message-ID: <809972ba5f261c72146efa0461c19cb34daaa513.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau --- t/helper/test-read-graph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), From patchwork Wed Sep 9 15:23:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765929 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AC0092C for ; Wed, 9 Sep 2020 17:18:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4B49D21919 for ; Wed, 9 Sep 2020 17:18:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Vogp7pck" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730157AbgIIRSa (ORCPT ); Wed, 9 Sep 2020 13:18:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727010AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B1A7C0619D0 for ; Wed, 9 Sep 2020 08:23:14 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id w186so2805775qkd.1 for ; Wed, 09 Sep 2020 08:23:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ZqqrSe8DWrSv3n9OxhpstTZbIPqLuNiAMEVti0nm9wQ=; b=Vogp7pckCMvf0cldxYCrAg1FtN/cejS6yW1axgiqqnan7tOtqqCrscGwaaMgxvgY3Y qdoQxA1c/rXnMKV9+zS66aXbwgfSkXluaaxVvSIeFGayHa/hr832lRY1EYqja/RmfNQT 5z8KzVzS6aiRny3LE8As6/tZyEDRWCyqfa+fQbCZny87r5xDgtqOCr7jXgQ5EkZxHmDQ 14CzzV14lQZoWNOWfFgbO0BCeHODFir65fGWQS5MfuDqsdRYsnWWBBQc0CFy7e9AbOMZ i7GfALwWlsN9yzuIWbci1ltZU1YaWICIZh5U6ejTTLZR/SA00ZRw3NPdnfxjaEaLCahF FCWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ZqqrSe8DWrSv3n9OxhpstTZbIPqLuNiAMEVti0nm9wQ=; b=VbbZr31xnLWTvc1F5ltRwVKe79Wotgq1AffoVx00lnWCMXFFAKHwcwMPyu8pdwfPt3 r/8UVyzQEJTQ/MEZYnjZVX2y3VH3B1Hax3HBuwFnzRtGghtkCsoQPEtpDLHA9A74iMJx Ev4XDfj+lBMMSbeQdUnkLVpIhQ9Pr1JRyHM4MDWXjNV0sb28wuZdx6ID2i4aqxDwkL52 FjuL1q06SNblffEQA3Cc6GW3uOujs6EpppmMhi7cb0Sfj+WanKYSnObb+CFmwk310gaL w2YkBBiOBkyzQLT5dECD2FQXVXv0BqyZlglPvA6yBvWmemcG9UUbNERiVo1Ldx07mmhq 9XJA== X-Gm-Message-State: AOAM530ohq8kfHezdz3x2hBDR11wX56vfHgkgCl4YfDyqgUEnIbFHOEv KrPuI3NHppUnkYAaDp501GQoFfUdCqWJPWX5 X-Google-Smtp-Source: ABdhPJxfiq/oqbLf1D4iLrBYxuqWn0X8bh9Tq0VtsmmGihWoYl6kWI4RRyLl+q6lLuGghyGBmfR1aA== X-Received: by 2002:a37:ad08:: with SMTP id f8mr3605698qkm.207.1599664993305; Wed, 09 Sep 2020 08:23:13 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id p37sm3409361qtp.31.2020.09.09.08.23.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:12 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:10 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 05/12] commit-graph: respect 'commitGraph.readChangedPaths' Message-ID: <63c4a4901b6d695a29a44d0c734eea04087e71ca.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 ++ Documentation/config/commitgraph.txt | 4 ++++ commit-graph.c | 6 ++++-- repo-settings.c | 3 +++ repository.h | 1 + t/t4216-log-bloom.sh | 4 +++- 6 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 Documentation/config/commitgraph.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 3042d80978..770ae79b82 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -340,6 +340,8 @@ include::config/column.txt[] include::config/commit.txt[] +include::config/commitgraph.txt[] + include::config/credential.txt[] include::config/completion.txt[] diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt new file mode 100644 index 0000000000..cff0797b54 --- /dev/null +++ b/Documentation/config/commitgraph.txt @@ -0,0 +1,4 @@ +commitGraph.readChangedPaths:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 72a838bd00..ea54d108b9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -327,6 +327,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } + prepare_repo_settings(r); + graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -403,14 +405,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.commit_graph_read_changed_paths) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.commit_graph_read_changed_paths) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index aa61a35338..88ccce2036 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "commitgraph.readchangedpaths", &value)) + r->settings.commit_graph_read_changed_paths = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.commit_graph_read_changed_paths, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 628c834367..bacf843d46 100644 --- a/repository.h +++ b/repository.h @@ -30,6 +30,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index cd89c75002..fc7693806c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -95,7 +95,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config commitgraph.readChangedPaths false && + test_bloom_filters_not_used "$option -- $path" ' done done From patchwork Wed Sep 9 15:23:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765933 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16DEE618 for ; Wed, 9 Sep 2020 17:19:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECBE92166E for ; Wed, 9 Sep 2020 17:19:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="HEgSpWm6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730189AbgIIRS7 (ORCPT ); Wed, 9 Sep 2020 13:18:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730068AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C23AC0619D1 for ; Wed, 9 Sep 2020 08:23:23 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id r8so2174109qtp.13 for ; Wed, 09 Sep 2020 08:23:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FPOqANpWsbvGcEr6wxQXoaCC/xQdjTeq/HASE97f3aE=; b=HEgSpWm6POXRZ61ga2A1Bp0I8MwuvfwP5is+P/y0nibEg6u1KpfsmHFAAbgI9bJ4lB Q9WPdyNgQvh1RBidXev+/Hrz8KZXBTG90qy/IgGSwAHlKOj9hueWm2qFjrs4vIUk9dki zkW52NCXOAac4mdTmhZA/WQzU2nTbm9hZWTEvGVrb5UKRbkRicHFfN99BppM7kVyU111 EgDXBWV+8JP3PzhZKgWgB2vbvGzZvgU5cUj2vJN2ReGW6u5hfTyDGaK0KQ14mXBmICaB TQPxObyWWuqniiKAhJudNVFzxGW8F8hZ7h9WOGiLR4g71Ui9VSxvhpcQP5MdCKaI7NBt DRZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FPOqANpWsbvGcEr6wxQXoaCC/xQdjTeq/HASE97f3aE=; b=UR2rGe0jf7N+x1OblzMXbGosWQVu1E6agSAjHLlPcl181uSyK3PO089L9eV2GZ/ZRA gEd7ux7gj8po9fvc251UR1vhy6m23fj48BrCCdjGi2BDdxCUiwwMzNPmyeZaWkEwpmtU fzmgAugEc42xUqK/I9dRf2PE8Sz0kojAASPmf++5RecoYzZAzIntF/CliGC4wqynJcay EZaOAYTdGV1y0FadxRovco9wRwsW9qaC60JK48/oSeLRdoihm/tI/HPIgrlq8/+8WB86 IiMFHSKOP6qIjG0Xu725xk116XNdZKJzpilv/9WrQVbK/tRD0ATLOvFuIKf3l11JnpvW Gv0w== X-Gm-Message-State: AOAM532K9PSNB9dKUIglSSIvV/5gALtzRKX8TJ1HW+8BmOQCwvysktoH zAlf2/rz+QuaDTCi9lHX3nRhqSuXq43A+jYp X-Google-Smtp-Source: ABdhPJwsVEOJNrCajyawMCOtmRp/nZJOJUbAav9k6lCYik3IsQgKrfJxaNKMwl/FB1cGuKXYChMTmQ== X-Received: by 2002:ac8:c4e:: with SMTP id l14mr3764994qti.149.1599665002277; Wed, 09 Sep 2020 08:23:22 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id p68sm2899973qkd.124.2020.09.09.08.23.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:21 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:18 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 06/12] commit-graph.c: store maximum changed paths Message-ID: <9bd699560b5489d64ebc830f0b3cdbba42c373f1.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau --- bloom.h | 11 ++++++++++- commit-graph.c | 3 +++ t/t4216-log-bloom.sh | 4 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/bloom.h b/bloom.h index d8fbb0fbf1..0b9b59a6fe 100644 --- a/bloom.h +++ b/bloom.h @@ -28,9 +28,18 @@ struct bloom_filter_settings { * that contain n*b bits. */ uint32_t bits_per_entry; + + /* + * The maximum number of changed paths per commit + * before declaring a Bloom filter to be too-large. + * + * Not written to the commit-graph file. + */ + uint32_t max_changed_paths; }; -#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 } +#define DEFAULT_BLOOM_MAX_CHANGES 512 +#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10, DEFAULT_BLOOM_MAX_CHANGES } #define BITS_PER_WORD 8 #define BLOOMDATA_CHUNK_HEADER_SIZE 3 * sizeof(uint32_t) diff --git a/commit-graph.c b/commit-graph.c index ea54d108b9..55af498aa0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1201,6 +1201,7 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version); jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes); jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry); + jw_object_intmax(&jw, "max_changed_paths", ctx->bloom_settings->max_changed_paths); jw_end(&jw); trace2_data_json("bloom", ctx->r, "settings", &jw); @@ -1669,6 +1670,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; } diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fc7693806c..47ddf2641f 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -174,11 +174,11 @@ test_expect_success 'persist filter settings' ' GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \ GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2.txt && GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \ GIT_TRACE2_EVENT_NESTING=5 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt ' test_expect_success 'correctly report changes over limit' ' From patchwork Wed Sep 9 15:23:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765939 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF0EF92C for ; Wed, 9 Sep 2020 17:19:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8EB82166E for ; Wed, 9 Sep 2020 17:19:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="E3PWezdN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730592AbgIIRTD (ORCPT ); Wed, 9 Sep 2020 13:19:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730082AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B17AC0619D3 for ; Wed, 9 Sep 2020 08:23:31 -0700 (PDT) Received: by mail-qk1-x732.google.com with SMTP id w186so2806486qkd.1 for ; Wed, 09 Sep 2020 08:23:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ed/p05KNT2ha2ye02bbF8wtMQRHBsX0sBud6bq5eOfc=; b=E3PWezdNf+mHlPlcT59Myz5u1wTaORLJb9Wsu14D+7iBh8sUNrafH56p5kzPSlt7Eq I4u8XDEYHPDhqjMzkItP7GbtXB4XXzEL9ilSzo09PhkexfLUlsDAFMoy4UOUoc1OJHsR HCYYLQmiNKug7Cru4gUdTorWR+DBlfuYbgqPX3gZqRURO4E4A2kENiv3sX64kQJGHqOf aGD6RaV0uxnnE+503EggwJto4wFAcL1/pWW4I3ujX5Ej5Y14cvsbaU1xRgLAbpx51uRf QGKXApoQ2xy3TzBtoedg1sCDQYsD2DDHzYt4oVebJZpLl7f9VyTLrNVGHJigObe5WNIS YllA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ed/p05KNT2ha2ye02bbF8wtMQRHBsX0sBud6bq5eOfc=; b=J2wFNtcB6TbC324zEoBp4utGO0ptXs7PH7MbgOeMbZ3G64+hH+rzoU4y+ei6166bXa XK7cQSPVIHJzHkdqSNSxqWTe3rW70B9P8TYDYX5I92A2XIibCS3yYNqcMiZo3lPCRMN1 mcqvR5Kh6bmHNCtjnZiv8zTic3DgxOSXKuxa57AP91oHtN9t1OF2ONKaVqtnYp6U8HIi o2e2kDq+fjoa2h0I1O+lB/HCqly6lWbjWBosp6OepCA8fmxq8Ioj1E38kDZNjacPMWNv Gtkk31kPnjMsnurFyUowN3P5+jqu1bbzbjlth0sEEm0OXUzbK7jeDq5k3M/E0bbBCowR 5dKw== X-Gm-Message-State: AOAM532tRQQf64fpMADLKm1fYrtZso6YdvC5KFbmZkaxXsk+OYDwWt6r B7QEuvzyPkYBZbIvfzfUtuxIFO6dUN1GblVi X-Google-Smtp-Source: ABdhPJzruS1DtTYs71sWyyHaxqyx5jf5yvOuzBOczvlV4zxYlZtmETQOWVAY8YytQrfWEcAPeBm7UQ== X-Received: by 2002:a05:620a:55d:: with SMTP id o29mr3822870qko.12.1599665009017; Wed, 09 Sep 2020 08:23:29 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id a20sm3041300qtw.45.2020.09.09.08.23.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:28 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:25 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 07/12] bloom: split 'get_bloom_filter()' in two Message-ID: <7475ce47ebcdb9c666e658ab8727b2d1df384c8d.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau --- blame.c | 2 +- bloom.c | 16 +++++++++++++--- bloom.h | 16 +++++++++++++--- commit-graph.c | 42 +++++++++++++++++++++++++++++++++++++++--- line-log.c | 2 +- revision.c | 2 +- t/helper/test-bloom.c | 3 ++- 7 files changed, 70 insertions(+), 13 deletions(-) diff --git a/blame.c b/blame.c index 903e23af23..e5ba35dbd1 100644 --- a/blame.c +++ b/blame.c @@ -1276,7 +1276,7 @@ static int maybe_changed_path(struct repository *r, if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; - filter = get_bloom_filter(r, origin->commit, 0); + filter = get_bloom_filter(r, origin->commit); if (!filter) return 1; diff --git a/bloom.c b/bloom.c index cd9380ac62..393c61b4bc 100644 --- a/bloom.c +++ b/bloom.c @@ -177,9 +177,10 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present) +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + enum bloom_filter_computed *computed) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -187,6 +188,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct diff_options diffopt; int max_changes = 512; + if (computed) + *computed = BLOOM_NOT_COMPUTED; + if (!bloom_filters.slab_size) return NULL; @@ -271,8 +275,14 @@ struct bloom_filter *get_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); filter->data = NULL; filter->len = 0; + + if (computed) + *computed |= BLOOM_TRUNC_LARGE; } + if (computed) + *computed |= BLOOM_COMPUTED; + free(diff_queued_diff.queue); DIFF_QUEUE_CLEAR(&diff_queued_diff); diff --git a/bloom.h b/bloom.h index 0b9b59a6fe..e2e035ad14 100644 --- a/bloom.h +++ b/bloom.h @@ -89,9 +89,19 @@ void add_key_to_filter(const struct bloom_key *key, void init_bloom_filters(void); -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present); +enum bloom_filter_computed { + BLOOM_NOT_COMPUTED = (1 << 0), + BLOOM_COMPUTED = (1 << 1), + BLOOM_TRUNC_LARGE = (1 << 2), +}; + +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + enum bloom_filter_computed *computed); + +#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ + (r), (c), 0, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index 55af498aa0..b8b2c7ca65 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -971,6 +971,10 @@ struct write_commit_graph_context { const struct split_commit_graph_opts *split_opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + + int count_bloom_filter_computed; + int count_bloom_filter_not_computed; + int count_bloom_filter_trunc_large; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1182,7 +1186,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1222,7 +1226,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1392,6 +1396,24 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_computed", + ctx->count_bloom_filter_computed); + jw_object_intmax(&jw, "filter_not_computed", + ctx->count_bloom_filter_not_computed); + jw_object_intmax(&jw, "filter_trunc_large", + ctx->count_bloom_filter_trunc_large); + jw_end(&jw); + + trace2_data_json("commit-graph", the_repository, "bloom_statistics", &jw); + + jw_release(&jw); +} + static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; @@ -1414,12 +1436,26 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); for (i = 0; i < ctx->commits.nr; i++) { + enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + &computed); + if (computed & BLOOM_COMPUTED) { + ctx->count_bloom_filter_computed++; + if (computed & BLOOM_TRUNC_LARGE) + ctx->count_bloom_filter_trunc_large++; + } else if (computed & BLOOM_NOT_COMPUTED) + ctx->count_bloom_filter_not_computed++; ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } + if (trace2_is_enabled()) + trace2_bloom_filter_write_statistics(ctx); + free(sorted_commits); stop_progress(&progress); } diff --git a/line-log.c b/line-log.c index bf73ea95ac..68eeb425f8 100644 --- a/line-log.c +++ b/line-log.c @@ -1159,7 +1159,7 @@ static int bloom_filter_check(struct rev_info *rev, return 1; if (!rev->bloom_filter_settings || - !(filter = get_bloom_filter(rev->repo, commit, 0))) + !(filter = get_bloom_filter(rev->repo, commit))) return 1; if (!range) diff --git a/revision.c b/revision.c index 857274408c..f4be5d1650 100644 --- a/revision.c +++ b/revision.c @@ -751,7 +751,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; - filter = get_bloom_filter(revs->repo, commit, 0); + filter = get_bloom_filter(revs->repo, commit); if (!filter) { count_bloom_filter_not_present++; diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 5e77d56f59..9f7bb729fc 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -39,7 +39,8 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) struct bloom_filter *filter; setup_git_directory(); c = lookup_commit(the_repository, commit_oid); - filter = get_bloom_filter(the_repository, c, 1); + filter = get_or_compute_bloom_filter(the_repository, c, 1, + NULL); print_bloom_filter(filter); } From patchwork Wed Sep 9 15:23:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765931 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9620618 for ; Wed, 9 Sep 2020 17:18:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B56A21D7F for ; Wed, 9 Sep 2020 17:18:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="VNx4oq4p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730988AbgIIRSs (ORCPT ); Wed, 9 Sep 2020 13:18:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726801AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA876C0619E4 for ; Wed, 9 Sep 2020 08:23:38 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id k25so2233055qtu.4 for ; Wed, 09 Sep 2020 08:23:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=BiRjnylEnwVA2iK7QdJpzD0m8pgZZ1JnlR/W6/KtEAM=; b=VNx4oq4p6ZONNoye5f3Xv6hfTDw+WIPmk4KObTJMGWDibPjNjuHyq7EBwatTjt+CQc FKnF9nW0rSLzuevKUkuEVDYLz1yWPzi21ucnSyd4WUU8qHEgEVJtMkV/7U+G5GQ1GDQK UfTYjyBvxQWDCRxA0i8emhuA/d1I6jvy9N88Jew/HXY3Y5Xaez3AOZmLRS3XKSo2U1TE O/tIGThWMhkL+N9dlXIi43Nm0nKcmgDtFs/tTRxG6efL43sfrcc0iaZO8apB/St5pm4G yIYtKUa+45kZ9vQae7rV95QFZYqDZ0XxYOJJXs6m3offek1S3Zux4QcFUPSyZYg0UIZJ 8R6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=BiRjnylEnwVA2iK7QdJpzD0m8pgZZ1JnlR/W6/KtEAM=; b=Zz2YpZxcqMaDd2pePYJSB/uCAKXm8uc25mPE+tywXnE3VXJOklQqo3Joa6ThcIn9sc 6QW741+wKBbguxHuJlW2QLGgzi1FMOk07J/KcKkbjpfuI9HX0iQCzTgouigwu1AZTTI/ rrnY849T+h8/IPxkx3XAiPnH7J3YLIsUeIwCi8mQ340fpgQFLs16eURCwyiBXBf4+OY4 T8HlwbivhZYD8OFc3/ipVB5XSvHyGYHpkwIR8ul+mYzF+hI0205FMtz1A18hKATTpuVK yitLTreUrgqEn22RkPadvIwsvp5z7u0tuEfWizxSwXvMgNWF5XvNEMXb4z9GI9XGqgBb fYEQ== X-Gm-Message-State: AOAM5311FCQmTXa8yPVZ5b9uTW4dwSf9xO4nrIzXWJ/8BfKYsLpOa9bY Mt4PxNDqjQ93gPDqRMgiMQKjvflqHOxoz3DI X-Google-Smtp-Source: ABdhPJzsoVN+h1ljXBkarqz8H/nyFU29DKUao09QSyhs47EE58TQovLnyCq8lMfqBnM+5IdNOaxCdw== X-Received: by 2002:ac8:794f:: with SMTP id r15mr3631963qtt.147.1599665015147; Wed, 09 Sep 2020 08:23:35 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id f8sm3134094qtx.81.2020.09.09.08.23.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:34 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 08/12] bloom: use provided 'struct bloom_filter_settings' Message-ID: <2841ecf152fc42a9955e7609c174959d525f0830.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau --- bloom.c | 13 ++++++------- bloom.h | 3 ++- commit-graph.c | 21 ++++++++++----------- t/helper/test-bloom.c | 1 + 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/bloom.c b/bloom.c index 393c61b4bc..2d6aef9098 100644 --- a/bloom.c +++ b/bloom.c @@ -180,13 +180,12 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, enum bloom_filter_computed *computed) { struct bloom_filter *filter; - struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; int i; struct diff_options diffopt; - int max_changes = 512; if (computed) *computed = BLOOM_NOT_COMPUTED; @@ -211,7 +210,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; diffopt.detect_rename = 0; - diffopt.max_changes = max_changes; + diffopt.max_changes = settings->max_changed_paths; diff_setup_done(&diffopt); /* ensure commit is parsed so we have parent information */ @@ -223,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= max_changes) { + if (diffopt.num_changes <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -260,13 +259,13 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } - filter->len = (hashmap_get_size(&pathmap) * settings.bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; - fill_bloom_key(e->path, strlen(e->path), &key, &settings); - add_key_to_filter(&key, filter, &settings); + fill_bloom_key(e->path, strlen(e->path), &key, settings); + add_key_to_filter(&key, filter, settings); } hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); diff --git a/bloom.h b/bloom.h index e2e035ad14..c6d77e8393 100644 --- a/bloom.h +++ b/bloom.h @@ -98,10 +98,11 @@ enum bloom_filter_computed { struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, enum bloom_filter_computed *computed); #define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL) + (r), (c), 0, NULL, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index b8b2c7ca65..500f29525a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1442,6 +1442,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->r, c, 1, + ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { ctx->count_bloom_filter_computed++; @@ -1699,17 +1700,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int num_chunks = 3; uint64_t chunk_offset; struct object_id file_hash; - struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; - - if (!ctx->bloom_settings) { - bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", - bloom_settings.bits_per_entry); - bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", - bloom_settings.num_hashes); - bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", - bloom_settings.max_changed_paths); - ctx->bloom_settings = &bloom_settings; - } if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -2155,6 +2145,7 @@ int write_commit_graph(struct object_directory *odb, uint32_t i, count_distinct = 0; int res = 0; int replace = 0; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; if (!commit_graph_compatible(the_repository)) return 0; @@ -2168,6 +2159,14 @@ int write_commit_graph(struct object_directory *odb, ctx->split_opts = split_opts; ctx->total_bloom_filter_data_size = 0; + bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", + bloom_settings.bits_per_entry); + bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", + bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); + ctx->bloom_settings = &bloom_settings; + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 9f7bb729fc..46e97b04eb 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -40,6 +40,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) setup_git_directory(); c = lookup_commit(the_repository, commit_oid); filter = get_or_compute_bloom_filter(the_repository, c, 1, + &settings, NULL); print_bloom_filter(filter); } From patchwork Wed Sep 9 15:23:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765935 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADE17618 for ; Wed, 9 Sep 2020 17:19:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87C5321919 for ; Wed, 9 Sep 2020 17:19:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="dprUjl1V" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726440AbgIIRTG (ORCPT ); Wed, 9 Sep 2020 13:19:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729296AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65638C0619E5 for ; Wed, 9 Sep 2020 08:23:47 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id v54so2216006qtj.7 for ; Wed, 09 Sep 2020 08:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=Wa1BtU9l6GiqicBRM3GDhThj4OrG3T20TsdMMvOs4t8=; b=dprUjl1VsgncVoM1Y13PR3tlV9ePabn8lwVvBfmOZGwhvHX/lY7meJtoc8yQ1xGRWb pBupX0RLVjXegMCjojGaLzamVSSo8LC4iJUeEh5k7TJ/fSrOlVJvP/f2RFnneJczIgF+ YUpnsDPMsExNCRnMInH00eJGZtOd3skAwquK1L26cU3In3BapLtySPjIs2KCyz8M13Ne R1aqq5HZI5j8UoRQGYIm4EqM/C8BsWf7Ss7xwvxB5CxRriALBE68GeWXKM5jfVBPg28C f0Ifhp4kTdwy1GYjNMkz9DvahblebaTqmSICcXBXy10aXjLNzKV3yxFLdNnNVHfcvn4D Z+zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=Wa1BtU9l6GiqicBRM3GDhThj4OrG3T20TsdMMvOs4t8=; b=Uyme/SFiz/DqzBMcaW9WxR3Gx6xzH+13I2HKs2QyK/s6KwWZBVj1j3GMnyzLVjVDH1 iBGWnCIjo/7bH69MMe6lpFCw/fSvSnuXKS0yjO/OGKVx0reKsqWVuxySlcTeo/lYfaNK 41wsGnLlBh6xCG/0w0QQXVD9Gtl7ranUtkzZh/Z8MZo4Niq/NddyDxW5WbKhv1RSYG+k 2a3AzzarOH+49SQzrrYhuxkNS8xORNaFMRDlieME0QGFklsW5OR3JcArLMb7tnc0MNR4 MIYsPMu0xrGtwYwI6Alyu1s84Ct8UpPlbjPKHPq4gNs14dVyQUxmNtkZAd1ccXS+ecHM m/zw== X-Gm-Message-State: AOAM533biza1hlzWTiPPfh3Fm7GrnYSL+T6ltRF+8USifUo4+KpXMcVB uLj86QzmmM9n2WMGWygQkf9Bt/nPJ5cOX2gI X-Google-Smtp-Source: ABdhPJwTzreHrIXGriDt1doCAE1OVrVIYM/r5SkO5gLH8U4FRAEIR+RF/y/uweTxF3Upye/ZkePnGg== X-Received: by 2002:aed:2f42:: with SMTP id l60mr3802262qtd.234.1599665025960; Wed, 09 Sep 2020 08:23:45 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id w59sm3279355qtd.1.2020.09.09.08.23.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:45 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:42 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 09/12] bloom/diff: properly short-circuit on max_changes Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee Commit e3696980 (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are | | N=1 | N=10 | N=512 | |--------|----------------|----------------|----------------| | HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB | | HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB | For Linux, the results are | | N=1 | N=20 | N=512 | |--------|----------------|---------------|---------------| | HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB | | HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB | Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- bloom.c | 9 ++++- diff.h | 2 - t/t4216-log-bloom.sh | 88 +++++++++++++++++++++++++++++++++++++++----- tree-diff.c | 5 +-- 4 files changed, 88 insertions(+), 16 deletions(-) diff --git a/bloom.c b/bloom.c index 2d6aef9098..db9fb82437 100644 --- a/bloom.c +++ b/bloom.c @@ -222,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= settings->max_changed_paths) { + if (diff_queued_diff.nr <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -259,6 +259,12 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } + if (hashmap_get_size(&pathmap) > settings->max_changed_paths) { + if (computed) + *computed |= BLOOM_TRUNC_LARGE; + goto cleanup; + } + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); @@ -268,6 +274,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, add_key_to_filter(&key, filter, settings); } + cleanup: hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); } else { for (i = 0; i < diff_queued_diff.nr; i++) diff --git a/diff.h b/diff.h index e0c0af6286..1d32b71885 100644 --- a/diff.h +++ b/diff.h @@ -287,8 +287,6 @@ struct diff_options { /* If non-zero, then stop computing after this many changes. */ int max_changes; - /* For internal use only. */ - int num_changes; int ita_invisible_in_index; /* white-space error highlighting */ diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 47ddf2641f..f375e752cd 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -182,20 +182,90 @@ test_expect_success 'persist filter settings' ' ' test_expect_success 'correctly report changes over limit' ' - git init 513changes && + git init limits && ( - cd 513changes && - for i in $(test_seq 1 513) + cd limits && + mkdir d && + mkdir d/e && + + for i in $(test_seq 1 2) do - echo $i >file$i.txt || return 1 + printf $i >d/file$i.txt && + printf $i >d/e/file$i.txt || return 1 done && - git add . && + + mkdir mode && + printf bash >mode/script.sh && + + mkdir foo && + touch foo/bar && + touch foo.txt && + + git add d foo foo.txt mode && git commit -m "files" && - git commit-graph write --reachable --changed-paths && - for i in $(test_seq 1 513) + + # Commit has 7 file and 4 directory adds + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace && + grep "\"filter_computed\":1" trace && + grep "\"filter_trunc_large\":1" trace && + + for path in $(git ls-tree -r --name-only HEAD) do - git -c core.commitGraph=false log -- file$i.txt >expect && - git log -- file$i.txt >actual && + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # Make a variety of path changes + printf new1 >d/e/file1.txt && + printf new2 >d/file2.txt && + rm d/e/file2.txt && + rm -r foo && + printf text >foo && + mkdir f && + printf new1 >f/file1.txt && + + # including a mode-only change (counts as modified) + git update-index --chmod=+x mode/script.sh && + + git add foo d f && + git commit -m "complicated" && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace-edit" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace-edit && + grep "\"filter_computed\":2" trace-edit && + grep "\"filter_trunc_large\":2" trace-edit && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=11 \ + GIT_TRACE2_EVENT="$(pwd)/trace-update" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":11" trace-update && + grep "\"filter_computed\":2" trace-update && + grep "\"filter_trunc_large\":0" trace-update && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && test_cmp expect actual || return 1 done ) diff --git a/tree-diff.c b/tree-diff.c index 6ebad1a46f..7cebbb327e 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -434,7 +434,7 @@ static struct combine_diff_path *ll_diff_tree_paths( if (diff_can_quit_early(opt)) break; - if (opt->max_changes && opt->num_changes > opt->max_changes) + if (opt->max_changes && diff_queued_diff.nr > opt->max_changes) break; if (opt->pathspec.nr) { @@ -521,7 +521,6 @@ static struct combine_diff_path *ll_diff_tree_paths( /* t↓ */ update_tree_entry(&t); - opt->num_changes++; } /* t > p[imin] */ @@ -539,7 +538,6 @@ static struct combine_diff_path *ll_diff_tree_paths( skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ update_tp_entries(tp, nparent); - opt->num_changes++; } } @@ -557,7 +555,6 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { - opt->num_changes = 0; p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); /* From patchwork Wed Sep 9 15:23:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765937 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A32BC618 for ; Wed, 9 Sep 2020 17:19:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B104221E8 for ; Wed, 9 Sep 2020 17:19:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="JbNZmBRq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731077AbgIIRTI (ORCPT ); Wed, 9 Sep 2020 13:19:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730140AbgIIP17 (ORCPT ); Wed, 9 Sep 2020 11:27:59 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 638CFC0619E6 for ; Wed, 9 Sep 2020 08:23:53 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id q5so2804901qkc.2 for ; Wed, 09 Sep 2020 08:23:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=j8mXLV36Vu0cS8mGSAouizOGGHumepgUHUXVe8CyoiY=; b=JbNZmBRq+cR8uBhpmbwsjvmVySicO2ffND9P49w5gt1B93YThQTA/hSEOdGQfXYzkD wcDeJcc1iMEyREZohqLTEOVcs96fnCKIfWMorZ5PDLsKI/9Zs/peP/usKyNfteTnmhjg 5tTFB75O5vjiaJzANIR2z045E1ptrxu6EzR/z4ktWeT0PAfyl0cfHiEFmBbQ0t8kp/cz fwgCTFZw//GpNiVvztKIfggzjO0NLp4WdjyvgTSKmnCKjagW7+N6u+HF4HJKy0IHRdtq imlQWHc4DKj5RW3GOxl6MIoFeFN2xE5p+6A/zjjF8uklGknPEw//le7AmO11PkvCHdjd lFDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=j8mXLV36Vu0cS8mGSAouizOGGHumepgUHUXVe8CyoiY=; b=Cm5n0M68gqg1hpNDKg2XRxllYbYk++M2r749263KOsE0+ui7Tjh3JsjReqO+12CZLK DWbNs6mfU4pv2jyvV36H71Rqrw5yvM/926lR+4zd7jvQy8RPD+tShPXlobhbhg6b7VDm Gv16liOC0vGn+1l+/fvv7o9Euo43vKDwxd3KcdNFDiHVD063W9xx370FHJKjqr1zCgUD CSL6joPGCeywoxmGxauqPsbC/DNkBauVfqmjXyDAUi+4Jt8SzuT9trUreHIZS4qqQZUA SQK8SuHafF1o4fI+Y6Sf9KIy6krU67IXyL1cAp+Ac1rNMbkdRBgFtYKn/K40+GUnfDxS DI1Q== X-Gm-Message-State: AOAM531KzpifXmOgN9e5Nzh4Ra4TlZ+1u1qbyNrWjQJJZgZS8lBw8Kpd 6FqK46oZe6j+KmeM0+kKlpE3TfcFZ8u2XccL X-Google-Smtp-Source: ABdhPJwG0qCVR9FUlUOs/ROt7Qm+X1hZ0xHmdJ43E04s/OMaYWe3Fd+/Mf1iLqbHPhzj8FSqYa59Jg== X-Received: by 2002:a37:4711:: with SMTP id u17mr3552477qka.54.1599665032103; Wed, 09 Sep 2020 08:23:52 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id k52sm3591723qtc.56.2020.09.09.08.23.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:51 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:48 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 10/12] bloom: encode out-of-bounds filters as non-empty Message-ID: <1c993b83980e77594b0de2b1884cbc4cf484f9ab.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a changed-path Bloom filter has either zero, or more than a certain number (commonly 512) of entries, the commit-graph machinery encodes it as "missing". More specifically, it sets the indices adjacent in the BIDX chunk as equal to each other to indicate a "length 0" filter; that is, that the filter occupies zero bytes on disk. This has heretofore been fine, since the commit-graph machinery has no need to care about these filters with too few or too many changed paths. Both cases act like no filter has been generated at all, and so there is no need to store them. In a subsequent commit, however, the commit-graph machinery will learn to only compute Bloom filters for some commits in the current commit-graph layer. This is a change from the current implementation which computes Bloom filters for all commits that are in the layer being written. Critically for this patch, only computing some of the Bloom filters means adding a third state for length 0 Bloom filters: zero entries, too many entries, or "hasn't been computed". It will be important for that future patch to distinguish between "not representable" (i.e., zero or too-many changed paths), and "hasn't been computed". In particular, we don't want to waste time recomputing filters that have already been computed. To that end, change how we store Bloom filters in the "computed but not representable" category: - Bloom filters with no entries are stored as a single byte with all bits low (i.e., all queries to that Bloom filter will return "definitely not") - Bloom filters with too many entries are stored as a single byte with all bits set high (i.e., all queries to that Bloom filter will return "maybe"). These rules are sufficient to not incur a behavior change by changing the on-disk representation of these two classes. Likewise, no specification changes are necessary for the commit-graph format, either: - Filters that were previously empty will be recomputed and stored according to the new rules, and - old clients reading filters generated by new clients will interpret the filters correctly and be none the wiser to how they were generated. Clients will invoke the Bloom machinery in more cases than before, but this can be addressed by returning a NULL filter when all bits are set high. This can be addressed in a future patch. Finally, note that this does increase the size of on-disk commit-graphs, but far less than other proposals. In particular, this is generally more efficient than storing a bitmap for which commits haven't computed their Bloom filters. Storing a bitmap incurs a penalty of one bit per commit, whereas storing explicit filters as above incurs a penalty of one byte per too-large or too-small commit. In practice, these boundary commits likely occupy a small proportion of the overall number of commits, and so the size penalty is likely smaller than storing a bitmap for all commits. A test to exercise filters which contain too many changed path entries will be introduced in a subsequent patch. Suggested-by: SZEDER Gábor Suggested-by: Jakub Narębski Helped-by: Derrick Stolee Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- .../technical/commit-graph-format.txt | 2 +- bloom.c | 10 +++++++-- bloom.h | 1 + commit-graph.c | 5 +++++ t/t0095-bloom.sh | 4 ++-- t/t4216-log-bloom.sh | 21 +++++++++++++++++-- 6 files changed, 36 insertions(+), 7 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 6ddbceba15..6585f1948a 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -125,7 +125,7 @@ CHUNK DATA: * The rest of the chunk is the concatenation of all the computed Bloom filters for the commits in lexicographic order. * Note: Commits with no changes or more than 512 changes have Bloom filters - of length zero. + of length one, with either all bits set to zero or one respectively. * The BDAT chunk is present if and only if BIDX is present. Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] diff --git a/bloom.c b/bloom.c index db9fb82437..194b6ab8ad 100644 --- a/bloom.c +++ b/bloom.c @@ -266,6 +266,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + if (!filter->len) { + if (computed) + *computed |= BLOOM_TRUNC_SMALL; + filter->len = 1; + } filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { @@ -279,8 +284,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } else { for (i = 0; i < diff_queued_diff.nr; i++) diff_free_filepair(diff_queued_diff.queue[i]); - filter->data = NULL; - filter->len = 0; + filter->data = xmalloc(1); + filter->data[0] = 0xFF; + filter->len = 1; if (computed) *computed |= BLOOM_TRUNC_LARGE; diff --git a/bloom.h b/bloom.h index c6d77e8393..70a8840896 100644 --- a/bloom.h +++ b/bloom.h @@ -93,6 +93,7 @@ enum bloom_filter_computed { BLOOM_NOT_COMPUTED = (1 << 0), BLOOM_COMPUTED = (1 << 1), BLOOM_TRUNC_LARGE = (1 << 2), + BLOOM_TRUNC_SMALL = (1 << 3), }; struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff --git a/commit-graph.c b/commit-graph.c index 500f29525a..d402743e61 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -974,6 +974,7 @@ struct write_commit_graph_context { int count_bloom_filter_computed; int count_bloom_filter_not_computed; + int count_bloom_filter_trunc_small; int count_bloom_filter_trunc_large; }; @@ -1405,6 +1406,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte ctx->count_bloom_filter_computed); jw_object_intmax(&jw, "filter_not_computed", ctx->count_bloom_filter_not_computed); + jw_object_intmax(&jw, "filter_trunc_small", + ctx->count_bloom_filter_trunc_small); jw_object_intmax(&jw, "filter_trunc_large", ctx->count_bloom_filter_trunc_large); jw_end(&jw); @@ -1446,6 +1449,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) &computed); if (computed & BLOOM_COMPUTED) { ctx->count_bloom_filter_computed++; + if (computed & BLOOM_TRUNC_SMALL) + ctx->count_bloom_filter_trunc_small++; if (computed & BLOOM_TRUNC_LARGE) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh index 232ba2c485..4d0e512bcb 100755 --- a/t/t0095-bloom.sh +++ b/t/t0095-bloom.sh @@ -71,8 +71,8 @@ test_expect_success 'get bloom filters for commit with no changes' ' git init && git commit --allow-empty -m "c0" && cat >expect <<-\EOF && - Filter_Length:0 - Filter_Data: + Filter_Length:1 + Filter_Data:00| EOF test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual && test_cmp expect actual diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index f375e752cd..a56327ffd4 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -30,6 +30,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' rm file_to_be_deleted && git add . && git commit -m "file removed" && + git commit --allow-empty -m "empty" && git commit-graph write --reachable --changed-paths && test_oid_cache <<-EOF @@ -49,7 +50,7 @@ graph_read_expect () { } test_expect_success 'commit-graph write wrote out the bloom chunks' ' - graph_read_expect 15 + graph_read_expect 16 ' # Turn off any inherited trace2 settings for this test. @@ -156,7 +157,7 @@ test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' test_bloom_filters_used_when_some_filters_are_missing () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":8" + bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":9" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom @@ -271,4 +272,20 @@ test_expect_success 'correctly report changes over limit' ' ) ' +test_expect_success 'correctly report commits with no changed paths' ' + git init small && + test_when_finished "rm -fr small" && + ( + cd small && + + git commit --allow-empty -m "initial commit" && + + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git commit-graph write --reachable --changed-paths && + grep "\"filter_computed\":1" trace && + grep "\"filter_trunc_small\":1" trace && + grep "\"filter_trunc_large\":0" trace + ) +' + test_done From patchwork Wed Sep 9 15:23:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2EC8D14F6 for ; Wed, 9 Sep 2020 15:32:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 125E72226B for ; Wed, 9 Sep 2020 15:32:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="zMiwWxms" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730111AbgIIPba (ORCPT ); Wed, 9 Sep 2020 11:31:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730187AbgIIP2o (ORCPT ); Wed, 9 Sep 2020 11:28:44 -0400 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 021D3C0619E7 for ; Wed, 9 Sep 2020 08:24:00 -0700 (PDT) Received: by mail-qk1-x72a.google.com with SMTP id v123so2756631qkd.9 for ; Wed, 09 Sep 2020 08:23:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=OAbpYrR7JQ7H2Q7ASpqpu6/ti1dOif58BT380vCKMTY=; b=zMiwWxmsKMjDrb7ukIUJsu2JQEvnwtf5E8GgOebaOsOi81BCHsjUIYy16mp/ydyYGx D6UvmogdeGJKpU791W/KnJrYzwunwwd3uMyy2oRTEwXPX5I9i9ABmx/003+BocKOVCnv bwaiYYdV+rtwWvcokXP/cTwjL33S8hFwBiRqBd03PJpgoWGA80wuoRI0wEiKhOqKUp4z InqB/M1l995PN3sRYdQzTusk8RbMBTvcjauO7Cp6GiBZ1sYdQ1bAI1Xr0LAnxUZZPSIH Sbn7P4KUm708BL4RrVUfoAcEw+U0iKRYNxv3d/lZYKKdxQUw2N+GfFxc4dCCvSGFupfN HuPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=OAbpYrR7JQ7H2Q7ASpqpu6/ti1dOif58BT380vCKMTY=; b=Zx/cOnkQkwZ38k4kEmWVKEYDM4SyW5yG/ZiAMtS9gl4x/XEo97kawQdmRw5Qjtw0KL A+09lI6WhiyrMbD1YaZrxKqLzG4o6ALX10QuQ48wGIBSj8ehaQA3P9SlRNfRVbQ6EDU7 iVhla5KBRnwEsBWlPsciFJDrOLqjHgHrssgPTjIIMH0tILnAltM9gWSk7CoSPu4MPQAc Xjai+eCDJwq/y9JbzRNVffGTVNx/FZ2L+87+msWzmShjqmZlPlPEaiS0MUbAWDJDh3e5 bf35dMbkdMHFeURlMbvwB4oKmeS3ZACGk4VdxvXqDIPujhcmFEWIqvku0tqm2EqbYr5P qlWQ== X-Gm-Message-State: AOAM53087ah0o/+7R+sgxoRALiAlK36KcwOHpI2a58nMtuN7hUmbHW4b wnAaSBuCw2GQ7tegOcgSdqHmgJDQ8yefEwq7 X-Google-Smtp-Source: ABdhPJywZPhLVndgtIhiNPbeblVrpxDZRM6ARgnjNUmeca3fXid8yoo3KEwhAmxQVyqxmIb0Vb8OHg== X-Received: by 2002:a37:b307:: with SMTP id c7mr3797257qkf.33.1599665038692; Wed, 09 Sep 2020 08:23:58 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id c13sm3310600qtq.5.2020.09.09.08.23.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:23:57 -0700 (PDT) Date: Wed, 9 Sep 2020 11:23:55 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 11/12] commit-graph: rename 'split_commit_graph_opts' Message-ID: <2816028f44e9d9dde8f360355e9a61b8f2d26dd9.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the subsequent commit, additional options will be added to the commit-graph API which have nothing to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Likewise, rename the 'flags' member to instead be 'split_flags' to clarify that it only has to do with the behavior implied by '--split'. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 20 ++++++++++---------- commit-graph.c | 40 ++++++++++++++++++++-------------------- commit-graph.h | 8 ++++---- 3 files changed, 34 insertions(+), 34 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ba5584463f..f3243bd982 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,7 +119,7 @@ static int graph_verify(int argc, const char **argv) } extern int read_replace_refs; -static struct split_commit_graph_opts split_opts; +static struct commit_graph_opts write_opts; static int write_option_parse_split(const struct option *opt, const char *arg, int unset) @@ -187,24 +187,24 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + OPT_CALLBACK_F(0, "split", &write_opts.split_flags, NULL, N_("allow writing an incremental commit-graph file"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, write_option_parse_split), - OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + OPT_INTEGER(0, "max-commits", &write_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), - OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + OPT_INTEGER(0, "size-multiple", &write_opts.size_multiple, N_("maximum ratio between two levels of a split commit-graph")), - OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), OPT_END(), }; opts.progress = isatty(2); opts.enable_changed_paths = -1; - split_opts.size_multiple = 2; - split_opts.max_commits = 0; - split_opts.expire_time = 0; + write_opts.size_multiple = 2; + write_opts.max_commits = 0; + write_opts.expire_time = 0; trace2_cmd_mode("write"); @@ -232,7 +232,7 @@ static int graph_write(int argc, const char **argv) odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(odb, flags, &split_opts)) + if (write_commit_graph_reachable(odb, flags, &write_opts)) return 1; return 0; } @@ -261,7 +261,7 @@ static int graph_write(int argc, const char **argv) opts.stdin_packs ? &pack_indexes : NULL, opts.stdin_commits ? &commits : NULL, flags, - &split_opts)) + &write_opts)) result = 1; cleanup: diff --git a/commit-graph.c b/commit-graph.c index d402743e61..dcc27b74e3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -968,7 +968,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; - const struct split_commit_graph_opts *split_opts; + const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1292,8 +1292,8 @@ static void close_reachable(struct write_commit_graph_context *ctx) { int i; struct commit *commit; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->split_flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -1490,7 +1490,7 @@ static int add_ref_to_set(const char *refname, int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct oidset commits = OIDSET_INIT; struct refs_cb_data data; @@ -1507,7 +1507,7 @@ int write_commit_graph_reachable(struct object_directory *odb, stop_progress(&data.progress); result = write_commit_graph(odb, NULL, &commits, - flags, split_opts); + flags, opts); oidset_clear(&commits); return result; @@ -1622,8 +1622,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) static void copy_oids_to_commits(struct write_commit_graph_context *ctx) { uint32_t i; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->split_flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; ctx->num_extra_edges = 0; if (ctx->report_progress) @@ -1908,13 +1908,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) int max_commits = 0; int size_mult = 2; - if (ctx->split_opts) { - max_commits = ctx->split_opts->max_commits; + if (ctx->opts) { + max_commits = ctx->opts->max_commits; - if (ctx->split_opts->size_multiple) - size_mult = ctx->split_opts->size_multiple; + if (ctx->opts->size_multiple) + size_mult = ctx->opts->size_multiple; - flags = ctx->split_opts->flags; + flags = ctx->opts->split_flags; } g = ctx->r->objects->commit_graph; @@ -2092,8 +2092,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) size_t dirnamelen; timestamp_t expire_time = time(NULL); - if (ctx->split_opts && ctx->split_opts->expire_time) - expire_time = ctx->split_opts->expire_time; + if (ctx->opts && ctx->opts->expire_time) + expire_time = ctx->opts->expire_time; if (!ctx->split) { char *chain_file_name = get_chain_filename(ctx->odb); unlink(chain_file_name); @@ -2144,7 +2144,7 @@ int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -2161,7 +2161,7 @@ int write_commit_graph(struct object_directory *odb, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->split_opts = split_opts; + ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", @@ -2209,15 +2209,15 @@ int write_commit_graph(struct object_directory *odb, } } - if (ctx->split_opts) - replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx->opts) + replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; } ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) - ctx->oids.alloc = split_opts->max_commits; + if (ctx->split && opts && ctx->oids.alloc > opts->max_commits) + ctx->oids.alloc = opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->odb); diff --git a/commit-graph.h b/commit-graph.h index d9acb22bac..b7914b0a7a 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -105,11 +105,11 @@ enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_REPLACE = 2 }; -struct split_commit_graph_opts { +struct commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; - enum commit_graph_split_flags flags; + enum commit_graph_split_flags split_flags; }; /* @@ -120,12 +120,12 @@ struct split_commit_graph_opts { */ int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); #define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) From patchwork Wed Sep 9 15:24:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11765915 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58F6513B1 for ; Wed, 9 Sep 2020 17:17:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B2CA21D7E for ; Wed, 9 Sep 2020 17:17:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="K/DA++s8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730193AbgIIRQG (ORCPT ); Wed, 9 Sep 2020 13:16:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729779AbgIIP2y (ORCPT ); Wed, 9 Sep 2020 11:28:54 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC32FC061240 for ; Wed, 9 Sep 2020 08:24:05 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id q5so2805634qkc.2 for ; Wed, 09 Sep 2020 08:24:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=9tiopOSX+4C9scUTdRLxaax+ljcX4O9qltrrnS2V54s=; b=K/DA++s80DuHPVnsuCYYmF41hvdOopXgw+ilFV6MThzP33UQr6BhRWHyfQem8G0fbO 7wNF1T57X3LgFIftqP7qcxwEU7JdgZFhQ4x4ihI1+MeGN/Nr1NDyo7fH6zBrmXls9xj+ fDdIG5ZMNe4gZnXR4dxdxXY71Jm/PnxtV2ZI9gtp5/69M3N5yitrkwdJ4TlXGasKIuQj HFujkuBcmMILz67rqTTBIpnA7m8z2Balf0Ol1ckbWHFQ2cLT8ESEpsgLuqwWs2DsMQrq 2a1f0a2nnFyANITradyMAw9fODy+mfU1yUfn5+WGhQsOlDdoNGcBc+d8JH4o4qj8StjF yXYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=9tiopOSX+4C9scUTdRLxaax+ljcX4O9qltrrnS2V54s=; b=gwEJWt8f3QN+SgOiz23LND2pTc6xoqrxUcXJUcDofLBGxD192h5dpucq/SY+yFwIs1 PhPNy6CUiCnLhvNsvaD8gCFa9vYHJUGsmBUEa2ySHn2T98bCMkfLDhOtmQRruXOcIR84 VgnQHPCj/KPHdk4pz3Xa6QS5U2a26jjiZlCAEBioppbTpiBHcQTr5Ziz5wlQ8E/WUkBS 8mDcB9jgSK/SG20Gnmiy25VLcI8WmEBFWSkBxk4kB7QQw1zetlztLAiX/cZDQF1RjwTT EFFIbIlLNd78hdpsFRQ6zQJLV8Zn1iBvKF9zMGeyPryWG6KzTWDK4nhPWK4y8jQU9Yet Au5A== X-Gm-Message-State: AOAM532AfEb8tC/3dFby8IY787E76usiLdCKa33RF/Dlort1GN21sXqp ezXnrV+g7yRwCIKTnmqI4qU3R79SEnWlxREo X-Google-Smtp-Source: ABdhPJx+qt0UuKm6F69JC0nJ9E9v6o0CySrRXnR0V5RJ9z3gyR6/WYCGT3E33chqfWUCMRW5i7DoqA== X-Received: by 2002:a05:620a:148:: with SMTP id e8mr3846301qkn.186.1599665043985; Wed, 09 Sep 2020 08:24:03 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:10e2:cf5e:922:2af0]) by smtp.gmail.com with ESMTPSA id k20sm3339959qtb.34.2020.09.09.08.24.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 08:24:03 -0700 (PDT) Date: Wed, 9 Sep 2020 11:24:00 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH 12/12] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: <4ff11cec37d17d788a3ee076b7c3de1c873a5fbd.1599664389.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag and configuration variable to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=' to allow computing at most '' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 6 ++++ bloom.c | 7 ++-- builtin/commit-graph.c | 39 ++++++++++++++++++-- commit-graph.c | 9 +++-- commit-graph.h | 1 + t/t4216-log-bloom.sh | 53 ++++++++++++++++++++++++++++ 7 files changed, 110 insertions(+), 9 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..81a2e65903 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,12 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Commits whose filters are not calculated are stored as a +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. +Overrides the `commitGraph.maxNewFilters` configuration. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index 194b6ab8ad..022dd6e0f9 100644 --- a/bloom.c +++ b/bloom.c @@ -197,12 +197,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) - return filter; + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); } - if (filter->data) + if (filter->data && filter->len) return filter; if (!compute_if_not_present) return NULL; diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f3243bd982..e7a1539b08 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index dcc27b74e3..1d9f8cc7e9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1422,6 +1422,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; struct commit **sorted_commits; + int max_new_filters; init_bloom_filters(); @@ -1438,13 +1439,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) else QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + max_new_filters = ctx->opts && ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { enum bloom_filter_computed computed = 0; struct commit *c = sorted_commits[i]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed & BLOOM_COMPUTED) { @@ -1455,7 +1459,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + ctx->total_bloom_filter_data_size += filter + ? sizeof(unsigned char) * filter->len : 0; display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index b7914b0a7a..a22bd86701 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -110,6 +110,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags split_flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index a56327ffd4..24deb8104a 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -287,5 +287,58 @@ test_expect_success 'correctly report commits with no changed paths' ' grep "\"filter_trunc_large\":0" trace ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write \ + $commit_graph_args && + grep "\"filter_not_computed\":$2" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_trunc_large\":$3" "$TRASH_DIRECTORY/trace.event" && + grep "\"filter_computed\":$4" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 3 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 4 0 1 + ) +' + +test_expect_success 'Bloom generation backfills empty commits' ' + git init empty && + test_when_finished "rm -fr empty" && + ( + cd empty && + for i in $(test_seq 1 6) + do + git commit --allow-empty -m "$i" + done && + + # Generate Bloom filters for empty commits 1-6, two at a time. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 0 2 && + + # Finally, make sure that once all commits have filters, that + # none are subsequently recomputed. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 6 0 0 + ) +' test_done