From patchwork Tue Aug 11 20:51:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D010313B1 for ; Tue, 11 Aug 2020 20:51:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B057420774 for ; Tue, 11 Aug 2020 20:51:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="TRPtvSaI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726255AbgHKUvX (ORCPT ); Tue, 11 Aug 2020 16:51:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUvX (ORCPT ); Tue, 11 Aug 2020 16:51:23 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB27EC06174A for ; Tue, 11 Aug 2020 13:51:22 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id s15so35251qvv.7 for ; Tue, 11 Aug 2020 13:51:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FcwVKi0u+RGOZEkffFbj0JHqYJPk8trEu7Vdc1VPFmQ=; b=TRPtvSaITU7ScuJQDDdhW+c0hvDfmjynx6WzEXsScs0asQ75VMcP+AUBw5NvH3Ys3K G/3zAPIo8cRvQ4fYYMZOQe7PkXINEf+Y6JE9sownASp3gu4TqMxisC5AIT8ASESFanul Lkj7TUv8E3ZvdlYg9DuPJzASW+hHDOYv3Iiu0THBz4PY99GqGwlmOGzptB2sLy80W+JQ HlUrs0p8u1OQD6jFipfOoC2fFOOSObO8uEcJacubjxvDvVyiq6DVhgUOebI140rd8BQv 1vGuk7YhftFjgXJ9nuV1U1CZt/4Ojpt8Zf+ypvUXargDwL5uCPnpqdGnWNSUCzrusZKV YLJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FcwVKi0u+RGOZEkffFbj0JHqYJPk8trEu7Vdc1VPFmQ=; b=gGZj2uT50ouidaR3uR6sSVPniQ6fItpIQSmJga+Ab/bUSo7UYHfzBKphaPW34Mc4ax B3cwgqixNLdCNgGkO3Aeq00Sj9k5V29nanSlRy2pwv/ks0APulzOHkz9vVGT+AaTQVbZ wQ2823dB4jXu9qvLxhQZVepyHSatA4l42zDORwevoM+1AqLkgFuA/K79OcU4gcyGMeCE PnjX0CFtU51AIYRXjuc7+8f8eU3H2ULtAfAO5/r8V7PppsSBLuRM+819zmUyykI217CY Lbvy33OgpdnreYaFgy0sAhhZY0g1kgLRu5ZFAZKi1+LCIycHD+Fogjcm6jWe/0WsBpLk Mwqw== X-Gm-Message-State: AOAM533s1S341ZmNiW0MlYN9nFW2boClldNm7nzmDONdnQdtImInZWyF ghzlDnofTWPh+FUcB/B84qz6f6mMypvpRys+ X-Google-Smtp-Source: ABdhPJw+O/IcWTJ2+LSuzqqW8H17CK00sdmKyPb5ouyTTg+4wHrC57rT/3uslvzWNoEPvTQBRU8U4A== X-Received: by 2002:a0c:8b51:: with SMTP id d17mr3323335qvc.107.1597179080617; Tue, 11 Aug 2020 13:51:20 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id a6sm28588qka.5.2020.08.11.13.51.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:19 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:19 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 01/14] commit-graph: introduce 'get_bloom_filter_settings()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings *' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter *'. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau --- blame.c | 6 ++++-- bloom.c | 6 +++--- commit-graph.c | 11 +++++++++++ commit-graph.h | 2 ++ revision.c | 5 +---- t/t4216-log-bloom.sh | 9 ++++++--- t/t5324-split-commit-graph.sh | 13 +++++++++++++ 7 files changed, 40 insertions(+), 12 deletions(-) diff --git a/blame.c b/blame.c index 82fa16d658..3e5f8787bc 100644 --- a/blame.c +++ b/blame.c @@ -2891,16 +2891,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb, const char *path) { struct blame_bloom_data *bd; + struct bloom_filter_settings *bs; if (!sb->repo->objects->commit_graph) return; - if (!sb->repo->objects->commit_graph->bloom_filter_settings) + bs = get_bloom_filter_settings(sb->repo); + if (!bs) return; bd = xmalloc(sizeof(struct blame_bloom_data)); - bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings; + bd->settings = bs; bd->alloc = 4; bd->nr = 0; diff --git a/bloom.c b/bloom.c index 1a573226e7..cd9380ac62 100644 --- a/bloom.c +++ b/bloom.c @@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, while (graph_pos < g->num_commits_in_base) g = g->base_graph; - /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ + /* The commit graph commit 'c' lives in doesn't carry Bloom filters. */ if (!g->chunk_bloom_indexes) return 0; @@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - r->objects->commit_graph->chunk_bloom_indexes) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) + return filter; } if (filter->data) diff --git a/commit-graph.c b/commit-graph.c index e51c91dd5b..d4b06811be 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -660,6 +660,17 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) +{ + struct commit_graph *g = r->objects->commit_graph; + while (g) { + if (g->bloom_filter_settings) + return g->bloom_filter_settings; + g = g->base_graph; + } + return NULL; +} + static void close_commit_graph_one(struct commit_graph *g) { if (!g) diff --git a/commit-graph.h b/commit-graph.h index 09a97030dc..0677dd1031 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); */ int generation_numbers_enabled(struct repository *r); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), diff --git a/revision.c b/revision.c index 3dcf689341..be600186ee 100644 --- a/revision.c +++ b/revision.c @@ -680,10 +680,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) repo_parse_commit(revs->repo, revs->commits->item); - if (!revs->repo->objects->commit_graph) - return; - - revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings; + revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo); if (!revs->bloom_filter_settings) return; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c21cc160f3..c9f9bdf1ba 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -60,7 +60,7 @@ setup () { test_bloom_filters_used () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\"" + bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\"" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom && @@ -134,8 +134,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' -test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' ' - test_bloom_filters_not_used "-- A/B" +test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' ' + # Ensure that the number of empty filters is equal to the number of + # filters in the latest graph layer to prove that they are loaded (and + # ignored). + test_bloom_filters_used "-- A/B" 3 ' test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 9b850ea907..5bdfd53ef9 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -425,4 +425,17 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success '--split=replace with partial Bloom data' ' + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/3 && + git rev-list -1 HEAD~2 >a && + git rev-list -1 HEAD~1 >b && + git commit-graph write --split=no-merge --stdin-commits --changed-paths graph-files && + test_line_count = 1 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Tue Aug 11 20:51:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B005E109B for ; Tue, 11 Aug 2020 20:51:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94B6A2076C for ; Tue, 11 Aug 2020 20:51:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="mp76nl5e" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726274AbgHKUv1 (ORCPT ); Tue, 11 Aug 2020 16:51:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUv1 (ORCPT ); Tue, 11 Aug 2020 16:51:27 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC802C06174A for ; Tue, 11 Aug 2020 13:51:26 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id d14so121435qke.13 for ; Tue, 11 Aug 2020 13:51:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LMuh9652CGyiO0bmz3tWXIVImi3zNm3ghnf+7Y714js=; b=mp76nl5e8c5ubUiP1rUOgmPTvo06ceSaGO2UPkEypTbN7G9xGl876w/feYXXHZwkOw vMj2WizX6QXG/SfxqaKaEhfo3syLIQChLo4sJRd3JM1Vn1wNX1+gVc4rm+XBoAHUR+7c QiVbu6z+mpBIwG8hqyXX5wjnJ6DnGqriZM6YurYai2gJtJMHp7lfyEXesZcjpuyoOPDT zQ52ByLuiQWVmqS4H0oqx09gKb7QtT+uDzpAF4q+Vzvo+gfTFx3l3DTRDxFS/HD+mZqp y2VhUh1PZmfi2k/RjC/UdmdntKmLzBK+666BSghIp06E6bB97ifygRE1nDT9GoJCwWVk TNUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LMuh9652CGyiO0bmz3tWXIVImi3zNm3ghnf+7Y714js=; b=XuKGIz8JkjRtNpdbPRUcbXAD6gcdQsfJLXkxBSdtkLPIZfzXxq7qwRpAvL9ZfQLd/U XnLY2F6wcgecmJva4EQc6WG6E3aymXnt/4toUGvcK8V+Q0VlxXp1y3rJf8AoaP8CfKM8 ItLPT5NciPKpdp9ZjoAvv3IaJW4RVjiw7EB+kNYSNbDMeJQ/WJbHZmdJbKfDt+OBlWdl JXljiKLS0DXTMy/mb+HzsDmvqu8xQ/+SfyKC0fQ0BgvjEM3DTAXHI8WeOElXoe0w8wSn SmltUi/5budKdMoGm2b7ZhWtCmaid1BCbF+xovEhiMjgFiVyptPJp2sQgAprjSBpAMPC xi8Q== X-Gm-Message-State: AOAM533D3JYJhLRo+A0vnb0DpRSA1c/boS+/UA9lq1nltUC1EXFYRVuJ B54A0qlhbq6LSszZ0HU+4to76GkH7nnB9F0C X-Google-Smtp-Source: ABdhPJwGYvYfGNh8H3wfXhh25QSh5Ua+wEUriuS006xPontzpp9LmrB3CU6Ii5CGMmh+ZihCarMMsQ== X-Received: by 2002:a37:9b95:: with SMTP id d143mr2819789qke.272.1597179085706; Tue, 11 Aug 2020 13:51:25 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id i68sm8618qkb.58.2020.08.11.13.51.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:25 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:23 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 02/14] t4216: use an '&&'-chain Message-ID: <9fc8b17d6fe9e1f7b177627e67420443e7a2e98c.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine Helped-by: Jeff King Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c9f9bdf1ba..fe19f6a60c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } From patchwork Tue Aug 11 20:51:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 031A913B1 for ; Tue, 11 Aug 2020 20:51:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA57A20781 for ; Tue, 11 Aug 2020 20:51:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="To3+7wEe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726366AbgHKUvb (ORCPT ); Tue, 11 Aug 2020 16:51:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUva (ORCPT ); Tue, 11 Aug 2020 16:51:30 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0B25C06174A for ; Tue, 11 Aug 2020 13:51:30 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id c12so10538252qtn.9 for ; Tue, 11 Aug 2020 13:51:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=V9PKuOid9ZDCDhUVkEnzx8062h6610ZwJ92cJXhzY/I=; b=To3+7wEeQ+ZoAiecrBCN4Jy3SeuaKDEYoEUxcmwk9DhzAgSaBIB4fVq6j897412vLc BUYk9kRNBGkAPo6Yhn7mKsFmDLWBIVZmj9G86AagrMAxHXlH5VC+6q0mF1W/MmweWhDm xdOfVkmdBkgD0/3Nf+90yTqUowGKH6MYrS9NVWD5Ekrcz/17PUTYmYXBnYvli9QlV9MM 6ojiS+vNFg3RDytnwhpb2wDg5tBOKP3OT3UsF6kLqM2eHv8ODHnbvaVrTf6wnFCoHVqW XeO8OCHP1TllmBoBRLWk9SWKh6doSwvDNJJwqvep+Wll9bNvBnYFQM4vMuQlG0hUiHRb vDhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=V9PKuOid9ZDCDhUVkEnzx8062h6610ZwJ92cJXhzY/I=; b=cCTzv1sgZd7x5KDTmLSjhMSSSidfqghYjiql2H1G6ktycUZw7pJF6ALre6tkzMy+wr frFGQoqWN1DvHBKcvNLoOOGpvq55ENNT4FOI2UjqsoyUhJ7tUZQNbWMaMszXl6XiScu3 aX4IF7LfxZioUf1BmB+YGJXvA8Fl2a6vBm9f7/oxmSHVCE8XzLgAwasXqRb2ZighovHP vBd9mZJ0mZQZoxHH4sWYEUQrOhGFwRasP6Lc3ZnotV27mgO86rrNeiO8OZrfEcjT0KY1 SpNv9mgc93gNpx/uceJU1U+uwod6VtDSB/cPcE8bH1+31OzoB9dKWCVlzcostON49ya/ 5EJA== X-Gm-Message-State: AOAM531inmCPN9/um8UbMoUg1Xn7XOD5Kc57XWebG++y8Ghur27sFRg0 WxCeA+n+qxHZZnmLmQxkQNlWKiAq/DNYTgIk X-Google-Smtp-Source: ABdhPJwl6CHmkbyVB63HkF54pgRfyyB7HFfHgAzG0J9VreCWvbsN7g4HTEKB7f7f/Gu2T522ZewaBQ== X-Received: by 2002:ac8:4511:: with SMTP id q17mr3023471qtn.117.1597179089546; Tue, 11 Aug 2020 13:51:29 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id d8sm20917170qtr.12.2020.08.11.13.51.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:28 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:27 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 03/14] commit-graph: pass a 'struct repository *' in more places Message-ID: <8dbe4838b725e21d7bfe16eb2c4e15e68f620dd0.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 2 +- commit-graph.c | 17 ++++++++++------- commit-graph.h | 6 ++++-- fuzz-commit-graph.c | 5 +++-- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 523501f217..ba5584463f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index d4b06811be..0c1030641c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -224,7 +224,8 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) return 1; } -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb) { void *graph_map; @@ -240,7 +241,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -280,7 +281,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -445,7 +447,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) return NULL; } -static struct commit_graph *load_commit_graph_one(const char *graph_file, +static struct commit_graph *load_commit_graph_one(struct repository *r, + const char *graph_file, struct object_directory *odb) { @@ -457,7 +460,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(r, fd, &st, odb); if (g) g->filename = xstrdup(graph_file); @@ -469,7 +472,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); return g; @@ -550,7 +553,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 0677dd1031..d9acb22bac 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,11 +75,13 @@ struct commit_graph { struct bloom_filter_settings *bloom_filter_settings; }; -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); From patchwork Tue Aug 11 20:51:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 594EC109B for ; Tue, 11 Aug 2020 20:51:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F92020781 for ; Tue, 11 Aug 2020 20:51:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="kucQ2WL3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726426AbgHKUvf (ORCPT ); Tue, 11 Aug 2020 16:51:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUvf (ORCPT ); Tue, 11 Aug 2020 16:51:35 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07620C06174A for ; Tue, 11 Aug 2020 13:51:35 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id cs12so52594qvb.2 for ; Tue, 11 Aug 2020 13:51:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=AtctLu256av06ik5IHDWiQKbEbZ8S9fTFgXOJvN+Mvg=; b=kucQ2WL3arY0ZauFZIeUpdFyYSylVBWGGuqogIyGbKoxj/D91sd5qHVwHXFiTMLduO UcwCU1pjMZJTGc5jXZqrDYVvMpI9DIFgs4ztonEJLmkSiI/LC3HJAK7MWYLGIDmaG+Hr wKFqV0TlzzXnHFxa5iw5nr7SYLgMmgZa1d9BALNjfisD398xjXT1+kyCjB864KTp6HBV uz8IBgu60hxvHKcXYQF35hKGUwZBTOik90qHmfq6Hf3FgKALhHxCMHo7YN+V+VmP+Klb nJO4pcLCy9PY8i7A27vwG4nc0R6fELxH59oxIBZoKML+IKMENETdpDLk46YlJ/GvR661 zLkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=AtctLu256av06ik5IHDWiQKbEbZ8S9fTFgXOJvN+Mvg=; b=Iu/t5/qF8hq8ybWfPIh0g2GvdYtSS8fCxkjKfC/f94TqndCNlRS7uaaALjlgjTMLlU /+hfW3z32JYsI+x/Wp4OERk5EJ9XcAWoHxiZIehAO1i1ACSe0MOyt0VZ5v3c5aauEXzq 0iT7u/YJMKbiNEqj11fsFwpE16RNlsSCqZFgiKk9U1BZNP3cYvxvNTLoKyxFABd8C9sV 2fnbcdZNr4n+2S8hZCcTUEL2QbzujDxaBC2MJjcXzHTZFcSg4IccCQ4XUNDiGc7onIgY 4coqxg/iqtPmaT9MwmP3o+uiO8vBQXeLns+euw7/HUcxtgZeqNCXs6PKkV9m/pr80DXW /Meg== X-Gm-Message-State: AOAM531rFgS8Ce7/d3Lihtd89o/Q/ttC1IuU+66SLjb5sm40pwqPLsBv pUCrc7j+PUxC6FfmMaw/MEvGrr0zsYWHlQsG X-Google-Smtp-Source: ABdhPJzaSVVUWcrP4nClizXKtY/E4HPdvVVvgEjDpyf8hbKRjK/nrczZWXg2BW7iVEBW/MMncyfxOw== X-Received: by 2002:a0c:b791:: with SMTP id l17mr3426646qve.44.1597179093960; Tue, 11 Aug 2020 13:51:33 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id x137sm14652qkb.47.2020.08.11.13.51.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:33 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 04/14] t/helper/test-read-graph.c: prepare repo settings Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau --- t/helper/test-read-graph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), From patchwork Tue Aug 11 20:51:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 56285109B for ; Tue, 11 Aug 2020 20:51:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B29C2076C for ; Tue, 11 Aug 2020 20:51:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="0if0YExF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726454AbgHKUvk (ORCPT ); Tue, 11 Aug 2020 16:51:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUvj (ORCPT ); Tue, 11 Aug 2020 16:51:39 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD02EC06174A for ; Tue, 11 Aug 2020 13:51:39 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id e5so10538878qth.5 for ; Tue, 11 Aug 2020 13:51:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=OaOZ9gT79tjrUv9cgKcQTqD6M81VIYGGJlB88j6wVR4=; b=0if0YExFt9OX2XIo+Qn3/3V+sTKmcUFjsU8De5sPTYRNmv/NZiJ6CeQ33xUWGe0HCB x8CNaavL7l4Yviac5IldRYHPoJF19fBnUlVc3/oBRBp6pR2csxHEaW0/KBt2TJ+aKMgs Qxj+VtiG4CkQ8UVAMFUX7O1GcmuXyoLi+i7gm2bjLAH2Mv7X0ZnYZmcdBkRsunhPiSOI dkGKzY8J0F08A0acQ14xN+3I9Mwsr27ZwFScLeXBQ9/KDM8D73Ia0ayM8cWu/CB2cv9i OrgDlGq73xA9EjFBosJd6JHRepMr6tk1g7I40mlrPlkL+ybHp48RRFoibuxb6m9BHbhQ WuUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=OaOZ9gT79tjrUv9cgKcQTqD6M81VIYGGJlB88j6wVR4=; b=qPO8O84r1uJOiYNa2J8lIQCCQBlxucs5RZnheaiTx6oDasXNKQUCm2zASylYsHEXht u+ePCXZBoqs0lS0bhh8tc9INFQquvnDsXuptXWZuqD3J9X0QZiduM++JjDpC7qJzxNbM H1Qqt38yZGkdXMsdoOsbORHYZeffetLaJx6cbCiah/kBjnZukDNYfLHaSfoskcHxjeir bir8pdFSiB72NcQg83mYhDp12dhBlmtvE2Ka6y/9Oay0PrwJ1E2zEG2hYSBAyjUiGLEU jYMgh4zV0EPgRgoMweMtO+3DogyYUScPwlERaSIsdiZFfgro86lggROfJFTLup2uJt/l dbNQ== X-Gm-Message-State: AOAM531ZOKKI7k0s3jbj/ifOS+YvqdKavgNdQfRD/QGxGR4NgFVvAu7g lpArudRa/NyJTHZZi3wS2gBxJaXNmEIPT7os X-Google-Smtp-Source: ABdhPJyRCpst7GGRjQeoswrfH+OFVAeSya5izfbH4AhiASjOBckHeJjHN9KBqxFHZRhKBMR8K/xm2w== X-Received: by 2002:aed:3203:: with SMTP id y3mr2980512qtd.265.1597179098654; Tue, 11 Aug 2020 13:51:38 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id y3sm21923210qtj.55.2020.08.11.13.51.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:38 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:36 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 05/14] commit-graph: respect 'commitGraph.readChangedPaths' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 ++ Documentation/config/commitgraph.txt | 4 ++++ commit-graph.c | 6 ++++-- repo-settings.c | 3 +++ repository.h | 1 + t/t4216-log-bloom.sh | 4 +++- 6 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 Documentation/config/commitgraph.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index ef0768b91a..78883c6e63 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -340,6 +340,8 @@ include::config/column.txt[] include::config/commit.txt[] +include::config/commitgraph.txt[] + include::config/credential.txt[] include::config/completion.txt[] diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt new file mode 100644 index 0000000000..cff0797b54 --- /dev/null +++ b/Documentation/config/commitgraph.txt @@ -0,0 +1,4 @@ +commitGraph.readChangedPaths:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 0c1030641c..a516e93d71 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -320,6 +320,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } + prepare_repo_settings(r); + graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -396,14 +398,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.commit_graph_read_changed_paths) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.commit_graph_read_changed_paths) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index 0918408b34..9e551bc03d 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "commitgraph.readchangedpaths", &value)) + r->settings.commit_graph_read_changed_paths = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.commit_graph_read_changed_paths, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 3c1f7d54bd..81759b7d27 100644 --- a/repository.h +++ b/repository.h @@ -29,6 +29,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fe19f6a60c..b3d1f596f8 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -90,7 +90,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config commitgraph.readChangedPaths false && + test_bloom_filters_not_used "$option -- $path" ' done done From patchwork Tue Aug 11 20:51:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAA33109B for ; Tue, 11 Aug 2020 20:51:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C06DB20774 for ; Tue, 11 Aug 2020 20:51:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="eQY4gKRE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726473AbgHKUvp (ORCPT ); Tue, 11 Aug 2020 16:51:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUvo (ORCPT ); Tue, 11 Aug 2020 16:51:44 -0400 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DBCDC06174A for ; Tue, 11 Aug 2020 13:51:44 -0700 (PDT) Received: by mail-qv1-xf41.google.com with SMTP id w2so20005qvh.12 for ; Tue, 11 Aug 2020 13:51:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=6laf6kKuVSjfW1rY/Bcz72cLLx2hYo5Wjv1pbceIgxc=; b=eQY4gKREGS8uFZhv6cDRcqKkj2CtgZ9CrfLFepzjr6MA+O/ct7yNfs1S1AHDh9Iskp eh74eRqGCFSza4ysvW359XqP9MMlVoHtGIpBPYL79sqUsRqxMg+Y65Z3L4Br+z8ziUG8 pnS2PBjJq72QPuqAYWHB3D5ZLFQX18AQ+yPljYnCHjfmCQDO9ab8n6I6vXYMpUe70VOP 5CWygIO/plTUGM5t1FhFusDmQ91D+/K2kprDaziBzfvnU8/AH6h27Pa8wKPF9dMwKsV5 6ghtRmeNLYv8UAfZYBhwgr2vE1c0kUgTHkyva43Ju0VbJ/yZrShAH2h69613XSiRR6pD Gk8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=6laf6kKuVSjfW1rY/Bcz72cLLx2hYo5Wjv1pbceIgxc=; b=EOhPVjbC7zu3Dqz2OfZP5fLzW/RUVBtzKRlZ2iNMZbbwhPjvG8xCygem4pjNZgm2fl wF8uvXS/wlObiQwKEUOHLONNo99EOrjYXi7h9rCPwhSn1hrgm4RnlRhm+103RADuER+Z jm3FpWCnH9zlOblljhi06rHCmtXMMeFAR6fdlnu/qI9DebfQcjBiUBrE6caxsK2aY4mv n1tzCvd2gbSNvE/gMJafRZQgGKii3N1qIM8n09egqC16uwfqaG6k3tjNmIW63pldPGgz UHNJBVuTBPEgmgx7JZoj0UzXKxW/U6WnIgFYEUCXl2e/enrkWCFI7Vk1ru3cljVWFrCL hf2A== X-Gm-Message-State: AOAM533XRj4Tzf5gbUOsmZkk/3Euwbh8dQZBnWdVjnpVFXxS2XGQ86r8 5aIUYBDTegif+6KKvlJJjQ+OEzBqc6zTtOYn X-Google-Smtp-Source: ABdhPJzpG2dFkeyB/HNw/OJoJ6lBJ1yrQUkcl+/iIxV3PJiCGPcQtR4k3XN7z92UawuXe5lrsOvUsA== X-Received: by 2002:ad4:438f:: with SMTP id s15mr3432989qvr.164.1597179103505; Tue, 11 Aug 2020 13:51:43 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id b23sm20702087qtp.41.2020.08.11.13.51.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:42 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:41 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 06/14] commit-graph.c: store maximum changed paths Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau --- bloom.h | 11 ++++++++++- commit-graph.c | 3 +++ t/t4216-log-bloom.sh | 4 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/bloom.h b/bloom.h index d8fbb0fbf1..0b9b59a6fe 100644 --- a/bloom.h +++ b/bloom.h @@ -28,9 +28,18 @@ struct bloom_filter_settings { * that contain n*b bits. */ uint32_t bits_per_entry; + + /* + * The maximum number of changed paths per commit + * before declaring a Bloom filter to be too-large. + * + * Not written to the commit-graph file. + */ + uint32_t max_changed_paths; }; -#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 } +#define DEFAULT_BLOOM_MAX_CHANGES 512 +#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10, DEFAULT_BLOOM_MAX_CHANGES } #define BITS_PER_WORD 8 #define BLOOMDATA_CHUNK_HEADER_SIZE 3 * sizeof(uint32_t) diff --git a/commit-graph.c b/commit-graph.c index a516e93d71..86dd4b979e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1194,6 +1194,7 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version); jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes); jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry); + jw_object_intmax(&jw, "max_changed_paths", ctx->bloom_settings->max_changed_paths); jw_end(&jw); trace2_data_json("bloom", ctx->r, "settings", &jw); @@ -1662,6 +1663,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; } diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index b3d1f596f8..eb2bcc51f0 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -169,11 +169,11 @@ test_expect_success 'persist filter settings' ' GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \ GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2.txt && GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \ GIT_TRACE2_EVENT_NESTING=5 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt ' test_expect_success 'correctly report changes over limit' ' From patchwork Tue Aug 11 20:51:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4CAE109B for ; Tue, 11 Aug 2020 20:51:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B678C20781 for ; Tue, 11 Aug 2020 20:51:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Vj/MC60O" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726479AbgHKUvs (ORCPT ); Tue, 11 Aug 2020 16:51:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUvs (ORCPT ); Tue, 11 Aug 2020 16:51:48 -0400 Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6809FC06174A for ; Tue, 11 Aug 2020 13:51:48 -0700 (PDT) Received: by mail-qk1-x72b.google.com with SMTP id n129so164894qkd.6 for ; Tue, 11 Aug 2020 13:51:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=YDi48ppg/58iZSGrMm41yujADK+orXTZvhekbsxyFW0=; b=Vj/MC60Osp62Jl9pG4V6Vui20qahtF8dBcR6wd7VYb2EYXq5GPpiYJ4ibHB/r39iG4 RRZGTK+5pFKmSl92JXE/lZpYNCs1MGt6rNEXgP3Dy6Kmbh4bAbFQj8S0QbsGiOnc3AIE 2L+RVnhOyXGhrah2TqlLRJbn1IM2y2zs7Y+SzH/oTBw5Ft9mJf6ubnPJguAE0mq5Hf2d p14HCeSY4jvrLDJI4bCzUHqosDhWuNHB2cH0SdLDRztzcxhLYJclAp4drbY44t1tVAjp 6W1FFCu0TgKiMp7jIxGRM5M+izN2pqvpY1mPQ4+sUZL5phQPXgbWofYEZjRfts+EMj0m Z6aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YDi48ppg/58iZSGrMm41yujADK+orXTZvhekbsxyFW0=; b=WGGI9zH4qI38H3gYXSLj945Iq4rHaPRorH8nLmecTTkWoEpqPIw5ZaeU34alWX/waG 6ikk5czBpmtwrT+SpTAwk0R2k1fIliWq/DFiO+uwJ10gEvblm5LH5yJo+0qeKR35pBzN 7twmdpU3XLpcxzXIYfMnrhZaBSX1jXN+pOSIeYv+EVLPLg+39rWbrUJFLNIWQVpT+ZqM cwLgy/9dRWMvJxMHx3wERhbdunC7MWC3eipbwvydkgdRdmKC3S2mn1OJ4WPkQmMFyBsY zXvRPXELWL8MDcQ+/fJ1sPDnW+SR2JqyH72LYhFCVaVCfmGDCPG2D7NDjBv1XJmkX/A5 63KQ== X-Gm-Message-State: AOAM533oIBGW8gI1pHXpQmor6t3vbydmRfN4RSPSmwbQCpk1kh1ttxTE 7MXAn7/9Sk9q4vf0t6VfNp9+F5V9MSX7l3Ld X-Google-Smtp-Source: ABdhPJxjMW/oFGSWH/Qt+amahKH2Rp4GrmdG67ovLN7B5a20lB+einF65gt0Ul28tvUySHA/cmcrLQ== X-Received: by 2002:a37:a187:: with SMTP id k129mr2906881qke.196.1597179107015; Tue, 11 Aug 2020 13:51:47 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id k31sm20175327qtd.60.2020.08.11.13.51.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:46 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:45 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 07/14] bloom: split 'get_bloom_filter()' in two Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with two counters in the 'write_commit_graph_context' struct which store the number of filters that we computed, and the number of those which were too large to store. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau --- blame.c | 2 +- bloom.c | 13 ++++++++++--- bloom.h | 10 +++++++--- commit-graph.c | 38 +++++++++++++++++++++++++++++++++++--- line-log.c | 2 +- revision.c | 2 +- t/helper/test-bloom.c | 3 ++- 7 files changed, 57 insertions(+), 13 deletions(-) diff --git a/blame.c b/blame.c index 3e5f8787bc..756285fca7 100644 --- a/blame.c +++ b/blame.c @@ -1275,7 +1275,7 @@ static int maybe_changed_path(struct repository *r, if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; - filter = get_bloom_filter(r, origin->commit, 0); + filter = get_bloom_filter(r, origin->commit); if (!filter) return 1; diff --git a/bloom.c b/bloom.c index cd9380ac62..a8a21762f4 100644 --- a/bloom.c +++ b/bloom.c @@ -177,9 +177,10 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present) +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -187,6 +188,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct diff_options diffopt; int max_changes = 512; + if (computed) + *computed = 0; + if (!bloom_filters.slab_size) return NULL; @@ -273,6 +277,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, filter->len = 0; } + if (computed) + *computed = 1; + free(diff_queued_diff.queue); DIFF_QUEUE_CLEAR(&diff_queued_diff); diff --git a/bloom.h b/bloom.h index 0b9b59a6fe..baa91926db 100644 --- a/bloom.h +++ b/bloom.h @@ -89,9 +89,13 @@ void add_key_to_filter(const struct bloom_key *key, void init_bloom_filters(void); -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present); +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed); + +#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ + (r), (c), 0, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index 86dd4b979e..ba2a2cfb22 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -964,6 +964,9 @@ struct write_commit_graph_context { const struct split_commit_graph_opts *split_opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + + int count_bloom_filter_found_large; + int count_bloom_filter_computed; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1175,7 +1178,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1215,7 +1218,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1385,6 +1388,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_found_large", + ctx->count_bloom_filter_found_large); + jw_object_intmax(&jw, "filter_computed", + ctx->count_bloom_filter_computed); + jw_end(&jw); + + trace2_data_json("commit-graph", the_repository, "bloom_statistics", &jw); + + jw_release(&jw); +} + static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; @@ -1407,12 +1426,25 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); for (i = 0; i < ctx->commits.nr; i++) { + int computed = 0; struct commit *c = sorted_commits[i]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) + ctx->count_bloom_filter_found_large++; + } ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } + if (trace2_is_enabled()) + trace2_bloom_filter_write_statistics(ctx); + free(sorted_commits); stop_progress(&progress); } diff --git a/line-log.c b/line-log.c index bf73ea95ac..68eeb425f8 100644 --- a/line-log.c +++ b/line-log.c @@ -1159,7 +1159,7 @@ static int bloom_filter_check(struct rev_info *rev, return 1; if (!rev->bloom_filter_settings || - !(filter = get_bloom_filter(rev->repo, commit, 0))) + !(filter = get_bloom_filter(rev->repo, commit))) return 1; if (!range) diff --git a/revision.c b/revision.c index be600186ee..7f58ecc411 100644 --- a/revision.c +++ b/revision.c @@ -751,7 +751,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; - filter = get_bloom_filter(revs->repo, commit, 0); + filter = get_bloom_filter(revs->repo, commit); if (!filter) { count_bloom_filter_not_present++; diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index f0aa80b98e..531af439c2 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -39,7 +39,8 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) struct bloom_filter *filter; setup_git_directory(); c = lookup_commit(the_repository, commit_oid); - filter = get_bloom_filter(the_repository, c, 1); + filter = get_or_compute_bloom_filter(the_repository, c, 1, + NULL); print_bloom_filter(filter); } From patchwork Tue Aug 11 20:51:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5237A109B for ; Tue, 11 Aug 2020 20:51:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3163120781 for ; Tue, 11 Aug 2020 20:51:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Jyv/dV3W" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726501AbgHKUv5 (ORCPT ); Tue, 11 Aug 2020 16:51:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUv4 (ORCPT ); Tue, 11 Aug 2020 16:51:56 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DBC8C06174A for ; Tue, 11 Aug 2020 13:51:56 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id g26so182749qka.3 for ; Tue, 11 Aug 2020 13:51:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=F8i3CrolrB3qgXoLmj1WcoctSOJOAoCyWhuWJxecEPA=; b=Jyv/dV3WJCsbx2oRcBGn0L/Lhs9EIVZ2xakHIvL/xRJnwKxMctf9rfwp4PHHk2d98+ 8Ubal2q+eQutB6D9aMbNK08FjIt2J+l6Qc7hf1Q9l5A5Gq2qo7fo+KORZQe9XMz4DdVZ riXgd6NEE7l3g8U/bJEtk/jlxsXXlHc9Ex4bsQrmssr6e+589alXeinLJyf9dNEb7ABT vUC0G8JRXWqZrv4m+BB6kCVAi4w/p/S+rrg1Jc0t1VdyEXgV5KRdt/UBUg4h5J9tbSth giAvNkcDqXNARWFLYsrovsmnKEuk4+tIlTCb1qXF+C0lL4YP/J7qZKWvxN11vN2c3Ob0 tYAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=F8i3CrolrB3qgXoLmj1WcoctSOJOAoCyWhuWJxecEPA=; b=FLtadGKw3+SaspfmuI3FfKK99MMQlXX1ti4j+Yu0sTbWA2O1epnYdNF8TN6LJxNQkc +/vX8W+cMLDIY8u/NS23HopJmTvT93DU/iVMafPVcCdMbFsR2X8Gm8bUNspoAivKjkKv 9t73n7VeMj1Ly11PJFo8U9JLRyimT/SA+YYYsmNBWa9C7oim33f0x2xAGCD07kOHxCj4 DpKEpRtlyLx9KVhsAD7Joz1vFhO2zglcFylKnhqgnDeM2hmHTTwmmpRxhG3QFY+3pF8x VlG7vfcOwYj7kosQyix2q1oNSn62tLOh128ACv1cSNV4EMAOrdNT5sT5XqSETBhbjO35 9IQw== X-Gm-Message-State: AOAM531MlzFJtRZrxvSXGtDsn8rxpmZVWZ2ogTEo1xT2+XWILW8JchK1 mYgWTW4CYy3yvw4t8Hi9b2OcXs7/yrxOTErl X-Google-Smtp-Source: ABdhPJyHy6f4PScoTzlymZfJH/lK7dCa8piZ+7wbCOMNoKL0QrwjvGXQmsDHgi6230b8jYDPuTu0tA== X-Received: by 2002:a37:44b:: with SMTP id 72mr3019610qke.494.1597179114965; Tue, 11 Aug 2020 13:51:54 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id q34sm21757089qtk.32.2020.08.11.13.51.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:54 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:53 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 08/14] bloom: use provided 'struct bloom_filter_settings' Message-ID: <4f08177dbe0f3cddfcc7cddbdb65446f410b6ecf.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau --- bloom.c | 13 ++++++------- bloom.h | 3 ++- commit-graph.c | 21 ++++++++++----------- t/helper/test-bloom.c | 1 + 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/bloom.c b/bloom.c index a8a21762f4..0cf1962dc5 100644 --- a/bloom.c +++ b/bloom.c @@ -180,13 +180,12 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, int *computed) { struct bloom_filter *filter; - struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; int i; struct diff_options diffopt; - int max_changes = 512; if (computed) *computed = 0; @@ -211,7 +210,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; diffopt.detect_rename = 0; - diffopt.max_changes = max_changes; + diffopt.max_changes = settings->max_changed_paths; diff_setup_done(&diffopt); /* ensure commit is parsed so we have parent information */ @@ -223,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= max_changes) { + if (diffopt.num_changes <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -260,13 +259,13 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } - filter->len = (hashmap_get_size(&pathmap) * settings.bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; - fill_bloom_key(e->path, strlen(e->path), &key, &settings); - add_key_to_filter(&key, filter, &settings); + fill_bloom_key(e->path, strlen(e->path), &key, settings); + add_key_to_filter(&key, filter, settings); } hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); diff --git a/bloom.h b/bloom.h index baa91926db..3f19e3fca4 100644 --- a/bloom.h +++ b/bloom.h @@ -92,10 +92,11 @@ void init_bloom_filters(void); struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, int *computed); #define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL) + (r), (c), 0, NULL, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index ba2a2cfb22..48d4697f54 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1432,6 +1432,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->r, c, 1, + ctx->bloom_settings, &computed); if (computed) { ctx->count_bloom_filter_computed++; @@ -1688,17 +1689,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int num_chunks = 3; uint64_t chunk_offset; struct object_id file_hash; - struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; - - if (!ctx->bloom_settings) { - bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", - bloom_settings.bits_per_entry); - bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", - bloom_settings.num_hashes); - bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", - bloom_settings.max_changed_paths); - ctx->bloom_settings = &bloom_settings; - } if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -2144,6 +2134,7 @@ int write_commit_graph(struct object_directory *odb, uint32_t i, count_distinct = 0; int res = 0; int replace = 0; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; if (!commit_graph_compatible(the_repository)) return 0; @@ -2157,6 +2148,14 @@ int write_commit_graph(struct object_directory *odb, ctx->split_opts = split_opts; ctx->total_bloom_filter_data_size = 0; + bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", + bloom_settings.bits_per_entry); + bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", + bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); + ctx->bloom_settings = &bloom_settings; + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 531af439c2..4af949164c 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -40,6 +40,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) setup_git_directory(); c = lookup_commit(the_repository, commit_oid); filter = get_or_compute_bloom_filter(the_repository, c, 1, + &settings, NULL); print_bloom_filter(filter); } From patchwork Tue Aug 11 20:51:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709657 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A065013B1 for ; Tue, 11 Aug 2020 20:52:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81A5720781 for ; Tue, 11 Aug 2020 20:52:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="L5b8sOPA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726506AbgHKUwA (ORCPT ); Tue, 11 Aug 2020 16:52:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUwA (ORCPT ); Tue, 11 Aug 2020 16:52:00 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5924C06174A for ; Tue, 11 Aug 2020 13:51:59 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id 2so140946qkf.10 for ; Tue, 11 Aug 2020 13:51:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=UkG0VbkkbfTo4r4sgGsVzU492HVkMStyXADq+pBNjSA=; b=L5b8sOPAdEdmYhfcyHXzZnEZLzKBZSu2To9fxTXs8BpenpdA1xAdvOKg9e1LPqJ7MM ltbCAnhMDhbUgO681DEou1CF8LagokOk+UYhuxQR507iPYfLtJ+27snmBe4ELjllJRHO 7Elv5tH4RG8B9YJK1qO6+nxOxoIFW/vSax83fv6ZY69M7Vd68Mu0h0hNUSIaAZnS70D7 HqV1CPVdohpj2OADPTCUFp1a4oyyrlaE9EZ0sI6ElABNgiSW3wcxCGdio2lmmvrSmQg2 wtgdbSmuoAmgB6iwLVM4nhClgjRhFMWmnvJqSu+r63wXvA+Wza27P/keizS6MAZyU8ba pfKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=UkG0VbkkbfTo4r4sgGsVzU492HVkMStyXADq+pBNjSA=; b=gjfnM07asEP0xBIuSr71FUJVeEp3wUqZQF/RzD81A4MJ3ZPDFszGJgOBh4JXHpHHaz kw03JfsDARtAxB2bimlGM6c5y+qb8wetWtF9bNeXo3w0CEjTflsHAHCquxTF5afH9Iju 1QiCEKGXH1bq9wFU0YCVPODkUC8IE5v2EpJ1luJdPWXYrFkt7cXFK4sawCeCKXqQkRkT w4q1EYyXGCkR7pCToNgnYMROa2YL7JqVOUT3poMI6YojIGH+Jf//6PPFByz8nnrD3lSz LHVOBYG5b9UE0PkrtKcjLcK0dCmRx6c26rGBTbfYpOc6OvryGUQ4Qu+aA5mGAbwuXBD0 HZ7g== X-Gm-Message-State: AOAM533Cmek91osP5NEHlZdEky52Q2BWBhx0McvNQWrqQAadXCevFvE5 TpSw6lgaj0GE8wGlS/KDG3LPIjFV2dWWuj22 X-Google-Smtp-Source: ABdhPJzBeI7D/R/YtqOBHouSy2EUu+MW1Fo57cqmmkAA36F/7xf0VAXCaFXCYs0s4BFawHDSJb6loA== X-Received: by 2002:a37:a7cc:: with SMTP id q195mr2997937qke.110.1597179118503; Tue, 11 Aug 2020 13:51:58 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id y3sm21924245qtj.55.2020.08.11.13.51.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:57 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:56 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 09/14] bloom/diff: properly short-circuit on max_changes Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee Commit e3696980 (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are | | N=1 | N=10 | N=512 | |--------|----------------|----------------|----------------| | HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB | | HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB | For Linux, the results are | | N=1 | N=20 | N=512 | |--------|----------------|---------------|---------------| | HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB | | HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB | Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- bloom.c | 6 +++- diff.h | 2 -- t/t4216-log-bloom.sh | 85 +++++++++++++++++++++++++++++++++++++++----- tree-diff.c | 5 +-- 4 files changed, 82 insertions(+), 16 deletions(-) diff --git a/bloom.c b/bloom.c index 0cf1962dc5..ed54e96e57 100644 --- a/bloom.c +++ b/bloom.c @@ -222,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= settings->max_changed_paths) { + if (diff_queued_diff.nr <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -259,6 +259,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } + if (hashmap_get_size(&pathmap) > settings->max_changed_paths) + goto cleanup; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); @@ -268,6 +271,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, add_key_to_filter(&key, filter, settings); } + cleanup: hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); } else { for (i = 0; i < diff_queued_diff.nr; i++) diff --git a/diff.h b/diff.h index e0c0af6286..1d32b71885 100644 --- a/diff.h +++ b/diff.h @@ -287,8 +287,6 @@ struct diff_options { /* If non-zero, then stop computing after this many changes. */ int max_changes; - /* For internal use only. */ - int num_changes; int ita_invisible_in_index; /* white-space error highlighting */ diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index eb2bcc51f0..21b67677ef 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -177,20 +177,87 @@ test_expect_success 'persist filter settings' ' ' test_expect_success 'correctly report changes over limit' ' - git init 513changes && + git init limits && ( - cd 513changes && - for i in $(test_seq 1 513) + cd limits && + mkdir d && + mkdir d/e && + + for i in $(test_seq 1 2) do - echo $i >file$i.txt || return 1 + printf $i >d/file$i.txt && + printf $i >d/e/file$i.txt || return 1 done && - git add . && + + mkdir mode && + printf bash >mode/script.sh && + + mkdir foo && + touch foo/bar && + touch foo.txt && + + git add d foo foo.txt mode && git commit -m "files" && - git commit-graph write --reachable --changed-paths && - for i in $(test_seq 1 513) + + # Commit has 7 file and 4 directory adds + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace && + grep "\"filter_found_large\":1" trace && + + for path in $(git ls-tree -r --name-only HEAD) do - git -c core.commitGraph=false log -- file$i.txt >expect && - git log -- file$i.txt >actual && + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # Make a variety of path changes + printf new1 >d/e/file1.txt && + printf new2 >d/file2.txt && + rm d/e/file2.txt && + rm -r foo && + printf text >foo && + mkdir f && + printf new1 >f/file1.txt && + + # including a mode-only change (counts as modified) + git update-index --chmod=+x mode/script.sh && + + git add foo d f && + git commit -m "complicated" && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace-edit" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace-edit && + grep "\"filter_found_large\":2" trace-edit && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=11 \ + GIT_TRACE2_EVENT="$(pwd)/trace-update" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":11" trace-update && + grep "\"filter_found_large\":0" trace-update && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && test_cmp expect actual || return 1 done ) diff --git a/tree-diff.c b/tree-diff.c index 6ebad1a46f..7cebbb327e 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -434,7 +434,7 @@ static struct combine_diff_path *ll_diff_tree_paths( if (diff_can_quit_early(opt)) break; - if (opt->max_changes && opt->num_changes > opt->max_changes) + if (opt->max_changes && diff_queued_diff.nr > opt->max_changes) break; if (opt->pathspec.nr) { @@ -521,7 +521,6 @@ static struct combine_diff_path *ll_diff_tree_paths( /* t↓ */ update_tree_entry(&t); - opt->num_changes++; } /* t > p[imin] */ @@ -539,7 +538,6 @@ static struct combine_diff_path *ll_diff_tree_paths( skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ update_tp_entries(tp, nparent); - opt->num_changes++; } } @@ -557,7 +555,6 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { - opt->num_changes = 0; p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); /* From patchwork Tue Aug 11 20:52:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709659 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7839D13B1 for ; Tue, 11 Aug 2020 20:52:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5B8B320774 for ; Tue, 11 Aug 2020 20:52:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="MoGrmvn7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726515AbgHKUwF (ORCPT ); Tue, 11 Aug 2020 16:52:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUwF (ORCPT ); Tue, 11 Aug 2020 16:52:05 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EB27C06174A for ; Tue, 11 Aug 2020 13:52:05 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id p4so198936qkf.0 for ; Tue, 11 Aug 2020 13:52:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VnfpCipl9Slkz8EUOG39zqbhG2/9zqI8fjgty/iURK8=; b=MoGrmvn7UzRA4fw/TaoreO3ZGlZQ3dbHMMieUKmywKgP+P/3cyd7QXNQlgL7FgVN0t SfXUPLLJ0ZhUkvVVnQPxR6icni9J01pAMFDHeegpharm7BtBICpSXjx74SDbWhYmMErq jwnq0oLN0i6nFou4+C8YM6NfTbZ+AeTiCJRyRA7x0kqXepLCDbjd/F7y4cyJAJ1CNw3U zk953xIeS15WNo5vd0vg9ltDLL8SexvUPbFsZAY/CLtZUDcMQFVziXQ9v4HHtYsLppOE m10c2njPZHwuc7kKch/1bT+IjxRR6qrkKhyP6LndqfhZPfsm65P6lBA2/kiCXu9O/XdB IVyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VnfpCipl9Slkz8EUOG39zqbhG2/9zqI8fjgty/iURK8=; b=eGFcu71clfsnR9fTnabmQF+HseoD6QLHrUQOCLC1JUKUoMmC3zmTvU+R8ei1A4bJqR vPLLlPC0PwYvfG0FzsinGW+13LdfD7OKS6ZP4PbZhLZSBoTJ69jhq+4vCEun/w+Ypg+7 NW3wwzRCflOhIpvmH8BlShGfOwsFSxBIoc9P6UpoDL4PpQbe4zFP0n8YMYQWqd/MXfdH vrDiYRLLz7K9oYI6swZuILeaHi28QizbHxmohpGdC0kxa6nYMvT3aTRimRFBY+bZvg3y YdL+Z3Y7UZAjpfNAETX/S8f7eRs+Nt3z5L9+kv2hoDLi/HBZrmGeyUKqyGXeZreWIPUo PCiw== X-Gm-Message-State: AOAM533wxwihUb+EDSrEGcv7ZCiKZ07KKnkcqNLc7U0Vjbb+3G15gTzc 2clBpSzhGu7H9WrM0L2nLWRQr1CsojenpSW8 X-Google-Smtp-Source: ABdhPJwuSYvUQjvd0tfKRm2KEu2F/xHVmEQehhAW/86xFFBiv5x41O8MFBoCWKXMOE3X1+yiO14Kew== X-Received: by 2002:a05:620a:12af:: with SMTP id x15mr2835232qki.441.1597179123899; Tue, 11 Aug 2020 13:52:03 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id j72sm24404qke.20.2020.08.11.13.52.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:52:03 -0700 (PDT) Date: Tue, 11 Aug 2020 16:52:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 10/14] commit-graph.c: sort index into commits list Message-ID: <23fd52c3b84e518af7c22aa3edd0447dbaeac3d7.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For locality, 'compute_bloom_filters()' sorts the commits for which it wants to compute Bloom filters in a preferred order (cf., 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) for details). A future patch will want to recover the new graph position of each commit. Since the 'packed_commit_list' already stores a double-pointer, avoid a 'COPY_ARRAY' and instead keep track of an index into the original list. (Use an integer index instead of a memory address, since this involves a needlessly confusing triple-pointer). Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to take into account the packed_commit_list they are sorting with respect to. Since 'compute_bloom_filters()' is the only caller for each of those comparison functions, no other call-sites need updating. Signed-off-by: Taylor Blau --- commit-graph.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 48d4697f54..0d70545149 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -79,10 +79,18 @@ static void set_commit_pos(struct repository *r, const struct object_id *oid) *commit_pos_at(&commit_pos, commit) = max_pos++; } -static int commit_pos_cmp(const void *va, const void *vb) +struct packed_commit_list { + struct commit **list; + int nr; + int alloc; +}; + +static int commit_pos_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; return commit_pos_at(&commit_pos, a) - commit_pos_at(&commit_pos, b); } @@ -139,10 +147,12 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } -static int commit_gen_cmp(const void *va, const void *vb) +static int commit_gen_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; uint32_t generation_a = commit_graph_generation(a); uint32_t generation_b = commit_graph_generation(b); @@ -922,11 +932,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } -struct packed_commit_list { - struct commit **list; - int nr; - int alloc; -}; struct packed_oid_list { struct object_id *list; @@ -1408,7 +1413,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; struct progress *progress = NULL; - struct commit **sorted_commits; + int *sorted_commits; init_bloom_filters(); @@ -1418,16 +1423,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->commits.nr); ALLOC_ARRAY(sorted_commits, ctx->commits.nr); - COPY_ARRAY(sorted_commits, ctx->commits.list, ctx->commits.nr); - - if (ctx->order_by_pack) - QSORT(sorted_commits, ctx->commits.nr, commit_pos_cmp); - else - QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + for (i = 0; i < ctx->commits.nr; i++) + sorted_commits[i] = i; + QSORT_S(sorted_commits, ctx->commits.nr, + ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, + &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { int computed = 0; - struct commit *c = sorted_commits[i]; + int pos = sorted_commits[i]; + struct commit *c = ctx->commits.list[pos]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, From patchwork Tue Aug 11 20:51:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B649313B1 for ; Tue, 11 Aug 2020 20:51:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 964B020774 for ; Tue, 11 Aug 2020 20:51:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Cbjc4S2r" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726488AbgHKUvx (ORCPT ); Tue, 11 Aug 2020 16:51:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUvx (ORCPT ); Tue, 11 Aug 2020 16:51:53 -0400 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFC80C06174A for ; Tue, 11 Aug 2020 13:51:52 -0700 (PDT) Received: by mail-qv1-xf44.google.com with SMTP id r19so23718qvw.11 for ; Tue, 11 Aug 2020 13:51:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cQ0YqZZa9lODCoFb9xpqUaBGz+eMOXt0L0kMZRhg/Ic=; b=Cbjc4S2rukJm6KsQz1M/zTF0MUOITgoNiDRLcaLJnJu6W6CsE5EZexQjcwxDR9mO6T V3j0BIQ9xLGS4ubrm52CCG2N08RUcamFPW+fVwQy3FKqU2/E3ZDMCCJBfpjMmPh3wCuF Eni3pP3+MoPEVtvrc53L2FZmMw7RD4FNn+F7qqERi4UenkuBJmhOnGFjBkuicUNDH1Vd +My/6BR70SNkhdXnw9hmNh4xpK049X9Pyxth/DBEG7R76OnQzNV9ey+yvpyKWOcEwc/o JExKf1mi6mY+cIdPsRg//h0D0HQvE/B7ofp55Mcq2ElAdwOV+bchwnBiYYUTShaH20nY mEAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cQ0YqZZa9lODCoFb9xpqUaBGz+eMOXt0L0kMZRhg/Ic=; b=I1/WuS+nbyO/FlMRsKCnbAiJLK+RtjSAodCgj1kv/T5dFOWzLvqcn2OfxLuZKRHpj/ l2J/GS/O1eoYWFhl6PP3AizWjUCIbPqE8Y7ebSnO0djHNvYRo7FRmDMHjVlnzOarhQXx ejW0/uAgBpDXE05bhveoKacZ6dbLGwg+cNoDqUkmZa7D5ZOcVsRdVQd4xEXgzWhgdUKF TARna8VqRYhpXnDeZLWZ42fYnHvPQXmc/chc5IYKpaXQ4I8wsUZ8DwOP/zXZxKeyI6LF 4x6r+er7f3L2OzrorDnloYuLXCcoSZkaXI9RIQrKRPRdl60FNd5ZcO+4g/XlXOT5JMDf xgUg== X-Gm-Message-State: AOAM532OIM0R372dExx/AS6Ybq35Bnh5pfrr1qWXdZAp1hFLnRQwSR8X ZMD0GvCZUk0Y73+FIXc+L0T/Wd3xQQk7y1A2 X-Google-Smtp-Source: ABdhPJw8PWpJXMa/Ghx9FXftDjASUcJdiAX4GqMq7fcXj+udcnJeH0Qmx3XgGZpw5jck/V2bpwv21g== X-Received: by 2002:ad4:444e:: with SMTP id l14mr3160668qvt.111.1597179111461; Tue, 11 Aug 2020 13:51:51 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id s5sm18559863qke.120.2020.08.11.13.51.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:51:50 -0700 (PDT) Date: Tue, 11 Aug 2020 16:51:49 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 11/14] csum-file.h: introduce 'hashwrite_be64()' Message-ID: <4800cd373eefb8d71094254d90fd0304067e464f.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A small handful of writers who wish to encode 64-bit values in network order have worked around the lack of such a helper by calling the 32-bit variant twice. The subsequent commit will add another caller who wants to write a 64-bit value. To ease their (and the existing caller's) pain, introduce a helper to do just that, and convert existing call-sites. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit-graph.c | 8 ++------ csum-file.h | 6 ++++++ midx.c | 3 +-- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 0d70545149..8964453433 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1784,12 +1784,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); + hashwrite_be32(f, chunks[i].id); + hashwrite_be64(f, chunk_offset); chunk_offset += chunks[i].size; } diff --git a/csum-file.h b/csum-file.h index f9cbd317fb..b026ec7766 100644 --- a/csum-file.h +++ b/csum-file.h @@ -62,4 +62,10 @@ static inline void hashwrite_be32(struct hashfile *f, uint32_t data) hashwrite(f, &data, sizeof(data)); } +static inline void hashwrite_be64(struct hashfile *f, uint64_t data) +{ + hashwrite_be32(f, data >> 32); + hashwrite_be32(f, data & 0xffffffffUL); +} + #endif diff --git a/midx.c b/midx.c index a5fb797ede..51ca27cf34 100644 --- a/midx.c +++ b/midx.c @@ -775,8 +775,7 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off if (!(offset >> 31)) continue; - hashwrite_be32(f, offset >> 32); - hashwrite_be32(f, offset & 0xffffffffUL); + hashwrite_be64(f, offset); written += 2 * sizeof(uint32_t); nr_large_offset--; From patchwork Tue Aug 11 20:52:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709661 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 178D1109B for ; Tue, 11 Aug 2020 20:52:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E4D202076C for ; Tue, 11 Aug 2020 20:52:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="VOfDKzMi" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726521AbgHKUwL (ORCPT ); Tue, 11 Aug 2020 16:52:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUwK (ORCPT ); Tue, 11 Aug 2020 16:52:10 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 947DFC06174A for ; Tue, 11 Aug 2020 13:52:10 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id c12so10539753qtn.9 for ; Tue, 11 Aug 2020 13:52:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=zvT0HGAuUzTk4bbsAnxQujCI3XYmFjA7nYj2hrHZmZw=; b=VOfDKzMiwY7lwnqVHQc3Fc3/CvhgomiKJWFRfaFXLSC94BmhEPmToo98DRk5/dokXo En7xbilG8N3bNp1/Kr6PYjX+4hqAXurDego04sPYpPwxLIFHrTp+RrSk/nGLE5sjkhqR vlFzzUAkpFsAX9hSmnor9+xym/cpfzhFugAPhq94DLYvp2tlIqWMJUgjgiJ29ovZ9dfc koDlwaImvFLRA17+0CHHGXYsZ5M+fGBaj/pla/1o/Y0K/Xck+GhzM+fYRzEvjJx1FKHk 5h5esQOkqaeymnuDRre2yEKe6fBFLYGSbJVmNnICqbA1jD2od9+5hSOMzYbszaII/bbL idig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=zvT0HGAuUzTk4bbsAnxQujCI3XYmFjA7nYj2hrHZmZw=; b=HXoSv2FOTaWVMTTa/SD5IRJdKEiIKNBdWYVt0qhPQrGmi79n1JYA9bDvJskg19f+lG H6LjUoAmdpoEOwBpaOcoZcJmpsvAWT3mGWcTwLIuWIx0mVj18TAMI3Bvd93ay1ZrIRUD vENgMrB2uAnsqCXS7z+h8k2H9sKF4dwNL5YAVli9TFDvW91xmtvn3h58MvlSiSq1SlUO 0TL6kXlnXR4rr8l6Z/aO2KcdXlXSUAMElpPpJRB5aimW+/Gv/ayPLv/DomhduuAckwl/ Ldmk5j/UA0hNSblN7atP57mEtmwAbaaT+EuHe+0ufdjkjWuwMShmo6AdB4czwrvPDWdH Ra1g== X-Gm-Message-State: AOAM531q+R7hGyz7iKfgKptfO+bV5r0HQv9qN70lSmfwHrOBu/gn+/I3 ehyAJ34UUGbjw6SmJ9Yr8yo+BRA4yApS5d4O X-Google-Smtp-Source: ABdhPJwPesl0xCH7UlWwvM1CDPDvZYdS3fhWUz8/Tj845987IGPdzU5QtG4yseUs9qIsMJfqPdmSdA== X-Received: by 2002:ac8:7094:: with SMTP id y20mr3147540qto.52.1597179128725; Tue, 11 Aug 2020 13:52:08 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id d57sm21076110qte.91.2020.08.11.13.52.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:52:08 -0700 (PDT) Date: Tue, 11 Aug 2020 16:52:07 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 12/14] commit-graph: add large-filters bitmap chunk Message-ID: <619e0c619dd12e2bea2b80c7d249ba66fe2a315c.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a commit has more than a certain number of changed paths (commonly 512), the commit-graph machinery represents it as a zero-length filter. This is done since having many entries in the Bloom filter has undesirable effects on the false positivity rate. In addition to these too-large filters, the commit-graph machinery also represents commits with no filter and commits with no changed paths in the same way. When writing a commit-graph that aggregates several incremental commit-graph layers (eg., with '--split=replace'), the commit-graph machinery first computes all of the Bloom filters that it wants to write but does not already know about from existing graph layers. Because we overload the zero-length filter in the above fashion, this leads to recomputing large filters over and over again. This is already undesirable, since it means that we are wasting considerable effort to discover that a commit with too many changed paths, only to throw that effort away (and then repeat the process the next time a roll-up is performed). In a subsequent patch, we will add a '--max-new-filters=' option, which specifies an upper-bound on the number of new filters we are willing to compute from scratch. Suppose that there are 'N' too-large filters, and we specify '--max-new-filters=M'. If 'N >= M', it is unlikely that any filters will be generated, since we'll spend most of our effort on filters that we ultimately throw away. If 'N < M', filters will trickle in over time, but only at most 'M - N' per-write. To address this, add a new chunk which encodes a bitmap where the ith bit is on iff the ith commit has zero or at least 512 changed paths. Likewise store the maximum number of changed paths we are willing to store in order to prepare for eventually making this value more easily customizable. When computing Bloom filters, first consult the relevant bitmap (in the case that we are rolling up existing layers) to see if computing the Bloom filter from scratch would be a waste of time. This patch implements a new chunk instead of extending the existing BIDX and BDAT chunks because modifying these chunks would confuse old clients. (Eg., setting the most-significant bit in the BIDX chunk would confuse old clients and require a version bump). To allow using the existing bitmap code with 64-bit words, we write the data in network byte order from the 64-bit words. This means we also need to read the array from the commit-graph file by translating each word from network byte order using get_be64() when loading the commit graph. (Note that this *could* be delayed until first-use, but a later patch will rely on this being initialized early, so we assume the up-front cost when parsing instead of delaying initialization). By avoiding the need to move to new versions of the BDAT and BIDX chunk, we can give ourselves more time to consider whether or not other modifications to these chunks are worthwhile without holding up this change. Another approach would be to introduce a new BIDX chunk (say, one identified by 'BID2') which is identical to the existing BIDX chunk, except the most-significant bit of each offset is interpreted as "this filter is too big" iff looking at a BID2 chunk. This avoids having to write a bitmap, but forces older clients to rewrite their commit-graphs (as well as reduces the theoretical largest Bloom filters we couldl write, and forces us to maintain the code necessary to translate BIDX chunks to BID2 ones). Separately from this patch, I implemented this alternate approach and did not find it to be advantageous. Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- .../technical/commit-graph-format.txt | 12 +++ bloom.h | 2 +- commit-graph.c | 96 ++++++++++++++++--- commit-graph.h | 4 + t/t4216-log-bloom.sh | 25 ++++- 5 files changed, 124 insertions(+), 15 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 440541045d..5f2d9ab4d7 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -123,6 +123,18 @@ CHUNK DATA: of length zero. * The BDAT chunk is present if and only if BIDX is present. + Large Bloom Filters (ID: {'B', 'F', 'X', 'L'}) [Optional] + * It starts with a 32-bit unsigned integer specifying the maximum number of + changed-paths that can be stored in a single Bloom filter. + * It then contains a list of 64-bit words (the length of this list is + determined by the width of the chunk) which is a bitmap. The 'i'th bit is + set exactly when the 'i'th commit in the graph has a changed-path Bloom + filter with zero entries (either because the commit is empty, or because + it contains more than 512 changed paths). + * The BFXL chunk is present only when the BIDX and BDAT chunks are + also present. + + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] This list of H-byte hashes describe a set of B commit-graph files that form a commit-graph chain. The graph position for the ith commit in this diff --git a/bloom.h b/bloom.h index 3f19e3fca4..464d9b57de 100644 --- a/bloom.h +++ b/bloom.h @@ -33,7 +33,7 @@ struct bloom_filter_settings { * The maximum number of changed paths per commit * before declaring a Bloom filter to be too-large. * - * Not written to the commit-graph file. + * Written to the 'BFXL' chunk (instead of 'BDAT'). */ uint32_t max_changed_paths; }; diff --git a/commit-graph.c b/commit-graph.c index 8964453433..ea0583298c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -41,8 +41,9 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ +#define GRAPH_CHUNKID_BLOOMLARGE 0x4246584c /* "BFXL" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 8 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -429,6 +430,24 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); } break; + + case GRAPH_CHUNKID_BLOOMLARGE: + if (graph->chunk_bloom_large_filters) + chunk_repeated = 1; + else if (r->settings.commit_graph_read_changed_paths) { + size_t alloc = get_be64(chunk_lookup + 4) - chunk_offset - sizeof(uint32_t); + graph->chunk_bloom_large_filters = data + chunk_offset + sizeof(uint32_t); + graph->bloom_filter_settings->max_changed_paths = get_be32(data + chunk_offset); + if (alloc) { + size_t j; + graph->bloom_large = bitmap_word_alloc(alloc); + + for (j = 0; j < graph->bloom_large->word_alloc; j++) + graph->bloom_large->words[j] = get_be64( + graph->chunk_bloom_large_filters + j * sizeof(eword_t)); + } + } + break; } if (chunk_repeated) { @@ -443,7 +462,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, /* We need both the bloom chunks to exist together. Else ignore the data */ graph->chunk_bloom_indexes = NULL; graph->chunk_bloom_data = NULL; + graph->chunk_bloom_large_filters = NULL; FREE_AND_NULL(graph->bloom_filter_settings); + bitmap_free(graph->bloom_large); } hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len); @@ -932,6 +953,20 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } +static int get_bloom_filter_large_in_graph(struct commit_graph *g, + const struct commit *c) +{ + uint32_t graph_pos = commit_graph_position(c); + if (graph_pos == COMMIT_NOT_FROM_GRAPH) + return 0; + + while (g && graph_pos < g->num_commits_in_base) + g = g->base_graph; + + if (!(g && g->bloom_large)) + return 0; + return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base); +} struct packed_oid_list { struct object_id *list; @@ -970,8 +1005,10 @@ struct write_commit_graph_context { size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + int count_bloom_filter_known_large; int count_bloom_filter_found_large; int count_bloom_filter_computed; + struct bitmap *bloom_large; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1235,6 +1272,23 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, return 0; } +static int write_graph_chunk_bloom_large(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + size_t i, alloc = ctx->commits.nr / BITS_IN_EWORD; + if (ctx->commits.nr % BITS_IN_EWORD) + alloc++; + if (alloc > ctx->bloom_large->word_alloc) + BUG("write_graph_chunk_bloom_large: bitmap not large enough"); + + trace2_region_enter("commit-graph", "bloom_large", ctx->r); + hashwrite_be32(f, ctx->bloom_settings->max_changed_paths); + for (i = 0; i < ctx->bloom_large->word_alloc; i++) + hashwrite_be64(f, ctx->bloom_large->words[i]); + trace2_region_leave("commit-graph", "bloom_large", ctx->r); + return 0; +} + static int oid_compare(const void *_a, const void *_b) { const struct object_id *a = (const struct object_id *)_a; @@ -1398,6 +1452,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte struct json_writer jw = JSON_WRITER_INIT; jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_known_large", + ctx->count_bloom_filter_known_large); jw_object_intmax(&jw, "filter_found_large", ctx->count_bloom_filter_found_large); jw_object_intmax(&jw, "filter_computed", @@ -1416,6 +1472,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int *sorted_commits; init_bloom_filters(); + ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); if (ctx->report_progress) progress = start_delayed_progress( @@ -1430,21 +1487,28 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { - int computed = 0; int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - struct bloom_filter *filter = get_or_compute_bloom_filter( - ctx->r, - c, - 1, - ctx->bloom_settings, - &computed); - if (computed) { - ctx->count_bloom_filter_computed++; - if (filter && !filter->len) - ctx->count_bloom_filter_found_large++; + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_known_large++; + } else { + int computed = 0; + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + ctx->bloom_settings, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_found_large++; + } + } + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } @@ -1764,6 +1828,11 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) + ctx->total_bloom_filter_data_size; chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; num_chunks++; + chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMLARGE; + chunks[num_chunks].size = sizeof(eword_t) * ctx->bloom_large->word_alloc + + sizeof(uint32_t); + chunks[num_chunks].write_fn = write_graph_chunk_bloom_large; + num_chunks++; } if (ctx->num_commit_graphs_after > 1) { chunks[num_chunks].id = GRAPH_CHUNKID_BASE; @@ -2503,6 +2572,7 @@ void free_commit_graph(struct commit_graph *g) } free(g->filename); free(g->bloom_filter_settings); + bitmap_free(g->bloom_large); free(g); } diff --git a/commit-graph.h b/commit-graph.h index d9acb22bac..ddbca1b59d 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -4,6 +4,7 @@ #include "git-compat-util.h" #include "object-store.h" #include "oidset.h" +#include "ewah/ewok.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" @@ -71,6 +72,9 @@ struct commit_graph { const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + const unsigned char *chunk_bloom_large_filters; + + struct bitmap *bloom_large; struct bloom_filter_settings *bloom_filter_settings; }; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 21b67677ef..6859d85369 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -33,7 +33,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' git commit-graph write --reachable --changed-paths ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 1 $NUM_CHUNKS 0 num_commits: $1 @@ -262,5 +262,28 @@ test_expect_success 'correctly report changes over limit' ' done ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + bloom_trace_prefix="{\"filter_known_large\":$2,\"filter_found_large\":$3,\"filter_computed\":$4" + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $commit_graph_args && + grep "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation does not recompute too-large filters' ' + ( + cd limits && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + git commit-graph write --reachable --changed-paths \ + --split=replace && + test_commit c1 filter && + + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ + 2 0 1 + ) +' test_done From patchwork Tue Aug 11 20:52:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8436913B1 for ; Tue, 11 Aug 2020 20:52:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 684E420774 for ; Tue, 11 Aug 2020 20:52:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="mqc6AKlI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726527AbgHKUwP (ORCPT ); Tue, 11 Aug 2020 16:52:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUwO (ORCPT ); Tue, 11 Aug 2020 16:52:14 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3282AC06174A for ; Tue, 11 Aug 2020 13:52:14 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id o22so10510846qtt.13 for ; Tue, 11 Aug 2020 13:52:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=d8tq5I8Y6CqRcMAn2r/SwGhChOu93t/sKgN4htBT5jg=; b=mqc6AKlIYgABQY2408SdzehRynZbJxncHfIjX5mdzZCB8rEHwvStlvHCnZIMpIB52R Di3wQNDGqETsXtRK+0ZCeQTRy0MjsehfvO9ZKgV3SEi5ebV0DKrDGe7veOdRvgFi1GwE 74FRaSDb170+QPWgUn8mvvXxZN/QAwIXgg6LEBaFYp3Ju2/k7rQBpRmMZ3910WwApm7m gPFTFuA3unMKuq9+tino/5pk9w/wFJ2QthO3bFj/WQhuzdkwko9HbIKTWe7dzg9qu7wy kdyd2IAsNBUOntMV0s995+dxg1Hts1XX0ZSNlg54bh/h2xvA+7CdhKgU7C3ve5Pnlo71 NE8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=d8tq5I8Y6CqRcMAn2r/SwGhChOu93t/sKgN4htBT5jg=; b=o7U1Gi2pyrHgxHToJjXMR3jaRD+8pp6bZL2avxENjg0mGgzIDD8+UKbm29O/Om98Tv 8yusBtMr4dpBvrWlqHyLrlH8NzqkM8zxZozwSbQcbm5mHbv+tfPpcoSAcl57KqSISMIj J+guf8UV2RWqT7n0I28yCoFzUAiYPtvy5VQycLppIHdE/TTx+Atry928z88jHUviDJaV cufDEDg3T90Be02CMbN18Q+sEe+VM/0M6Y70U69LlKmDokB6rr+jMHPFfezJilrPngcb Wpz6hmmJpWJcdbH5UgevihGa5dtxnvoPkDmorFIXDDiwnxd0Bekg+mUBm2/WaDrsDNUU jzfw== X-Gm-Message-State: AOAM531Be5Qr6RPVdSn3/pGsbs0duPsMJIARpiBfXkrxX82KkExoyncc Vg5A/lU4bP297YNHyRDGayBzCZWUWEgW0h7e X-Google-Smtp-Source: ABdhPJyF9xqqtBgifFmR4o/grCZWNyYdXm61k2zjRnfrc2/7R0wFfNn+7fgfGQEmnAvUKjUJvX/JWw== X-Received: by 2002:ac8:47c8:: with SMTP id d8mr2898602qtr.32.1597179133025; Tue, 11 Aug 2020 13:52:13 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id q7sm16296qkf.35.2020.08.11.13.52.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:52:12 -0700 (PDT) Date: Tue, 11 Aug 2020 16:52:11 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 13/14] commit-graph: rename 'split_commit_graph_opts' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the subsequent commit, additional options will be added to the commit-graph API which have nothing to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Suggsted-by: Derrick Stolee Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 20 ++++++++++---------- commit-graph.c | 40 ++++++++++++++++++++-------------------- commit-graph.h | 6 +++--- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ba5584463f..38f5f57d15 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,7 +119,7 @@ static int graph_verify(int argc, const char **argv) } extern int read_replace_refs; -static struct split_commit_graph_opts split_opts; +static struct commit_graph_opts write_opts; static int write_option_parse_split(const struct option *opt, const char *arg, int unset) @@ -187,24 +187,24 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + OPT_CALLBACK_F(0, "split", &write_opts.flags, NULL, N_("allow writing an incremental commit-graph file"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, write_option_parse_split), - OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + OPT_INTEGER(0, "max-commits", &write_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), - OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + OPT_INTEGER(0, "size-multiple", &write_opts.size_multiple, N_("maximum ratio between two levels of a split commit-graph")), - OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), OPT_END(), }; opts.progress = isatty(2); opts.enable_changed_paths = -1; - split_opts.size_multiple = 2; - split_opts.max_commits = 0; - split_opts.expire_time = 0; + write_opts.size_multiple = 2; + write_opts.max_commits = 0; + write_opts.expire_time = 0; trace2_cmd_mode("write"); @@ -232,7 +232,7 @@ static int graph_write(int argc, const char **argv) odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(odb, flags, &split_opts)) + if (write_commit_graph_reachable(odb, flags, &write_opts)) return 1; return 0; } @@ -261,7 +261,7 @@ static int graph_write(int argc, const char **argv) opts.stdin_packs ? &pack_indexes : NULL, opts.stdin_commits ? &commits : NULL, flags, - &split_opts)) + &write_opts)) result = 1; cleanup: diff --git a/commit-graph.c b/commit-graph.c index ea0583298c..6886f319a5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1001,7 +1001,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; - const struct split_commit_graph_opts *split_opts; + const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1342,8 +1342,8 @@ static void close_reachable(struct write_commit_graph_context *ctx) { int i; struct commit *commit; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -1543,7 +1543,7 @@ static int add_ref_to_set(const char *refname, int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct oidset commits = OIDSET_INIT; struct refs_cb_data data; @@ -1560,7 +1560,7 @@ int write_commit_graph_reachable(struct object_directory *odb, stop_progress(&data.progress); result = write_commit_graph(odb, NULL, &commits, - flags, split_opts); + flags, opts); oidset_clear(&commits); return result; @@ -1675,8 +1675,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) static void copy_oids_to_commits(struct write_commit_graph_context *ctx) { uint32_t i; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; ctx->num_extra_edges = 0; if (ctx->report_progress) @@ -1962,13 +1962,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) int max_commits = 0; int size_mult = 2; - if (ctx->split_opts) { - max_commits = ctx->split_opts->max_commits; + if (ctx->opts) { + max_commits = ctx->opts->max_commits; - if (ctx->split_opts->size_multiple) - size_mult = ctx->split_opts->size_multiple; + if (ctx->opts->size_multiple) + size_mult = ctx->opts->size_multiple; - flags = ctx->split_opts->flags; + flags = ctx->opts->flags; } g = ctx->r->objects->commit_graph; @@ -2146,8 +2146,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) size_t dirnamelen; timestamp_t expire_time = time(NULL); - if (ctx->split_opts && ctx->split_opts->expire_time) - expire_time = ctx->split_opts->expire_time; + if (ctx->opts && ctx->opts->expire_time) + expire_time = ctx->opts->expire_time; if (!ctx->split) { char *chain_file_name = get_chain_filename(ctx->odb); unlink(chain_file_name); @@ -2198,7 +2198,7 @@ int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -2215,7 +2215,7 @@ int write_commit_graph(struct object_directory *odb, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->split_opts = split_opts; + ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", @@ -2263,15 +2263,15 @@ int write_commit_graph(struct object_directory *odb, } } - if (ctx->split_opts) - replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx->opts) + replace = ctx->opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; } ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) - ctx->oids.alloc = split_opts->max_commits; + if (ctx->split && opts && ctx->oids.alloc > opts->max_commits) + ctx->oids.alloc = opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->odb); diff --git a/commit-graph.h b/commit-graph.h index ddbca1b59d..af08c4505d 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -109,7 +109,7 @@ enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_REPLACE = 2 }; -struct split_commit_graph_opts { +struct commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; @@ -124,12 +124,12 @@ struct split_commit_graph_opts { */ int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); #define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) From patchwork Tue Aug 11 20:52:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11709665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44FCF109B for ; Tue, 11 Aug 2020 20:52:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28D8A20781 for ; Tue, 11 Aug 2020 20:52:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="vcCJlIr+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726529AbgHKUwV (ORCPT ); Tue, 11 Aug 2020 16:52:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725987AbgHKUwU (ORCPT ); Tue, 11 Aug 2020 16:52:20 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA4EAC06174A for ; Tue, 11 Aug 2020 13:52:20 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id y11so47536qvl.4 for ; Tue, 11 Aug 2020 13:52:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+cZdRC8E7ZSQYMzb3tw3vk7oWazgBS1DHO6N5L/S2yQ=; b=vcCJlIr+hZZqE0PYuUbVua8L7SF6eSeHq4XrKLPg6mY0QZYpWv1E/3kiwcZBUQf6w+ whusKhoN4x72Xxb5SSp/eoHTDVCp8FWeGhv7XhCvilNwCyL1Kh5DQIHOrWegC4RntGSG osyNHBUln3kyB5gOsq8D6blLov/LSnBYKz7qp9/QP9uJMZQHyVLQdmxbJXRbYfazk4IH nRmusMtqCtfpGlZ43WRoHe6GvxAPxh9FTj6ZtneEwMdC5WSTNSp3aXBYDU211GESNqkY gHuqEcYM0/CxeP1C4tBBry09d/Y3nbsjU3zW2ciJiV1wAJoOwYRrNZ27QEIpnt4jrCXl jqIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+cZdRC8E7ZSQYMzb3tw3vk7oWazgBS1DHO6N5L/S2yQ=; b=eRtZtvi+Zm3TDkoFO0NYxWiUGIx0MAXRVqkP8dt+UI2MAaSRl0b5wfFH9krDliKtCD QLtnclbSm0ze7r8R6iXHTv9vytSkA4ATxOqtP9dCfg4gRAE6c+v2tbquQDXLxoSF1+f+ ulBSr95yoBcUFb4ae3WZy1F86h32UqbPXhGxwK6AiQ6m7FVtkCuJwwQoh8eMkWliVstg S+MsrCZMxaDrzHUDEFpgIAAKLzJ/3p5GiA9NkVfAtMnKwgmA6PbpK9pqv4nykSMZtwL+ 0pdo/gkBpSjNizG0w0GOCOQ0ZfhA4LN1YYyzZFu7uQ4Y3mIa5H12YnwuuvJnYDH0iJDH /TCw== X-Gm-Message-State: AOAM530vOb9y/cRZ4LCNWF+poKYtYh/nGVsMT4nArAVJEuIQt5zglKmv 7QR/7FZwBxQn7gehwglfh3LeJ6I5k/xTjWKz X-Google-Smtp-Source: ABdhPJyJyIM42vogVH+WYiFuzfeKkwCvVB/KWXxxv01cfekkC3u/t4/Iv0wu+547TDPIGevz5n72gg== X-Received: by 2002:a05:6214:10e8:: with SMTP id q8mr3191446qvt.59.1597179136557; Tue, 11 Aug 2020 13:52:16 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a92f:57be:59a6:7cb2]) by smtp.gmail.com with ESMTPSA id l10sm13766048qtl.72.2020.08.11.13.52.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 13:52:15 -0700 (PDT) Date: Tue, 11 Aug 2020 16:52:14 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com, gitster@pobox.com Subject: [PATCH v3 14/14] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: <09f6871f66bff838c067a3e0d23cd4622171f3bd.1597178915.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag and configuration variable to fill in the 'max_new_filters' variable introduced by the previous patch. The command-line option '--max-new-filters' takes precedence over 'commitGraph.maxNewFilters', which is the default value. '--no-max-new-filters' can also be provided, which sets the value back to '-1', indicating that an unlimited number of new Bloom filters may be generated. (OPT_INTEGER only allows setting the '--no-' variant back to '0', hence a custom callback was used instead). Signed-off-by: Taylor Blau Signed-off-by: Derrick Stolee --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 4 +++ bloom.c | 15 +++++++++++ builtin/commit-graph.c | 39 +++++++++++++++++++++++++--- commit-graph.c | 27 ++++++++++++++++--- commit-graph.h | 1 + t/t4216-log-bloom.sh | 19 ++++++++++++++ 7 files changed, 102 insertions(+), 7 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..9c887d5d79 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,10 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Overrides the `commitGraph.maxNewFilters` configuration. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index ed54e96e57..8d07209c6b 100644 --- a/bloom.c +++ b/bloom.c @@ -51,6 +51,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, else start_index = 0; + if ((start_index == end_index) && + (g->bloom_large && !bitmap_get(g->bloom_large, lex_pos))) { + /* + * If the filter is zero-length, either (1) the filter has no + * changes, (2) the filter has too many changes, or (3) it + * wasn't computed (eg., due to '--max-new-filters'). + * + * If either (1) or (2) is the case, the 'large' bit will be set + * for this Bloom filter. If it is unset, then it wasn't + * computed. In that case, return nothing, since we don't have + * that filter in the graph. + */ + return 0; + } + filter->len = end_index - start_index; filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 38f5f57d15..3500a6e1f1 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index 6886f319a5..4aae5471e3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -954,7 +954,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit } static int get_bloom_filter_large_in_graph(struct commit_graph *g, - const struct commit *c) + const struct commit *c, + uint32_t max_changed_paths) { uint32_t graph_pos = commit_graph_position(c); if (graph_pos == COMMIT_NOT_FROM_GRAPH) @@ -965,6 +966,17 @@ static int get_bloom_filter_large_in_graph(struct commit_graph *g, if (!(g && g->bloom_large)) return 0; + if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) { + /* + * Force all commits which are subject to a different + * 'max_changed_paths' limit to be recomputed from scratch. + * + * Note that this could likely be improved, but is ignored since + * all real-world graphs set the maximum number of changed paths + * at 512. + */ + return 0; + } return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base); } @@ -1470,6 +1482,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; int *sorted_commits; + int max_new_filters; init_bloom_filters(); ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); @@ -1486,10 +1499,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, &ctx->commits); + max_new_filters = ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, + c, + ctx->bloom_settings->max_changed_paths)) { bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_known_large++; } else { @@ -1497,7 +1515,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed) { @@ -1507,7 +1525,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_found_large++; } } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + if (filter) + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index af08c4505d..75ef83708b 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -114,6 +114,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 6859d85369..3aab8ffbe3 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -286,4 +286,23 @@ test_expect_success 'Bloom generation does not recompute too-large filters' ' ) ' +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 2 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 2 0 1 + ) +' + test_done