From patchwork Wed Aug 5 17:01:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701955 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B6D814DD for ; Wed, 5 Aug 2020 17:03:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F2F5622D2B for ; Wed, 5 Aug 2020 17:03:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="I7XTi0Hn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728384AbgHERCz (ORCPT ); Wed, 5 Aug 2020 13:02:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728412AbgHERBo (ORCPT ); Wed, 5 Aug 2020 13:01:44 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75622C06174A for ; Wed, 5 Aug 2020 10:01:40 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id o22so33943522qtt.13 for ; Wed, 05 Aug 2020 10:01:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1BiigwTLfuOeWHvfonIVOOTNqNm25Nmg66yT7L4YJ+U=; b=I7XTi0HnnUcsuAaicSZT6dQyoyNj2L3OXn+PjnmIgCAGaAFSvNP6pGK1LpmG1T/ohb vfh+mOcyxIH9MwmFRsn8ZDkRuJagYscvc2SB1lcbb8Y400a9FYPb2haRWWNbl8JvO6w8 J89u0B2ckQ2dqts7iKVVIJDZhoL6Q2U6bDHgIJBT2ZDRJe/+h2QUOnvegcB6omC3j5Gy 6LuPfRF6atIuxTYmIB0ZzU1Q0DVyki5P8u9thNAB37fYiMfbG4ISvsZ6LyxRbIeIRCnH KFUqiCCIcCSyt72FDHrDCooK+dKDYW1iwm4bkcyjyZzdTyB49c+IxOrCaQo9yakppUR0 JQ8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1BiigwTLfuOeWHvfonIVOOTNqNm25Nmg66yT7L4YJ+U=; b=CpvyMwThdHeJ9NS+iiki3WD3kRHHU7OffvJXCJCMK0JlpceFO+9TbPssK2pg3XuUiI ZtAvTLR2tIQoavj3QgF+jXVH8dWcZm/sAcSYbCF5RI2UjBx2xFy9yM/AQubcAenxpnIZ 99EAe0fBUemf57e5da1XR401c6+5x6B2ekWOvWk6cfw9M7OEODBqZ24VxTijzGatEvvU n+Oca1nGBEthy9hfzMH+TUs1PafP5GIIIsyd+hLXQiUJYPzxMV96fWy4nzVRlO2vHkRc uewTM+5zf4GTMM8F31KM9X9m82Ho+IISgAGPMWfWEBBiKbqJStZHRBQtDf6kL79eVUjg 1/vg== X-Gm-Message-State: AOAM530haYEP2IMmzwf0CzeeOLRRELwM+vPMz/oniIsrEuluP062yfBw Fa39l6QmP/ngv2sR4CFTMr92nduDWFF+fg== X-Google-Smtp-Source: ABdhPJzl0D33467qJQd+N2VlQO3ZfQXPUBkMVhfzgDSBq/+PuiqZsIuLL5ujpg9g+LR2aHlvWNzhjg== X-Received: by 2002:ac8:35f9:: with SMTP id l54mr4496932qtb.25.1596646899094; Wed, 05 Aug 2020 10:01:39 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id t5sm2076562qkh.46.2020.08.05.10.01.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:01:38 -0700 (PDT) Date: Wed, 5 Aug 2020 13:01:37 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 01/14] commit-graph: introduce 'get_bloom_filter_settings()' Message-ID: <001f3385ffd96f24a640ff6670166e67784f1f50.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings *' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter *'. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau --- blame.c | 6 ++++-- bloom.c | 6 +++--- commit-graph.c | 11 +++++++++++ commit-graph.h | 2 ++ revision.c | 5 +---- t/t4216-log-bloom.sh | 9 ++++++--- t/t5324-split-commit-graph.sh | 13 +++++++++++++ 7 files changed, 40 insertions(+), 12 deletions(-) diff --git a/blame.c b/blame.c index 82fa16d658..3e5f8787bc 100644 --- a/blame.c +++ b/blame.c @@ -2891,16 +2891,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb, const char *path) { struct blame_bloom_data *bd; + struct bloom_filter_settings *bs; if (!sb->repo->objects->commit_graph) return; - if (!sb->repo->objects->commit_graph->bloom_filter_settings) + bs = get_bloom_filter_settings(sb->repo); + if (!bs) return; bd = xmalloc(sizeof(struct blame_bloom_data)); - bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings; + bd->settings = bs; bd->alloc = 4; bd->nr = 0; diff --git a/bloom.c b/bloom.c index 1a573226e7..cd9380ac62 100644 --- a/bloom.c +++ b/bloom.c @@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, while (graph_pos < g->num_commits_in_base) g = g->base_graph; - /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ + /* The commit graph commit 'c' lives in doesn't carry Bloom filters. */ if (!g->chunk_bloom_indexes) return 0; @@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - r->objects->commit_graph->chunk_bloom_indexes) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) + return filter; } if (filter->data) diff --git a/commit-graph.c b/commit-graph.c index e51c91dd5b..d4b06811be 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -660,6 +660,17 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) +{ + struct commit_graph *g = r->objects->commit_graph; + while (g) { + if (g->bloom_filter_settings) + return g->bloom_filter_settings; + g = g->base_graph; + } + return NULL; +} + static void close_commit_graph_one(struct commit_graph *g) { if (!g) diff --git a/commit-graph.h b/commit-graph.h index 09a97030dc..0677dd1031 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); */ int generation_numbers_enabled(struct repository *r); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), diff --git a/revision.c b/revision.c index 6de29cdf7a..c45ed1076e 100644 --- a/revision.c +++ b/revision.c @@ -681,10 +681,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) repo_parse_commit(revs->repo, revs->commits->item); - if (!revs->repo->objects->commit_graph) - return; - - revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings; + revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo); if (!revs->bloom_filter_settings) return; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c21cc160f3..c9f9bdf1ba 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -60,7 +60,7 @@ setup () { test_bloom_filters_used () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\"" + bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\"" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom && @@ -134,8 +134,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' -test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' ' - test_bloom_filters_not_used "-- A/B" +test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' ' + # Ensure that the number of empty filters is equal to the number of + # filters in the latest graph layer to prove that they are loaded (and + # ignored). + test_bloom_filters_used "-- A/B" 3 ' test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 9b850ea907..5bdfd53ef9 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -425,4 +425,17 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success '--split=replace with partial Bloom data' ' + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/3 && + git rev-list -1 HEAD~2 >a && + git rev-list -1 HEAD~1 >b && + git commit-graph write --split=no-merge --stdin-commits --changed-paths graph-files && + test_line_count = 1 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Wed Aug 5 17:02:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701957 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BAD314DD for ; Wed, 5 Aug 2020 17:03:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 240DD22D00 for ; Wed, 5 Aug 2020 17:03:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="gVh8f808" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728323AbgHERDd (ORCPT ); Wed, 5 Aug 2020 13:03:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728426AbgHERCe (ORCPT ); Wed, 5 Aug 2020 13:02:34 -0400 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 325F1C061756 for ; Wed, 5 Aug 2020 10:02:23 -0700 (PDT) Received: by mail-qv1-xf41.google.com with SMTP id dd12so16037149qvb.0 for ; Wed, 05 Aug 2020 10:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LMuh9652CGyiO0bmz3tWXIVImi3zNm3ghnf+7Y714js=; b=gVh8f808ZrRDafbEk9g5xKDZJtAi2/mpJtqbDsxNXIrBriGt97LW3OVwIsJDS8juLs kPHFv52B9v0daSNPqD4WL+xX2joDVH+s4baVHYQVOEqL32JTPt5zumLiTrHGeD3pQ1oo 3o6bmlTIjK8eKUIFtOby26TZB2h62ICVIvrNhs6tK4tEgn7L1kqudd5m9vmZbxtWItTn Y9mb019aVlVAeMOmJMAXXPb4DB54SpQUxGgyJAb8JnthrYJmgAVuw7a3pds8XbQnlAPi CQfS2NQs5sTOvHXzvKm39dtcamEmFANSwq9XipNRdqcghVJ4Hl0HheM6oVtRYQ65kPdC GaFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LMuh9652CGyiO0bmz3tWXIVImi3zNm3ghnf+7Y714js=; b=MA9TNZ6Flh+d9hsrT9TdvLWRe5MVqwe9/y85hZHRrEnxfdPjasshZgr/n1y2a0hCq5 KfcpM66Ooxcr0F9Sdy/fH8ySrmxPRDXusc/Sn+Bas8O/nhzHeE6GgqfulNApGsyCo2vD Cx/h4X0xiGR7L+JxsWsUEjqsAyM6SIQoTTYp+zU1yZKQVv2i61V+w4wWJ7V7IPd3aHpD wgahZPMOKRSSMAy2NdgI6lyXDJf7G4ostu3wsA7VxI+Y4bvwG8JOt7y90mDqZeVZZD0j ybEVz+i98ikhjMrMr8ufoguOtQYnHI7WQQWwLF4NoykXhyoa83xNaSVaZ/yHpy8AxtgV LxDg== X-Gm-Message-State: AOAM531aRupAYZSzRsaacnMo4B4Xlqg7hVzs6fPdFV0rZCu16DXyWpSK xNvysim8zP8J7WlEZwJL5VD/TFzJHvvXAQ== X-Google-Smtp-Source: ABdhPJx1Ed2atz6oUChitVSNE0d8ZSjPSNw7wnIAT0OPk+REZv1o1veFh8wUU9H9oEXNjsarnu6DZw== X-Received: by 2002:a0c:f14d:: with SMTP id y13mr4370811qvl.136.1596646941979; Wed, 05 Aug 2020 10:02:21 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id r6sm2379403qtu.93.2020.08.05.10.02.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:21 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:20 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 02/14] t4216: use an '&&'-chain Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine Helped-by: Jeff King Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c9f9bdf1ba..fe19f6a60c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } From patchwork Wed Aug 5 17:02:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB70F1392 for ; Wed, 5 Aug 2020 17:06:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8CE0222D3E for ; Wed, 5 Aug 2020 17:06:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Co2v1TEr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728465AbgHERGx (ORCPT ); Wed, 5 Aug 2020 13:06:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728428AbgHERCe (ORCPT ); Wed, 5 Aug 2020 13:02:34 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30CB1C061757 for ; Wed, 5 Aug 2020 10:02:27 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id b14so40619219qkn.4 for ; Wed, 05 Aug 2020 10:02:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=V9PKuOid9ZDCDhUVkEnzx8062h6610ZwJ92cJXhzY/I=; b=Co2v1TEr9/pqXSHysB0JdL9BSJoqGYx6Bya//wt04JYzjnb64GfcQsCtnAoz5aeS1o Vdz/CcDRZZ4lVU9/n2SBTj/dr61EwTidbnBspqYeiFI2M/Y2Zz7zKQWSaxUERPq4lwAT VBvbKJULIKrl0zHkU8i8kwtQmX6UCCdWMFdTaT46tX15tnFr48qAX0d3UrA/m1RW+E36 B68mOsw4fe2rYHBsQCmLgyMh/wi3gKcTx4wFwRxP7qW+0OjebghBQZ3uNi93DT6x5kYx r0HXaoF8eBAgw9hlOVQv9Baxqe0E1g/Oz3Wvlbkj8jiMiBlCuWNcCy/zgsb71QLPoKEa P39g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=V9PKuOid9ZDCDhUVkEnzx8062h6610ZwJ92cJXhzY/I=; b=rRSvcK7Mu6M08MZIjit4G5z/0sgQFH9y8C5LElQmatLrNcWwF+FhB1Dx4QaW7nW7Om wDBflZfrviMCbePOv4qlb3yHsY6DuIgEdwvKxtY/WPjSfgdBUFO7uvCSSEI+V/C85xhm QjLsbwBPNIFBDSf2OI4jwyS4i19Mc0ZjKeYoDbz4HUlPB8DDhPxgTk/6FurxpwGbhzrX QDtTVctlmQFq9aIGUwci+ETyYAevGD1qV0aVcRtB8rxy73IINQyQ+QGHwiPJDV8Dw4tB AxNmc6cqlztLSKWywHILxgE7XU85GmtlYtvmMN49YIRtCP+nvehQrAdrrj20RwSQP3R0 Kl5A== X-Gm-Message-State: AOAM532E9fTSu1m6HN1HA/7IFWoaffzyzWCtO5MjQWr7oXK8p/ywRBGI R6ZcF9Di2bd6LX+pj710bahtiVje3/+opw== X-Google-Smtp-Source: ABdhPJzDRU7WXLPhli6dagIvtv8FzjtJye2ktoBOKSXVCEgCKZQi7FGhKYq6ueIJOjRsD/wpoLCV3w== X-Received: by 2002:a37:9945:: with SMTP id b66mr4388678qke.51.1596646945953; Wed, 05 Aug 2020 10:02:25 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id c42sm2907358qte.5.2020.08.05.10.02.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:25 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:24 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 03/14] commit-graph: pass a 'struct repository *' in more places Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 2 +- commit-graph.c | 17 ++++++++++------- commit-graph.h | 6 ++++-- fuzz-commit-graph.c | 5 +++-- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 523501f217..ba5584463f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index d4b06811be..0c1030641c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -224,7 +224,8 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) return 1; } -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb) { void *graph_map; @@ -240,7 +241,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -280,7 +281,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -445,7 +447,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) return NULL; } -static struct commit_graph *load_commit_graph_one(const char *graph_file, +static struct commit_graph *load_commit_graph_one(struct repository *r, + const char *graph_file, struct object_directory *odb) { @@ -457,7 +460,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(r, fd, &st, odb); if (g) g->filename = xstrdup(graph_file); @@ -469,7 +472,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); return g; @@ -550,7 +553,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 0677dd1031..d9acb22bac 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,11 +75,13 @@ struct commit_graph { struct bloom_filter_settings *bloom_filter_settings; }; -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); From patchwork Wed Aug 5 17:02:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BC8814DD for ; Wed, 5 Aug 2020 17:07:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56D4E22D3E for ; Wed, 5 Aug 2020 17:07:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="mu0anZHr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728477AbgHERHJ (ORCPT ); Wed, 5 Aug 2020 13:07:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728430AbgHERCf (ORCPT ); Wed, 5 Aug 2020 13:02:35 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD010C06179E for ; Wed, 5 Aug 2020 10:02:30 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id e5so20167109qth.5 for ; Wed, 05 Aug 2020 10:02:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=AtctLu256av06ik5IHDWiQKbEbZ8S9fTFgXOJvN+Mvg=; b=mu0anZHrAUBYnXrweJLqPjOiAxQc02RceY2O+269/Cgpl5P4CtCYThG/Vfw5fkhGpl OjpyE5jqxQkTrXpnG7QxncFN8yeGVsPGP3y5r3KE/tLhK1Ucy7RnlUNGNr9aAkkOpeOQ Op0nPK8ySsrI7fF/R4BN5/79fzqpgMT3fMwPs8T70TPL+j7KXuMW9/zPoeq4fM9ONCTj gWkBxSFHV2cmkCvkrafEy/RuTMx9nNDq/RCkis+4dY+sBnrUiJEHw05uvVV1Jd83UVut geWHPLQsoy1d1Bgu2jN8qz41iFmRFic+TbactpwzkwYDUFF2OlrdqchIdqPhowQd2DCd tPJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=AtctLu256av06ik5IHDWiQKbEbZ8S9fTFgXOJvN+Mvg=; b=L0ROvhzIwEfsYNhcpJVZ5HpdyYp6tRlCgz/j+ttPCTSaBHs6buxkSz6Ng4bHQMBmSR VHzkpp4O2RbYYT5zHHiXFWt+d/R75d5fiT3M3Esr/7sRWL958qwhxeJMQjOzuJomOA6J g69GtnpR3HxEFazW6Oj3cz+VnfQyr77GPYF2Ld8cGUhaxzWkbQUwxYmL9co8kBGISJwn HC3K/Ewox75EjfwNY3T0+JFFNtcr8B7xMzMIgvkOScYpGXnsY7C9cF4nonjzp2qU/WGp PMth8IjhCttzRxL1rt0mdgXPUvBCJN83nR879vUhnG4I7iHPr/JYYsxDR5IgdrXxMhlD CzvQ== X-Gm-Message-State: AOAM533jEX275BAvRb0Zu/1hmNWlH3OldHK43MhxpUpH8wXGpCzCqhLo wqVz7wdX90JN5q9w5WJKPs5jRzqWPMJvhw== X-Google-Smtp-Source: ABdhPJzXTfocLWlEWnlDDFqRYsrmZdLbyeSWk/S6WImBC60CYUYs7kjVzCNaukLPfiPoyu6v8mHblA== X-Received: by 2002:ac8:74c7:: with SMTP id j7mr4385355qtr.254.1596646949619; Wed, 05 Aug 2020 10:02:29 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id b131sm2094753qkc.121.2020.08.05.10.02.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:28 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:28 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 04/14] t/helper/test-read-graph.c: prepare repo settings Message-ID: <038e996ced8b7c6418894a791caeb7f80b2456be.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau --- t/helper/test-read-graph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), From patchwork Wed Aug 5 17:02:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 493871392 for ; Wed, 5 Aug 2020 17:07:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 32A8322D2C for ; Wed, 5 Aug 2020 17:07:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="QsJmF3Tq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728471AbgHERG6 (ORCPT ); Wed, 5 Aug 2020 13:06:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728205AbgHERCf (ORCPT ); Wed, 5 Aug 2020 13:02:35 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BE6EC06179F for ; Wed, 5 Aug 2020 10:02:35 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id d14so42306358qke.13 for ; Wed, 05 Aug 2020 10:02:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=OaOZ9gT79tjrUv9cgKcQTqD6M81VIYGGJlB88j6wVR4=; b=QsJmF3TqO3+FdViREG1kQ7zx/n/d9hEhMxCTGYk1kiiZQgs2cBFQW9vTbWKMVqq7Dd DbVi4a7PE7sfuoUcoww9XniwbIH24J8TyHoiv8fDssllyKA8X6jD5NN577UisIkcpIPz ZYqhREYsEyGivFfFYS20ZPsGgzMK91u1Jn/C99M2IMlxMirlXlkT0D4hIxGoAIR2ufbh 4lmD6wrxzK9lOLqdbG4DLfKBM5gPE8Dg3RkHBqbbiBvigsUn39xiUjiFBBVlW6pphIId /JgjjyBjWhF9ORReZEhPf526DPpnWhkxi7OveYg/B+g2KsCEFZTlY9Y8MaZYrBUFLO5U WALw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=OaOZ9gT79tjrUv9cgKcQTqD6M81VIYGGJlB88j6wVR4=; b=CR3LuNVmkKJuODgs2jXcxiCfRhLww1h+3ixWJ4Pxm37hCmyNqQeRUMNjsSxekCooBf hAR7bZuPKnWTGPS6+PCMH468OdCQvwG9B9+Z7FOGiXJpsReFFtKgsK3idmeLgeTCKISY b8ZLWPwwanYGhb5gkM13JuTkQJfHkzi0H1iKc/sMyuiQ6jxMdkRf/V18xMYsUhnGkDOV FCYSb1SgvxTCZI7XRqvez5ufwR1T4DdcaiWWGMCrvZASSllB9aCYTD1KbX2l+JI+/NAS HIgdWeaGTuATeMfqoHaJBOQxOT+m/ZY1zCuN3zkoCbrnkN6CwKEHQ5i8Fty4CQnQsYpb XQDg== X-Gm-Message-State: AOAM532QDBFaUnau7r52k0oSMG13CuDpo++SDMR30ECjin9U8qVhPVwx 1m1lBzMfTI76HcHe3j/lDaDekhUhEShywQ== X-Google-Smtp-Source: ABdhPJw2o+SikMVAdIGV0T6Qtd0dwCuhkovC6XNZ90x7eSqZCWoXmA4zyQTJZw9+rMEWRmQz8PTwvA== X-Received: by 2002:a37:3c1:: with SMTP id 184mr4262189qkd.312.1596646953788; Wed, 05 Aug 2020 10:02:33 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id g4sm1953376qka.25.2020.08.05.10.02.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:33 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:31 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 05/14] commit-graph: respect 'commitGraph.readChangedPaths' Message-ID: <404f10319af210a2edf05bd1097ab5a844cc5c37.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 ++ Documentation/config/commitgraph.txt | 4 ++++ commit-graph.c | 6 ++++-- repo-settings.c | 3 +++ repository.h | 1 + t/t4216-log-bloom.sh | 4 +++- 6 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 Documentation/config/commitgraph.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index ef0768b91a..78883c6e63 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -340,6 +340,8 @@ include::config/column.txt[] include::config/commit.txt[] +include::config/commitgraph.txt[] + include::config/credential.txt[] include::config/completion.txt[] diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt new file mode 100644 index 0000000000..cff0797b54 --- /dev/null +++ b/Documentation/config/commitgraph.txt @@ -0,0 +1,4 @@ +commitGraph.readChangedPaths:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 0c1030641c..a516e93d71 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -320,6 +320,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } + prepare_repo_settings(r); + graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -396,14 +398,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.commit_graph_read_changed_paths) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.commit_graph_read_changed_paths) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index 0918408b34..9e551bc03d 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "commitgraph.readchangedpaths", &value)) + r->settings.commit_graph_read_changed_paths = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.commit_graph_read_changed_paths, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 3c1f7d54bd..81759b7d27 100644 --- a/repository.h +++ b/repository.h @@ -29,6 +29,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fe19f6a60c..b3d1f596f8 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -90,7 +90,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config commitgraph.readChangedPaths false && + test_bloom_filters_not_used "$option -- $path" ' done done From patchwork Wed Aug 5 17:02:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47FC214B7 for ; Wed, 5 Aug 2020 17:07:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2CF5E22D6D for ; Wed, 5 Aug 2020 17:07:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="s+PET7tU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728498AbgHERHM (ORCPT ); Wed, 5 Aug 2020 13:07:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728018AbgHERCj (ORCPT ); Wed, 5 Aug 2020 13:02:39 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDB4DC0617A0 for ; Wed, 5 Aug 2020 10:02:38 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id b14so40619874qkn.4 for ; Wed, 05 Aug 2020 10:02:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=6laf6kKuVSjfW1rY/Bcz72cLLx2hYo5Wjv1pbceIgxc=; b=s+PET7tU0V/gdAaE1BJgYK0l+pqPniAOOfnPeAeMUXT+iLPnPPFrm/mTx2uj1cbXjU HOiPGsZ8lO0IMo0it1+H38I4q3Z9Meu/bxGNM+oKA+6cfrz+g36ZqFQ/I0C9/kaBKgsP 8By91qMHkcQY0wNhtBUmwhv91v/2hqYZyiUzLwA5+7s9kzV59W3PbaU5gkJfDjEe4zCV 6t32vljVZaAWaG//E338E3oMbNqeG37Qg99jJTofB+TbzzlOeJxRztOhN31Al4OmOhbk 0a7lixbc1wZtU5cncG3P7sg52FFt1EBM8ZsiJS4NURRwXS68WtHuAU+OozTDSgWtLGF+ T94w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=6laf6kKuVSjfW1rY/Bcz72cLLx2hYo5Wjv1pbceIgxc=; b=Dwrplm99uFR8IWzhIadsmkAHg6PuMDQpExFEDRluxzvtwmSifrMggQ9YxpuV0UQ+Qk +wIFaUqeTM6dXeHqUyOME1bK7rnS5PN1csl8btrA8/C1xE5xDg7Gdc75KDKwLlWNrrVm fbolf+NDu6g+K7A4IhFybZJzY0uZ4b30i+lRSOW2p3VU372gMpoDRbNqNJS3Mc/cbuyw dNvtOtOAKdvQubB3UE+SDxnkGH6F3C5TxRomOc4jJO0Mxwxd1BKUWCloiQ3okma4hoky +ec/44XeD+fddEKnUt84OPgngrBHSIPbwAUe2RL5xO4zoJs/GVe7ZedUeMkwfife2ybu /s5A== X-Gm-Message-State: AOAM531jX0+4VnF6FAqDAdg7Esa34oY99eRL6rK2guv1J084S9obL6+2 qVCiOTGlNdG9qaMNCH1Oqu1NN4hZ26a14A== X-Google-Smtp-Source: ABdhPJzT5r2qX3W6GC0DgSmU/3yxYgTV3tTlHds8BElLCD70v/NXfHAKuaounjx9DDIg0fTQlD/2xA== X-Received: by 2002:a37:6556:: with SMTP id z83mr4038991qkb.406.1596646957499; Wed, 05 Aug 2020 10:02:37 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id f14sm1046872qkl.52.2020.08.05.10.02.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:36 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:35 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 06/14] commit-graph.c: store maximum changed paths Message-ID: <053991f0482b248c3c5fbaf604edfab796ac2540.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau --- bloom.h | 11 ++++++++++- commit-graph.c | 3 +++ t/t4216-log-bloom.sh | 4 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/bloom.h b/bloom.h index d8fbb0fbf1..0b9b59a6fe 100644 --- a/bloom.h +++ b/bloom.h @@ -28,9 +28,18 @@ struct bloom_filter_settings { * that contain n*b bits. */ uint32_t bits_per_entry; + + /* + * The maximum number of changed paths per commit + * before declaring a Bloom filter to be too-large. + * + * Not written to the commit-graph file. + */ + uint32_t max_changed_paths; }; -#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 } +#define DEFAULT_BLOOM_MAX_CHANGES 512 +#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10, DEFAULT_BLOOM_MAX_CHANGES } #define BITS_PER_WORD 8 #define BLOOMDATA_CHUNK_HEADER_SIZE 3 * sizeof(uint32_t) diff --git a/commit-graph.c b/commit-graph.c index a516e93d71..86dd4b979e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1194,6 +1194,7 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version); jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes); jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry); + jw_object_intmax(&jw, "max_changed_paths", ctx->bloom_settings->max_changed_paths); jw_end(&jw); trace2_data_json("bloom", ctx->r, "settings", &jw); @@ -1662,6 +1663,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; } diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index b3d1f596f8..eb2bcc51f0 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -169,11 +169,11 @@ test_expect_success 'persist filter settings' ' GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \ GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2.txt && GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \ GIT_TRACE2_EVENT_NESTING=5 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt ' test_expect_success 'correctly report changes over limit' ' From patchwork Wed Aug 5 17:02:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2EE0D1510 for ; Wed, 5 Aug 2020 17:06:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B85022D00 for ; Wed, 5 Aug 2020 17:06:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="G7yN9MVf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728488AbgHERFk (ORCPT ); Wed, 5 Aug 2020 13:05:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728103AbgHERCq (ORCPT ); Wed, 5 Aug 2020 13:02:46 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 900C4C0617A1 for ; Wed, 5 Aug 2020 10:02:45 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id p25so1843563qkp.2 for ; Wed, 05 Aug 2020 10:02:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=XxGYE6AhgiSWhnfGqWbnFIHNGIk0jQ+QaGHE5qSqdW0=; b=G7yN9MVfkSc7R4xHI2aa8qC2Lsi0x4j0F/QnacjUk/IBPFyiy/fP2OPPbd3Qt2M/OB VQ6e5F3Q4JxkrFHTOgAg9klwttU9quDPOZOfxup9u+XoXfHcLqyz6AM26hMr8VbmeUzV QsNsAc450/SNyNccc3JZp05hHm/qI01iFUnj7DDuIs+7dKvi1aOso9GsmHJSU5uEI7NC CzbZwVXQNv1/hBq/Kx1nqS/aDqDieKpzsSvDRksmYy3Yr6RVTVEAvcikEyfVv88gdZjb XrNBjR5lzzKboBz73f06/ezWQzK6Pk1xS6kteKqbAva10wgp17OlOjubdquxADQ2x0Fr WOeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=XxGYE6AhgiSWhnfGqWbnFIHNGIk0jQ+QaGHE5qSqdW0=; b=OGcagy7Vtld65PxITOMMP+DAx+FB8gfNKkLvPdM9eqluo1iQnMgMMEnACVT4QEOYGd guTGhTrIwXetE6ncFha3CXUJ+nq/OLyuhw65IDzaWRmgSOD4aUCzCIkqb17gE0nLuBWp E/YGaDA3hj9FwSaqu5MM/F0xSu/PUMonJzhDVa4oaQNy8xvyydhapTTWjBU4B64dWtvb uWiYmFTCajdI0PW+0Eoyd20qmlZX1jreyyK15gPS223wISnqC6qCuZ4HiNRNhYwiKDwJ kCnjrn2f8A8MmQX9GLQbGssvwiHGbJNd6STTPTH8u4LEOA6jELhNNdi4t6uVNdRE60p2 brvg== X-Gm-Message-State: AOAM530AamIJfSLuHrTHUJtpQiUyk5s5viGBE2VNLY4N2tKsjHIa59dd EsJkHpgUWKikky04/8tIK+suDd4yBNKX0g== X-Google-Smtp-Source: ABdhPJzHTuw140ab2fpleGbfGQqzNKr/tRewpnlsmBMsvdeLG7qv7NyZeoGiKqfvPPRUarb9zrUBkA== X-Received: by 2002:a37:a503:: with SMTP id o3mr4214761qke.162.1596646961228; Wed, 05 Aug 2020 10:02:41 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id l45sm1105974qtf.11.2020.08.05.10.02.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:40 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:39 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 07/14] bloom: split 'get_bloom_filter()' in two Message-ID: <23525947c8c3bf9fc5faf59d9a7b6da65cec4e12.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with two counters in the 'write_commit_graph_context' struct which store the number of filters that we computed, and the number of those which were too large to store. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau --- blame.c | 2 +- bloom.c | 13 ++++++++++--- bloom.h | 11 ++++++++--- commit-graph.c | 38 +++++++++++++++++++++++++++++++++++--- line-log.c | 2 +- revision.c | 2 +- t/helper/test-bloom.c | 3 ++- 7 files changed, 58 insertions(+), 13 deletions(-) diff --git a/blame.c b/blame.c index 3e5f8787bc..756285fca7 100644 --- a/blame.c +++ b/blame.c @@ -1275,7 +1275,7 @@ static int maybe_changed_path(struct repository *r, if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; - filter = get_bloom_filter(r, origin->commit, 0); + filter = get_bloom_filter(r, origin->commit); if (!filter) return 1; diff --git a/bloom.c b/bloom.c index cd9380ac62..a8a21762f4 100644 --- a/bloom.c +++ b/bloom.c @@ -177,9 +177,10 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present) +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -187,6 +188,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct diff_options diffopt; int max_changes = 512; + if (computed) + *computed = 0; + if (!bloom_filters.slab_size) return NULL; @@ -273,6 +277,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, filter->len = 0; } + if (computed) + *computed = 1; + free(diff_queued_diff.queue); DIFF_QUEUE_CLEAR(&diff_queued_diff); diff --git a/bloom.h b/bloom.h index 0b9b59a6fe..c4107a6415 100644 --- a/bloom.h +++ b/bloom.h @@ -89,9 +89,14 @@ void add_key_to_filter(const struct bloom_key *key, void init_bloom_filters(void); -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present); +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed); + +#define DEFAULT_BLOOM_MAX_CHANGES 512 +#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ + (r), (c), 0, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index 86dd4b979e..ba2a2cfb22 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -964,6 +964,9 @@ struct write_commit_graph_context { const struct split_commit_graph_opts *split_opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + + int count_bloom_filter_found_large; + int count_bloom_filter_computed; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1175,7 +1178,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1215,7 +1218,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1385,6 +1388,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_found_large", + ctx->count_bloom_filter_found_large); + jw_object_intmax(&jw, "filter_computed", + ctx->count_bloom_filter_computed); + jw_end(&jw); + + trace2_data_json("commit-graph", the_repository, "bloom_statistics", &jw); + + jw_release(&jw); +} + static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; @@ -1407,12 +1426,25 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); for (i = 0; i < ctx->commits.nr; i++) { + int computed = 0; struct commit *c = sorted_commits[i]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) + ctx->count_bloom_filter_found_large++; + } ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } + if (trace2_is_enabled()) + trace2_bloom_filter_write_statistics(ctx); + free(sorted_commits); stop_progress(&progress); } diff --git a/line-log.c b/line-log.c index c53692834d..9e58fd185a 100644 --- a/line-log.c +++ b/line-log.c @@ -1159,7 +1159,7 @@ static int bloom_filter_check(struct rev_info *rev, return 1; if (!rev->bloom_filter_settings || - !(filter = get_bloom_filter(rev->repo, commit, 0))) + !(filter = get_bloom_filter(rev->repo, commit))) return 1; if (!range) diff --git a/revision.c b/revision.c index c45ed1076e..b7ec712755 100644 --- a/revision.c +++ b/revision.c @@ -753,7 +753,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; - filter = get_bloom_filter(revs->repo, commit, 0); + filter = get_bloom_filter(revs->repo, commit); if (!filter) { count_bloom_filter_not_present++; diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index f0aa80b98e..531af439c2 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -39,7 +39,8 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) struct bloom_filter *filter; setup_git_directory(); c = lookup_commit(the_repository, commit_oid); - filter = get_bloom_filter(the_repository, c, 1); + filter = get_or_compute_bloom_filter(the_repository, c, 1, + NULL); print_bloom_filter(filter); } From patchwork Wed Aug 5 17:02:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701963 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BF5E1510 for ; Wed, 5 Aug 2020 17:05:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B1A822D3E for ; Wed, 5 Aug 2020 17:05:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="DHp+dJyQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728481AbgHERFd (ORCPT ); Wed, 5 Aug 2020 13:05:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728268AbgHERCq (ORCPT ); Wed, 5 Aug 2020 13:02:46 -0400 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86527C0617A2 for ; Wed, 5 Aug 2020 10:02:46 -0700 (PDT) Received: by mail-qv1-xf42.google.com with SMTP id b2so10482335qvp.9 for ; Wed, 05 Aug 2020 10:02:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1eU3I9uZzpxdnR2nm0s+2T/AjhX9gGIMEwKMBanKHvY=; b=DHp+dJyQqSHAhGvB0MZ8lMEhw4kyuup7hbh1TcAaMElIjeNDeP779CMAoXdEKXicHy yXEvrbgNUuR/D4OBSD5EpszmwT6NBrSGMpHUdg76mdKYe2VeRSV1THMxj3E/J24cRSHu u2xdFCVSeGsRbYodnjI07aINk62lAsUxY1WY6eBgZ40W3p+x/tnl4RGyxUMF77ZNxMZY s/OljxENOtz+HhH894OKToBbKQl0NRQeBvLs7uUvIpEkCX2W5sm4KViXgM8R8p2rcO/i okghpUaWf8x07/Ut5OdU3jgK2iv473tut2wGblX7fYc3Ga9ifXoy9oFXAQk1fxtXIUZi RUgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1eU3I9uZzpxdnR2nm0s+2T/AjhX9gGIMEwKMBanKHvY=; b=aJ6ZCovl/jtXfojq+VnNKJWRltCzZBhxIm2lVr0MLs4Yw4s/6pJjm7Qs/w8yZmPDe9 3FhiPhYS6ysRsUSg2nHcEBwZ1x1NKuzPybYOCfuMD7sZZ3J7MYrr/vstyJWq+yiB5+QI vfX6tiXMS6VaqJVYg7/QIIXtIDkmp2Uim++HZqunZKpZkvV58AdfPPtIzSwYqZqk6awB GaYlRzDRLz3AdFjyeTR9KixcGkrG+HANhnbiiVM8mZMpi44re+Ns3eHb0sda9P3PgP6D U6KxSOXrOjq8pC3HaY2+ZV9SPDK7yFHPBPHmQ8tmqRdvhTzCZa+lOFnmB4JqMQX1rkRv FZvw== X-Gm-Message-State: AOAM530DYgvMjy9M0UL7EEgsfjKHYcaeYvKUV5lwxz9WgUmqoduCoHy+ jxNQj3TT9DNh0+0a6OZe3FvF9uWx7AvchA== X-Google-Smtp-Source: ABdhPJwiNh/al1VcZYRQbPAHAEceeGwfo/zzDgNhFpXakykjk1GadZgf4Ony8BJajwOj89fPLJipmw== X-Received: by 2002:ad4:5748:: with SMTP id q8mr4776114qvx.217.1596646965127; Wed, 05 Aug 2020 10:02:45 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id e129sm1983527qkf.132.2020.08.05.10.02.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:44 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:42 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 08/14] bloom: use provided 'struct bloom_filter_settings' Message-ID: <4deb724fc13e580430628f0ac408685394c8689f.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau --- bloom.c | 13 ++++++------- bloom.h | 3 ++- commit-graph.c | 21 ++++++++++----------- t/helper/test-bloom.c | 1 + 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/bloom.c b/bloom.c index a8a21762f4..0cf1962dc5 100644 --- a/bloom.c +++ b/bloom.c @@ -180,13 +180,12 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, int *computed) { struct bloom_filter *filter; - struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; int i; struct diff_options diffopt; - int max_changes = 512; if (computed) *computed = 0; @@ -211,7 +210,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; diffopt.detect_rename = 0; - diffopt.max_changes = max_changes; + diffopt.max_changes = settings->max_changed_paths; diff_setup_done(&diffopt); /* ensure commit is parsed so we have parent information */ @@ -223,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= max_changes) { + if (diffopt.num_changes <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -260,13 +259,13 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } - filter->len = (hashmap_get_size(&pathmap) * settings.bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; - fill_bloom_key(e->path, strlen(e->path), &key, &settings); - add_key_to_filter(&key, filter, &settings); + fill_bloom_key(e->path, strlen(e->path), &key, settings); + add_key_to_filter(&key, filter, settings); } hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); diff --git a/bloom.h b/bloom.h index c4107a6415..2353c74a24 100644 --- a/bloom.h +++ b/bloom.h @@ -92,11 +92,12 @@ void init_bloom_filters(void); struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, int *computed); #define DEFAULT_BLOOM_MAX_CHANGES 512 #define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL) + (r), (c), 0, NULL, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index ba2a2cfb22..48d4697f54 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1432,6 +1432,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->r, c, 1, + ctx->bloom_settings, &computed); if (computed) { ctx->count_bloom_filter_computed++; @@ -1688,17 +1689,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int num_chunks = 3; uint64_t chunk_offset; struct object_id file_hash; - struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; - - if (!ctx->bloom_settings) { - bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", - bloom_settings.bits_per_entry); - bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", - bloom_settings.num_hashes); - bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", - bloom_settings.max_changed_paths); - ctx->bloom_settings = &bloom_settings; - } if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -2144,6 +2134,7 @@ int write_commit_graph(struct object_directory *odb, uint32_t i, count_distinct = 0; int res = 0; int replace = 0; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; if (!commit_graph_compatible(the_repository)) return 0; @@ -2157,6 +2148,14 @@ int write_commit_graph(struct object_directory *odb, ctx->split_opts = split_opts; ctx->total_bloom_filter_data_size = 0; + bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", + bloom_settings.bits_per_entry); + bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", + bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); + ctx->bloom_settings = &bloom_settings; + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 531af439c2..4af949164c 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -40,6 +40,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) setup_git_directory(); c = lookup_commit(the_repository, commit_oid); filter = get_or_compute_bloom_filter(the_repository, c, 1, + &settings, NULL); print_bloom_filter(filter); } From patchwork Wed Aug 5 17:02:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701959 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE0DC1575 for ; Wed, 5 Aug 2020 17:05:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9EFB022D2C for ; Wed, 5 Aug 2020 17:05:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="dsrKHeRk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728371AbgHERFQ (ORCPT ); Wed, 5 Aug 2020 13:05:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728295AbgHERCu (ORCPT ); Wed, 5 Aug 2020 13:02:50 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 306A5C06174A for ; Wed, 5 Aug 2020 10:02:50 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id s15so16636001qvv.7 for ; Wed, 05 Aug 2020 10:02:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=UkG0VbkkbfTo4r4sgGsVzU492HVkMStyXADq+pBNjSA=; b=dsrKHeRkGZ/QNV2n8Ty9sR8PykZVu4RmCDk9pn+67CvxTgGpQ7vukS2LRI5UuauTG7 R891fjQxjyd/caQfu1gIjuVn0J6E23YaqzI2XRMTdcXe5r17M6mJe+hx+N6MuclcaWGB 6zpAehs8HYQNTg4vd4u6BhRO1YbpcZkbdzoXwuBQEpC1ROMOearB3XhpnKRUwMABrGZJ I3jYGAVc2dXtodGBBnYoh2pIGAbD82cYlYHJNP0/jf0BHWCd3B/rlSCwaXX4mnJHQ8hq u9UF4Q8XjXvDKp63jpyC9/15H8YNW1N213LDbUTGjWaXjX3Q6zM9xZ9lx4U/qBaDH4f4 MjOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=UkG0VbkkbfTo4r4sgGsVzU492HVkMStyXADq+pBNjSA=; b=iyAqc2GK6MzmstIKSG9M9twtk6bPXhLZ3n+Y2B8nvCFwBpMvBVr7HuqcVIiS24f0pG hlR5Zq7Ooudg0PgyrhjyZkCk1jGw2WZkcGeLzTSzTixlzSObFl8TyJUD+8iruwiXJ+bG WDuLBen7s1Am72/9kXM4AtRkiLHsndv9QlfqOUpI47x+fczV17xpxBl331vKKmUaactw aHdfKsJdukH3RrfM2ZkSoaT2uZt6Jir0E8olq8EFU5AEMqiIFkUdzHF0XJrWIMPFFPc+ U0CW1b4gIrka4I2pzTuBp45Cvt/chDxlz53RIXDLbyvXBkho7IXX4Zb5M8zf1TCzTDVC knug== X-Gm-Message-State: AOAM531g0d2h520CNSVQaPDOXN55kNA6uW/HYCJOS7hcF40yDaOzPFn3 P73aPpLI8AiAjH6wjteTquv0M6OEcL6vMQ== X-Google-Smtp-Source: ABdhPJwU+tRnqHfbY1UdHaWepOuKgg2OhO9BBqpRhQnAYjOaDaqtB55A34Zv0TUooYAMdbF7er1TAQ== X-Received: by 2002:ad4:5388:: with SMTP id i8mr4627901qvv.202.1596646968683; Wed, 05 Aug 2020 10:02:48 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id q29sm2636272qtc.10.2020.08.05.10.02.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:47 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:47 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 09/14] bloom/diff: properly short-circuit on max_changes Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee Commit e3696980 (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are | | N=1 | N=10 | N=512 | |--------|----------------|----------------|----------------| | HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB | | HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB | For Linux, the results are | | N=1 | N=20 | N=512 | |--------|----------------|---------------|---------------| | HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB | | HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB | Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- bloom.c | 6 +++- diff.h | 2 -- t/t4216-log-bloom.sh | 85 +++++++++++++++++++++++++++++++++++++++----- tree-diff.c | 5 +-- 4 files changed, 82 insertions(+), 16 deletions(-) diff --git a/bloom.c b/bloom.c index 0cf1962dc5..ed54e96e57 100644 --- a/bloom.c +++ b/bloom.c @@ -222,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= settings->max_changed_paths) { + if (diff_queued_diff.nr <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -259,6 +259,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } + if (hashmap_get_size(&pathmap) > settings->max_changed_paths) + goto cleanup; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); @@ -268,6 +271,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, add_key_to_filter(&key, filter, settings); } + cleanup: hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); } else { for (i = 0; i < diff_queued_diff.nr; i++) diff --git a/diff.h b/diff.h index e0c0af6286..1d32b71885 100644 --- a/diff.h +++ b/diff.h @@ -287,8 +287,6 @@ struct diff_options { /* If non-zero, then stop computing after this many changes. */ int max_changes; - /* For internal use only. */ - int num_changes; int ita_invisible_in_index; /* white-space error highlighting */ diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index eb2bcc51f0..21b67677ef 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -177,20 +177,87 @@ test_expect_success 'persist filter settings' ' ' test_expect_success 'correctly report changes over limit' ' - git init 513changes && + git init limits && ( - cd 513changes && - for i in $(test_seq 1 513) + cd limits && + mkdir d && + mkdir d/e && + + for i in $(test_seq 1 2) do - echo $i >file$i.txt || return 1 + printf $i >d/file$i.txt && + printf $i >d/e/file$i.txt || return 1 done && - git add . && + + mkdir mode && + printf bash >mode/script.sh && + + mkdir foo && + touch foo/bar && + touch foo.txt && + + git add d foo foo.txt mode && git commit -m "files" && - git commit-graph write --reachable --changed-paths && - for i in $(test_seq 1 513) + + # Commit has 7 file and 4 directory adds + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace && + grep "\"filter_found_large\":1" trace && + + for path in $(git ls-tree -r --name-only HEAD) do - git -c core.commitGraph=false log -- file$i.txt >expect && - git log -- file$i.txt >actual && + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # Make a variety of path changes + printf new1 >d/e/file1.txt && + printf new2 >d/file2.txt && + rm d/e/file2.txt && + rm -r foo && + printf text >foo && + mkdir f && + printf new1 >f/file1.txt && + + # including a mode-only change (counts as modified) + git update-index --chmod=+x mode/script.sh && + + git add foo d f && + git commit -m "complicated" && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace-edit" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace-edit && + grep "\"filter_found_large\":2" trace-edit && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=11 \ + GIT_TRACE2_EVENT="$(pwd)/trace-update" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":11" trace-update && + grep "\"filter_found_large\":0" trace-update && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && test_cmp expect actual || return 1 done ) diff --git a/tree-diff.c b/tree-diff.c index 6ebad1a46f..7cebbb327e 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -434,7 +434,7 @@ static struct combine_diff_path *ll_diff_tree_paths( if (diff_can_quit_early(opt)) break; - if (opt->max_changes && opt->num_changes > opt->max_changes) + if (opt->max_changes && diff_queued_diff.nr > opt->max_changes) break; if (opt->pathspec.nr) { @@ -521,7 +521,6 @@ static struct combine_diff_path *ll_diff_tree_paths( /* t↓ */ update_tree_entry(&t); - opt->num_changes++; } /* t > p[imin] */ @@ -539,7 +538,6 @@ static struct combine_diff_path *ll_diff_tree_paths( skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ update_tp_entries(tp, nparent); - opt->num_changes++; } } @@ -557,7 +555,6 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { - opt->num_changes = 0; p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); /* From patchwork Wed Aug 5 17:02:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701961 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 645051510 for ; Wed, 5 Aug 2020 17:05:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4D61F22D2C for ; Wed, 5 Aug 2020 17:05:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Tgw7z+R1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728419AbgHERFZ (ORCPT ); Wed, 5 Aug 2020 13:05:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728361AbgHERCy (ORCPT ); Wed, 5 Aug 2020 13:02:54 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D02B9C0617A3 for ; Wed, 5 Aug 2020 10:02:53 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id g26so42377453qka.3 for ; Wed, 05 Aug 2020 10:02:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VnfpCipl9Slkz8EUOG39zqbhG2/9zqI8fjgty/iURK8=; b=Tgw7z+R1pP1CmrXus4Gw+lxVj8kmxi65zPlw/u/KvDjsc9q9jVsVS3oQpilF4Uop/D VCFwZjEHRQlaCbaScAVmFFCxa9jK6ahuYp3FHRo8ReWvgkbNOdn1wK3zV3s2O1/XYZyx 6osPRnG1fxJn8RnosECwS4zCMVcqsmpryAZwQ11E227v+RfFkEwR9rNHJ18W4OCulB+5 JE/UzBLntEob5Cgqn6KBl8Zh1OCsI2NbUPrxkVduuUlAlzPC9ZicRK+xFmYtU+Tmf6wD pU0ub5/kqbHOitmOzZItycIdUg0YeJCkZuUPVEStPN7Bj/ucXNRkCZAm1dFBG9WWEDD2 jGgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VnfpCipl9Slkz8EUOG39zqbhG2/9zqI8fjgty/iURK8=; b=o23J8m13Pkk0yd+HgmtBGExH/eMla3gnG4UnHuP+16ck0/LDSEdplP6BVng1WBMdGb b6QWaVf3cbJOauSIzFy50aBNgU4E4Ellj59WJIalKVeJ3V6iTHP5ZTVtteFQ4qIzsaP3 PH91UM96uWwcdHb35X0nKmIXu7trk8cFhpqAnDfsrx9DTCupVDbeajbkqRCXEH0Mt1l4 poAUYTEImT/TGf1B6MynmIn/V0w6ebLLWPLYTK9ju/xcNeBG9EwDYBjocBRG+8tTx+yC DSR2ll1x/wfAnDa359XipI5XPcw8yvfbqmf31I4vNF+bSsElZF5z4OWwxi90MYuJf0Wp +WHQ== X-Gm-Message-State: AOAM5317hpM3KEwTu3QsTOUJSNNpUg03G4Zi5i4rNLl8SOuI8tC8bgES dT8YOoqk9ztkxDqlOsWbb0wwNbgeGyrJeg== X-Google-Smtp-Source: ABdhPJwEM7a0bZATGsA9auO68PNPkidpwwRF5w400obsVYgVdBWaf+LMRq+/VNXnN1Qj3EcGPGmTJg== X-Received: by 2002:a05:620a:12af:: with SMTP id x15mr4124305qki.441.1596646972568; Wed, 05 Aug 2020 10:02:52 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id k134sm1980868qke.60.2020.08.05.10.02.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:51 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:50 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 10/14] commit-graph.c: sort index into commits list Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For locality, 'compute_bloom_filters()' sorts the commits for which it wants to compute Bloom filters in a preferred order (cf., 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) for details). A future patch will want to recover the new graph position of each commit. Since the 'packed_commit_list' already stores a double-pointer, avoid a 'COPY_ARRAY' and instead keep track of an index into the original list. (Use an integer index instead of a memory address, since this involves a needlessly confusing triple-pointer). Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to take into account the packed_commit_list they are sorting with respect to. Since 'compute_bloom_filters()' is the only caller for each of those comparison functions, no other call-sites need updating. Signed-off-by: Taylor Blau --- commit-graph.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 48d4697f54..0d70545149 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -79,10 +79,18 @@ static void set_commit_pos(struct repository *r, const struct object_id *oid) *commit_pos_at(&commit_pos, commit) = max_pos++; } -static int commit_pos_cmp(const void *va, const void *vb) +struct packed_commit_list { + struct commit **list; + int nr; + int alloc; +}; + +static int commit_pos_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; return commit_pos_at(&commit_pos, a) - commit_pos_at(&commit_pos, b); } @@ -139,10 +147,12 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } -static int commit_gen_cmp(const void *va, const void *vb) +static int commit_gen_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; uint32_t generation_a = commit_graph_generation(a); uint32_t generation_b = commit_graph_generation(b); @@ -922,11 +932,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } -struct packed_commit_list { - struct commit **list; - int nr; - int alloc; -}; struct packed_oid_list { struct object_id *list; @@ -1408,7 +1413,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; struct progress *progress = NULL; - struct commit **sorted_commits; + int *sorted_commits; init_bloom_filters(); @@ -1418,16 +1423,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->commits.nr); ALLOC_ARRAY(sorted_commits, ctx->commits.nr); - COPY_ARRAY(sorted_commits, ctx->commits.list, ctx->commits.nr); - - if (ctx->order_by_pack) - QSORT(sorted_commits, ctx->commits.nr, commit_pos_cmp); - else - QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + for (i = 0; i < ctx->commits.nr; i++) + sorted_commits[i] = i; + QSORT_S(sorted_commits, ctx->commits.nr, + ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, + &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { int computed = 0; - struct commit *c = sorted_commits[i]; + int pos = sorted_commits[i]; + struct commit *c = ctx->commits.list[pos]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, From patchwork Wed Aug 5 17:02:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C130C1575 for ; Wed, 5 Aug 2020 17:06:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AAF0A22D3E for ; Wed, 5 Aug 2020 17:06:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Ulj8/hI6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728358AbgHERGO (ORCPT ); Wed, 5 Aug 2020 13:06:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728362AbgHERC5 (ORCPT ); Wed, 5 Aug 2020 13:02:57 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77E33C0617A4 for ; Wed, 5 Aug 2020 10:02:57 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id h7so42308276qkk.7 for ; Wed, 05 Aug 2020 10:02:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=tDKkQ0gKoAGETClPbYVVJ/6NtRbQ+kiQAIhvnw3+RoE=; b=Ulj8/hI6v3JUvOwkSrzOmyUs9DECG47/q9NyozE9RvnMdlsiccAupH2/q0Ouo1XQ8N 7MwR43iNRdZY7QdMsKxLtxKSm6iws3ayO9cth43yD6uT9Ma4J+z2h4H8g4jQ3UXZrDi2 ah6jhUFMdDrP508lh0ZUObWCoEohPHUZz1NBCLI0i5dVhbkmq2kFIxkMzs9TAxiSU6U+ AfBwRZMROWc7ZvPAGVQ9gMGzOl1wdq8oVzQGxOn5moq3wYDJRiG7n10GDH1MXmp4WyoA f+p9WdnwZmXXhJtX+5wccGMcpRKaJnX8t7X0AFjD1WnyO1NUVo4nft3lXMTS3jZmmRAr ZjgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=tDKkQ0gKoAGETClPbYVVJ/6NtRbQ+kiQAIhvnw3+RoE=; b=qDS1kw+nzqo/O8TWkLjwFoptt3Bv4AMu/zgtpfcuN0HWIlSAafhZ+jjl6RcQS6+ZEj UvUO1zNz4SJe5pyMbM6sy8l9G+rfJcvLSIiMJFsirHnTmlwhrhFnQGb7/NoLh+I6YR9S CrGvuyPTPS6SOhq2mUBt0bI6JAP1vIMohpNQQMOFrPxNaDYw7GPrPS1oMrQbrFaBzZzI M1NGeEEDNaJa50+rdebIgbmdJQLzAvTIxiLWcM2Ir22t3zrZjMog4CSQodZkwvs4YPV2 ikHUeGxupo5TaTNr5t9/bfxiNixGY+2JDXq1n/WCWFMbZF5QAWH9Bd0h646bDp6vti0s j1Ig== X-Gm-Message-State: AOAM532lfI0pnd1ZxE6j8siWOdsd97qXBARAw9xYTvLJ+8fQcTaNSD2m 4IbmBZYjX+3O7C2ftTod8HIBojh6G+fBMA== X-Google-Smtp-Source: ABdhPJwXUT51hSNFJilyqy7iR+ZItA3q2aWqteHdo90zInufXTvtGrRF99uyz58UYk0qDGmW//jc0A== X-Received: by 2002:ae9:ef43:: with SMTP id d64mr4569801qkg.326.1596646976211; Wed, 05 Aug 2020 10:02:56 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id x12sm2734660qta.67.2020.08.05.10.02.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:55 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:54 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 11/14] csum-file.h: introduce 'hashwrite_be64()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A small handful of writers who wish to encode 64-bit values in network order have worked around the lack of such a helper by calling the 32-bit variant twice. The subsequent commit will add another caller who wants to write a 64-bit value. To ease their (and the existing caller's) pain, introduce a helper to do just that, and convert existing call-sites. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit-graph.c | 8 ++------ csum-file.h | 6 ++++++ midx.c | 3 +-- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 0d70545149..8964453433 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1784,12 +1784,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); + hashwrite_be32(f, chunks[i].id); + hashwrite_be64(f, chunk_offset); chunk_offset += chunks[i].size; } diff --git a/csum-file.h b/csum-file.h index f9cbd317fb..b026ec7766 100644 --- a/csum-file.h +++ b/csum-file.h @@ -62,4 +62,10 @@ static inline void hashwrite_be32(struct hashfile *f, uint32_t data) hashwrite(f, &data, sizeof(data)); } +static inline void hashwrite_be64(struct hashfile *f, uint64_t data) +{ + hashwrite_be32(f, data >> 32); + hashwrite_be32(f, data & 0xffffffffUL); +} + #endif diff --git a/midx.c b/midx.c index 6d1584ca51..8b33149ce4 100644 --- a/midx.c +++ b/midx.c @@ -775,8 +775,7 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off if (!(offset >> 31)) continue; - hashwrite_be32(f, offset >> 32); - hashwrite_be32(f, offset & 0xffffffffUL); + hashwrite_be64(f, offset); written += 2 * sizeof(uint32_t); nr_large_offset--; From patchwork Wed Aug 5 17:02:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701967 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4FD5A1731 for ; Wed, 5 Aug 2020 17:06:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 355F622D6F for ; Wed, 5 Aug 2020 17:06:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="aIjN3++Y" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728493AbgHERGD (ORCPT ); Wed, 5 Aug 2020 13:06:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbgHERDL (ORCPT ); Wed, 5 Aug 2020 13:03:11 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AA05C0617A5 for ; Wed, 5 Aug 2020 10:03:01 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id 77so10033979qkm.5 for ; Wed, 05 Aug 2020 10:03:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=dfZPKLJ6r3UEWyGZR3vKueGNOUl/lW3q87JIhDFzLg8=; b=aIjN3++YoJkfU9/RG+X9A5G7gri5HG+/j3rs2rdk8g7lco1HeZWNU8oBH2wH/gnyLQ j5dPPLOPe4JYVaBsw4bCvrQEV0a3vMdyMU2oyuVtn5p6XN5hcKwRwxf3OZH/OX+lVWNv QWh/7q06QbstFS7mUdeByxtiomJ/07j9ZU14FobslCNQ+Tg8lBlUhs1impmRpHyYjPV8 h5oiTMvzuEMESMbqpgnbfGBEnzBaIc1si29OrsuQTb92fu6sQNOWUeziwFba7ncFiGdv RHm1IcrukYntJJnP6cS+6ghkHWhPmn6X578lUWOHYs9jgMaOcf3VCMzxsA33GKa0XLP0 +TQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=dfZPKLJ6r3UEWyGZR3vKueGNOUl/lW3q87JIhDFzLg8=; b=OZycNibSr6C3v3mfVOxVwemxaKxgENC30wpkzZFlMZmaFXsbFtIdKd0xDEl8jcqaC6 /jQAMXKTkhZfadKtSwNU122fNrdcO1/tPeUL0+wVCDHqiFJ5lH7o8CcszePis078tQUv G6Oxq6zUq6o5JPwVcHX9QKXq+kvjRbKEd4m5lm7PenS8xKAET09TToubNHi29VQCkH21 3+32iStHLFPYyhHbSCcozBKR1eVwyL5sEOmmI3IgT7YQFusSOn2MfGpDa2jQbpDgclHY MPiXdYxXnvw/d5qt6BDW/cOqN7BPLjQXeZ3kGKzHRs2jgNWZOesBxGCTtZFC2KVPzRXk Dx+g== X-Gm-Message-State: AOAM5312QHI0eI+53eSppSz5qwf/MY/ftQXF9YGlw06h/J2U2s3vIFNJ KJ65PaZkPMGONL5bREpEB2kG497gZZwglA== X-Google-Smtp-Source: ABdhPJxZpbsk+tX1kS1D7DRcO92uh1DxCabMBNkrbF8Gn1tHtP+qMMCwvMFnWyDFCcZoct9BuP2uhQ== X-Received: by 2002:a37:9a46:: with SMTP id c67mr4506879qke.85.1596646979826; Wed, 05 Aug 2020 10:02:59 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id m17sm2311399qkn.45.2020.08.05.10.02.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:02:58 -0700 (PDT) Date: Wed, 5 Aug 2020 13:02:58 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 12/14] commit-graph: add large-filters bitmap chunk Message-ID: <100b26d7c8a5e7ce21a950f33791eadab74e8e70.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a commit has more than a certain number of changed paths (commonly 512), the commit-graph machinery represents it as a zero-length filter. This is done since having many entries in the Bloom filter has undesirable effects on the false positivity rate. In addition to these too-large filters, the commit-graph machinery also represents commits with no filter and commits with no changed paths in the same way. When writing a commit-graph that aggregates several incremental commit-graph layers (eg., with '--split=replace'), the commit-graph machinery first computes all of the Bloom filters that it wants to write but does not already know about from existing graph layers. Because we overload the zero-length filter in the above fashion, this leads to recomputing large filters over and over again. This is already undesirable, since it means that we are wasting considerable effort to discover that a commit with too many changed paths, only to throw that effort away (and then repeat the process the next time a roll-up is performed). In a subsequent patch, we will add a '--max-new-filters=' option, which specifies an upper-bound on the number of new filters we are willing to compute from scratch. Suppose that there are 'N' too-large filters, and we specify '--max-new-filters=M'. If 'N >= M', it is unlikely that any filters will be generated, since we'll spend most of our effort on filters that we ultimately throw away. If 'N < M', filters will trickle in over time, but only at most 'M - N' per-write. To address this, add a new chunk which encodes a bitmap where the ith bit is on iff the ith commit has zero or at least 512 changed paths. Likewise store the maximum number of changed paths we are willing to store in order to prepare for eventually making this value more easily customizable. When computing Bloom filters, first consult the relevant bitmap (in the case that we are rolling up existing layers) to see if computing the Bloom filter from scratch would be a waste of time. This patch implements a new chunk instead of extending the existing BIDX and BDAT chunks because modifying these chunks would confuse old clients. (Eg., setting the most-significant bit in the BIDX chunk would confuse old clients and require a version bump). To allow using the existing bitmap code with 64-bit words, we write the data in network byte order from the 64-bit words. This means we also need to read the array from the commit-graph file by translating each word from network byte order using get_be64() upon first use of the bitmap. This is only used when writing the commit-graph, so this is a relatively small operation compared to the other writing code. By avoiding the need to move to new versions of the BDAT and BIDX chunk, we can give ourselves more time to consider whether or not other modifications to these chunks are worthwhile without holding up this change. Another approach would be to introduce a new BIDX chunk (say, one identified by 'BID2') which is identical to the existing BIDX chunk, except the most-significant bit of each offset is interpreted as "this filter is too big" iff looking at a BID2 chunk. This avoids having to write a bitmap, but forces older clients to rewrite their commit-graphs (as well as reduces the theoretical largest Bloom filters we couldl write, and forces us to maintain the code necessary to translate BIDX chunks to BID2 ones). Separately from this patch, I implemented this alternate approach and did not find it to be advantageous. Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- .../technical/commit-graph-format.txt | 12 +++ bloom.h | 2 +- commit-graph.c | 101 +++++++++++++++--- commit-graph.h | 5 + t/t4216-log-bloom.sh | 25 ++++- 5 files changed, 130 insertions(+), 15 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 440541045d..5f2d9ab4d7 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -123,6 +123,18 @@ CHUNK DATA: of length zero. * The BDAT chunk is present if and only if BIDX is present. + Large Bloom Filters (ID: {'B', 'F', 'X', 'L'}) [Optional] + * It starts with a 32-bit unsigned integer specifying the maximum number of + changed-paths that can be stored in a single Bloom filter. + * It then contains a list of 64-bit words (the length of this list is + determined by the width of the chunk) which is a bitmap. The 'i'th bit is + set exactly when the 'i'th commit in the graph has a changed-path Bloom + filter with zero entries (either because the commit is empty, or because + it contains more than 512 changed paths). + * The BFXL chunk is present only when the BIDX and BDAT chunks are + also present. + + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] This list of H-byte hashes describe a set of B commit-graph files that form a commit-graph chain. The graph position for the ith commit in this diff --git a/bloom.h b/bloom.h index 2353c74a24..73238276bc 100644 --- a/bloom.h +++ b/bloom.h @@ -33,7 +33,7 @@ struct bloom_filter_settings { * The maximum number of changed paths per commit * before declaring a Bloom filter to be too-large. * - * Not written to the commit-graph file. + * Written to the 'BFXL' chunk (instead of 'BDAT'). */ uint32_t max_changed_paths; }; diff --git a/commit-graph.c b/commit-graph.c index 8964453433..1fee49d171 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -41,8 +41,9 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ +#define GRAPH_CHUNKID_BLOOMLARGE 0x4246584c /* "BFXL" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 8 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -429,6 +430,19 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); } break; + + case GRAPH_CHUNKID_BLOOMLARGE: + if (graph->chunk_bloom_large_filters) + chunk_repeated = 1; + else if (r->settings.commit_graph_read_changed_paths) { + graph->bloom_large_to_alloc = get_be64(chunk_lookup + 4) + - chunk_offset - sizeof(uint32_t); + + graph->bloom_large.word_alloc = 0; /* populate when necessary */ + graph->chunk_bloom_large_filters = data + chunk_offset + sizeof(uint32_t); + graph->bloom_filter_settings->max_changed_paths = get_be32(data + chunk_offset); + } + break; } if (chunk_repeated) { @@ -443,6 +457,7 @@ struct commit_graph *parse_commit_graph(struct repository *r, /* We need both the bloom chunks to exist together. Else ignore the data */ graph->chunk_bloom_indexes = NULL; graph->chunk_bloom_data = NULL; + graph->chunk_bloom_large_filters = NULL; FREE_AND_NULL(graph->bloom_filter_settings); } @@ -932,6 +947,31 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } +static int get_bloom_filter_large_in_graph(struct commit_graph *g, + const struct commit *c) +{ + uint32_t graph_pos = commit_graph_position(c); + if (graph_pos == COMMIT_NOT_FROM_GRAPH) + return 0; + + while (g && graph_pos < g->num_commits_in_base) + g = g->base_graph; + + if (!g || !g->bloom_large_to_alloc) + return 0; + + if (!g->bloom_large.word_alloc) { + size_t i; + g->bloom_large.word_alloc = g->bloom_large_to_alloc; + g->bloom_large.words = xmalloc(g->bloom_large_to_alloc * sizeof(eword_t)); + + for (i = 0; i < g->bloom_large_to_alloc; i++) + g->bloom_large.words[i] = get_be64(g->chunk_bloom_large_filters + + i * sizeof(eword_t)); + } + + return bitmap_get(&g->bloom_large, graph_pos - g->num_commits_in_base); +} struct packed_oid_list { struct object_id *list; @@ -970,8 +1010,10 @@ struct write_commit_graph_context { size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + int count_bloom_filter_known_large; int count_bloom_filter_found_large; int count_bloom_filter_computed; + struct bitmap *bloom_large; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1235,6 +1277,23 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, return 0; } +static int write_graph_chunk_bloom_large(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + size_t i, alloc = ctx->commits.nr / BITS_IN_EWORD; + if (ctx->commits.nr % BITS_IN_EWORD) + alloc++; + if (alloc > ctx->bloom_large->word_alloc) + BUG("write_graph_chunk_bloom_large: bitmap not large enough"); + + trace2_region_enter("commit-graph", "bloom_large", ctx->r); + hashwrite_be32(f, ctx->bloom_settings->max_changed_paths); + for (i = 0; i < ctx->bloom_large->word_alloc; i++) + hashwrite_be64(f, ctx->bloom_large->words[i]); + trace2_region_leave("commit-graph", "bloom_large", ctx->r); + return 0; +} + static int oid_compare(const void *_a, const void *_b) { const struct object_id *a = (const struct object_id *)_a; @@ -1398,6 +1457,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte struct json_writer jw = JSON_WRITER_INIT; jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_known_large", + ctx->count_bloom_filter_known_large); jw_object_intmax(&jw, "filter_found_large", ctx->count_bloom_filter_found_large); jw_object_intmax(&jw, "filter_computed", @@ -1416,6 +1477,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int *sorted_commits; init_bloom_filters(); + ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); if (ctx->report_progress) progress = start_delayed_progress( @@ -1430,21 +1492,28 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { - int computed = 0; int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - struct bloom_filter *filter = get_or_compute_bloom_filter( - ctx->r, - c, - 1, - ctx->bloom_settings, - &computed); - if (computed) { - ctx->count_bloom_filter_computed++; - if (filter && !filter->len) - ctx->count_bloom_filter_found_large++; + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_known_large++; + } else { + int computed = 0; + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + ctx->bloom_settings, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_found_large++; + } + } + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } @@ -1764,6 +1833,11 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) + ctx->total_bloom_filter_data_size; chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; num_chunks++; + chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMLARGE; + chunks[num_chunks].size = sizeof(eword_t) * ctx->bloom_large->word_alloc + + sizeof(uint32_t); + chunks[num_chunks].write_fn = write_graph_chunk_bloom_large; + num_chunks++; } if (ctx->num_commit_graphs_after > 1) { chunks[num_chunks].id = GRAPH_CHUNKID_BASE; @@ -2503,6 +2577,7 @@ void free_commit_graph(struct commit_graph *g) } free(g->filename); free(g->bloom_filter_settings); + bitmap_free(g->bloom_large); free(g); } diff --git a/commit-graph.h b/commit-graph.h index d9acb22bac..f4fb996dd5 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -4,6 +4,7 @@ #include "git-compat-util.h" #include "object-store.h" #include "oidset.h" +#include "ewah/ewok.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" @@ -71,6 +72,10 @@ struct commit_graph { const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + const unsigned char *chunk_bloom_large_filters; + + size_t bloom_large_to_alloc; + struct bitmap bloom_large; struct bloom_filter_settings *bloom_filter_settings; }; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 21b67677ef..6859d85369 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -33,7 +33,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' git commit-graph write --reachable --changed-paths ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 1 $NUM_CHUNKS 0 num_commits: $1 @@ -262,5 +262,28 @@ test_expect_success 'correctly report changes over limit' ' done ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + bloom_trace_prefix="{\"filter_known_large\":$2,\"filter_found_large\":$3,\"filter_computed\":$4" + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $commit_graph_args && + grep "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation does not recompute too-large filters' ' + ( + cd limits && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + git commit-graph write --reachable --changed-paths \ + --split=replace && + test_commit c1 filter && + + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ + 2 0 1 + ) +' test_done From patchwork Wed Aug 5 17:03:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701971 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3837B14B7 for ; Wed, 5 Aug 2020 17:06:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17EE322D00 for ; Wed, 5 Aug 2020 17:06:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="vlnELA/k" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728431AbgHERGl (ORCPT ); Wed, 5 Aug 2020 13:06:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728409AbgHERDL (ORCPT ); Wed, 5 Aug 2020 13:03:11 -0400 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14192C0617A9 for ; Wed, 5 Aug 2020 10:03:06 -0700 (PDT) Received: by mail-qt1-x836.google.com with SMTP id t23so31069950qto.3 for ; Wed, 05 Aug 2020 10:03:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=bLneSICqgel/NNWDRx81PUTLzPw8E0VjoB4oAwwDhps=; b=vlnELA/kjks+FFgAJggrqqFvT+z8R0mBDPrNcQycZG7yNGp+bjeXTucKpi2LPEyGaU GZdXuocKZROYrVycYbGiEKRrz/ecLZqN2N0yeSYww0nsPW3gi3IiqsFzDWrUYVVLId8W a6EZMQ1ndeEKCvYPIYG/onPZKH3Pcbg9ReBmaF61x2ofT8BjFFs1EB1lIScEzTZj/Twb TZGFJSFODRA31XbH5WnDuGZxWoDEQifj7lV607kRUCjHPVqGcZW5hx9067k0ijKq3y8E MVb2RsxlxPjd3DNQWPMssftl1JzEs99zVg9y3fyP5YVfmKdqZpYzKAM2PXqnifBQhILb i3fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=bLneSICqgel/NNWDRx81PUTLzPw8E0VjoB4oAwwDhps=; b=HPc3shkrjOQRCKIDAhj0ctCgkwiM0SSZ2zniBVD7RdUTHLLxpJ3UDQCpTIrxddWdy0 qQgLHfjHeYBZyMozHj1FxkE5njKtwvyIMai888K+EDbDY9LXneut3xjDZwXV25iRKWe1 wyDzuJpCqm3c3zj94/4kBMZd7tP2aTKesduSbg5KZ6By5YC8oaj/24DnS0f/iKqsulum CgaL8PzdOf1E3Z1s8gdBPb5S2u4VLXtwtAlBgjU+cU8ZT1gRnp/arwddojqoral8Hp9A heSegevW6mLPNWmUnpzNZPShsxh2K3iTTvICfBiur2THi4Aa6jeAkG6e7SNd9rxlNXHW yMsA== X-Gm-Message-State: AOAM532jLaCGxqCyiLU9VvoHZGpdM8qnL8eXcNbXE8Jwjev7nbjMlgE+ RYI2x9va/6r8xIDX2Av53I1s+48IQhv5nQ== X-Google-Smtp-Source: ABdhPJwJ76uaQQhKIsVbLroGutbqSa+GgcYhlSIlStC4S3o7gRXXR2F3JqtzlbmB/YLBH0RUsyFqzA== X-Received: by 2002:aed:21da:: with SMTP id m26mr4394417qtc.197.1596646983374; Wed, 05 Aug 2020 10:03:03 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id 3sm2034133qkm.117.2020.08.05.10.03.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:03:02 -0700 (PDT) Date: Wed, 5 Aug 2020 13:03:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 13/14] commit-graph: rename 'split_commit_graph_opts' Message-ID: <2ee0b8435168226f91ef60916413be0a5da5290d.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the subsequent commit, additional options will be added to the commit-graph API which have nothing to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Suggsted-by: Derrick Stolee Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 20 ++++++++++---------- commit-graph.c | 40 ++++++++++++++++++++-------------------- commit-graph.h | 6 +++--- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ba5584463f..38f5f57d15 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,7 +119,7 @@ static int graph_verify(int argc, const char **argv) } extern int read_replace_refs; -static struct split_commit_graph_opts split_opts; +static struct commit_graph_opts write_opts; static int write_option_parse_split(const struct option *opt, const char *arg, int unset) @@ -187,24 +187,24 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + OPT_CALLBACK_F(0, "split", &write_opts.flags, NULL, N_("allow writing an incremental commit-graph file"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, write_option_parse_split), - OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + OPT_INTEGER(0, "max-commits", &write_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), - OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + OPT_INTEGER(0, "size-multiple", &write_opts.size_multiple, N_("maximum ratio between two levels of a split commit-graph")), - OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), OPT_END(), }; opts.progress = isatty(2); opts.enable_changed_paths = -1; - split_opts.size_multiple = 2; - split_opts.max_commits = 0; - split_opts.expire_time = 0; + write_opts.size_multiple = 2; + write_opts.max_commits = 0; + write_opts.expire_time = 0; trace2_cmd_mode("write"); @@ -232,7 +232,7 @@ static int graph_write(int argc, const char **argv) odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(odb, flags, &split_opts)) + if (write_commit_graph_reachable(odb, flags, &write_opts)) return 1; return 0; } @@ -261,7 +261,7 @@ static int graph_write(int argc, const char **argv) opts.stdin_packs ? &pack_indexes : NULL, opts.stdin_commits ? &commits : NULL, flags, - &split_opts)) + &write_opts)) result = 1; cleanup: diff --git a/commit-graph.c b/commit-graph.c index 1fee49d171..82fca07579 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1006,7 +1006,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; - const struct split_commit_graph_opts *split_opts; + const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1347,8 +1347,8 @@ static void close_reachable(struct write_commit_graph_context *ctx) { int i; struct commit *commit; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -1548,7 +1548,7 @@ static int add_ref_to_set(const char *refname, int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct oidset commits = OIDSET_INIT; struct refs_cb_data data; @@ -1565,7 +1565,7 @@ int write_commit_graph_reachable(struct object_directory *odb, stop_progress(&data.progress); result = write_commit_graph(odb, NULL, &commits, - flags, split_opts); + flags, opts); oidset_clear(&commits); return result; @@ -1680,8 +1680,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) static void copy_oids_to_commits(struct write_commit_graph_context *ctx) { uint32_t i; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; ctx->num_extra_edges = 0; if (ctx->report_progress) @@ -1967,13 +1967,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) int max_commits = 0; int size_mult = 2; - if (ctx->split_opts) { - max_commits = ctx->split_opts->max_commits; + if (ctx->opts) { + max_commits = ctx->opts->max_commits; - if (ctx->split_opts->size_multiple) - size_mult = ctx->split_opts->size_multiple; + if (ctx->opts->size_multiple) + size_mult = ctx->opts->size_multiple; - flags = ctx->split_opts->flags; + flags = ctx->opts->flags; } g = ctx->r->objects->commit_graph; @@ -2151,8 +2151,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) size_t dirnamelen; timestamp_t expire_time = time(NULL); - if (ctx->split_opts && ctx->split_opts->expire_time) - expire_time = ctx->split_opts->expire_time; + if (ctx->opts && ctx->opts->expire_time) + expire_time = ctx->opts->expire_time; if (!ctx->split) { char *chain_file_name = get_chain_filename(ctx->odb); unlink(chain_file_name); @@ -2203,7 +2203,7 @@ int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -2220,7 +2220,7 @@ int write_commit_graph(struct object_directory *odb, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->split_opts = split_opts; + ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", @@ -2268,15 +2268,15 @@ int write_commit_graph(struct object_directory *odb, } } - if (ctx->split_opts) - replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx->opts) + replace = ctx->opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; } ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) - ctx->oids.alloc = split_opts->max_commits; + if (ctx->split && opts && ctx->oids.alloc > opts->max_commits) + ctx->oids.alloc = opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->odb); diff --git a/commit-graph.h b/commit-graph.h index f4fb996dd5..1d147b7b76 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -110,7 +110,7 @@ enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_REPLACE = 2 }; -struct split_commit_graph_opts { +struct commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; @@ -125,12 +125,12 @@ struct split_commit_graph_opts { */ int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); #define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) From patchwork Wed Aug 5 17:03:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11701983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D39C1392 for ; Wed, 5 Aug 2020 17:10:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C03F22D3E for ; Wed, 5 Aug 2020 17:10:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="pPy9aIqU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728057AbgHERJm (ORCPT ); Wed, 5 Aug 2020 13:09:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728349AbgHERDX (ORCPT ); Wed, 5 Aug 2020 13:03:23 -0400 Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9315C0617AA for ; Wed, 5 Aug 2020 10:03:08 -0700 (PDT) Received: by mail-qv1-xf32.google.com with SMTP id b2so10482937qvp.9 for ; Wed, 05 Aug 2020 10:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=C8mcdF635Hj3CViDSS3+9IGExrkhj7G6YfsQfTjfGsE=; b=pPy9aIqUaiDCcBzt9OWGT8mM8X9HU/sO1SOPUCui4JLxXmVRtmVAC+/AzJMyx6JmsV MvXxezivk2KGYujs8G158iHKNzigYNyEMnSsKBTVO4PXSycpPvMkznUJqlBNxoSXAX/5 GrqUJIfqG95ZdNmbX82xIH87rbA/cF1qwT8I5NFP4LihDFWSyE7JM0FRzm04w4d9Oph9 yIw4DQzq9uYvAGmgK85XuIBAZUfkhqi9fB2UTz9Elg7t4pLuvJOrU41OoBPPA0YgLzvN h0+syZnLOg46rDxm0d+sb8HiXCbCCr9BiAApXxxBgZMVByZdclA2n1PkHsjNJvwTF95o SLMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=C8mcdF635Hj3CViDSS3+9IGExrkhj7G6YfsQfTjfGsE=; b=KLStpiL3bvFPj4xeLsSKt0nqBQgXOZXZBjPwMWk1u2whbkr674WvezBz1P7MnKYuc4 0rm3GCtHvNhtRSBoPDzfLJgVjDKzARJlsg/kZLjgc1sPhGDZpijVKks6LXkozacRqxN0 TCu3bOf0zhDXB2FQtCsvNoo2PfdNMKOdh+I7QVtyGc/i9ji47jQbKKWpYk/6K7vBYKVU ZJvueOsYU5UOG1Lv/F1K9qiCbFH1RA5zK0M3/9bdWuIUBlIVBXyCt7EUMsg+N6hhJVHq 7XAKcw/3expiYt1Aql3Xfla1ZfFcQ8w2MZRUkkT2fdbomGRKEzy9FZGoG+LFBe3vO4F7 nPzQ== X-Gm-Message-State: AOAM5311FZG42tpdnkwqth8CtevyGJfS8DzHYxA3LHlHEnfXPZCAGnaQ lTSjWimfo3AMB4lGbQJwOIN8UR32hRCSnw== X-Google-Smtp-Source: ABdhPJwJtSjabKvfySo852mGGuQVSKwxSyA8iOx5t7D0jcA8mRPWnn/PJzTq3c7ye+l7tev6spgMNQ== X-Received: by 2002:ad4:4cc3:: with SMTP id i3mr3081508qvz.17.1596646986884; Wed, 05 Aug 2020 10:03:06 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:d118:9acc:fdba:dee7]) by smtp.gmail.com with ESMTPSA id t93sm2519392qtd.97.2020.08.05.10.03.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Aug 2020 10:03:06 -0700 (PDT) Date: Wed, 5 Aug 2020 13:03:05 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com Subject: [PATCH v2 14/14] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: <3b66ae4a9c2aa97dd64f88904ad2bf2756ccd9ef.1596646576.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag and configuration variable to fill in the 'max_new_filters' variable introduced by the previous patch. The command-line option '--max-new-filters' takes precedence over 'commitGraph.maxNewFilters', which is the default value. '--no-max-new-filters' can also be provided, which sets the value back to '-1', indicating that an unlimited number of new Bloom filters may be generated. (OPT_INTEGER only allows setting the '--no-' variant back to '0', hence a custom callback was used instead). Signed-off-by: Taylor Blau --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 4 +++ bloom.c | 15 +++++++++++ builtin/commit-graph.c | 39 +++++++++++++++++++++++++--- commit-graph.c | 16 +++++++++--- commit-graph.h | 1 + t/t4216-log-bloom.sh | 19 ++++++++++++++ 7 files changed, 91 insertions(+), 7 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..9c887d5d79 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,10 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Overrides the `commitGraph.maxNewFilters` configuration. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index ed54e96e57..d0c0fd049d 100644 --- a/bloom.c +++ b/bloom.c @@ -51,6 +51,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, else start_index = 0; + if ((start_index == end_index) && + (g->bloom_large.word_alloc && !bitmap_get(&g->bloom_large, lex_pos))) { + /* + * If the filter is zero-length, either (1) the filter has no + * changes, (2) the filter has too many changes, or (3) it + * wasn't computed (eg., due to '--max-new-filters'). + * + * If either (1) or (2) is the case, the 'large' bit will be set + * for this Bloom filter. If it is unset, then it wasn't + * computed. In that case, return nothing, since we don't have + * that filter in the graph. + */ + return 0; + } + filter->len = end_index - start_index; filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 38f5f57d15..3500a6e1f1 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index 82fca07579..76b1238262 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -948,7 +948,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit } static int get_bloom_filter_large_in_graph(struct commit_graph *g, - const struct commit *c) + const struct commit *c, + uint32_t max_changed_paths) { uint32_t graph_pos = commit_graph_position(c); if (graph_pos == COMMIT_NOT_FROM_GRAPH) @@ -1475,6 +1476,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; int *sorted_commits; + int max_new_filters; init_bloom_filters(); ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); @@ -1491,10 +1493,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, &ctx->commits); + max_new_filters = ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, + c, + ctx->bloom_settings->max_changed_paths)) { bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_known_large++; } else { @@ -1502,7 +1509,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed) { @@ -1512,7 +1519,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_found_large++; } } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + if (filter) + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index 1d147b7b76..47d99ea4bf 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -115,6 +115,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 6859d85369..3aab8ffbe3 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -286,4 +286,23 @@ test_expect_success 'Bloom generation does not recompute too-large filters' ' ) ' +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 2 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 2 0 1 + ) +' + test_done