From patchwork Thu Sep 3 22:46:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1332746 for ; Thu, 3 Sep 2020 22:46:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86EDD20786 for ; Thu, 3 Sep 2020 22:46:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="QyJxAWDp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729571AbgICWqF (ORCPT ); Thu, 3 Sep 2020 18:46:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729438AbgICWqE (ORCPT ); Thu, 3 Sep 2020 18:46:04 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93A3FC061244 for ; Thu, 3 Sep 2020 15:46:04 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id w12so4697283qki.6 for ; Thu, 03 Sep 2020 15:46:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/4wm95ptT7x1ZcrNfkC2F+QnzhpSTYV3tjDy+cMi/HY=; b=QyJxAWDpHcyuw8vYSxlg7GzCln5K5Yf/NOJRTbXGpeKsTZvb/XXM6E2uYW979/SPoL QIrgY6pEvPpx9oDCy8LKcDSu9TQ4ciaHxOLNPPwAkV8DWsHXDR6WEq2xxrOV9PfG26JO oPxIL6dQU2KfVlhqp3eqi1oXtq9O/cIteYn7VTRHh8yEgz7Kc2Gym6BXq2ZFHswYQbM6 Z537NzTaHRrilGaRkZrcHqd6mvLXV73pc9+fy35gqC73UtKCh0mq3i6fRm2SjC5ByWCR mD/AAdFMDlhEC5hSzr2Dkkcm+qDH6Zu6o6mYuIqZutV1taYemgK8qSh12cQK7lqVEi4Y ImtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/4wm95ptT7x1ZcrNfkC2F+QnzhpSTYV3tjDy+cMi/HY=; b=UHjTb13232eyMyriHgo8/WbhW+M5mcmQNmbTeXMIs9Fs4yaqJvDHtfyyDUl+s7ygZU siMw3OLApxARFEuuGzFi4TLazmvxFDPOmRGQ//NfB45UziIHFwrtivhUvqV5iDX6E5lr rX4cmbDCipgGnhio9RXwUB9fdCSMZ+maMtHWtM9ifu0IutOrssOwpe+1MvvDNKNncxwE W37H39I/AYH5Utjvk8lRj7kbZa6VgjlkL5Zoqvq5ohBLSB9IafWAKWhwLDT73i2qeXbw E6meGEtcZ+ECxiX1lFBDQ0fNl9hIhK1g3UUNV083yrr4JJJCyyrfMPBJaguFca96BvxV XA3w== X-Gm-Message-State: AOAM533UZebFvQikGksoFlMd6UOIf5/+3GcsAy4Nm4NpPUBnbmFGXTAX 1A7O3RIsV0F0MZoEjJ2lGZ5qjgg6MV8t98lt X-Google-Smtp-Source: ABdhPJyri80SlN3JN/7arzcsAHZj3ptnYhzxdZCjQUZaTjOCzkwAed++9fb0YQl10KZ9fI7sFLHr3g== X-Received: by 2002:a37:66d1:: with SMTP id a200mr5347623qkc.342.1599173163284; Thu, 03 Sep 2020 15:46:03 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id 10sm3247492qkk.88.2020.09.03.15.46.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:02 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:00 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 01/14] commit-graph: introduce 'get_bloom_filter_settings()' Message-ID: <97d80f109f5c470a8ca3d17ef9d451787b0b609e.1599172907.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings *' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter *'. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau --- blame.c | 6 ++++-- bloom.c | 6 +++--- commit-graph.c | 11 +++++++++++ commit-graph.h | 2 ++ revision.c | 5 +---- t/t4216-log-bloom.sh | 9 ++++++--- t/t5324-split-commit-graph.sh | 13 +++++++++++++ 7 files changed, 40 insertions(+), 12 deletions(-) diff --git a/blame.c b/blame.c index 1be1cd82a2..903e23af23 100644 --- a/blame.c +++ b/blame.c @@ -2892,16 +2892,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb, const char *path) { struct blame_bloom_data *bd; + struct bloom_filter_settings *bs; if (!sb->repo->objects->commit_graph) return; - if (!sb->repo->objects->commit_graph->bloom_filter_settings) + bs = get_bloom_filter_settings(sb->repo); + if (!bs) return; bd = xmalloc(sizeof(struct blame_bloom_data)); - bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings; + bd->settings = bs; bd->alloc = 4; bd->nr = 0; diff --git a/bloom.c b/bloom.c index 1a573226e7..cd9380ac62 100644 --- a/bloom.c +++ b/bloom.c @@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, while (graph_pos < g->num_commits_in_base) g = g->base_graph; - /* The commit graph commit 'c' lives in doesn't carry bloom filters. */ + /* The commit graph commit 'c' lives in doesn't carry Bloom filters. */ if (!g->chunk_bloom_indexes) return 0; @@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - r->objects->commit_graph->chunk_bloom_indexes) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) + return filter; } if (filter->data) diff --git a/commit-graph.c b/commit-graph.c index 0ed003e218..6a36ed0b06 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -667,6 +667,17 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) +{ + struct commit_graph *g = r->objects->commit_graph; + while (g) { + if (g->bloom_filter_settings) + return g->bloom_filter_settings; + g = g->base_graph; + } + return NULL; +} + static void close_commit_graph_one(struct commit_graph *g) { if (!g) diff --git a/commit-graph.h b/commit-graph.h index 09a97030dc..0677dd1031 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); */ int generation_numbers_enabled(struct repository *r); +struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); + enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), diff --git a/revision.c b/revision.c index 08c2ad23af..857274408c 100644 --- a/revision.c +++ b/revision.c @@ -680,10 +680,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) repo_parse_commit(revs->repo, revs->commits->item); - if (!revs->repo->objects->commit_graph) - return; - - revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings; + revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo); if (!revs->bloom_filter_settings) return; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 4bb9e9dbe2..715912ad0f 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -65,7 +65,7 @@ setup () { test_bloom_filters_used () { log_args=$1 - bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\"" + bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\"" setup "$log_args" && grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom && @@ -139,8 +139,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' -test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' ' - test_bloom_filters_not_used "-- A/B" +test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' ' + # Ensure that the number of empty filters is equal to the number of + # filters in the latest graph layer to prove that they are loaded (and + # ignored). + test_bloom_filters_used "-- A/B" 3 ' test_expect_success 'setup - add commit-graph to the chain with Bloom filters' ' diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 18216463c7..c334ee9155 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -427,4 +427,17 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success '--split=replace with partial Bloom data' ' + rm -rf $graphdir $infodir/commit-graph && + git reset --hard commits/3 && + git rev-list -1 HEAD~2 >a && + git rev-list -1 HEAD~1 >b && + git commit-graph write --split=no-merge --stdin-commits --changed-paths graph-files && + test_line_count = 1 graph-files && + verify_chain_files_exist $graphdir +' + test_done From patchwork Thu Sep 3 22:46:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57BA0746 for ; Thu, 3 Sep 2020 22:46:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F5E62083B for ; Thu, 3 Sep 2020 22:46:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="dQxta5Jq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729578AbgICWqK (ORCPT ); Thu, 3 Sep 2020 18:46:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729438AbgICWqJ (ORCPT ); Thu, 3 Sep 2020 18:46:09 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16899C061244 for ; Thu, 3 Sep 2020 15:46:09 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id v54so3261290qtj.7 for ; Thu, 03 Sep 2020 15:46:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=vbGAMmD5p7GqMlErDXBuqILsXE1ccwaXq0IJWJhLIrU=; b=dQxta5JqENPbbwkOZxzJBg6C23ZyrZIZ+2qI1iBCUHDTUGgfqR+ulRI2H9+Y5uidmN DnVRNS+fDVB6wCm177NYbeDEBWLffQLEKPn8OIPs/ZGx9fA9guXgF3rI+xiCuRlyJNNx pzhqRt27FVonekccnhK3rFhLRsw1RcM12x8rTLGI3i9zVsvPFf2R3wkgWS/IuwZxm0je w8gPw3bZiUPtHqYUAQxEvevLuGo533aitEC6CftP89Tny7VR6g2EKJrxX+qs0u2lr5ZV clcYqJyjOkImyXi+g3EVhP50l9xGCg+Nmy1IhirkLzlCJr0yja+lutH9knRJWbOaVOqs +gzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=vbGAMmD5p7GqMlErDXBuqILsXE1ccwaXq0IJWJhLIrU=; b=GtZUwXcYfHVo7I8CNTbIAHwzt3xBkmbg0QIW+u4IeeSTQhARO15i/0yGD8rEg+swAY WMhmhSPmmEadLdOV7I4b0+pr1HqA+RL4jqqHE9mRu5rl/c45Ro5UguGljuBUIlZ+huxy +0NVwO+KkJy1cHBSsMJUKEj0QKMYSsITH3G16J9vwN7yejdlumfLOjl+YY/i4ufiF0zo 4qEx8t/zucifRYeRPcjQiPs8rlFpI+FJgQt91dxJslIDS5jXwsd+efR+heFZh/0GGzDO dKt5U7hZrLbhTw7JuYmyaYcM+emAHK2xDXmqIPH3+VhUwjn523o9VRWUSbIoHox48RZO s8Og== X-Gm-Message-State: AOAM531ozsUBYeBMlRKiHInGJNgthuQBy2IsUUweDmO1rEZDDqfX0W86 YO5ainP/1w5cZH5dzIhGQCJ4U8gvnOV5FxRI X-Google-Smtp-Source: ABdhPJwm1zkqJKhZi74Nk7/qDHxWq1m/DZNnKRlSujla/+Ev+mmND9mC4iguEvFs0q71vdsY/Y/6TQ== X-Received: by 2002:aed:2fc5:: with SMTP id m63mr5685690qtd.313.1599173168070; Thu, 03 Sep 2020 15:46:08 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id a3sm3084062qtj.21.2020.09.03.15.46.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:07 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:05 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 02/14] t4216: use an '&&'-chain Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. When 'trace.perf' already exists, everything works fine. However, the function can be executed without 'trace.perf' on disk (eg., when the subset of tests run is altered with '--run'), and so the bare 'rm' complains about a missing file. To remove some noise from the test log, invoke 'rm' with '-f', at which point it is sensible to place the 'rm -f' in an '&&'-chain, which is both (1) our usual style, and (2) avoids a broken chain in the future if more commands are added at the beginning of the function. Helped-by: Eric Sunshine Helped-by: Jeff King Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 715912ad0f..cd89c75002 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -58,7 +58,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } From patchwork Thu Sep 3 22:46:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 693CA746 for ; Thu, 3 Sep 2020 22:46:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 46B2120716 for ; Thu, 3 Sep 2020 22:46:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="IC8G1JMm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729595AbgICWqZ (ORCPT ); Thu, 3 Sep 2020 18:46:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729581AbgICWqO (ORCPT ); Thu, 3 Sep 2020 18:46:14 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB763C061249 for ; Thu, 3 Sep 2020 15:46:13 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id w16so4690424qkj.7 for ; Thu, 03 Sep 2020 15:46:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=MYSeo56uRodtCiJWZQi2tkaNankbkIOuGdGn/m8FMFs=; b=IC8G1JMm3JTbm6y9CFj3RkObXhboQ3BxupUqd0aGPxvdukIdiOaDvf2A028xvcvumZ /0xIDZKHZd408QYp4gl5ORNfaMZWrBGoHz8R2jtchDGLqsVyyKiukSfVdDYGze5ricA6 gmUb2vp6sqyc/ALSXpY1Te0/G44Q+XCEQQr2NmF3RXOK65JdSVJiVGBmzyRWAs3oYsBF 6OJTe0uefkwbMrgeekq8vldtaGOlZjTSfLN8C6ZCM4an0NcCTS24y2oCmje2GELGtO+O yyZ4OzZIYmRu3goCSp+iTO+8MxSwOzoZeRhcZua4Diht/hAG16k28YNz+ulVQ7sLcmVC Fntg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=MYSeo56uRodtCiJWZQi2tkaNankbkIOuGdGn/m8FMFs=; b=RvO8qyZ0qXq8TLlKsncopQztwxIM5pj9nTgSYZC8ErGi4ITcfJCHkJWospT84yGI9w 8yEemDbU2tQiMdL014bXIKUA2vvvZ98ls1txbMld5QtKFSW1PvxR9/h+pmYPV9yXgSY4 T9Enq8Dt2pg+bP7DowMpksT50TEEqlDd7ZiPg9HUqbmMjqHAr/39wUugoRoz7yamTWzS Ozc1g/D0KAfX0uHPQgRm/w+bU5gimTM5TvjCV9byq+zEyeOMtObOIXojrQxCA1LblGHh GhYVUE6eJnrp/Gk4puo1CqktodrhP7lSykm2UrB9bjDVi6nEgxP71HjI1iFAZzIxAIqP KtsA== X-Gm-Message-State: AOAM532HP2aw1PEhhJt6BeODyD4GXt1o0uh48C0c2owWgcYcBqFBYgmy hB7jTEVMspb5USF2Xqk35r5XGSDenm9wazbQ X-Google-Smtp-Source: ABdhPJwY0zC874XeOmuEbdiVBlcP/CjqsEBqwOL+kxK4498FYep4cyZqe+pi8NQHOV8h/6f02xhsTw== X-Received: by 2002:a05:620a:3d0:: with SMTP id r16mr5306501qkm.129.1599173172817; Thu, 03 Sep 2020 15:46:12 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id f3sm3089943qth.56.2020.09.03.15.46.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:12 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:10 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 03/14] commit-graph: pass a 'struct repository *' in more places Message-ID: <639a962a4920d35002aa5111d1d983b96277e882.1599172907.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 2 +- commit-graph.c | 17 ++++++++++------- commit-graph.h | 6 ++++-- fuzz-commit-graph.c | 5 +++-- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 523501f217..ba5584463f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index 6a36ed0b06..72a838bd00 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -231,7 +231,8 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) return 1; } -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb) { void *graph_map; @@ -247,7 +248,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -287,7 +288,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -452,7 +454,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) return NULL; } -static struct commit_graph *load_commit_graph_one(const char *graph_file, +static struct commit_graph *load_commit_graph_one(struct repository *r, + const char *graph_file, struct object_directory *odb) { @@ -464,7 +467,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(r, fd, &st, odb); if (g) g->filename = xstrdup(graph_file); @@ -476,7 +479,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); return g; @@ -557,7 +560,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(r, graph_name, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 0677dd1031..d9acb22bac 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,11 +75,13 @@ struct commit_graph { struct bloom_filter_settings *bloom_filter_settings; }; -struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, +struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, + int fd, struct stat *st, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); From patchwork Thu Sep 3 22:46:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40B8D138E for ; Thu, 3 Sep 2020 22:46:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2388A20716 for ; Thu, 3 Sep 2020 22:46:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="eTYrmKi0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729486AbgICWqY (ORCPT ); Thu, 3 Sep 2020 18:46:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729560AbgICWqT (ORCPT ); Thu, 3 Sep 2020 18:46:19 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F31AC061244 for ; Thu, 3 Sep 2020 15:46:18 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id u3so4676295qkd.9 for ; Thu, 03 Sep 2020 15:46:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2h+pY4m7Cx+NBzgbnkk/J/U9/BuYE7fviacZqf05fNo=; b=eTYrmKi0TPM0q7pGAihSseCTgal0tP1FlDyIpRit3AMdzA5u/4Y68wEX+hhjyYda5R gTlS9KCnfAcG5pBfnHl6i1jM9BNyMdWdvLBjQ5NlPrk3U5ZsGp/Zzmnk263YoH9s0dqO A5rcsKtPQPK3af9Wo7yf4rz8+k4STOzJfb9ZKelTI6EaaRZ/bxlex9ucxVzhoL2AxQpj G6QPVOyXg2OLQWph8N04ApD2PwGlETbCICEgkB2wSpyLFjimmb49Nb5CZ5MpTSBCFjcT xfQv7IrJhvKmGwccyr7kGq8eRTGrqQbcWZpYI/hx6cAIy67XEFIXbs2vJfJECxVX1/Uq KaYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2h+pY4m7Cx+NBzgbnkk/J/U9/BuYE7fviacZqf05fNo=; b=nRwsvcNRUPmBEF6f8b2ycRROzNMJPWAhUUPawsOUuor69ZT5GhfOrMAdilnrtB7h37 ATMVC75JzuUlJdyriOsVMT2Uxlkix1C/8LRYBWhLndzdJiAlVrCQts61D4i+GsR++BoW 0Ktpgz9mxK2Aizv9eFKQx/4mbLASKBq1rcw3iGhV6GXKjWoSYl2QQ9L7ktcExzOU2NCT XWFLQhIKGbHfCzpP7MowBDXtLT6DusOT/ZljViLkcOWOZKLdR7qjVILe7gN5SfvTM/+L IeBMzkGbBBLHX+qzrvMQdsj9g3/3ZuqiitTIhR3msbH+yp8NLslBpXnLxlTJjYxKyYb8 TWKQ== X-Gm-Message-State: AOAM531ZVteW5P+nSCAqsO64aQy+RGawDC6IjHK/K9s9tKk9Ln3FAKH2 9slg2YH+YvXFsWyU875n41vgH/osvq+NG248 X-Google-Smtp-Source: ABdhPJxwdzCIzfPU3zLs1WQnqLqfbqmUSsdERQmTDV8MeWktKxLaP5EBIGjAcy9iJglkQHDk9qMxGA== X-Received: by 2002:a05:620a:21cf:: with SMTP id h15mr5367493qka.123.1599173177555; Thu, 03 Sep 2020 15:46:17 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id q142sm3065593qke.48.2020.09.03.15.46.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:16 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:14 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 04/14] t/helper/test-read-graph.c: prepare repo settings Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau --- t/helper/test-read-graph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), From patchwork Thu Sep 3 22:46:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E0CD746 for ; Thu, 3 Sep 2020 22:46:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDD6D20786 for ; Thu, 3 Sep 2020 22:46:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="MVworPmc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729602AbgICWqf (ORCPT ); Thu, 3 Sep 2020 18:46:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729579AbgICWqX (ORCPT ); Thu, 3 Sep 2020 18:46:23 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 552A7C061244 for ; Thu, 3 Sep 2020 15:46:23 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id g3so3240528qtq.10 for ; Thu, 03 Sep 2020 15:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cqzwbwviXpVDnWui0q21WP1AZ9olbeMJ0rHHylfp8TQ=; b=MVworPmcxc2Cnf602/1t1HaCkfbaEawgiARYNb70UvOZnvXbqetLJ59ifZcusyy70i ucpA1ump4kyR7VEWghzN7KCTa+qQ8B4AD8SWtDD6PRny8yYIBSLl8xiod6TZGN4guesA M/Fm1uyi8/9X9umiuII567I0p0i+qYh+xoNYVmWc1HVXyTU7Em5H+Jxl2ftDWcSi21zC 96s3NaS7RTK/cIqpXkmg4BdJvNsmpcKn6WLSDJ9znQ2/hHUwsWtfX7ISToJ25hfH2FkA 3hfJMLtId+j/yi2GZi1dxIko9hnngSzhNK7Myz8BYq2V9CvVuTXbAysEozoMXP1b2T+0 Llaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cqzwbwviXpVDnWui0q21WP1AZ9olbeMJ0rHHylfp8TQ=; b=XtcuZmrRl+37FXKKaYjL5NWANbsh5+IFP+zm/Q0NEQC2FssKOENe04faJVvSTOnLCr 9Qm/oG3APK7CJx4YfWj0t5eA6OubKEgbwZVEyrjsLwqg0SpR4pO1/0Dj51SvhhAIutxd GsVau7DfVlvqKT6u15xXDndvwZWBGTbj0+eY4GNz/p1tNR8CWwXTdIHSNfnSFFmbLIPz hEbKeL5t63xccXmYhswhetAv7hIL2CEuC9BQ9a2dFRzS6VblogDKSSxJWz3IYaZMc/fM YqaQwVmXoN+QAJdEh58j1LDf3HczBNyY2ZxbLUy+nMN+syUeMUSoCKdUigNHWw+EwmKk 3Ahg== X-Gm-Message-State: AOAM532PbnZT2t3A9LqJpxI4gSOk8hKBT/bd5h9rgY7ZjMiKuQJFj2O8 z+c9wA0laR1IdMvqBeNx24RR++efKDkV9sDG X-Google-Smtp-Source: ABdhPJz5ewqh9izcNHBNyk/ea5Hq8RagxO8r3q3j84OfhkfJ+J1jUMAZnQCjT1fBOs5jZlop43KuXg== X-Received: by 2002:ac8:2612:: with SMTP id u18mr6101042qtu.363.1599173182252; Thu, 03 Sep 2020 15:46:22 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id c13sm3162897qtq.5.2020.09.03.15.46.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:21 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:19 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 05/14] commit-graph: respect 'commitGraph.readChangedPaths' Message-ID: <8aff54d83e26953476da7ca9a1676bd945c00aa2.1599172908.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 ++ Documentation/config/commitgraph.txt | 4 ++++ commit-graph.c | 6 ++++-- repo-settings.c | 3 +++ repository.h | 1 + t/t4216-log-bloom.sh | 4 +++- 6 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 Documentation/config/commitgraph.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 3042d80978..770ae79b82 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -340,6 +340,8 @@ include::config/column.txt[] include::config/commit.txt[] +include::config/commitgraph.txt[] + include::config/credential.txt[] include::config/completion.txt[] diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt new file mode 100644 index 0000000000..cff0797b54 --- /dev/null +++ b/Documentation/config/commitgraph.txt @@ -0,0 +1,4 @@ +commitGraph.readChangedPaths:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 72a838bd00..ea54d108b9 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -327,6 +327,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } + prepare_repo_settings(r); + graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -403,14 +405,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.commit_graph_read_changed_paths) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.commit_graph_read_changed_paths) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index aa61a35338..88ccce2036 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "commitgraph.readchangedpaths", &value)) + r->settings.commit_graph_read_changed_paths = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.commit_graph_read_changed_paths, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 628c834367..bacf843d46 100644 --- a/repository.h +++ b/repository.h @@ -30,6 +30,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index cd89c75002..fc7693806c 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -95,7 +95,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config commitgraph.readChangedPaths false && + test_bloom_filters_not_used "$option -- $path" ' done done From patchwork Thu Sep 3 22:46:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8B71746 for ; Thu, 3 Sep 2020 22:46:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B2E920791 for ; Thu, 3 Sep 2020 22:46:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="S7RByym+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729603AbgICWqh (ORCPT ); Thu, 3 Sep 2020 18:46:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729598AbgICWq2 (ORCPT ); Thu, 3 Sep 2020 18:46:28 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B07FDC061245 for ; Thu, 3 Sep 2020 15:46:27 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id n18so3307311qtw.0 for ; Thu, 03 Sep 2020 15:46:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VLl9/4i6W1dXFmg/9v3/RXIEbDmyDfs/63CE/yvVW30=; b=S7RByym+gW4fs/JvSoSaFPeIigYxKvB/AN1p7dx3Od/+hmRgRLmaZgTwA8Lx14ly5W 6YLBZI04bcu4vTJ1Tf5/v/KXLTv6Cy1RAJtviXy+8z4ypdikecH+myvnOV5ktaEJRYE5 IBaFTJAWEdFEEhSSXrg4kCC5B8pJCHTfoUD8WBadUNQBBBsntvW/fPovvTDikLK0hERA EbUU8PXMY0JsElla2KG9nOF5P7Sp3GFXR/sw2c2SbN3gqq9PhbNAIMRSPCYG0g7bRPSG UdFy0aht92ADyp94Xbaib2416ZM8GDLfBTpS6oYj4f5AeeNJsxuE3FlV6E+Hzm7ukql4 deVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VLl9/4i6W1dXFmg/9v3/RXIEbDmyDfs/63CE/yvVW30=; b=raEGVZuufDGaezUCnYIHsK9fsDzk/Tr+UYJNZJfe2KQm8Ewvcz6fFw35fTJl2jBpMn fEI0deg7SD02ZuegeXyTt6XGeGfTl8IwQ6sEsXwjwnXC+vkqErUppF59zIZmQUSHJBON VsXY1ivb6YRySiSkn0C/I4RzlRdnP3B1LjtX4Q+PgrtRh0X50bnBc9RlvWLvNonyCOdR 2MX6+nQuAdNmehJKwKzsyDdEqXThqNBZnMPHAegyVa/xwjRLW1CSXwIeaVByvIqSzCeb G7PlKZ6JdMxzvpvZBa1pgME8hUYRDdGCWGK7vj5+m4wQAWPgQfY8H6k4gxmFCAVHrAuw tiFw== X-Gm-Message-State: AOAM5324Xt7tMV89AwZmd0R0pmOIyzZ6W2BWFK86xONn37sr4SBPDkC4 51J7uMAQWjQT+kKsKXx+PXvbt3cK/lvRt15X X-Google-Smtp-Source: ABdhPJxrE1prM1DEoweps9yE9lY+u1CGsdgPQ8zd2FW2CtjJnVg6iG//pr6C1J7UlBeTL8s/kW0cZQ== X-Received: by 2002:ac8:794c:: with SMTP id r12mr5806467qtt.162.1599173186601; Thu, 03 Sep 2020 15:46:26 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id f76sm3070295qke.19.2020.09.03.15.46.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:26 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:24 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 06/14] commit-graph.c: store maximum changed paths Message-ID: <965489d3617ec03be90d12402fda29fc2b1ead63.1599172908.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For now, we assume that there is a fixed constant describing the maximum number of changed paths we are willing to store in a Bloom filter. Prepare for that to (at least partially) not be the case by making it a member of the 'struct bloom_filter_settings'. This will be helpful in the subsequent patches by reducing the size of test cases that exercise storing too many changed paths, as well as preparing for an eventual future in which this value might change. This patch alone does not cause newly generated Bloom filters to use a custom upper-bound on the maximum number of changed paths a single Bloom filter can hold, that will occur in a later patch. Signed-off-by: Taylor Blau --- bloom.h | 11 ++++++++++- commit-graph.c | 3 +++ t/t4216-log-bloom.sh | 4 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/bloom.h b/bloom.h index d8fbb0fbf1..0b9b59a6fe 100644 --- a/bloom.h +++ b/bloom.h @@ -28,9 +28,18 @@ struct bloom_filter_settings { * that contain n*b bits. */ uint32_t bits_per_entry; + + /* + * The maximum number of changed paths per commit + * before declaring a Bloom filter to be too-large. + * + * Not written to the commit-graph file. + */ + uint32_t max_changed_paths; }; -#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 } +#define DEFAULT_BLOOM_MAX_CHANGES 512 +#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10, DEFAULT_BLOOM_MAX_CHANGES } #define BITS_PER_WORD 8 #define BLOOMDATA_CHUNK_HEADER_SIZE 3 * sizeof(uint32_t) diff --git a/commit-graph.c b/commit-graph.c index ea54d108b9..55af498aa0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1201,6 +1201,7 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version); jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes); jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry); + jw_object_intmax(&jw, "max_changed_paths", ctx->bloom_settings->max_changed_paths); jw_end(&jw); trace2_data_json("bloom", ctx->r, "settings", &jw); @@ -1669,6 +1670,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; } diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fc7693806c..47ddf2641f 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -174,11 +174,11 @@ test_expect_success 'persist filter settings' ' GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \ GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2.txt && GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \ GIT_TRACE2_EVENT_NESTING=5 \ git commit-graph write --reachable --changed-paths && - grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt ' test_expect_success 'correctly report changes over limit' ' From patchwork Thu Sep 3 22:46:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E07D746 for ; Thu, 3 Sep 2020 22:46:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D8E0520797 for ; Thu, 3 Sep 2020 22:46:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="dA8+k35P" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729607AbgICWqi (ORCPT ); Thu, 3 Sep 2020 18:46:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728107AbgICWqd (ORCPT ); Thu, 3 Sep 2020 18:46:33 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68FA7C061246 for ; Thu, 3 Sep 2020 15:46:32 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id g72so4666332qke.8 for ; Thu, 03 Sep 2020 15:46:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5K2ayCwmhm5xs2xaD7HBq9tMPKD3BsaVkm5pmV5rXyo=; b=dA8+k35Pjr12oF/J38Kan9ksKzdWVKlFZhGoPGexyJHyCWihCees8qACYql0UVhQWn 2rHvULFFpebblUrIgqRRfvTM10jw7RTNV1YJJJdJAj8L4bJNvgFiud6LHmyTp0wdmqiS R1/+4zjkTgaZRCfSP8yO4TS/kAuowyAY2dwYXhFCaoW2pw0vru7TTivz9BUTs96a25VC 4p4r8eMAej/MLkrcgPIMsq6i41ftLkEjkqkj2QVSperXGbRZm9+gBV5qxTqsX0PZlcfi PdzNZhZymjZyoZ3gZPeN7xM3WIbCrMyHY6h6G93N8j074G/wIn37uhFWCASejqvfEjx1 UeaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5K2ayCwmhm5xs2xaD7HBq9tMPKD3BsaVkm5pmV5rXyo=; b=fo/4TL8aFaMZ7DxYOXPk5sd4b6/hhowNnaYSJ/RLo29GUWpNaiLNPYlOr23OAB3RcC dFNtA/WAZUOr5PLZgm43i9y8F/avU3q72ZKmYc0EF6+NuQcS0pUSGcS/zZiPRsdimJeX C6ygUH+5Q4hA3TZUfcJ+ecpbnimxAl0yX1FePy2+77o3wUgOstAnUGwgEu4+mnvG1mwN TzHWHx7XHnzemlYdcMNpDW8bu5GVsV+hUU1GCP7LMJvWrXQkPThmIydVHP/hCsrDMHnG waUc4UMHDrqvwHDEpGYPLP38zpv3ILzFHhQdsPhbPoPd6RETNIlHXlb7+8D73bfPlJpR nc9Q== X-Gm-Message-State: AOAM531Ejbq4w0dFTU4QzXYx3tBsVuD1Pl6s+4Mny9BiWzx3ArZ7vDfx 0zmkErVUR3uT7oRUJksZun+vr9Eeu9Ea8Zl3 X-Google-Smtp-Source: ABdhPJybT7essjEazFStDZvQFwIjtUCed2xM7/bDNehpznpOyMLfuKIn9jZJVEfJDoCrDTFry8jUxA== X-Received: by 2002:a37:b307:: with SMTP id c7mr5476826qkf.33.1599173191287; Thu, 03 Sep 2020 15:46:31 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id r42sm3187792qtk.29.2020.09.03.15.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:30 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:28 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 07/14] bloom: split 'get_bloom_filter()' in two Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with two counters in the 'write_commit_graph_context' struct which store the number of filters that we computed, and the number of those which were too large to store. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau --- blame.c | 2 +- bloom.c | 13 ++++++++++--- bloom.h | 10 +++++++--- commit-graph.c | 38 +++++++++++++++++++++++++++++++++++--- line-log.c | 2 +- revision.c | 2 +- t/helper/test-bloom.c | 3 ++- 7 files changed, 57 insertions(+), 13 deletions(-) diff --git a/blame.c b/blame.c index 903e23af23..e5ba35dbd1 100644 --- a/blame.c +++ b/blame.c @@ -1276,7 +1276,7 @@ static int maybe_changed_path(struct repository *r, if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY) return 1; - filter = get_bloom_filter(r, origin->commit, 0); + filter = get_bloom_filter(r, origin->commit); if (!filter) return 1; diff --git a/bloom.c b/bloom.c index cd9380ac62..a8a21762f4 100644 --- a/bloom.c +++ b/bloom.c @@ -177,9 +177,10 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, return strcmp(e1->path, e2->path); } -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present) +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed) { struct bloom_filter *filter; struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; @@ -187,6 +188,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct diff_options diffopt; int max_changes = 512; + if (computed) + *computed = 0; + if (!bloom_filters.slab_size) return NULL; @@ -273,6 +277,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, filter->len = 0; } + if (computed) + *computed = 1; + free(diff_queued_diff.queue); DIFF_QUEUE_CLEAR(&diff_queued_diff); diff --git a/bloom.h b/bloom.h index 0b9b59a6fe..baa91926db 100644 --- a/bloom.h +++ b/bloom.h @@ -89,9 +89,13 @@ void add_key_to_filter(const struct bloom_key *key, void init_bloom_filters(void); -struct bloom_filter *get_bloom_filter(struct repository *r, - struct commit *c, - int compute_if_not_present); +struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, + struct commit *c, + int compute_if_not_present, + int *computed); + +#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ + (r), (c), 0, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index 55af498aa0..cabac7f45b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -971,6 +971,9 @@ struct write_commit_graph_context { const struct split_commit_graph_opts *split_opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + + int count_bloom_filter_found_large; + int count_bloom_filter_computed; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1182,7 +1185,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1222,7 +1225,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0); + struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1392,6 +1395,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_found_large", + ctx->count_bloom_filter_found_large); + jw_object_intmax(&jw, "filter_computed", + ctx->count_bloom_filter_computed); + jw_end(&jw); + + trace2_data_json("commit-graph", the_repository, "bloom_statistics", &jw); + + jw_release(&jw); +} + static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; @@ -1414,12 +1433,25 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); for (i = 0; i < ctx->commits.nr; i++) { + int computed = 0; struct commit *c = sorted_commits[i]; - struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) + ctx->count_bloom_filter_found_large++; + } ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } + if (trace2_is_enabled()) + trace2_bloom_filter_write_statistics(ctx); + free(sorted_commits); stop_progress(&progress); } diff --git a/line-log.c b/line-log.c index bf73ea95ac..68eeb425f8 100644 --- a/line-log.c +++ b/line-log.c @@ -1159,7 +1159,7 @@ static int bloom_filter_check(struct rev_info *rev, return 1; if (!rev->bloom_filter_settings || - !(filter = get_bloom_filter(rev->repo, commit, 0))) + !(filter = get_bloom_filter(rev->repo, commit))) return 1; if (!range) diff --git a/revision.c b/revision.c index 857274408c..f4be5d1650 100644 --- a/revision.c +++ b/revision.c @@ -751,7 +751,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY) return -1; - filter = get_bloom_filter(revs->repo, commit, 0); + filter = get_bloom_filter(revs->repo, commit); if (!filter) { count_bloom_filter_not_present++; diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 5e77d56f59..9f7bb729fc 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -39,7 +39,8 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) struct bloom_filter *filter; setup_git_directory(); c = lookup_commit(the_repository, commit_oid); - filter = get_bloom_filter(the_repository, c, 1); + filter = get_or_compute_bloom_filter(the_repository, c, 1, + NULL); print_bloom_filter(filter); } From patchwork Thu Sep 3 22:46:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EC92746 for ; Thu, 3 Sep 2020 22:46:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C62220786 for ; Thu, 3 Sep 2020 22:46:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Jzt3q7I+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729610AbgICWqn (ORCPT ); Thu, 3 Sep 2020 18:46:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729598AbgICWqh (ORCPT ); Thu, 3 Sep 2020 18:46:37 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DE3AC061244 for ; Thu, 3 Sep 2020 15:46:37 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id o5so4652070qke.12 for ; Thu, 03 Sep 2020 15:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=uxiuIK+jPRSc7gmqToFl9A5SprpoJ36HUo59Kr/HhjM=; b=Jzt3q7I+M8kQDoFti9FQ0Tpf5W7U+v3BWV31mnOZWZdRYAxeMWeCCia7iv67vY7RAJ y1A9CchoyYobVj6PYxFH3j9/Kre4mkIAc+bxGXo/G7TXZtVvECVV2xw2MGfLSpOpiNsT v/0vpZAZLmnxqXiTGyNxKKcx21lmF1U+iK8elx0m6JIPCXyTFfPk0qY1i08iC/mtaKTQ PeR/z/gjk9HhRj8PL9tZkX8jjZpySw3R+LmJFDWUjCwgk+bjsT7t/hcCETmjjZ6wW2rq gYMWBahGmB7pCVKVX81QDnz3UaehGX4bEbifpZsZDjcAR/jAg2n00MurmIJCZ7dYxprY BT3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=uxiuIK+jPRSc7gmqToFl9A5SprpoJ36HUo59Kr/HhjM=; b=Y7qXZKO3ChBLXAAH5y82/KtwcTLNJzN8mpX3upaDamD5+Vjt0v+4OnG4JJwRhVOkRn Yj24EfU+8fwZ+pTlPZ6TXly8BWBlNjfjL5LwjAvOGDAKtfjaCpSBuXo7zQ0b8MM8sxLD YQkj102sqg4RQDvtQSouHxTY2PIT51bc0BOrR/gWyPVyDRlwborPJiBivUmE1LVOG0XX QC4gaVwFYGMOrsk/ipBVuFM0J6ojJ340WCjDeC+bYK/MZkiu9GXgTdOyJzkZc8CgMVCS 37QLjUWnj2nGG+4OAIEdRqnGirjr3FOrfDDTtqHHedACEKskCqiLjOrBKOi+7FoYfXli Z35Q== X-Gm-Message-State: AOAM533RL45vntc2YboLj5A3SleyYe2LQGblxrZvTSlLJ7VrJYSUZH49 dh1Es8kEYrDqv7e2znVY+h+upTkdPDq6Vqgx X-Google-Smtp-Source: ABdhPJxVH+AEAEDpgue2XqVCdX1IVd45ngGPT0fNUTYLMaqi8NsdTWUdHNRRhuk2sG/jnSrB0xXnNA== X-Received: by 2002:a05:620a:4d9:: with SMTP id 25mr5250313qks.483.1599173196046; Thu, 03 Sep 2020 15:46:36 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id i66sm3341816qkc.63.2020.09.03.15.46.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:35 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:33 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 08/14] bloom: use provided 'struct bloom_filter_settings' Message-ID: <89bedba0899bfd1f9904c1122c2872ef39f69564.1599172908.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter from scratch, it looks to the default 'struct bloom_filter_settings' in order to determine the maximum number of changed paths, number of bits per entry, and so on. All of these values have so far been constant, and so there was no need to pass in a pointer from the caller (eg., the one that is stored in the 'struct write_commit_graph_context'). Start passing in a 'struct bloom_filter_settings *' instead of using the default values to respect graph-specific settings (eg., in the case of setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS'). In order to have an initialized value for these settings, move its initialization to earlier in the commit-graph write. Signed-off-by: Taylor Blau --- bloom.c | 13 ++++++------- bloom.h | 3 ++- commit-graph.c | 21 ++++++++++----------- t/helper/test-bloom.c | 1 + 4 files changed, 19 insertions(+), 19 deletions(-) diff --git a/bloom.c b/bloom.c index a8a21762f4..0cf1962dc5 100644 --- a/bloom.c +++ b/bloom.c @@ -180,13 +180,12 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data, struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, int *computed) { struct bloom_filter *filter; - struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; int i; struct diff_options diffopt; - int max_changes = 512; if (computed) *computed = 0; @@ -211,7 +210,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; diffopt.detect_rename = 0; - diffopt.max_changes = max_changes; + diffopt.max_changes = settings->max_changed_paths; diff_setup_done(&diffopt); /* ensure commit is parsed so we have parent information */ @@ -223,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= max_changes) { + if (diffopt.num_changes <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -260,13 +259,13 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } - filter->len = (hashmap_get_size(&pathmap) * settings.bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; - fill_bloom_key(e->path, strlen(e->path), &key, &settings); - add_key_to_filter(&key, filter, &settings); + fill_bloom_key(e->path, strlen(e->path), &key, settings); + add_key_to_filter(&key, filter, settings); } hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); diff --git a/bloom.h b/bloom.h index baa91926db..3f19e3fca4 100644 --- a/bloom.h +++ b/bloom.h @@ -92,10 +92,11 @@ void init_bloom_filters(void); struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, + const struct bloom_filter_settings *settings, int *computed); #define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL) + (r), (c), 0, NULL, NULL) int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, diff --git a/commit-graph.c b/commit-graph.c index cabac7f45b..7ba9ae26e1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1439,6 +1439,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->r, c, 1, + ctx->bloom_settings, &computed); if (computed) { ctx->count_bloom_filter_computed++; @@ -1695,17 +1696,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int num_chunks = 3; uint64_t chunk_offset; struct object_id file_hash; - struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; - - if (!ctx->bloom_settings) { - bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", - bloom_settings.bits_per_entry); - bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", - bloom_settings.num_hashes); - bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", - bloom_settings.max_changed_paths); - ctx->bloom_settings = &bloom_settings; - } if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -2151,6 +2141,7 @@ int write_commit_graph(struct object_directory *odb, uint32_t i, count_distinct = 0; int res = 0; int replace = 0; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; if (!commit_graph_compatible(the_repository)) return 0; @@ -2164,6 +2155,14 @@ int write_commit_graph(struct object_directory *odb, ctx->split_opts = split_opts; ctx->total_bloom_filter_data_size = 0; + bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", + bloom_settings.bits_per_entry); + bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", + bloom_settings.num_hashes); + bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", + bloom_settings.max_changed_paths); + ctx->bloom_settings = &bloom_settings; + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index 9f7bb729fc..46e97b04eb 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -40,6 +40,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) setup_git_directory(); c = lookup_commit(the_repository, commit_oid); filter = get_or_compute_bloom_filter(the_repository, c, 1, + &settings, NULL); print_bloom_filter(filter); } From patchwork Thu Sep 3 22:46:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F28FB109A for ; Thu, 3 Sep 2020 22:46:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C9B2320716 for ; Thu, 3 Sep 2020 22:46:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="B2GNggjT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729612AbgICWqq (ORCPT ); Thu, 3 Sep 2020 18:46:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729583AbgICWqm (ORCPT ); Thu, 3 Sep 2020 18:46:42 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1062C061244 for ; Thu, 3 Sep 2020 15:46:41 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id g3so3241088qtq.10 for ; Thu, 03 Sep 2020 15:46:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=xi/lCT5iYM6j60LMmT2oBI9tLnDpx3gvN3HJ/ZSaKK8=; b=B2GNggjTItZb/i2S0ZJpf1R+E3t3kVEfVYtZnL9680Qu8KDslIFZn8hAQ55IUbeGRr zVLg47dLDSF9YKl2GVNF1hSCTXUS5yzXI46nTopg6RrUPMBemXa2DS2Ezk6cjwer78/B pJ0wqw0iahNZ8kHUH5yaNGvMUR9+S7qz4P380KZKt6Df1i2xxGey4XEy3Mjswk5fuRJI wf0USpH1LlejXvw4kPVu8RewBeJ67t0E3hi4ByxEq+nYbq5FYkEBXMYb+/VJTLjG9Cyu Q1tSPwmX3ySNtokZyttXu27d/0qHv4knuwaEHBJFyyAOhyG8XKfM78Oe58+5d1ssYcyv eJKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=xi/lCT5iYM6j60LMmT2oBI9tLnDpx3gvN3HJ/ZSaKK8=; b=TWyrNFRAUPEiBhzMo3aKuWdL85GiprbMyJZov3gc5/VKP5fVB2o4MLg1V2DxdNP1VT HDDoB80/mxxzgNjv8ED3YYO1QNY7Ok/sRrzfllcxiI206QBMZX3duA7ZRXvC6rZB2FBB NBkDjN1pgaftb2/hCTL9heU3+s+zBRWx0i1VXyXgqeufvHVntb4Rl6jn5wgVh3NOMFEN yg177o0FC1OLQJLkjTaXezgPAOqXm5je2SDKi8Bm5fJV2JE54m1kgiA/VsmonkWAYrUP xzmtSnv1N6khS/4Fe//sN1jgeef3JBesK0VrPTZUocApttu0Mai0h0EAtsV+Ess1ZJUu vX/g== X-Gm-Message-State: AOAM532K61aSKCY0PRvRXuueGclVbOfufYWW84PNgWL2fWfmqRhI3WoF piTCYZAyljU3ySh4bBnI0DAhJcmnUgJkgY0o X-Google-Smtp-Source: ABdhPJy19zXnHPq91ans94hwkIy09tAMAkXofX/kYeSTx3pk1/zGUiZZsUPnJWUB47/B1FeOiKN6Zw== X-Received: by 2002:ac8:1c43:: with SMTP id j3mr5927926qtk.302.1599173200746; Thu, 03 Sep 2020 15:46:40 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id q126sm3243759qkb.75.2020.09.03.15.46.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:40 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:38 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 09/14] bloom/diff: properly short-circuit on max_changes Message-ID: <427f129656c0b8112c33e75cd2b49959be42481e.1599172908.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee Commit e3696980 (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are | | N=1 | N=10 | N=512 | |--------|----------------|----------------|----------------| | HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB | | HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB | For Linux, the results are | | N=1 | N=20 | N=512 | |--------|----------------|---------------|---------------| | HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB | | HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB | Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- bloom.c | 6 +++- diff.h | 2 -- t/t4216-log-bloom.sh | 85 +++++++++++++++++++++++++++++++++++++++----- tree-diff.c | 5 +-- 4 files changed, 82 insertions(+), 16 deletions(-) diff --git a/bloom.c b/bloom.c index 0cf1962dc5..ed54e96e57 100644 --- a/bloom.c +++ b/bloom.c @@ -222,7 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diffopt.num_changes <= settings->max_changed_paths) { + if (diff_queued_diff.nr <= settings->max_changed_paths) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; @@ -259,6 +259,9 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff_free_filepair(diff_queued_diff.queue[i]); } + if (hashmap_get_size(&pathmap) > settings->max_changed_paths) + goto cleanup; + filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; filter->data = xcalloc(filter->len, sizeof(unsigned char)); @@ -268,6 +271,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, add_key_to_filter(&key, filter, settings); } + cleanup: hashmap_free_entries(&pathmap, struct pathmap_hash_entry, entry); } else { for (i = 0; i < diff_queued_diff.nr; i++) diff --git a/diff.h b/diff.h index e0c0af6286..1d32b71885 100644 --- a/diff.h +++ b/diff.h @@ -287,8 +287,6 @@ struct diff_options { /* If non-zero, then stop computing after this many changes. */ int max_changes; - /* For internal use only. */ - int num_changes; int ita_invisible_in_index; /* white-space error highlighting */ diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 47ddf2641f..e8788749bf 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -182,20 +182,87 @@ test_expect_success 'persist filter settings' ' ' test_expect_success 'correctly report changes over limit' ' - git init 513changes && + git init limits && ( - cd 513changes && - for i in $(test_seq 1 513) + cd limits && + mkdir d && + mkdir d/e && + + for i in $(test_seq 1 2) do - echo $i >file$i.txt || return 1 + printf $i >d/file$i.txt && + printf $i >d/e/file$i.txt || return 1 done && - git add . && + + mkdir mode && + printf bash >mode/script.sh && + + mkdir foo && + touch foo/bar && + touch foo.txt && + + git add d foo foo.txt mode && git commit -m "files" && - git commit-graph write --reachable --changed-paths && - for i in $(test_seq 1 513) + + # Commit has 7 file and 4 directory adds + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace && + grep "\"filter_found_large\":1" trace && + + for path in $(git ls-tree -r --name-only HEAD) do - git -c core.commitGraph=false log -- file$i.txt >expect && - git log -- file$i.txt >actual && + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # Make a variety of path changes + printf new1 >d/e/file1.txt && + printf new2 >d/file2.txt && + rm d/e/file2.txt && + rm -r foo && + printf text >foo && + mkdir f && + printf new1 >f/file1.txt && + + # including a mode-only change (counts as modified) + git update-index --chmod=+x mode/script.sh && + + git add foo d f && + git commit -m "complicated" && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + GIT_TRACE2_EVENT="$(pwd)/trace-edit" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":10" trace-edit && + grep "\"filter_found_large\":2" trace-edit && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && + test_cmp expect actual || return 1 + done && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=11 \ + GIT_TRACE2_EVENT="$(pwd)/trace-update" \ + git commit-graph write --reachable --changed-paths && + grep "\"max_changed_paths\":11" trace-update && + grep "\"filter_found_large\":0" trace-update && + + for path in $(git ls-tree -r --name-only HEAD) + do + git -c commitGraph.readChangedPaths=false log \ + -- $path >expect && + git log -- $path >actual && test_cmp expect actual || return 1 done ) diff --git a/tree-diff.c b/tree-diff.c index 6ebad1a46f..7cebbb327e 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -434,7 +434,7 @@ static struct combine_diff_path *ll_diff_tree_paths( if (diff_can_quit_early(opt)) break; - if (opt->max_changes && opt->num_changes > opt->max_changes) + if (opt->max_changes && diff_queued_diff.nr > opt->max_changes) break; if (opt->pathspec.nr) { @@ -521,7 +521,6 @@ static struct combine_diff_path *ll_diff_tree_paths( /* t↓ */ update_tree_entry(&t); - opt->num_changes++; } /* t > p[imin] */ @@ -539,7 +538,6 @@ static struct combine_diff_path *ll_diff_tree_paths( skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ update_tp_entries(tp, nparent); - opt->num_changes++; } } @@ -557,7 +555,6 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { - opt->num_changes = 0; p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); /* From patchwork Thu Sep 3 22:46:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9DE9B109A for ; Thu, 3 Sep 2020 22:46:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 801372078E for ; Thu, 3 Sep 2020 22:46:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="dNmEWDJR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729615AbgICWqt (ORCPT ); Thu, 3 Sep 2020 18:46:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729583AbgICWqr (ORCPT ); Thu, 3 Sep 2020 18:46:47 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A03AC061244 for ; Thu, 3 Sep 2020 15:46:47 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id w186so4732422qkd.1 for ; Thu, 03 Sep 2020 15:46:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FMxZe3e6Xgq5gCG5T0iRw0yuB/plFuoFTEvtL6UqiRs=; b=dNmEWDJRt+oZGIptzW8MMB1SJInTeUhby5hpxBuThCh2S7ueHLEENesL/yatIcBxAf cvvJaKi70ZpPlPLBAP6a2YKzuXncj8nSvDrj1+6Gys8piRsh8NXllc68KuXCVHZXgsrE yxg5AR2ez0U5Fpe4ZPLc2UcUBWGUU7BtAKw5V9oNeyLgc2Q4Z2+Dwf5eqJ6vTF7gCZaU 2/FT/3ZmJ2ZERJS3nISamYGaV2IGl2vXvd9rj6JgFEUFyKcdtLX11G4Bex4KrZPlGfgu ArieNPj0gS/Y/qCH8Oaoux4vDeHiUrRwXA6m2dPOzvPtANKHLW+vfwxw5wn9GZVrzBoV +HLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FMxZe3e6Xgq5gCG5T0iRw0yuB/plFuoFTEvtL6UqiRs=; b=jen0hYkCo4joEdA8Z+OCKtLZ9w4ZuwmZEc6mnNznmv3m2oXFDM8y64Fhawm71TZA+0 NpYFpDTqbK+ZCIY0pQmHvXUvmAz2IM5TE2IaDMJwTu3/aiSBh3vLg7vsd9bwTf4B2xQH 3TxkIuucZNKTrYzCKsek5qoittPFnda8BdgR/l6UJ2JVisPRWq150KoIgI2oWljp4inQ prAukho1PVpm46pob0tzYrwekLUOxOKXZpt5xeYeQw5fXGd3pP4c7Rpfrzj6JmdYhjXr RZYqiYvaUkTw4dr1qBq2p5M7zxOfuoUOyPB2slI7tuGGTL6iOU1PZp8VxnLXVj7HgK7O EbVw== X-Gm-Message-State: AOAM5332AYR6VXBqbwdDlp8eczBoxTGfifuYd8z/RE8r4uRpK8WNRr+N h2LXbwzA0FjZzmDJH4PpwFh05S61txFaDzob X-Google-Smtp-Source: ABdhPJzJxeINYchx9gRhjfqExBrNVFmaZBqPX7S3FYhtstd1DRLrY6blHJOKOPsO99aN32ZacL7btA== X-Received: by 2002:a37:8047:: with SMTP id b68mr5069769qkd.299.1599173205981; Thu, 03 Sep 2020 15:46:45 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id v39sm3186013qtv.47.2020.09.03.15.46.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:45 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:42 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 10/14] commit-graph.c: sort index into commits list Message-ID: <08b5f185f615de5d43e2397ae58730e82b18b7dd.1599172908.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org For locality, 'compute_bloom_filters()' sorts the commits for which it wants to compute Bloom filters in a preferred order (cf., 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) for details). A future patch will want to recover the new graph position of each commit. Since the 'packed_commit_list' already stores a double-pointer, avoid a 'COPY_ARRAY' and instead keep track of an index into the original list. (Use an integer index instead of a memory address, since this involves a needlessly confusing triple-pointer). Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to take into account the packed_commit_list they are sorting with respect to. Since 'compute_bloom_filters()' is the only caller for each of those comparison functions, no other call-sites need updating. Signed-off-by: Taylor Blau --- commit-graph.c | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 7ba9ae26e1..35535f4192 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -79,10 +79,18 @@ static void set_commit_pos(struct repository *r, const struct object_id *oid) *commit_pos_at(&commit_pos, commit) = max_pos++; } -static int commit_pos_cmp(const void *va, const void *vb) +struct packed_commit_list { + struct commit **list; + int nr; + int alloc; +}; + +static int commit_pos_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; return commit_pos_at(&commit_pos, a) - commit_pos_at(&commit_pos, b); } @@ -139,10 +147,12 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } -static int commit_gen_cmp(const void *va, const void *vb) +static int commit_gen_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; uint32_t generation_a = commit_graph_generation(a); uint32_t generation_b = commit_graph_generation(b); @@ -929,11 +939,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } -struct packed_commit_list { - struct commit **list; - int nr; - int alloc; -}; struct packed_oid_list { struct object_id *list; @@ -1415,7 +1420,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; struct progress *progress = NULL; - struct commit **sorted_commits; + int *sorted_commits; init_bloom_filters(); @@ -1425,16 +1430,16 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->commits.nr); ALLOC_ARRAY(sorted_commits, ctx->commits.nr); - COPY_ARRAY(sorted_commits, ctx->commits.list, ctx->commits.nr); - - if (ctx->order_by_pack) - QSORT(sorted_commits, ctx->commits.nr, commit_pos_cmp); - else - QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + for (i = 0; i < ctx->commits.nr; i++) + sorted_commits[i] = i; + QSORT_S(sorted_commits, ctx->commits.nr, + ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, + &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { int computed = 0; - struct commit *c = sorted_commits[i]; + int pos = sorted_commits[i]; + struct commit *c = ctx->commits.list[pos]; struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, From patchwork Thu Sep 3 22:46:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755507 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF129746 for ; Thu, 3 Sep 2020 22:46:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D16FA2078E for ; Thu, 3 Sep 2020 22:46:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="d7DPBuzF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729620AbgICWqx (ORCPT ); Thu, 3 Sep 2020 18:46:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729583AbgICWqw (ORCPT ); Thu, 3 Sep 2020 18:46:52 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C944DC061244 for ; Thu, 3 Sep 2020 15:46:51 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id o5so4652610qke.12 for ; Thu, 03 Sep 2020 15:46:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=RGaZagxN7aJ5DUTKsYEDT6jF/oHXp9qe7iVjJytpYFY=; b=d7DPBuzFFqbAg7ULl+0ii6ePR4BduZpoqRsheXz2I1raprCpLidZ2LInVw3Lhze1gy FUfRx2dYUs3dM9rQTczhchqZxbl8cyPhbnxxgTfF7djtuXvTFqz/uncjsNDVGUMi3Pua GmTU7RYMd350tOk5FAAiK1IyQ/ooajCJQah3Mv4lHML2JE9NibrkZuzbAtD3K5v8hecl GLc3dNlx9JyLefwbmGGbb47aAmtM/emAL7RRpnMClqRXVNAiNi8XI3fe1EBLeilPgoqA ETS9/BaYo5eug50hycST8Iv45n1jmBjJJaOUeVeUZNnxuZXq+7ffDbkIaJ2kbvF4p3xD AUAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=RGaZagxN7aJ5DUTKsYEDT6jF/oHXp9qe7iVjJytpYFY=; b=f3Erb85MPO3q76KHi7ej9wAK2xVFGrKYJ7bF2LXZEfwDadQ5NBhwdli9oY1agwtQP+ he929cp9du55R2Gfybc8nAvZw6NwyxFGtxY1Aje7do0StCbKki1AiW5tuE+61WViUdBW jfQNetHKjcsuRTsiZBv0S+aRldqFLd3G6Jg31r4iV8eyHMerTPU+s2kf8DvbPKK7UjNC KwB8tq7sCbN7N7oSlX98Wkdct6I4xy2xkCYvcHmbzKanCn7yotciiAI+KVjOocCag+ny nYnP6fNnZ1uMBIvU+MRmq/57nLCff9NXMwh1ddlzvW5zOTLZCmDrdtDHXKFPx2iPQY4S +1Fw== X-Gm-Message-State: AOAM533nrZ9BH+HZFTqYbAAKnvl6UWtSvN+6K2dvf0c0+n/xbiUFXGQJ rqfw7qALgnONE9ExDtLcl4gPWwMJ6aJ88lZ8 X-Google-Smtp-Source: ABdhPJxn5Qg+7TIUMdCMdVpAMqCDNUfqLSHzzEFPoqB/q0CLQZcXqgDd++0m+9p+iw52hgYNM/3ulQ== X-Received: by 2002:a05:620a:2014:: with SMTP id c20mr3239616qka.89.1599173210748; Thu, 03 Sep 2020 15:46:50 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id 7sm3095016qkh.60.2020.09.03.15.46.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:50 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:48 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 11/14] csum-file.h: introduce 'hashwrite_be64()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A small handful of writers who wish to encode 64-bit values in network order have worked around the lack of such a helper by calling the 32-bit variant twice. The subsequent commit will add another caller who wants to write a 64-bit value. To ease their (and the existing caller's) pain, introduce a helper to do just that, and convert existing call-sites. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit-graph.c | 8 ++------ csum-file.h | 6 ++++++ midx.c | 3 +-- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 35535f4192..01d791343a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1791,12 +1791,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; for (i = 0; i <= num_chunks; i++) { - uint32_t chunk_write[3]; - - chunk_write[0] = htonl(chunks[i].id); - chunk_write[1] = htonl(chunk_offset >> 32); - chunk_write[2] = htonl(chunk_offset & 0xffffffff); - hashwrite(f, chunk_write, 12); + hashwrite_be32(f, chunks[i].id); + hashwrite_be64(f, chunk_offset); chunk_offset += chunks[i].size; } diff --git a/csum-file.h b/csum-file.h index f9cbd317fb..b026ec7766 100644 --- a/csum-file.h +++ b/csum-file.h @@ -62,4 +62,10 @@ static inline void hashwrite_be32(struct hashfile *f, uint32_t data) hashwrite(f, &data, sizeof(data)); } +static inline void hashwrite_be64(struct hashfile *f, uint64_t data) +{ + hashwrite_be32(f, data >> 32); + hashwrite_be32(f, data & 0xffffffffUL); +} + #endif diff --git a/midx.c b/midx.c index e9b2e1253a..32cc5fdc22 100644 --- a/midx.c +++ b/midx.c @@ -789,8 +789,7 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off if (!(offset >> 31)) continue; - hashwrite_be32(f, offset >> 32); - hashwrite_be32(f, offset & 0xffffffffUL); + hashwrite_be64(f, offset); written += 2 * sizeof(uint32_t); nr_large_offset--; From patchwork Thu Sep 3 22:46:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755509 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0CB20746 for ; Thu, 3 Sep 2020 22:46:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D301A20786 for ; Thu, 3 Sep 2020 22:46:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="zm1pi2Id" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729622AbgICWq5 (ORCPT ); Thu, 3 Sep 2020 18:46:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729468AbgICWq4 (ORCPT ); Thu, 3 Sep 2020 18:46:56 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8985FC061244 for ; Thu, 3 Sep 2020 15:46:56 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id n10so3282872qtv.3 for ; Thu, 03 Sep 2020 15:46:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=oaJQtzdsCf45kIF/uBoImpwYsOf4KjIXg2At5AUVAY8=; b=zm1pi2Idrzm22iVvRuvRsVjpCKTDDY1sSXKBSGwKX1zaoF7dyTn3l0BuVIa2T/yHG/ Z0/S4ST/d0WxGiLgthPUY6gzNoMojnFhzi7iGzKwlBGBDuJg7lxoBSMx4zLSViXuJHb9 ksca5c/X4WdImFTjmc6NQ1bJ/ycu2N+yqdNYT5mtb5OB4ouQR+hYwuXHVe8eec72RSuz TPRRPxrHLpAGSpY3Hqak804Y6Jo0Q+scOIC4Gv6fAURABKjWImwYXz4mUxCUPJHHHIPF fNpGnlSaW5CTgPya59XnnnWtXFJqrEntyHRUJKwSpLc+hZJpmFc5UbKKnwqtVz8ABenz pUQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=oaJQtzdsCf45kIF/uBoImpwYsOf4KjIXg2At5AUVAY8=; b=C5Lj+iraADFDmBdcidOUeW0rgjsxGS23t5cjGDbCfC9izWm4RwHQzQThjSngd+ey0Z EysnkGxAzYLcaGevXnSWZLNG/wVz320fDbCzL0HZJ/nrxiNPLGwFmBYR8uZnpI83/WBI WopPcl7I/Kuy7jUpiHUG3bf28yUnGEVUMvtogMfYR1f2DF9KDWRvCtrDMf4OI39qreXN xKdIjS4zhJ9KxJEh/oY+SRVdohEcFP3z/nyrpJZApeWa1+h7szF1Q9Al88U/Tvscrq7E 2/WhV+r7LOSAfsECvU5dy/G9KqFwJmNyrtJbgsmxGAuRsdaX9HcZfyMBVHi9cvuYPkz+ sAkA== X-Gm-Message-State: AOAM530ZjDop15iR4TIkVUzecUEhOGyEflKc8xe/1lTVPgjhmSwHesxR QlKN0cqAUD2Pyl40uYK0MgEFXZxQue3HQG2W X-Google-Smtp-Source: ABdhPJxkrA1jakxFR9fTl9UjB4kTdn5cPxsHdRvKUtdz5AAbnEv4oSEbXVB4eiFOd9IxHl1dr/JSAg== X-Received: by 2002:ac8:7388:: with SMTP id t8mr5939957qtp.187.1599173215222; Thu, 03 Sep 2020 15:46:55 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id b34sm3242868qtc.73.2020.09.03.15.46.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:54 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 12/14] commit-graph: add large-filters bitmap chunk Message-ID: <3063beb58859f997188ae0cca046aaa93eec3d9d.1599172908.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a commit has more than a certain number of changed paths (commonly 512), the commit-graph machinery represents it as a zero-length filter. In addition to these too-large filters, the commit-graph machinery also represents commits with no filter and commits with no changed paths in the same way. When writing a commit-graph that aggregates several incremental commit-graph layers (eg., with '--split=replace'), the commit-graph machinery first computes all of the Bloom filters that it wants to write but does not already know about from existing graph layers. Because we overload the zero-length filter in the above fashion, this leads to recomputing large filters over and over again. This is already undesirable, since it means that we are wasting considerable effort to discover that a commit with too many changed paths, only to throw that effort away (and then repeat the process the next time a roll-up is performed). In a subsequent patch, we will add a '--max-new-filters=' option, which specifies an upper-bound on the number of new filters we are willing to compute from scratch. Suppose that there are 'N' too-large filters, and we specify '--max-new-filters=M'. If 'N >= M', it is unlikely that any filters will be generated, since we'll spend most of our effort on filters that we ultimately throw away. If 'N < M', filters will trickle in over time, but only at most 'M - N' per-write. To address this, add a new chunk which encodes a bitmap where the ith bit is on iff the ith commit has zero or at least 512 changed paths. Likewise store the maximum number of changed paths we are willing to store in order to prepare for eventually making this value more easily customizable. When computing Bloom filters, first consult the relevant bitmap (in the case that we are rolling up existing layers) to see if computing the Bloom filter from scratch would be a waste of time. This patch implements a new chunk instead of extending the existing BIDX and BDAT chunks because modifying these chunks would confuse old clients. (Eg., setting the most-significant bit in the BIDX chunk would confuse old clients and require a version bump). To allow using the existing bitmap code with 64-bit words, we write the data in network byte order from the 64-bit words. This means we also need to read the array from the commit-graph file by translating each word from network byte order using get_be64() when loading the commit graph. Initialize this bitmap lazily to avoid paying a linear-time cost upon each commit-graph load even if we do not need the bitmaps themselves. By avoiding the need to move to new versions of the BDAT and BIDX chunk, we can give ourselves more time to consider whether or not other modifications to these chunks are worthwhile without holding up this change. Another approach would be to introduce a new BIDX chunk (say, one identified by 'BID2') which is identical to the existing BIDX chunk, except the most-significant bit of each offset is interpreted as "this filter is too big" iff looking at a BID2 chunk. This avoids having to write a bitmap, but forces older clients to rewrite their commit-graphs (as well as reduces the theoretical largest Bloom filters we could write, and forces us to maintain the code necessary to translate BIDX chunks to BID2 ones). Separately from this patch, I implemented this alternate approach and did not find it to be advantageous. Signed-off-by: Taylor Blau --- .../technical/commit-graph-format.txt | 13 +++ bloom.h | 2 +- commit-graph.c | 100 +++++++++++++++--- commit-graph.h | 10 ++ t/t4216-log-bloom.sh | 25 ++++- 5 files changed, 135 insertions(+), 15 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 6ddbceba15..db212b8cc5 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -128,6 +128,19 @@ CHUNK DATA: of length zero. * The BDAT chunk is present if and only if BIDX is present. + Large Bloom Filters (ID: {'B', 'F', 'X', 'L'}) [Optional] + * It starts with a 32-bit unsigned integer specifying the maximum number of + changed-paths that can be stored in a single Bloom filter. + * It then contains a list of 64-bit words in network order (the length of + this list is determined by the width of the chunk) which is a bitmap. The + 'i'th bit is set exactly when the 'i'th commit in the graph has a + changed-path Bloom filter with zero entries (either because the commit is + empty, or because it contains more entries than is allowed per filter by + the layer that contains it). + * The BFXL chunk is present only when the BIDX and BDAT chunks are + also present. + + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] This list of H-byte hashes describe a set of B commit-graph files that form a commit-graph chain. The graph position for the ith commit in this diff --git a/bloom.h b/bloom.h index 3f19e3fca4..464d9b57de 100644 --- a/bloom.h +++ b/bloom.h @@ -33,7 +33,7 @@ struct bloom_filter_settings { * The maximum number of changed paths per commit * before declaring a Bloom filter to be too-large. * - * Not written to the commit-graph file. + * Written to the 'BFXL' chunk (instead of 'BDAT'). */ uint32_t max_changed_paths; }; diff --git a/commit-graph.c b/commit-graph.c index 01d791343a..68ffa6ec35 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -41,8 +41,9 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ +#define GRAPH_CHUNKID_BLOOMLARGE 0x4246584c /* "BFXL" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 8 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -436,6 +437,16 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8); } break; + + case GRAPH_CHUNKID_BLOOMLARGE: + if (graph->chunk_bloom_large_filters) + chunk_repeated = 1; + else if (r->settings.commit_graph_read_changed_paths) { + graph->chunk_bloom_large_filters = data + chunk_offset + sizeof(uint32_t); + graph->bloom_large_alloc = get_be64(chunk_lookup + 4) - chunk_offset - sizeof(uint32_t); + graph->bloom_filter_settings->max_changed_paths = get_be32(data + chunk_offset); + } + break; } if (chunk_repeated) { @@ -450,6 +461,8 @@ struct commit_graph *parse_commit_graph(struct repository *r, /* We need both the bloom chunks to exist together. Else ignore the data */ graph->chunk_bloom_indexes = NULL; graph->chunk_bloom_data = NULL; + graph->chunk_bloom_large_filters = NULL; + graph->bloom_large_alloc = 0; FREE_AND_NULL(graph->bloom_filter_settings); } @@ -939,6 +952,32 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } +int get_bloom_filter_large_in_graph(struct commit_graph *g, + const struct commit *c) +{ + uint32_t graph_pos; + if (!find_commit_in_graph(c, g, &graph_pos)) + return 0; + + while (g && graph_pos < g->num_commits_in_base) + g = g->base_graph; + + if (!g) + return 0; + + if (!g->bloom_large && g->bloom_large_alloc) { + size_t i; + g->bloom_large = bitmap_word_alloc(g->bloom_large_alloc); + + for (i = 0; i < g->bloom_large->word_alloc; i++) + g->bloom_large->words[i] = get_be64( + g->chunk_bloom_large_filters + i * sizeof(eword_t)); + } + + if (!g->bloom_large) + return 0; + return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base); +} struct packed_oid_list { struct object_id *list; @@ -977,8 +1016,10 @@ struct write_commit_graph_context { size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; + int count_bloom_filter_known_large; int count_bloom_filter_found_large; int count_bloom_filter_computed; + struct bitmap *bloom_large; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1242,6 +1283,23 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, return 0; } +static int write_graph_chunk_bloom_large(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + size_t i, alloc = ctx->commits.nr / BITS_IN_EWORD; + if (ctx->commits.nr % BITS_IN_EWORD) + alloc++; + if (alloc > ctx->bloom_large->word_alloc) + BUG("write_graph_chunk_bloom_large: bitmap not large enough"); + + trace2_region_enter("commit-graph", "bloom_large", ctx->r); + hashwrite_be32(f, ctx->bloom_settings->max_changed_paths); + for (i = 0; i < ctx->bloom_large->word_alloc; i++) + hashwrite_be64(f, ctx->bloom_large->words[i]); + trace2_region_leave("commit-graph", "bloom_large", ctx->r); + return 0; +} + static int oid_compare(const void *_a, const void *_b) { const struct object_id *a = (const struct object_id *)_a; @@ -1405,6 +1463,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte struct json_writer jw = JSON_WRITER_INIT; jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "filter_known_large", + ctx->count_bloom_filter_known_large); jw_object_intmax(&jw, "filter_found_large", ctx->count_bloom_filter_found_large); jw_object_intmax(&jw, "filter_computed", @@ -1423,6 +1483,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int *sorted_commits; init_bloom_filters(); + ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); if (ctx->report_progress) progress = start_delayed_progress( @@ -1437,21 +1498,28 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { - int computed = 0; int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - struct bloom_filter *filter = get_or_compute_bloom_filter( - ctx->r, - c, - 1, - ctx->bloom_settings, - &computed); - if (computed) { - ctx->count_bloom_filter_computed++; - if (filter && !filter->len) - ctx->count_bloom_filter_found_large++; + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_known_large++; + } else { + int computed = 0; + struct bloom_filter *filter = get_or_compute_bloom_filter( + ctx->r, + c, + 1, + ctx->bloom_settings, + &computed); + if (computed) { + ctx->count_bloom_filter_computed++; + if (filter && !filter->len) { + bitmap_set(ctx->bloom_large, pos); + ctx->count_bloom_filter_found_large++; + } + } + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1); } @@ -1771,6 +1839,11 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) + ctx->total_bloom_filter_data_size; chunks[num_chunks].write_fn = write_graph_chunk_bloom_data; num_chunks++; + chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMLARGE; + chunks[num_chunks].size = sizeof(eword_t) * ctx->bloom_large->word_alloc + + sizeof(uint32_t); + chunks[num_chunks].write_fn = write_graph_chunk_bloom_large; + num_chunks++; } if (ctx->num_commit_graphs_after > 1) { chunks[num_chunks].id = GRAPH_CHUNKID_BASE; @@ -2510,6 +2583,7 @@ void free_commit_graph(struct commit_graph *g) } free(g->filename); free(g->bloom_filter_settings); + bitmap_free(g->bloom_large); free(g); } diff --git a/commit-graph.h b/commit-graph.h index d9acb22bac..9afb1477d5 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -4,6 +4,7 @@ #include "git-compat-util.h" #include "object-store.h" #include "oidset.h" +#include "ewah/ewok.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" @@ -50,6 +51,9 @@ void load_commit_graph_info(struct repository *r, struct commit *item); struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit *c); +int get_bloom_filter_large_in_graph(struct commit_graph *g, + const struct commit *c); + struct commit_graph { const unsigned char *data; size_t data_len; @@ -71,6 +75,10 @@ struct commit_graph { const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + const unsigned char *chunk_bloom_large_filters; + + struct bitmap *bloom_large; + size_t bloom_large_alloc; struct bloom_filter_settings *bloom_filter_settings; }; @@ -83,6 +91,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r, struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size); +void prepare_commit_graph_bloom_large(struct commit_graph *g); + /* * Return 1 if and only if the repository has a commit-graph * file and generation numbers are computed in that file. diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index e8788749bf..fed4929af3 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -38,7 +38,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' EOF ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 @@ -267,5 +267,28 @@ test_expect_success 'correctly report changes over limit' ' done ) ' +test_bloom_filters_computed () { + commit_graph_args=$1 + bloom_trace_prefix="{\"filter_known_large\":$2,\"filter_found_large\":$3,\"filter_computed\":$4" + rm -f "$TRASH_DIRECTORY/trace.event" && + GIT_TRACE2_EVENT="$TRASH_DIRECTORY/trace.event" git commit-graph write $commit_graph_args && + grep "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.event" +} + +test_expect_success 'Bloom generation does not recompute too-large filters' ' + ( + cd limits && + + # start from scratch and rebuild + rm -f .git/objects/info/commit-graph && + GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=10 \ + git commit-graph write --reachable --changed-paths \ + --split=replace && + test_commit c1 filter && + + test_bloom_filters_computed "--reachable --changed-paths --split=replace" \ + 2 0 1 + ) +' test_done From patchwork Thu Sep 3 22:46:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755511 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F76F109A for ; Thu, 3 Sep 2020 22:47:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B5312078E for ; Thu, 3 Sep 2020 22:47:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="hD8Bjt6a" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729624AbgICWrC (ORCPT ); Thu, 3 Sep 2020 18:47:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729468AbgICWrB (ORCPT ); Thu, 3 Sep 2020 18:47:01 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9274C061244 for ; Thu, 3 Sep 2020 15:47:00 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id f142so4646993qke.13 for ; Thu, 03 Sep 2020 15:47:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1pgZgw01AY2qiT2UbLacvXpIDOczgfjTcvIITIWv/Z0=; b=hD8Bjt6asK+zEtHduJPsdOOHa5mmLkC7EzHJaC02kI6Rl5QB32PuyWELll2H4HfCUV O79l9hKzULXiPnRexKYiR5MleyXlrWEXkxa8Hm5NHo5qstn958gWZcQpdIKy80PSuD71 ZVW3L+qnEIVSMa5P4b+fNjkxQ1rPqJzUqgN99/haHeifMWdvs8hD3+3sqIiOrpABNx7i UQqT1z/mAB5XjLNhWrEsCiwEFiqx6jkXLZZoaZcfd/FDED6/G2GSV5iA/If5Zp2lNEX2 iAFq7yq71AehmuHDBRxcr8jql5zvacpuh1FW7rJX3sWspi+JJ9GMVG4DvYRHj7nxLa/f ft4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1pgZgw01AY2qiT2UbLacvXpIDOczgfjTcvIITIWv/Z0=; b=f2Pg8Wx8nhaxAJNUMk5xFeI1LugEbKGS/JYpKlYGHzP1HzPmvK/RaSkw55/6CAT3Kw bS9JQ4CqgWskvsJiAP6XVBuDeCxowA2snDvrbJ5kn+7QZS2yV3iwhAZ/FBQDIXFDscAL qNp/mi2AB+M7pWYd1yhvHbBgfvEJyRsaX/ngYKNPZqSld8imUSiGE6804od+cz0M7zPI VudmDnGHd9NPs9KMScOgz9pNhmVEOcrzqsTs019YvV3KfG6kb28K24OozsxUB/Z/930t 3mpS4u4aNWkOfq0NYD1+4P0tLvWOBnM3car0I4zfcxenu55b4mqWFBO6IaEDHwMooAs3 zasg== X-Gm-Message-State: AOAM531sOsyPLva5Tzmu6sW1/BfvSQ6Alozag5SDADO7WSm1dJuCgvbg crxJ++yROZ/GAN2+AvziEvGlXYDJs3eOEsBb X-Google-Smtp-Source: ABdhPJwic6XJ0uhZIOJq68xXGNSgSlyUuhvRuKrmNgiAZprURzQWiLjanvt9hhAqOTkUZUa8T3nwcw== X-Received: by 2002:a37:b57:: with SMTP id 84mr5111235qkl.433.1599173219715; Thu, 03 Sep 2020 15:46:59 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id l38sm3233994qtl.58.2020.09.03.15.46.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:46:59 -0700 (PDT) Date: Thu, 3 Sep 2020 18:46:57 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 13/14] commit-graph: rename 'split_commit_graph_opts' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the subsequent commit, additional options will be added to the commit-graph API which have nothing to do with splitting. Rename the 'split_commit_graph_opts' structure to the more-generic 'commit_graph_opts' to encompass both. Likewise, rename the 'flags' member to instead be 'split_flags' to clarify that it only has to do with the behavior implied by '--split'. Suggested-by: Derrick Stolee Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- builtin/commit-graph.c | 20 ++++++++++---------- commit-graph.c | 40 ++++++++++++++++++++-------------------- commit-graph.h | 8 ++++---- make: *** [Makefile | 0 4 files changed, 34 insertions(+), 34 deletions(-) create mode 100644 make: *** [Makefile diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ba5584463f..f3243bd982 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -119,7 +119,7 @@ static int graph_verify(int argc, const char **argv) } extern int read_replace_refs; -static struct split_commit_graph_opts split_opts; +static struct commit_graph_opts write_opts; static int write_option_parse_split(const struct option *opt, const char *arg, int unset) @@ -187,24 +187,24 @@ static int graph_write(int argc, const char **argv) OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, N_("enable computation for changed paths")), OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), - OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL, + OPT_CALLBACK_F(0, "split", &write_opts.split_flags, NULL, N_("allow writing an incremental commit-graph file"), PARSE_OPT_OPTARG | PARSE_OPT_NONEG, write_option_parse_split), - OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + OPT_INTEGER(0, "max-commits", &write_opts.max_commits, N_("maximum number of commits in a non-base split commit-graph")), - OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + OPT_INTEGER(0, "size-multiple", &write_opts.size_multiple, N_("maximum ratio between two levels of a split commit-graph")), - OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), OPT_END(), }; opts.progress = isatty(2); opts.enable_changed_paths = -1; - split_opts.size_multiple = 2; - split_opts.max_commits = 0; - split_opts.expire_time = 0; + write_opts.size_multiple = 2; + write_opts.max_commits = 0; + write_opts.expire_time = 0; trace2_cmd_mode("write"); @@ -232,7 +232,7 @@ static int graph_write(int argc, const char **argv) odb = find_odb(the_repository, opts.obj_dir); if (opts.reachable) { - if (write_commit_graph_reachable(odb, flags, &split_opts)) + if (write_commit_graph_reachable(odb, flags, &write_opts)) return 1; return 0; } @@ -261,7 +261,7 @@ static int graph_write(int argc, const char **argv) opts.stdin_packs ? &pack_indexes : NULL, opts.stdin_commits ? &commits : NULL, flags, - &split_opts)) + &write_opts)) result = 1; cleanup: diff --git a/commit-graph.c b/commit-graph.c index 68ffa6ec35..33fcf01a7a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1012,7 +1012,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; - const struct split_commit_graph_opts *split_opts; + const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1353,8 +1353,8 @@ static void close_reachable(struct write_commit_graph_context *ctx) { int i; struct commit *commit; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->split_flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; if (ctx->report_progress) ctx->progress = start_delayed_progress( @@ -1554,7 +1554,7 @@ static int add_ref_to_set(const char *refname, int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct oidset commits = OIDSET_INIT; struct refs_cb_data data; @@ -1571,7 +1571,7 @@ int write_commit_graph_reachable(struct object_directory *odb, stop_progress(&data.progress); result = write_commit_graph(odb, NULL, &commits, - flags, split_opts); + flags, opts); oidset_clear(&commits); return result; @@ -1686,8 +1686,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) static void copy_oids_to_commits(struct write_commit_graph_context *ctx) { uint32_t i; - enum commit_graph_split_flags flags = ctx->split_opts ? - ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; + enum commit_graph_split_flags flags = ctx->opts ? + ctx->opts->split_flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED; ctx->num_extra_edges = 0; if (ctx->report_progress) @@ -1973,13 +1973,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) int max_commits = 0; int size_mult = 2; - if (ctx->split_opts) { - max_commits = ctx->split_opts->max_commits; + if (ctx->opts) { + max_commits = ctx->opts->max_commits; - if (ctx->split_opts->size_multiple) - size_mult = ctx->split_opts->size_multiple; + if (ctx->opts->size_multiple) + size_mult = ctx->opts->size_multiple; - flags = ctx->split_opts->flags; + flags = ctx->opts->split_flags; } g = ctx->r->objects->commit_graph; @@ -2157,8 +2157,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) size_t dirnamelen; timestamp_t expire_time = time(NULL); - if (ctx->split_opts && ctx->split_opts->expire_time) - expire_time = ctx->split_opts->expire_time; + if (ctx->opts && ctx->opts->expire_time) + expire_time = ctx->opts->expire_time; if (!ctx->split) { char *chain_file_name = get_chain_filename(ctx->odb); unlink(chain_file_name); @@ -2209,7 +2209,7 @@ int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts) + const struct commit_graph_opts *opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -2226,7 +2226,7 @@ int write_commit_graph(struct object_directory *odb, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; - ctx->split_opts = split_opts; + ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", @@ -2274,15 +2274,15 @@ int write_commit_graph(struct object_directory *odb, } } - if (ctx->split_opts) - replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + if (ctx->opts) + replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; } ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) - ctx->oids.alloc = split_opts->max_commits; + if (ctx->split && opts && ctx->oids.alloc > opts->max_commits) + ctx->oids.alloc = opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->odb); diff --git a/commit-graph.h b/commit-graph.h index 9afb1477d5..fe798a4047 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -115,11 +115,11 @@ enum commit_graph_split_flags { COMMIT_GRAPH_SPLIT_REPLACE = 2 }; -struct split_commit_graph_opts { +struct commit_graph_opts { int size_multiple; int max_commits; timestamp_t expire_time; - enum commit_graph_split_flags flags; + enum commit_graph_split_flags split_flags; }; /* @@ -130,12 +130,12 @@ struct split_commit_graph_opts { */ int write_commit_graph_reachable(struct object_directory *odb, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); int write_commit_graph(struct object_directory *odb, struct string_list *pack_indexes, struct oidset *commits, enum commit_graph_write_flags flags, - const struct split_commit_graph_opts *split_opts); + const struct commit_graph_opts *opts); #define COMMIT_GRAPH_VERIFY_SHALLOW (1 << 0) diff --git a/make: *** [Makefile b/make: *** [Makefile new file mode 100644 index 0000000000..e69de29bb2 From patchwork Thu Sep 3 22:47:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11755513 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE303746 for ; Thu, 3 Sep 2020 22:47:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A1DCF20786 for ; Thu, 3 Sep 2020 22:47:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="KamWqJFu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729627AbgICWrH (ORCPT ); Thu, 3 Sep 2020 18:47:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729468AbgICWrG (ORCPT ); Thu, 3 Sep 2020 18:47:06 -0400 Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51D56C061244 for ; Thu, 3 Sep 2020 15:47:06 -0700 (PDT) Received: by mail-qt1-x82a.google.com with SMTP id v54so3262735qtj.7 for ; Thu, 03 Sep 2020 15:47:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=oYpSjtOpe7DlSv/ZCosDwLkRD+e5p4idW8ckKahgFaI=; b=KamWqJFuBBlIX7jALdGF+dci8xOJf3bGgjyobej34aPQa6MZJLnbEJXJbJBNF/O6W+ jHum/H/AsoDugK1JQ2c8FSO+gJUOZeHX/V1ib4Ax6X35cMhPuXnCJaf2+pXLSCZNPqjH YkP0LTQvKnHOrC1loe9nVPjjJU6MUzNnUc0KVhuITd4nxYOXW9TeHFPtJ9Z8HRDud1Io PaPnbgh8rYJ618hjEHkjnpKvUfAdH/l8SDxL1dy7Ks9Xoiv+L4NtjDoIXG+0XpA/JVfQ sG6fwXkyg2uBjHU3koh87sA+hP0DR7ZjaAVxZyPKgyvIytZP78r2k9wZ3XXkmHt2TQlE EfWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=oYpSjtOpe7DlSv/ZCosDwLkRD+e5p4idW8ckKahgFaI=; b=b0rIlEBVbD48QiL2xKxM6MRWgJ+Kuow5LlYR5uEgIzbTjP9XIdGf4O6KjECFVXViyO jYKDfzs75ukoVlyuiKmKDgCESqGP3RMtqRbVbj2CfSNa+Q+RlvJvUc9t9FxEwOLLOhln TL9eycxgBA7tsPCWCiKXdHdYzlyXq751NmsC9vjMX1JZrXPDFvpJUPhXYsAB3D/WlDpG QKIm4kliY1UuO927rayx36j9wVZ5jOTMeKybkKvy+kEFD27JOdoX4GHHoy2PYqJGSYqD wRZB1KXBWOLVK3/SCgtvqVDSlhAAMsjtEsIul/TFFcDIYrr7mL+xkbJBqIn22n9LbtWu 9wpw== X-Gm-Message-State: AOAM533ecEhH5QyRIr4Uyph9z+J+Sl0PxYhPeWt+LJ6W8/KqnHbSK87s xGVKavCUbZc+3Z4LkKPMh7v7uJKgPl3vom/W X-Google-Smtp-Source: ABdhPJx+YxrEQ5FBnzF+YjALGZO9cYdJnRldYfZLIZasPHhqusDnEk2WjB3I4yXn5Cg7tvzmTPAgzQ== X-Received: by 2002:aed:38c9:: with SMTP id k67mr2353072qte.6.1599173224988; Thu, 03 Sep 2020 15:47:04 -0700 (PDT) Received: from localhost ([2605:9480:22e:ff10:a97c:6dae:76a4:c715]) by smtp.gmail.com with ESMTPSA id t20sm3283487qke.79.2020.09.03.15.47.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 15:47:04 -0700 (PDT) Date: Thu, 3 Sep 2020 18:47:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: [PATCH v4 14/14] builtin/commit-graph.c: introduce '--max-new-filters=' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Introduce a command-line flag and configuration variable to fill in the 'max_new_filters' variable introduced two patches ago. The command-line option '--max-new-filters' takes precedence over 'commitGraph.maxNewFilters', which is the default value. '--no-max-new-filters' can also be provided, which sets the value back to '-1', indicating that an unlimited number of new Bloom filters may be generated. (OPT_INTEGER only allows setting the '--no-' variant back to '0', hence a custom callback was used instead). Signed-off-by: Taylor Blau --- Documentation/config/commitgraph.txt | 4 +++ Documentation/git-commit-graph.txt | 6 ++++ bloom.c | 13 +++++--- builtin/commit-graph.c | 39 ++++++++++++++++++++++-- commit-graph.c | 27 ++++++++++++++--- commit-graph.h | 4 ++- t/t4216-log-bloom.sh | 44 ++++++++++++++++++++++++++++ 7 files changed, 125 insertions(+), 12 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index cff0797b54..4582c39fc4 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -1,3 +1,7 @@ +commitGraph.maxNewFilters:: + Specifies the default value for the `--max-new-filters` option of `git + commit-graph write` (c.f., linkgit:git-commit-graph[1]). + commitGraph.readChangedPaths:: If true, then git will use the changed-path Bloom filters in the commit-graph file (if it exists, and they are present). Defaults to diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 17405c73a9..81a2e65903 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -67,6 +67,12 @@ this option is given, future commit-graph writes will automatically assume that this option was intended. Use `--no-changed-paths` to stop storing this data. + +With the `--max-new-filters=` option, generate at most `n` new Bloom +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is +enforced. Commits whose filters are not calculated are stored as a +length zero Bloom filter, and their bit is marked in the `BFXL` chunk. +Overrides the `commitGraph.maxNewFilters` configuration. ++ With the `--split[=]` option, write the commit-graph as a chain of multiple commit-graph files stored in `/info/commit-graphs`. Commit-graph layers are merged based on the diff --git a/bloom.c b/bloom.c index ed54e96e57..34503898ac 100644 --- a/bloom.c +++ b/bloom.c @@ -197,16 +197,21 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH && - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c)) - return filter; + if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) + load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); } - if (filter->data) + if (filter->data && filter->len) return filter; if (!compute_if_not_present) return NULL; + if (filter && !filter->len && + get_bloom_filter_large_in_graph(r->objects->commit_graph, c, + settings->max_changed_paths)) + return filter; + + repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; diffopt.detect_rename = 0; diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index f3243bd982..e7a1539b08 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph verify [--object-dir ] [--shallow] [--[no-]progress]"), N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = { static const char * const builtin_commit_graph_write_usage[] = { N_("git commit-graph write [--object-dir ] [--append] " "[--split[=]] [--reachable|--stdin-packs|--stdin-commits] " - "[--changed-paths] [--[no-]progress] "), + "[--changed-paths] [--[no-]max-new-filters ] [--[no-]progress] " + ""), NULL }; @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress, return 0; } +static int write_option_max_new_filters(const struct option *opt, + const char *arg, + int unset) +{ + int *to = opt->value; + if (unset) + *to = -1; + else { + const char *s; + *to = strtol(arg, (char **)&s, 10); + if (*s) + return error(_("%s expects a numerical value"), + optname(opt, opt->flags)); + } + return 0; +} + static int graph_write(int argc, const char **argv) { struct string_list pack_indexes = STRING_LIST_INIT_NODUP; @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv) N_("maximum ratio between two levels of a split commit-graph")), OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time, N_("only expire files older than a given date-time")), + OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters, + NULL, N_("maximum number of changed-path Bloom filters to compute"), + 0, write_option_max_new_filters), OPT_END(), }; @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv) write_opts.size_multiple = 2; write_opts.max_commits = 0; write_opts.expire_time = 0; + write_opts.max_new_filters = -1; trace2_cmd_mode("write"); @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv) return result; } +static int git_commit_graph_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "commitgraph.maxnewfilters")) { + write_opts.max_new_filters = git_config_int(var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + int cmd_commit_graph(int argc, const char **argv, const char *prefix) { static struct option builtin_commit_graph_options[] = { @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) usage_with_options(builtin_commit_graph_usage, builtin_commit_graph_options); - git_config(git_default_config, NULL); + git_config(git_commit_graph_config, &opts); argc = parse_options(argc, argv, prefix, builtin_commit_graph_options, builtin_commit_graph_usage, diff --git a/commit-graph.c b/commit-graph.c index 33fcf01a7a..243c7253ff 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -953,7 +953,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit } int get_bloom_filter_large_in_graph(struct commit_graph *g, - const struct commit *c) + const struct commit *c, + uint32_t max_changed_paths) { uint32_t graph_pos; if (!find_commit_in_graph(c, g, &graph_pos)) @@ -976,6 +977,17 @@ int get_bloom_filter_large_in_graph(struct commit_graph *g, if (!g->bloom_large) return 0; + if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) { + /* + * Force all commits which are subject to a different + * 'max_changed_paths' limit to be recomputed from scratch. + * + * Note that this could likely be improved, but is ignored since + * all real-world graphs set the maximum number of changed paths + * at 512. + */ + return 0; + } return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base); } @@ -1481,6 +1493,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) int i; struct progress *progress = NULL; int *sorted_commits; + int max_new_filters; init_bloom_filters(); ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1); @@ -1497,10 +1510,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, &ctx->commits); + max_new_filters = ctx->opts && ctx->opts->max_new_filters >= 0 ? + ctx->opts->max_new_filters : ctx->commits.nr; + for (i = 0; i < ctx->commits.nr; i++) { int pos = sorted_commits[i]; struct commit *c = ctx->commits.list[pos]; - if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) { + if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, + c, + ctx->bloom_settings->max_changed_paths)) { bitmap_set(ctx->bloom_large, pos); ctx->count_bloom_filter_known_large++; } else { @@ -1508,7 +1526,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) struct bloom_filter *filter = get_or_compute_bloom_filter( ctx->r, c, - 1, + ctx->count_bloom_filter_computed < max_new_filters, ctx->bloom_settings, &computed); if (computed) { @@ -1518,7 +1536,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_found_large++; } } - ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; + if (filter) + ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; } display_progress(progress, i + 1); } diff --git a/commit-graph.h b/commit-graph.h index fe798a4047..eac4efc7a6 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -52,7 +52,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit *c); int get_bloom_filter_large_in_graph(struct commit_graph *g, - const struct commit *c); + const struct commit *c, + uint32_t max_changed_paths); struct commit_graph { const unsigned char *data; @@ -120,6 +121,7 @@ struct commit_graph_opts { int max_commits; timestamp_t expire_time; enum commit_graph_split_flags split_flags; + int max_new_filters; }; /* diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fed4929af3..571676cef2 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -291,4 +291,48 @@ test_expect_success 'Bloom generation does not recompute too-large filters' ' ) ' +test_expect_success 'Bloom generation is limited by --max-new-filters' ' + ( + cd limits && + test_commit c2 filter && + test_commit c3 filter && + test_commit c4 no-filter && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \ + 2 0 2 + ) +' + +test_expect_success 'Bloom generation backfills previously-skipped filters' ' + ( + cd limits && + test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \ + 2 0 1 + ) +' + +test_expect_success 'Bloom generation backfills empty commits' ' + git init empty && + test_when_finished "rm -fr empty" && + ( + cd empty && + for i in $(test_seq 1 6) + do + git commit --allow-empty -m "$i" + done && + + # Generate Bloom filters for empty commits 1-6, two at a time. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 0 2 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 2 2 2 && + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 4 2 2 && + + # Finally, make sure that once all commits have filters, that + # none are subsequently recomputed. + test_bloom_filters_computed "--reachable --changed-paths --max-new-filters=2" \ + 6 0 0 + ) +' + test_done