From patchwork Mon Aug 7 16:37:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13344561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D59CC00528 for ; Mon, 7 Aug 2023 16:38:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231829AbjHGQiv (ORCPT ); Mon, 7 Aug 2023 12:38:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232009AbjHGQiP (ORCPT ); Mon, 7 Aug 2023 12:38:15 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A7AE1BD7 for ; Mon, 7 Aug 2023 09:37:47 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-586b78aa26eso29990877b3.1 for ; Mon, 07 Aug 2023 09:37:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1691426266; x=1692031066; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=u7BObONqmKh/FzXouG+OOan/qZulTKn7Z1vuUuvDeqs=; b=ULNKDVamnJjNOq/41VhKsXotgASN0FJyB7WmDa4qb30nA9UkoYD/MB8qbeWKgW4BsL hMD0i1BmSpcxcIbDxI4UgQeKzEp01ATxHJrfj9Gg20SiUpwY89q42YRWNcHKSFrtGmFK WmwBrAtskpFQvcyuA8qEfUksMXUmcI6U350w5oXBsfcRqNboHWfw5MuOYQoG9wRZliXW XeEQuPfCfiMVOEZjOginAd/lA4ZJv9V7GqopD6hCP8vMvd8vHaldqiwu6jAp/VAXMuuV 8cIjmHYO4BnKPc7t3lj1WPOaGNlmX/2CHSXwyULCASjmvXjSlhzNfxwl4UfOloO3i72h HD/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691426266; x=1692031066; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=u7BObONqmKh/FzXouG+OOan/qZulTKn7Z1vuUuvDeqs=; b=jJ5d21lkWJkPsbLwONh52J1eZaxuZt1ZEnsvNRBDl8ovbeG38cuBt4AIp7Rcfhuw7O 5WKId6z1WdYa2PXK7dqs3uHOrBZ+1hqHu5+xl78agUGLlfwJ9XLZDjQfbQ2gbTDlNb4K 4s6awL9wYkHi2ym3pla8tKyRjS6tDg2/3ZsB2VEhrdx1cnhNkwNoHhTcW4+1/glxTAB/ ZKAZ+6TT7w6mmJUc34bvuCmA9t/UHd6i63GSqt3OjHTO5m3JEapTR/07xBo5aEabWSKK 7OU1bmzC+Tce2CjaR/b1pj68jh55mk+ERlbokDueMfEUiyTayTz9XDBrq6pT6HcH/aho p7vw== X-Gm-Message-State: AOJu0YyVvKBQLNC5NjgTlMkZpcX4OGShXti6N97VOj5l+RHoMYYMFaMy bM6sgjwfYbjiVXCNaw4vpac+HoVqHJZh9gs+ciDF8g== X-Google-Smtp-Source: AGHT+IE5+Bb9Mwe5UWrfXnRLcyPSdFXD+UwNIyfxDPM0mz8LtcuWWLKymYxEWSnllaeAYHsJ0LYZKw== X-Received: by 2002:a0d:fd87:0:b0:577:1909:ee16 with SMTP id n129-20020a0dfd87000000b005771909ee16mr12394817ywf.30.1691426266158; Mon, 07 Aug 2023 09:37:46 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id m12-20020a819e0c000000b00545a08184cesm2806600ywj.94.2023.08.07.09.37.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 09:37:45 -0700 (PDT) Date: Mon, 7 Aug 2023 12:37:45 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Junio C Hamano Subject: [RFC PATCH 1/6] bloom: annotate filters with hash version Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In subsequent commits, we will want to load existing Bloom filters out of a commit-graph, even when the hash version they were computed with does not match the value of `commitGraph.changedPathVersion`. In order to differentiate between the two, add a "filter" field to each Bloom filter. Signed-off-by: Taylor Blau --- bloom.c | 11 ++++++++--- bloom.h | 1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/bloom.c b/bloom.c index ebef5cfd2f..9b6a30f6f6 100644 --- a/bloom.c +++ b/bloom.c @@ -55,6 +55,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g, filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + BLOOMDATA_CHUNK_HEADER_SIZE); + filter->version = g->bloom_filter_settings->hash_version; return 1; } @@ -240,11 +241,13 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, return strcmp(e1->path, e2->path); } -static void init_truncated_large_filter(struct bloom_filter *filter) +static void init_truncated_large_filter(struct bloom_filter *filter, + int version) { filter->data = xmalloc(1); filter->data[0] = 0xFF; filter->len = 1; + filter->version = version; } struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, @@ -329,13 +332,15 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } if (hashmap_get_size(&pathmap) > settings->max_changed_paths) { - init_truncated_large_filter(filter); + init_truncated_large_filter(filter, + settings->hash_version); if (computed) *computed |= BLOOM_TRUNC_LARGE; goto cleanup; } filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->version = settings->hash_version; if (!filter->len) { if (computed) *computed |= BLOOM_TRUNC_EMPTY; @@ -355,7 +360,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } else { for (i = 0; i < diff_queued_diff.nr; i++) diff_free_filepair(diff_queued_diff.queue[i]); - init_truncated_large_filter(filter); + init_truncated_large_filter(filter, settings->hash_version); if (computed) *computed |= BLOOM_TRUNC_LARGE; diff --git a/bloom.h b/bloom.h index 138d57a86b..330a140520 100644 --- a/bloom.h +++ b/bloom.h @@ -55,6 +55,7 @@ struct bloom_filter_settings { struct bloom_filter { unsigned char *data; size_t len; + int version; }; /* From patchwork Mon Aug 7 16:37:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13344559 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F6CCC00528 for ; Mon, 7 Aug 2023 16:38:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231833AbjHGQih (ORCPT ); Mon, 7 Aug 2023 12:38:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232048AbjHGQiS (ORCPT ); Mon, 7 Aug 2023 12:38:18 -0400 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AA321BF0 for ; Mon, 7 Aug 2023 09:37:50 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-58969d4f1b6so3974957b3.2 for ; Mon, 07 Aug 2023 09:37:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1691426269; x=1692031069; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=6hNgaqdkXVqYDmmToyvEtrm4AFGkngfyVKUn0kaylRY=; b=w7FCWjlgKsZNJrdTaaTLLmsksZbQ3T02oX/c1aXK5TWKA9gGGIjGICxjsoxMQp5Dyc SF1gWGM1QShHcbKJj3Zc+asWb8UBr6x37WUTiWnyBjkCBCmq9Qlo0+c3PwMfc1CL3OFi ndOZg7yTDIk6eCuSU1/ymBMgcIka9meRAERZBKF0JbRjBo8leqUwZxEJcvfbFBqo/6Dh JAv1SCWUNC6YT9PwPWafA+hLnfloa2Bg1h8PtMyW2yYKC6WFbWK2/iqUKH0YTb4sBEuO PqHVA39Ia6W53CrkLdbhqZtnzQLnnBfbJqYy2YNznDAPSjnXoa9fEcq7I+Z9bQ+WQh3s 2abA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691426269; x=1692031069; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6hNgaqdkXVqYDmmToyvEtrm4AFGkngfyVKUn0kaylRY=; b=jxsp4YYi2KGHGz5MclMY9T91Nwpo0JgIN2DQ7Vjt0TEC4TyqemZV4QmRdEKgGRnvhy giFIA2Qxc8cuxtENyLHy2wZhqGMnskSLp60FrLbPk8uqnXn8m9dpbqOQyHJhjW0I8HcP gxdSFjKCU12x7lbAfMyQm/WaO844B1TyXcNvkQ7SR6a4BwfKL+AOmroyevuDZ3EOT6HL y4At0+3em0pnm2LbjVPk6o7lFhdo3NK7rVyNW4V8B0z+xfDUPcRSw2fM4CKIAPBAP9Bl WZyAAnVGUNMOH2Sz9XWrx4Td9rCF0HOVTkpHDuF5fZq48UxV7z67+BxHzi83rODzqPtk BfWA== X-Gm-Message-State: AOJu0YxY/P28gL7+1POmtLP8sn87DJB7j7cpt6FsV0784ZmVCWOckc0i suRNbIWSlLh0YKoTiE9Je12CEcUTAtAOKIR1qOif0w== X-Google-Smtp-Source: AGHT+IFZrhcXcGUV1wPYqc8twG0voE1LkCBJx5Tv6LeumP/dWdf7wZ5cgOfuZiwPNOLF3QrKSgxPvw== X-Received: by 2002:a81:8ac7:0:b0:576:a603:e733 with SMTP id a190-20020a818ac7000000b00576a603e733mr11897607ywg.22.1691426268807; Mon, 07 Aug 2023 09:37:48 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id v184-20020a0dd3c1000000b00586a3283e64sm2707007ywd.143.2023.08.07.09.37.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 09:37:48 -0700 (PDT) Date: Mon, 7 Aug 2023 12:37:47 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Junio C Hamano Subject: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Callers use the inline `get_bloom_filter()` implementation as a thin wrapper around `get_or_compute_bloom_filter()`. The former calls the latter with a value of "0" for `compute_if_not_present`, making `get_bloom_filter()` the default read-only path for fetching an existing Bloom filter. Callers expect the value returned from `get_bloom_filter()` is usable, that is that it's compatible with the configured value corresponding to `commitGraph.changedPathsVersion`. This is OK, since the commit-graph machinery only initializes its BDAT chunk (thereby enabling it to service Bloom filter queries) when the Bloom filter hash_version is compatible with our settings. So any value returned by `get_bloom_filter()` is trivially useable. However, subsequent commits will load the BDAT chunk even when the Bloom filters are built with incompatible hash versions. Prepare to handle this by teaching `get_bloom_filter()` to discard filters that are incompatible with the configured hash version. Callers who wish to read incompatible filters (e.g., for upgrading filters from v1 to v2) may use the lower level routine, `get_or_compute_bloom_filter()`. Signed-off-by: Taylor Blau --- bloom.c | 20 +++++++++++++++++++- bloom.h | 20 ++++++++++++++++++-- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/bloom.c b/bloom.c index 9b6a30f6f6..739fa093ba 100644 --- a/bloom.c +++ b/bloom.c @@ -250,6 +250,23 @@ static void init_truncated_large_filter(struct bloom_filter *filter, filter->version = version; } +struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) +{ + struct bloom_filter *filter; + int hash_version; + + filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL); + if (!filter) + return NULL; + + prepare_repo_settings(r); + hash_version = r->settings.commit_graph_changed_paths_version; + + if (!(hash_version == -1 || hash_version == filter->version)) + return NULL; /* unusable filter */ + return filter; +} + struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, @@ -275,7 +292,8 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter, graph_pos); } - if (filter->data && filter->len) + if ((filter->data && filter->len) && + (!settings || settings->hash_version == filter->version)) return filter; if (!compute_if_not_present) return NULL; diff --git a/bloom.h b/bloom.h index 330a140520..2b1c124bb5 100644 --- a/bloom.h +++ b/bloom.h @@ -110,8 +110,24 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, const struct bloom_filter_settings *settings, enum bloom_filter_computed *computed); -#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL, NULL) +/* + * Find the Bloom filter associated with the given commit "c". + * + * If any of the following are true + * + * - the repository does not have a commit-graph + * - it has a commit-graph, but reading the commit-graph is disabled + * - the given commit does not have a Bloom filter computed + * - there is a Bloom filter for commit "c", but it cannot be read because + * disabled + * + * , then `get_bloom_filter()` will return NULL. Otherwise, the corresponding + * Bloom filter will be returned. + * + * For callers who wish to inspect Bloom filters with incompatible hash + * versions, use get_or_compute_bloom_filter(). + */ +struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c); int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, From patchwork Mon Aug 7 16:37:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13344557 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0C60C04E69 for ; Mon, 7 Aug 2023 16:38:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231819AbjHGQid (ORCPT ); Mon, 7 Aug 2023 12:38:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232082AbjHGQiV (ORCPT ); Mon, 7 Aug 2023 12:38:21 -0400 Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD9961BFA for ; Mon, 7 Aug 2023 09:37:52 -0700 (PDT) Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-579de633419so43813407b3.3 for ; Mon, 07 Aug 2023 09:37:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1691426271; x=1692031071; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=nCfkA5W3RP6Pxf5Ly4nFtwDqk8cbxLrMziHPtlJh+o4=; b=WsXESjhqB82ddAdsCeDq54DCvwuV54RtCz/PyVQQGMuiA4jyB9odlSIRvVBqxahf36 9zOb0y/OpLcTtZvNvwiZ/zZeFCbLivcA2izOE5Y0lRRX0iBpucNLBZ3jkjfEEE+n7Jbm WAGbJ/pSee0ANknYvptRyPr1jRl+g3ITLw7ZKvvweMgu/ZqJ1WGq1FhRSVhtFdxqVb9M ZuHuL55prXsVcp6eWeqNSQwiEIRuDvcasZXTpLumEHE8Pw2ED4r+K/K/ZHoQNm04MGjb /bECmvfjCmx8hvdHvzU7TgdfCvWN/CAXA+vVWBmU3ohnqiMwdxCQf0A6uko91HMimE3q LeCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691426271; x=1692031071; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nCfkA5W3RP6Pxf5Ly4nFtwDqk8cbxLrMziHPtlJh+o4=; b=DlrOtChzCUte5wSh7wMZIZk8Xa24FFxD+gDYaZD/2h1ASOA2E39yvHYeSl+ABmxSSF 6X+92AAfKHF4Q9qWzl1CEhUQomnaPzhx5eyzXWQ9NDBxrgd5VISsKFqL5M87HBC/79dO WxyisjEUI5Tb8YTnrHnZCsd+TMEi1C0Du1gUsmva6x/B+knd7HGO0zRlWUT9UPKYBceJ xB35g7BqnK1JdQkwa4UxOXdtXVVXROO5g/k5n4gxWGuUlU8g/TjSyuzsOOUK1yS2snd/ HhJRqk1AHQeOAwAgUoK7J1dyY3D6OnKIe+ZGBUt53uXEvPF+JTJ3eRCmQXAC4pSRReGf /9rA== X-Gm-Message-State: AOJu0YzsObNMv2Xijcn5PV8cFMcH+Sz4NLQdOVyzhA95FAteQJA9MxT6 K06cPGxCBZ5Bf7hkOYetYX+6MXaG9+jgCwM8bRqEFw== X-Google-Smtp-Source: AGHT+IFsL6iburxjlLhbjweasoZjf/MUOESd0apL6fjvaRC8iC3FDwzAPL+li9d2j0nYq8WP5H9S2g== X-Received: by 2002:a0d:cb01:0:b0:569:479f:6d7f with SMTP id n1-20020a0dcb01000000b00569479f6d7fmr7432963ywd.43.1691426271515; Mon, 07 Aug 2023 09:37:51 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id t20-20020a818314000000b00577139f85dfsm2750641ywf.22.2023.08.07.09.37.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 09:37:51 -0700 (PDT) Date: Mon, 7 Aug 2023 12:37:50 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Junio C Hamano Subject: [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Message-ID: <79552bc385a6246f38b6baa1276db13c7300332c.1691426160.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The existing implementation of test_bloom_filters_not_used() asserts that the Bloom filter sub-system has not been initialized at all, by checking for the absence of any data from it from trace2. In the following commit, it will become possible to load Bloom filters without using them (e.g., because `commitGraph.changedPathVersion` is incompatible with the hash version with which the commit-graph's Bloom filters were written). When this is the case, it's possible to initialize the Bloom filter sub-system, while still not using any Bloom filters. When this is the case, check that the data dump from the Bloom sub-system is all zeros, indicating that no filters were used. Signed-off-by: Taylor Blau --- t/t4216-log-bloom.sh | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 775e59d864..a77caca789 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -81,7 +81,19 @@ test_bloom_filters_used () { test_bloom_filters_not_used () { log_args=$1 setup "$log_args" && - ! grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf" && + + if grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf" + then + # if the Bloom filter system is initialized, ensure that no + # filters were used + data="statistics:{" + data="$data\"filter_not_present\":0," + data="$data\"maybe\":0," + data="$data\"definitely_not\":0," + data="$data\"false_positive\":0}" + + grep -q "$data" "$TRASH_DIRECTORY/trace.perf" + fi && test_cmp log_wo_bloom log_w_bloom } From patchwork Mon Aug 7 16:37:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13344558 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86994C04FE1 for ; Mon, 7 Aug 2023 16:38:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231824AbjHGQif (ORCPT ); Mon, 7 Aug 2023 12:38:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232108AbjHGQiX (ORCPT ); Mon, 7 Aug 2023 12:38:23 -0400 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76C331FCD for ; Mon, 7 Aug 2023 09:37:55 -0700 (PDT) Received: by mail-yb1-xb33.google.com with SMTP id 3f1490d57ef6-d479d128596so3291077276.1 for ; Mon, 07 Aug 2023 09:37:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1691426274; x=1692031074; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/PeSoczXdOvAxnaG+yi3KBXu1dZhE6+EhokHcg9k2Xk=; b=k2fwkBvmADLnKX6KWs5Lo5iUQsDBPtmLraRqhUhGGgmvxP+H+7tQduzJE4KQWYwuG+ wdP6YHZBx746Gp0NhTDv1G32nGj6EA8dfgwJUDEQBsis4OWrkKebkH4b7gEkkPs6rD29 M/YWZnADSOZrqkUzrFNfLLCF46ZsqtMrhl2rHGflcAW1iejm6AWW2Pfxbx7CiJm5/17O wlMiezdXbzU3p8/NPm4Z1H8s3mQnfUPpI3fS+DTr6up1X1E6UJuFxQnYxasRFKoEIT0e xXm1odkRwiog0wRDN4s6MQd9ta/IF0Gaa50ZWoDQGWlowxP5kI3M2UDiinEJNvgltyFD sTzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691426274; x=1692031074; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/PeSoczXdOvAxnaG+yi3KBXu1dZhE6+EhokHcg9k2Xk=; b=jWZ/xs1IXbefk+KtITlNbfbiLWEaiKwe9hbA9lzWXIi5hewjx8rgQHqoNEcVzAsY+1 COiMGLly3TV7G7zvTPZbkSlugDBPB7tstEjn0O54EVFYRYsyP1urmnHXiWAdFlQCgflj phd0hmau9qixB9/xFyYrdSkhXVWx4donUTVcER177cAbuFH/ehhy0j/1BDwc6iZOjtiT nM6sBDQxKSK+hKMHnLjgSfwHQn6uaMSOOHQh7/3LS2LpVr093hlnGXmqW5ibYw5SSg9J Fj0dORED8Y/OvLGOPIUS/XO2leOSrPA2uE671qHblzeD/l2HQg3hpgsNbdmYvgmBMIlC +MAA== X-Gm-Message-State: AOJu0YwhPzzPui1v9eWxpa3DnsHQWjxTWUTQ2MJb5ioqBFgXfe1bUPxj pgSnlTfbhVFbdkQp31hCqUmLsPp7AnY1LxU+j0QyNA== X-Google-Smtp-Source: AGHT+IGogonKi+BLcyoKL/yPlnHz+6D2jBL/sGiD7meawMdRGB5M4T5+mwQMQKMaIKBzFyMZ5YoFZA== X-Received: by 2002:a25:2d12:0:b0:cb7:e3e9:fe6f with SMTP id t18-20020a252d12000000b00cb7e3e9fe6fmr7094092ybt.30.1691426274139; Mon, 07 Aug 2023 09:37:54 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 192-20020a2515c9000000b00c389676f3a2sm2382839ybv.40.2023.08.07.09.37.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 09:37:53 -0700 (PDT) Date: Mon, 7 Aug 2023 12:37:53 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Junio C Hamano Subject: [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Message-ID: <6676afe56db74828ba2532f3460e9b73a3ff20fd.1691426160.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3, 2023-08-01), we began ignoring the Bloom data ("BDAT") chunk for commit-graphs whose Bloom filters were computed using a hash version incompatible with the value of `commitGraph.changedPathVersion`. Now that the Bloom API has been hardened to discard these incompatible filters (with the exception of low-level APIs), we can safely load these Bloom filters unconditionally. We no longer want to return early from `graph_read_bloom_data()`, and similarly do not want to set the bloom_settings' `hash_version` field as a side-effect. The latter is because we want to wait until we know which Bloom settings we're using (either the defaults, from the GIT_TEST variables, or from the previous commit-graph layer) before deciding what hash_version to use. If we detect an existing BDAT chunk, we'll infer the rest of the settings (e.g., number of hashes, bits per entry, and maximum number of changed paths) from the earlier graph layer. The hash_version will be inferred from the previous layer as well, unless one has already been specified via configuration. Once all of that is done, we normalize the value of the hash_version to either "1" or "2". Signed-off-by: Taylor Blau --- commit-graph.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 38edb12669..60e5f9ada7 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -317,12 +317,6 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, uint32_t hash_version; hash_version = get_be32(chunk_start); - if (*c->commit_graph_changed_paths_version == -1) { - *c->commit_graph_changed_paths_version = hash_version; - } else if (hash_version != *c->commit_graph_changed_paths_version) { - return 0; - } - g->chunk_bloom_data = chunk_start; g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); g->bloom_filter_settings->hash_version = hash_version; @@ -2390,8 +2384,7 @@ int write_commit_graph(struct object_directory *odb, r->settings.commit_graph_changed_paths_version); return 0; } - bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2 - ? 2 : 1; + bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", @@ -2423,10 +2416,18 @@ int write_commit_graph(struct object_directory *odb, /* We have changed-paths already. Keep them in the next graph */ if (g && g->bloom_filter_settings) { ctx->changed_paths = 1; - ctx->bloom_settings = g->bloom_filter_settings; + + /* don't propagate the hash_version unless unspecified */ + if (bloom_settings.hash_version == -1) + bloom_settings.hash_version = g->bloom_filter_settings->hash_version; + bloom_settings.bits_per_entry = g->bloom_filter_settings->bits_per_entry; + bloom_settings.num_hashes = g->bloom_filter_settings->num_hashes; + bloom_settings.max_changed_paths = g->bloom_filter_settings->max_changed_paths; } } + bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1; + if (ctx->split) { struct commit_graph *g = ctx->r->objects->commit_graph; From patchwork Mon Aug 7 16:37:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13344560 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 575DCC001DB for ; Mon, 7 Aug 2023 16:38:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231887AbjHGQii (ORCPT ); Mon, 7 Aug 2023 12:38:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232149AbjHGQi2 (ORCPT ); Mon, 7 Aug 2023 12:38:28 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1C4C199D for ; Mon, 7 Aug 2023 09:37:58 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-583d63ca1e9so54726477b3.1 for ; Mon, 07 Aug 2023 09:37:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1691426276; x=1692031076; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/ONi12aEgnk5YUALdXCqyRf8n0r+9sccZlBDkY8IOSo=; b=ID6tAZSdRNOqo+cnxIdUApWrR+4vDYr8FjCO7nOIdAJhnRTuDQq190rLuQSYA1gZxy vcNuMQ54N5IfI84Vf2EflZlUMVAQq5ZnM7n1jcrv6vp8dfNzsykUI4bwja6+jVZVBZGu wgQ84ndXjQKMzOEuqkkdsTeWNj5KE8St5E02cUl773rKEIduR/fdcnTxlqtPDvHb/8eI SaN0oV3u07mbdWvU2AGGGxSFtTPhoyYgsDP+0oGdIa1dBi41a4iQNaz+jj8PalNxAWVy qQl4Heeae+lWJc+VZJcu/5axJboxfvGQiCohKYkbbV/7D0gsJtmSLQQDl7TgXyV6ahVN aayA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691426276; x=1692031076; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/ONi12aEgnk5YUALdXCqyRf8n0r+9sccZlBDkY8IOSo=; b=WstAvwVUeEZNG7A1blmUuTUkAeoRzy0587UrsogTdDa18mgihLLOw+RyDLD8Y0PQwn u+zTDOTWcxMaUKWQttcysGFEfwKkeV43ky2yqQBDIAVT4QJSdst40wWZP4zupBhjJv0O EiBMCl0N/B/GTU/xdOL2PzQJaV1JT3Zl6OPxtstMst2R7bEUCiwYILv3bDTTahixIzPK 5Gz3BgrEhwBiSPCe3Eh6dpMHGjAi6S0cswqVfGvciLIdUDpsZ0wVeXXd1vTpaf/C6EhM uluAomnF8L+KipEF5RssLdKvRbgO0FEcr+8Llgt7MxDaZXn9snZCcgNIqYTIVv3ruA4N eSew== X-Gm-Message-State: AOJu0YwWDUL7/0LT+PXHdWzFq0KhzONdJM42dATXZkV5iKtJvRTkvVbv tQGzA0d+su0FCPG0sxK9m3Axt7t+l94Xh62bjUHn4g== X-Google-Smtp-Source: AGHT+IEKs+0Sv94GoeBTd0cAazPvsrR6vBzImE8lQWBjT0Cqq73+ZefVi6rEQAqb4RcwtnmsRQt6ig== X-Received: by 2002:a0d:dbd5:0:b0:577:3bf2:80f0 with SMTP id d204-20020a0ddbd5000000b005773bf280f0mr10961243ywe.2.1691426276706; Mon, 07 Aug 2023 09:37:56 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d203-20020a0ddbd4000000b005842447e843sm2814217ywe.10.2023.08.07.09.37.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 09:37:56 -0700 (PDT) Date: Mon, 7 Aug 2023 12:37:55 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Junio C Hamano Subject: [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Message-ID: <09d6dd93594870c8010c4927c5eb9489ae23e7a2.1691426160.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Bit position 23 is one column too far to the left. Signed-off-by: Taylor Blau --- object.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/object.h b/object.h index 114d45954d..db25714b4e 100644 --- a/object.h +++ b/object.h @@ -62,7 +62,7 @@ void object_array_init(struct object_array *array); /* * object flag allocation: - * revision.h: 0---------10 15 23------27 + * revision.h: 0---------10 15 23------27 * fetch-pack.c: 01 67 * negotiator/default.c: 2--5 * walker.c: 0-2 From patchwork Mon Aug 7 16:37:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13344562 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47C0EC001DB for ; Mon, 7 Aug 2023 16:38:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231781AbjHGQi4 (ORCPT ); Mon, 7 Aug 2023 12:38:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231854AbjHGQia (ORCPT ); Mon, 7 Aug 2023 12:38:30 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C07C173B for ; Mon, 7 Aug 2023 09:38:00 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-57026f4bccaso51533307b3.2 for ; Mon, 07 Aug 2023 09:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20221208.gappssmtp.com; s=20221208; t=1691426279; x=1692031079; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=8zLF5lZIiVv9KK5EOVgkqKIq1YHCFNk53ZmJ+CnB4kE=; b=SPyRMU7GoOk9N1TjbDuqds7ktFDAdXiQO6RjwhYfKE2ErKCZ/WMC6vxiNnTXHvDH8U 1SY7puwzUISgOHbbWtqqA7ZXmSTkgmQGscHmItbEo7CdCzKF/nU3iEvlKZv2yHnTdfe4 cTmmHY48BVdRYYkKfwdzzzRnd1yTCj3BblSm21nkDtW9z3OermSplHsJdKziVMXYcTpe 2Sf8XEL7jTaBJaFUGp7daUPSx6T9c/vzp82zAFa30eGvKEXl1uwML9OfwC6fOyxZ87Pb 0EBE3Ph4BN50vkiUUdk3mq70wF4X4KMKGjGg6ncA25dCqDuV/+m5tX3dUkWUKctiCIcC WoAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691426279; x=1692031079; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8zLF5lZIiVv9KK5EOVgkqKIq1YHCFNk53ZmJ+CnB4kE=; b=K6ANpgnjmHeKKO47fUguHHCjeiYNDIZd7jBMHBhg04GmDpLJbNlt0SPZPlIEqEkPZK d+NgG70Zm9uoCHk+NduhqvW/ffr6LhTHmYaNknk7KPAvVOsh847DoeoWxsHx3CqxplKj xmoOqYj7y+8Y+EhBgFyylRrbZiwk6a63IRc8J3HX9l1u8DIknwc+29n1+NxfZeKnrfwR ixi4XrBP0abLHXbKkQXaUVBaPcZdFPjQCcorjj7ySDEEzYK5WWy4X3CzEXRRiz5S04Sh qZ/CZoGI1fU2kC63nxM5pzpk3vCk23CChUzYVAm71g+z7O9FCwylXCwGutRkYkFAol+a nNOQ== X-Gm-Message-State: AOJu0YxrYk6ieHBc9Ezj+NUYpIBwSPddP0ZDtz0NpI/Fz/7j1EpHhn68 tOZwUp/36iTprBpObMNayfDNKSBWqE/iXBm/ll+aPw== X-Google-Smtp-Source: AGHT+IEeYBwNxoxs8ZRaP44Mgeph+xEjCx121fDLu0zjOXv7MQN5PPZxs6FoGMooFkqRw/h94coTrA== X-Received: by 2002:a81:6586:0:b0:576:7dfc:e73e with SMTP id z128-20020a816586000000b005767dfce73emr11288371ywb.32.1691426279308; Mon, 07 Aug 2023 09:37:59 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id s9-20020a81bf49000000b0054bfc94a10dsm2752894ywk.47.2023.08.07.09.37.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 09:37:59 -0700 (PDT) Date: Mon, 7 Aug 2023 12:37:58 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Junio C Hamano Subject: [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Message-ID: <93f830ca61d17bb20f63c6a4254fe95816ae1cbe.1691426160.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3, 2023-08-01), a bug was described where it's possible for Git to produce non-murmur3 hashes when the platform's "char" type is signed, and there are paths with characters whose highest bit is set (i.e. all characters >= 0x80). That patch allows the caller to control which version of Bloom filters are read and written. However, even on platforms with a signed "char" type, it is possible to reuse existing Bloom filters if and only if there are no changed paths in any commit's first parent tree-diff whose characters have their highest bit set. When this is the case, we can reuse the existing filter without having to compute a new one. This is done by marking trees which are known to have (or not have) any such paths. When a commit's root tree is verified to not have any such paths, we mark it as such and declare that the commit's Bloom filter is reusable. Note that this heuristic only goes in one direction. If neither a commit nor its first parent have any paths in their trees with non-ASCII characters, then we know for certain that a path with non-ASCII characters will not appear in a tree-diff against that commit's first parent. The reverse isn't necessarily true: just because the tree-diff doesn't contain any such paths does not imply that no such paths exist in either tree. So we end up recomputing some Bloom filters that we don't strictly have to (i.e. their bits are the same no matter which version of murmur3 we use). But culling these out is impossible, since we'd have to perform the full tree-diff, which is the same effort as computing the Bloom filter from scratch. But because we can cache our results in each tree's flag bits, we can often avoid recomputing many filters, thereby reducing the time it takes to run $ git commit-graph write --changed-paths --reachable when upgrading from v1 to v2 Bloom filters. To benchmark this, let's generate a commit-graph in linux.git with v1 changed-paths in generation order[^1]: $ git clone git@github.com:torvalds/linux.git $ cd linux $ git commit-graph write --reachable --changed-paths $ graph=".git/objects/info/commit-graph" $ mv $graph{,.bak} Then let's time how long it takes to go from v1 to v2 filters (with and without the upgrade path enabled), resetting the state of the commit-graph each time: $ git config commitGraph.changedPathsVersion 2 $ hyperfine -p 'cp -f $graph.bak $graph' -L v 0,1 \ 'GIT_TEST_UPGRADE_BLOOM_FILTERS={v} git.compile commit-graph write --reachable --changed-paths' On linux.git (where there aren't any non-ASCII paths), the timings indicate that this patch represents a speed-up over recomputing all Bloom filters from scratch: Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] Range (min … max): 124.621 s … 125.227 s 3 runs Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] Range (min … max): 79.112 s … 79.437 s 3 runs Summary 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' On git.git, we do have some non-ASCII paths, giving us a more modest improvement from 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. On my machine, the stats for git.git are: - 8,285 Bloom filters computed from scratch - 10 Bloom filters generated as empty - 4 Bloom filters generated as truncated due to too many changed paths - 65,114 Bloom filters were reused when transitioning from v1 to v2. [^1]: Note that this is is important, since `--stdin-packs` or `--stdin-commits` orders commits in the commit-graph by their pack position (with `--stdin-packs`) or in the raw input (with `--stdin-commits`). Since we compute Bloom filters in the same order that commits appear in the graph, we must see a commit's (first) parent before we process the commit itself. This is only guaranteed to happen when sorting commits by their generation number. Signed-off-by: Taylor Blau --- bloom.c | 90 ++++++++++++++++++++++++++++++++++++++++++-- bloom.h | 1 + commit-graph.c | 5 +++ object.h | 1 + t/t4216-log-bloom.sh | 35 ++++++++++++++++- 5 files changed, 127 insertions(+), 5 deletions(-) diff --git a/bloom.c b/bloom.c index 739fa093ba..24dd874e46 100644 --- a/bloom.c +++ b/bloom.c @@ -7,6 +7,9 @@ #include "commit-graph.h" #include "commit.h" #include "commit-slab.h" +#include "tree.h" +#include "tree-walk.h" +#include "config.h" define_commit_slab(bloom_filter_slab, struct bloom_filter); @@ -250,6 +253,73 @@ static void init_truncated_large_filter(struct bloom_filter *filter, filter->version = version; } +#define VISITED (1u<<21) +#define HIGH_BITS (1u<<22) + +static int has_entries_with_high_bit(struct repository *r, struct tree *t) +{ + if (parse_tree(t)) + return 1; + + if (!(t->object.flags & VISITED)) { + struct tree_desc desc; + struct name_entry entry; + + init_tree_desc(&desc, t->buffer, t->size); + while (tree_entry(&desc, &entry)) { + size_t i; + for (i = 0; i < entry.pathlen; i++) { + if (entry.path[i] & 0x80) { + t->object.flags |= HIGH_BITS; + goto done; + } + } + + if (S_ISDIR(entry.mode)) { + struct tree *sub = lookup_tree(r, &entry.oid); + if (sub && has_entries_with_high_bit(r, sub)) { + t->object.flags |= HIGH_BITS; + goto done; + } + } + + } + +done: + t->object.flags |= VISITED; + } + + return !!(t->object.flags & HIGH_BITS); +} + +static int commit_tree_has_high_bit_paths(struct repository *r, + struct commit *c) +{ + struct tree *t; + if (repo_parse_commit(r, c)) + return 1; + t = repo_get_commit_tree(r, c); + if (!t) + return 1; + return has_entries_with_high_bit(r, t); +} + +static struct bloom_filter *upgrade_filter(struct repository *r, struct commit *c, + struct bloom_filter *filter, + int hash_version) +{ + struct commit_list *p = c->parents; + if (commit_tree_has_high_bit_paths(r, c)) + return NULL; + + if (p && commit_tree_has_high_bit_paths(r, p->item)) + return NULL; + + filter->version = hash_version; + + return filter; +} + struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) { struct bloom_filter *filter; @@ -292,9 +362,23 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter, graph_pos); } - if ((filter->data && filter->len) && - (!settings || settings->hash_version == filter->version)) - return filter; + if (filter->data && filter->len) { + struct bloom_filter *upgrade; + if (!settings || settings->hash_version == filter->version) + return filter; + + /* version mismatch, see if we can upgrade */ + if (compute_if_not_present && + git_env_bool("GIT_TEST_UPGRADE_BLOOM_FILTERS", 1)) { + upgrade = upgrade_filter(r, c, filter, + settings->hash_version); + if (upgrade) { + if (computed) + *computed |= BLOOM_UPGRADED; + return upgrade; + } + } + } if (!compute_if_not_present) return NULL; diff --git a/bloom.h b/bloom.h index 2b1c124bb5..4462fc3908 100644 --- a/bloom.h +++ b/bloom.h @@ -102,6 +102,7 @@ enum bloom_filter_computed { BLOOM_COMPUTED = (1 << 1), BLOOM_TRUNC_LARGE = (1 << 2), BLOOM_TRUNC_EMPTY = (1 << 3), + BLOOM_UPGRADED = (1 << 4), }; struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff --git a/commit-graph.c b/commit-graph.c index 60e5f9ada7..62e10c8f40 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1048,6 +1048,7 @@ struct write_commit_graph_context { int count_bloom_filter_not_computed; int count_bloom_filter_trunc_empty; int count_bloom_filter_trunc_large; + int count_bloom_filter_upgraded; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1654,6 +1655,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte ctx->count_bloom_filter_trunc_empty); trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large", ctx->count_bloom_filter_trunc_large); + trace2_data_intmax("commit-graph", ctx->r, "filter-upgraded", + ctx->count_bloom_filter_upgraded); } static void compute_bloom_filters(struct write_commit_graph_context *ctx) @@ -1695,6 +1698,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_empty++; if (computed & BLOOM_TRUNC_LARGE) ctx->count_bloom_filter_trunc_large++; + } else if (computed & BLOOM_UPGRADED) { + ctx->count_bloom_filter_upgraded++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; ctx->total_bloom_filter_data_size += filter diff --git a/object.h b/object.h index db25714b4e..2e5e08725f 100644 --- a/object.h +++ b/object.h @@ -75,6 +75,7 @@ void object_array_init(struct object_array *array); * commit-reach.c: 16-----19 * sha1-name.c: 20 * list-objects-filter.c: 21 + * bloom.c: 2122 * builtin/fsck.c: 0--3 * builtin/gc.c: 0 * builtin/index-pack.c: 2021 diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index a77caca789..48f8109a66 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -217,6 +217,10 @@ test_filter_trunc_large () { grep "\"key\":\"filter-trunc-large\",\"value\":\"$1\"" $2 } +test_filter_upgraded () { + grep "\"key\":\"filter-upgraded\",\"value\":\"$1\"" $2 +} + test_expect_success 'correctly report changes over limit' ' git init limits && ( @@ -543,10 +547,19 @@ test_expect_success 'when writing another commit graph, preserve existing versio test_expect_success 'when writing commit graph, do not reuse changed-path of another version' ' git init doublewrite && test_commit -C doublewrite c "$CENT" && + git -C doublewrite config --add commitgraph.changedPathsVersion 1 && - git -C doublewrite commit-graph write --reachable --changed-paths && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C doublewrite commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + git -C doublewrite config --add commitgraph.changedPathsVersion 2 && - git -C doublewrite commit-graph write --reachable --changed-paths && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C doublewrite commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + ( cd doublewrite && echo "c01f" >expect && @@ -555,4 +568,22 @@ test_expect_success 'when writing commit graph, do not reuse changed-path of ano ) ' +test_expect_success 'when writing commit graph, reuse changed-path of another version where possible' ' + git init upgrade && + + test_commit -C upgrade base no-high-bits && + + git -C upgrade config --add commitgraph.changedPathsVersion 1 && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C upgrade commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + + git -C upgrade config --add commitgraph.changedPathsVersion 2 && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C upgrade commit-graph write --reachable --changed-paths && + test_filter_computed 0 trace2.txt && + test_filter_upgraded 1 trace2.txt +' + test_done