From patchwork Wed Aug 30 16:43:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370376 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43DF7C83F01 for ; Wed, 30 Aug 2023 18:29:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230237AbjH3S3S (ORCPT ); Wed, 30 Aug 2023 14:29:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343742AbjH3QoK (ORCPT ); Wed, 30 Aug 2023 12:44:10 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C5851A3 for ; Wed, 30 Aug 2023 09:44:07 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-594e8207103so61968297b3.2 for ; Wed, 30 Aug 2023 09:44:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413841; x=1694018641; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QoHkY0gwz3nTB1tdRU4Jvo65AWGB7v8s4sgW6WMdEOs=; b=1RbBJOt/QldDkliZgq+ZJNzgX1zJN2S80aQai7lUKu3E0GI3aGPN2Biyxwe9UtuCXd fWkM28BXJE7Yz25aBlHNeX/PIqcAt5vFsKuE5BTFItKagQIMrEpepZsCUsj1ldzG7xb/ AoIhnOEh3yfsZEs0mrPAqQmMG1XpYQFGpDGaiZkQTlB5YwsBBEW/DhAQLTFx8N1MOW/f bkpifJ4QVpbv/j/BjvOa7xb5EXF0dOhC4i19cuVtaYhdb2ZSE//pikS62NkuVNF8hw0/ mC+5QvMJPynR+HEKhzWfGi4vjWpeigwCQ6Wj8SKWzVloXpalq2SxOxWyyYhcN0fdjHm2 iJJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413841; x=1694018641; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QoHkY0gwz3nTB1tdRU4Jvo65AWGB7v8s4sgW6WMdEOs=; b=NH4aqT7l9MRtA7oPr5yj0HjDmfsQfssPKslhgdAEbs2VGAMgn9vBGohlymXEw+Dndz WSXH6ECTZcGp/geuzHvrNvEro59G7m4pVBNiXMZ1H32NxsuKvUL1lVVLzf0iv7HG3ozp CVJaGXQm7i08vi5+dWqh5FUFtyHTvv4IuG8PC40pBcgoic2s2zG0doqbSaIgQn6lVv// kwvmuN4UL8mkPs8dffscJd3dYDIcdZnUL1OnCHRtnNkUNKWltwRlHcAPncFjS7MACBNQ Vt6gaFTjymx+MfuOvEJ1Mgzexu3cN9P0PpxNdaf4XvrZcxEOAW3ntPyKkqWblwulqTre XwVA== X-Gm-Message-State: AOJu0Ywf+X8gE7rqyGQ0+xSCOGG3DEH1UHgbF50vPtrWYpx4+rMiq3Mo IHyNfs8GQhSn5j1zKsEepIsspzmoalKzwa52T4DFzOiPDc+651S0IjN6iGyrT0Ur8+8uEeitxp0 hix2vitDB+RNLxvkmsqXXz3TdCCliZWaugoRn63bfOHmcjj9wmtfw1fxJZmdUHg11QAx/7pWcbY Fa X-Google-Smtp-Source: AGHT+IEuhrYMLPQ9ipbsdb8whi98ootygiZcsYyHFLWWi2SJokMXRwzW0FjciqOHwXQD747Sjo8l1s1Qqi1lnrOGwuzZ X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:ced0:0:b0:d77:d70c:b5cd with SMTP id x199-20020a25ced0000000b00d77d70cb5cdmr71148ybe.12.1693413841694; Wed, 30 Aug 2023 09:44:01 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:41 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: Subject: [PATCH v2 01/15] gitformat-commit-graph: describe version 2 of BDAT From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " , Taylor Blau Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The code change to Git to support version 2 will be done in subsequent commits. Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/gitformat-commit-graph.txt | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/Documentation/gitformat-commit-graph.txt b/Documentation/gitformat-commit-graph.txt index 31cad585e2..3e906e8030 100644 --- a/Documentation/gitformat-commit-graph.txt +++ b/Documentation/gitformat-commit-graph.txt @@ -142,13 +142,16 @@ All multi-byte numbers are in network byte order. ==== Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] * It starts with header consisting of three unsigned 32-bit integers: - - Version of the hash algorithm being used. We currently only support - value 1 which corresponds to the 32-bit version of the murmur3 hash + - Version of the hash algorithm being used. We currently support + value 2 which corresponds to the 32-bit version of the murmur3 hash implemented exactly as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double hashing technique using seed values 0x293ae76f and 0x7e646e2 as described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters - in Probabilistic Verification" + in Probabilistic Verification". Version 1 Bloom filters have a bug that appears + when char is signed and the repository has path names that have characters >= + 0x80; Git supports reading and writing them, but this ability will be removed + in a future version of Git. - The number of times a path is hashed and hence the number of bit positions that cumulatively determine whether a file is present in the commit. - The minimum number of bits 'b' per entry in the Bloom filter. If the filter From patchwork Wed Aug 30 16:43:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370382 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31DCFC83F23 for ; Wed, 30 Aug 2023 18:29:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232233AbjH3S3i (ORCPT ); Wed, 30 Aug 2023 14:29:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343744AbjH3QoM (ORCPT ); Wed, 30 Aug 2023 12:44:12 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1986C19A for ; Wed, 30 Aug 2023 09:44:09 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d7b9eb73dcdso1294861276.0 for ; Wed, 30 Aug 2023 09:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413843; x=1694018643; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=40QUFdrrZDF9ZPnRhPDyIilLn4FiUKdNlrwqCPe1dik=; b=Y2Ine4hIlUo1EqLEC5Kr6YNWd9uruunTtUi2I7iYHXbgwLzOfyzT+Bd8lqB8MPvK88 ZPlxMJgIK1eThr2msBOptFSDrBQyT7jC7uDd9Xm3r7rNj3o9FDkR2cD4HYb/M1fvO5/R pRiBhC4+8Sn0k3oYaFTT9/AP6HaeyZ54VHcj5IBAkuJ8ngFYhiiNc0qVILuukVhNEzbI nzjKOvAZyl36IhamGUe1xjzxpIcH5fw2IgMXLHb0CR19N6PTmXo9JUMffYJ4enDfqPFK TdSYEOdxBCpnGYRRlTHH255vn6Bif7hPMK7CE0Nl7CY9DlmNmRE1wwe2Do1ZPBRHEh2W oUJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413843; x=1694018643; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=40QUFdrrZDF9ZPnRhPDyIilLn4FiUKdNlrwqCPe1dik=; b=PZDmPgOl8PShxODUh+z61SB9CaXiOwj9VHwhkjwmVyFOSvqcN2xHnW3+14N4lmbNRP uFpfsDcp8Lcwy+2PBR61dcxjX6yDFPDZFo2xpzw3QX8sUPhOVYWYjknOPr4eKnPU2G1B fCQEh/H/hSYQBKUMJ1eDb8EOG9RZf0ikmLNS2w4HRx3TfUCzPWUwXyXR4W30xdKYiJ6B 8+CAOQmRGBBloCkRFO6J/+XdwMGuZS4lVAZZwVDrz7xquLI82XJgNahpMgPS7FXLxeP1 NwNrJcWyFN5DRpXwyBxPi1XhZ+XG5gex7ii4ODvwt7112V1mvTmI/93AUC7hiJwjcDrC C3UQ== X-Gm-Message-State: AOJu0YwE8iy3hBG3pUvFAMj/J1J3vfL/O/CW0kaEtOZBDAbDmwsSD4E4 uQ78LUq+ozagkkGQV0qArYb1uQcFxKDfynZHA/o1oVOrvdKkIC5/HucRMS//lwXnOKkjPojY8Ia MWXvTTTR3yMn8E+VdF3tLuBhqoAriJtQy2WvEi9KeC7Rpsr0RADfT83SnYkanGKszDQ4em+TJCM kh X-Google-Smtp-Source: AGHT+IEAM6BhjpaMkrpZ7pR9Vpni2V4ZHMzvkvFHEoCq3II7dAvRZQGulVopb9Qo6TjhWubqIiM7Twx+m+JOvW/CG95z X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:d383:0:b0:d7a:c493:f570 with SMTP id e125-20020a25d383000000b00d7ac493f570mr79035ybf.1.1693413843343; Wed, 30 Aug 2023 09:44:03 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:42 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <0c56f2a9e970b0ba2063c87254cd8e5d24f506de.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau Prepare for the 'read-graph' test helper to perform other tasks besides dumping high-level information about the commit-graph by extracting its main routine into a separate function. Signed-off-by: Taylor Blau Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- t/helper/test-read-graph.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 8c7a83f578..3375392f6c 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -5,20 +5,8 @@ #include "bloom.h" #include "setup.h" -int cmd__read_graph(int argc UNUSED, const char **argv UNUSED) +static void dump_graph_info(struct commit_graph *graph) { - struct commit_graph *graph = NULL; - struct object_directory *odb; - - setup_git_directory(); - odb = the_repository->objects->odb; - - prepare_repo_settings(the_repository); - - graph = read_commit_graph_one(the_repository, odb); - if (!graph) - return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), @@ -57,6 +45,23 @@ int cmd__read_graph(int argc UNUSED, const char **argv UNUSED) if (graph->topo_levels) printf(" topo_levels"); printf("\n"); +} + +int cmd__read_graph(int argc UNUSED, const char **argv UNUSED) +{ + struct commit_graph *graph = NULL; + struct object_directory *odb; + + setup_git_directory(); + odb = the_repository->objects->odb; + + prepare_repo_settings(the_repository); + + graph = read_commit_graph_one(the_repository, odb); + if (!graph) + return 1; + + dump_graph_info(graph); UNLEAK(graph); From patchwork Wed Aug 30 16:43:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EE91C83F17 for ; Wed, 30 Aug 2023 18:29:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232732AbjH3S34 (ORCPT ); Wed, 30 Aug 2023 14:29:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343743AbjH3QoM (ORCPT ); Wed, 30 Aug 2023 12:44:12 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A53CD1A2 for ; Wed, 30 Aug 2023 09:44:09 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c647150c254so1959491276.1 for ; Wed, 30 Aug 2023 09:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413845; x=1694018645; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mNswL+rz+pumA2VEb8BDNE4gOySTEWKOH0eUpEAy940=; b=cMNAd/JJJ5PHTQfXHlkRIbTCq8MsE1lLR1sHBEPg1ilkQC/we/ZHGd9jOJL5/SCi6l GNZ7OoXzKBZEDDEii2M1t4HgV0HxMrFTsRIFDHnMpj+nT9KmGXqfVco4oOkB964UsYhH E75/oyjQDYIweqxb/3mRuOf4VeeDRdxwFMx1zgnPOv8e0sY7yvjtvzF3TZrRuE4TTEl7 tInMXzOq3WecT7ykmrb+af5JM0+S2AAq/z4KqbBy+aVvTrTLL8u+gFw+G4i2++MK9SSa ry1wvbA+rZYEchTXxFda9WsZFY+jydqSW/KEYM7G5f5CR3XzW9vQwI3GPpcCv8XJYRuI OdsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413845; x=1694018645; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mNswL+rz+pumA2VEb8BDNE4gOySTEWKOH0eUpEAy940=; b=fl3qJEdQ6MTBYOdb8qSfGTrxF3UUTvWB1TXcK7iP2kiR9dXbvBZD4TBSS2jLptSkpf GY2iTOILM5sfuUylpZk72eWC/ILkTif1fs97J9n1cdPhXYl57yB25E2xSyuEjIyU31HD GKsfcAmXw2nJHXjm5Qpim7ojy5mWlDHLKrTfPuvmBwgz9HTSpPh/zp3kIX3ze9zXkzmJ yqgjdYxLmYUHmNAMQvooVoyaMoaXM7JEBiYscgcy/lYHEEk2+mdWSxdFfX9T/0E88ND7 BXqrb7PdHz7f5nFRJ9dsXig7z4jH8UTU/savcd7nhBP6/E/wrzNcZ4KU8DFQ1QXZBnaa dfgA== X-Gm-Message-State: AOJu0YwsfDX5y1Z1jn7gy3ys4tFlr94oegK0/OurTws6nRHXfwstwq98 c25JdkUjQsQyGXVI6pLhkNS4E4VNN9IcNDLbABuQd2vO6lzsN4JabavqDQ9QD0wJaYCCZZA6FOy yqHETSizwqIs0RfpcvvRYX7tMltCTJ5JAqySXEB0S+RcGxmwpI1C9Z3g3qUPye8L4xvEjbwNsPe 4b X-Google-Smtp-Source: AGHT+IFOrzAyzRqkoETI1m5GkV9RWYT7vDXT5YcZyIOiU39h7cb5mwFkhT6o15UbIjcbslsA4rMzqt723EmwiPv4duD5 X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:d155:0:b0:ca3:3341:6315 with SMTP id i82-20020a25d155000000b00ca333416315mr10858ybg.0.1693413844971; Wed, 30 Aug 2023 09:44:04 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:43 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <8405b845e5a8f56a03c98708185c3402e172d311.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 03/15] bloom.h: make `load_bloom_filter_from_graph()` public From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau Prepare for a future commit to use the load_bloom_filter_from_graph() function directly to load specific Bloom filters out of the commit-graph for manual inspection (to be used during tests). Signed-off-by: Taylor Blau Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- bloom.c | 6 +++--- bloom.h | 5 +++++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/bloom.c b/bloom.c index aef6b5fea2..3e78cfe79d 100644 --- a/bloom.c +++ b/bloom.c @@ -29,9 +29,9 @@ static inline unsigned char get_bitmask(uint32_t pos) return ((unsigned char)1) << (pos & (BITS_PER_WORD - 1)); } -static int load_bloom_filter_from_graph(struct commit_graph *g, - struct bloom_filter *filter, - uint32_t graph_pos) +int load_bloom_filter_from_graph(struct commit_graph *g, + struct bloom_filter *filter, + uint32_t graph_pos) { uint32_t lex_pos, start_index, end_index; diff --git a/bloom.h b/bloom.h index adde6dfe21..1e4f612d2c 100644 --- a/bloom.h +++ b/bloom.h @@ -3,6 +3,7 @@ struct commit; struct repository; +struct commit_graph; struct bloom_filter_settings { /* @@ -68,6 +69,10 @@ struct bloom_key { uint32_t *hashes; }; +int load_bloom_filter_from_graph(struct commit_graph *g, + struct bloom_filter *filter, + uint32_t graph_pos); + /* * Calculate the murmur3 32-bit hash value for the given data * using the given seed. From patchwork Wed Aug 30 16:43:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370374 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 583F8C83F17 for ; Wed, 30 Aug 2023 18:29:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230144AbjH3S3P (ORCPT ); Wed, 30 Aug 2023 14:29:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343746AbjH3QoN (ORCPT ); Wed, 30 Aug 2023 12:44:13 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D4841A3 for ; Wed, 30 Aug 2023 09:44:10 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-58d9e327d3aso78599897b3.3 for ; Wed, 30 Aug 2023 09:44:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413846; x=1694018646; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uC2JFGNAOhYsPuPpZuh5TqQsTwfLN0uDDymzZBtqK6M=; b=5XeLst22qUZbxSoyTLffQmsNbAQIlT5yP1SgXCylOcTJaVMDroFqAeJJio0jt5OJfL ej7df7+hzYkOhYQwj9FSeWc5ceyTevC2WirjSe5GYV6ygPMYqZl67s1ueVIzKhh3f52S Dh+ZE8Gl3I6qXXobHNXvkfxqIO9eCq9A79DkKRs+YRUwNb8RpIfwPsJVmb1zHQaZpVVv 9bgxAi2cvIk75uSxSRCjCPISoSdtaMUGZdzY02Ws5Ix63UQVKSh2p8yzKPLvEZYy7fqH TGNmmNw4L5vc60hC6oxCssUdhi05Xp2zQcpKYrq8xrFAFtby3FGlXQbxIhWnv2a+NUPG j3Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413846; x=1694018646; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uC2JFGNAOhYsPuPpZuh5TqQsTwfLN0uDDymzZBtqK6M=; b=YAy6/8eFikU4h4/ofDJ8Q10pSVZj0eWF/1Yp0OjviAr3o4J5sbw6yG/jqzsksHhkzj W1n69Ws1cK1Vkvun73VIz5Nh4GX3QNQFF/fZ5E0GTg+wDKOnik49XZ6NFdZ8hDI5AWN9 uLPK+U+NaHOxZZbcaW6ktrY/24cyjBl5KsDh6M1HgLSEHCG4N+zJNH/zUSRkByh6d50V ue7SzUKiBa+Plwrabylfch+wJcMUsK1AX0VC02EK26oAdl6QCVmEEVgai+IatULdxPeq jv+lu8Qt94NLuSei80BgzcPFl3HZZxn9wxzijXix46HT4rLBRU87OnEwaAEsC+OwCooS +MHw== X-Gm-Message-State: AOJu0Yzao1hh8H9g6wdzUrvS+9reoLwvp0OmKaKX9feRe3d789Qj3XQo VyGw6EKW/UOTeOvZeEORDpX44dqcw88nuJJ5obvWWYCd5FyCQX178OEX8XhxZOU1+I2i+tm7dsU heXD3bZlAhtDVqw26+3cb+GZiQt7TxjozqBog71BF46fHzrt4IR4p/AzTO9bY4ZgvorC1zhEoq4 Gf X-Google-Smtp-Source: AGHT+IE1r07DKY7gjzbovmL2OvnZgk9MxXGopkVxYaS/ZeawjsA2iRzAMKe80CF4HyxrbEhriU3UUKes3VP9pc7fHPLP X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a81:a785:0:b0:579:e07c:2798 with SMTP id e127-20020a81a785000000b00579e07c2798mr82919ywh.2.1693413846697; Wed, 30 Aug 2023 09:44:06 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:44 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <3a25f90c15b65df29c31b205346d303abd54ce96.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 04/15] t/helper/test-read-graph: implement `bloom-filters` mode From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau Implement a mode of the "read-graph" test helper to dump out the hexadecimal contents of the Bloom filter(s) contained in a commit-graph. Signed-off-by: Taylor Blau Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- t/helper/test-read-graph.c | 42 +++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 3375392f6c..899b5f41cc 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -47,10 +47,32 @@ static void dump_graph_info(struct commit_graph *graph) printf("\n"); } -int cmd__read_graph(int argc UNUSED, const char **argv UNUSED) +static void dump_graph_bloom_filters(struct commit_graph *graph) +{ + uint32_t i; + + for (i = 0; i < graph->num_commits + graph->num_commits_in_base; i++) { + struct bloom_filter filter = { 0 }; + size_t j; + + if (load_bloom_filter_from_graph(graph, &filter, i) < 0) { + fprintf(stderr, "missing Bloom filter for graph " + "position %"PRIu32"\n", i); + continue; + } + + for (j = 0; j < filter.len; j++) + printf("%02x", filter.data[j]); + if (filter.len) + printf("\n"); + } +} + +int cmd__read_graph(int argc, const char **argv) { struct commit_graph *graph = NULL; struct object_directory *odb; + int ret = 0; setup_git_directory(); odb = the_repository->objects->odb; @@ -58,12 +80,22 @@ int cmd__read_graph(int argc UNUSED, const char **argv UNUSED) prepare_repo_settings(the_repository); graph = read_commit_graph_one(the_repository, odb); - if (!graph) - return 1; + if (!graph) { + ret = 1; + goto done; + } - dump_graph_info(graph); + if (argc <= 1) + dump_graph_info(graph); + else if (!strcmp(argv[1], "bloom-filters")) + dump_graph_bloom_filters(graph); + else { + fprintf(stderr, "unknown sub-command: '%s'\n", argv[1]); + ret = 1; + } +done: UNLEAK(graph); - return 0; + return ret; } From patchwork Wed Aug 30 16:43:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370384 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4141C83F1B for ; Wed, 30 Aug 2023 18:29:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232388AbjH3S3r (ORCPT ); Wed, 30 Aug 2023 14:29:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343745AbjH3QoM (ORCPT ); Wed, 30 Aug 2023 12:44:12 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EA781A1 for ; Wed, 30 Aug 2023 09:44:09 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-58e4d2b7d16so81983167b3.0 for ; Wed, 30 Aug 2023 09:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413848; x=1694018648; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=f4lD4HwO+DkoOj8CZD/xO8HI1nsAXhFIfmBS8F/rsik=; b=wPLWEGdAIs2BmrTzymdwWBXM+5TJ/F65t4DcooLlBF51Fnvg+WA74LEiIXoW94enxc 1yTSWoJIdvoVxwzl6lic51bCiHaqqoedIE8g8mgCgmnYOMXxlhKOUQyNXTzZ+n7kVfe8 DGvKC8l2QKwxbxSsjPYkoYgYt34DOoDFXciiZE82pOey8vTNmap8EII1vb1Zfn1FMs2O vqNZvdXkC3bSBQ0+4Ag283GFkzW5jm7q0HiiME0XEj7KlZ83GR8QyvzlxwEtx+dQcY8H 8bXKbC3rJdB5aCq4TEOJCNu0c8p4qxJQqA28jHspI6Id8XitLMvbe69Q39zt/ke6yXNO wZ3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413848; x=1694018648; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=f4lD4HwO+DkoOj8CZD/xO8HI1nsAXhFIfmBS8F/rsik=; b=GfIOuZpbTGxbdxkGZ/gA681gdLJDuEEu8TRS4Pab6ADLCtcQOGOj4XvJeGqFKC+PFh barEzC+otlTyNvKA20882n/HovgqsMzM5gsjnapC5yJ8iUkUHr5cHYR/oBjBnRiPsanv txl2pf53+2Xrbq/UiKw/TiYxVVonSDPc8vP5+WMI+7Kg5mPunCWOFUBPAD0gXwlfokCe n8FPlX81otXo7qNzTlVxsQNv4lCbEoK6QCFkVP12/OYoPgikJN0qlw2xrmJgLBQ2BZ+a OvQ6q7TZN7F/FMNFzSwDjB5eTUv4KC4QafRTaJ+14SaVasHGvrG7kDFVNgOnBGVulO26 hEIg== X-Gm-Message-State: AOJu0YwHKaxW/C3kaq+ErAmc+pKpJOBh4nyWFKbWg2njzzKTN7vgGzt8 Bvch1//ZbxCYjJNH0jELh3e6+OyepUgn8ilfciL22paCQFqSdbhS30/HRTgRrf+66IYcJq+MuMc QUpgI8SoOFLlMoDR+FV7c5ZFz5chekExTytpTHyrm9DawATlRZmeyNpjxyaKienAjwEwtePvlLN s9 X-Google-Smtp-Source: AGHT+IHXBF3axItj1n4PMddneg/Nm3dpkDbGMqPwqx3gWC/yQsHZpOX3t/A89CuMIw0XhVDWrXvAlrSuh2kdT70jN8DU X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a05:6902:1366:b0:d08:ea77:52d4 with SMTP id bt6-20020a056902136600b00d08ea7752d4mr80159ybb.12.1693413848498; Wed, 30 Aug 2023 09:44:08 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:45 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <785801066563fdc9ec74635886aaea36347e45dd.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 05/15] t4216: test changed path filters with high bit paths From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " , Taylor Blau Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Subsequent commits will teach Git another version of changed path filter that has different behavior with paths that contain at least one character with its high bit set, so test the existing behavior as a baseline. Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- t/t4216-log-bloom.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index fa9d32facf..0aa9719934 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -404,4 +404,56 @@ test_expect_success 'Bloom generation backfills empty commits' ' ) ' +get_first_changed_path_filter () { + test-tool read-graph bloom-filters >filters.dat && + head -n 1 filters.dat +} + +# chosen to be the same under all Unicode normalization forms +CENT=$(printf "\302\242") + +test_expect_success 'set up repo with high bit path, version 1 changed-path' ' + git init highbit1 && + test_commit -C highbit1 c1 "$CENT" && + git -C highbit1 commit-graph write --reachable --changed-paths +' + +test_expect_success 'setup check value of version 1 changed-path' ' + ( + cd highbit1 && + echo "52a9" >expect && + get_first_changed_path_filter >actual && + test_cmp expect actual + ) +' + +# expect will not match actual if char is unsigned by default. Write the test +# in this way, so that a user running this test script can still see if the two +# files match. (It will appear as an ordinary success if they match, and a skip +# if not.) +if test_cmp highbit1/expect highbit1/actual +then + test_set_prereq SIGNED_CHAR_BY_DEFAULT +fi +test_expect_success SIGNED_CHAR_BY_DEFAULT 'check value of version 1 changed-path' ' + # Only the prereq matters for this test. + true +' + +test_expect_success 'setup make another commit' ' + # "git log" does not use Bloom filters for root commits - see how, in + # revision.c, rev_compare_tree() (the only code path that eventually calls + # get_bloom_filter()) is only called by try_to_simplify_commit() when the commit + # has one parent. Therefore, make another commit so that we perform the tests on + # a non-root commit. + test_commit -C highbit1 anotherc1 "another$CENT" +' + +test_expect_success 'version 1 changed-path used when version 1 requested' ' + ( + cd highbit1 && + test_bloom_filters_used "-- another$CENT" + ) +' + test_done From patchwork Wed Aug 30 16:43:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEE44C83F25 for ; Wed, 30 Aug 2023 18:29:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231131AbjH3S3V (ORCPT ); Wed, 30 Aug 2023 14:29:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343747AbjH3QoN (ORCPT ); Wed, 30 Aug 2023 12:44:13 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14C121A4 for ; Wed, 30 Aug 2023 09:44:11 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-58c8cbf0a0dso18655637b3.1 for ; Wed, 30 Aug 2023 09:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413850; x=1694018650; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=a11gaofxrwFqbqRH+84gjUe3pKqOaKVtS0Dgj0k1MQI=; b=s/iMLwG5KsmkixRVn2xFtxSXZHPexKAb7rKPEbogkBuMP+CQj1Oh9M6L0TcSpMJ+xA WR+HinQqH2qjgPDdoJOYVm+LN5y3963XTZx+rHSccnI6kca0nxL+553DA2xxV3RBSKTI QCVhBiw8Zga9qp/nM4q456swuY6n7lSF13owv3+t13qxpbtApHAdzNR0ehSA/hNzpdW3 wsAi2I6evoDiJ3lmOjx4a0l2KUDNzTM1Wb5gGbBoKOyGygFSQg0yWWH6YG03ot6268ip hP3xEY+2zhJwzyvDp6N5L0bHX3WRr6MdfJWZ+U0HyVE3LJZwJdc92MZAPqS4EniELkMp jyfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413850; x=1694018650; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=a11gaofxrwFqbqRH+84gjUe3pKqOaKVtS0Dgj0k1MQI=; b=fT+nucDb3SrMyvzATQH4DsoCjcBZjpGtQ2oTgwNVnnszxSjeQq6+jcFIvh/EYYbc+p eve/ZRE465yEaYTOmAua45fFqsG5aNzB/un3c/L/0rbBQEXJ1osUoYQ5cBk/wX64BWND Mk/uIk3Tfc5QQHYDGERnFx5JAK06qtKTAfU0aNsmU5ycpqU3tC1bmIw12aJRtDFBzptz +2L6sULqeaPVV4YS4hj3Ht/hJhHY5n6Lq3AeNBLb9hSOUI8OWSBdsAhjW+HHKZNMRFgu e6EplghiMh0aTkv3N3MZjf0m3eNf08YuTlzU61dH2SzotmZEDRA5UfKG46JFHGZPdKhe Bttg== X-Gm-Message-State: AOJu0YxGmEzEJd3dTM6kKz50NFzEliTt2VqKqFiWDBNDT11N/z9itr/x +8CVKlMopjtBlincks4gI2Bn31ihhLUUHNFANaVqE9SeT6hyyGuT/rKrb+XPKeG8MfKzQAv+yCx Kke33SShHE4mh0tvxZgNh/O4tPfL8IEXztPQ1ETqcekkU9NZx+vPiRcIeJDSjBs13TDd1DufPzC zJ X-Google-Smtp-Source: AGHT+IGzxcgEP50oy9LSjTTIW2vGZVwXN/P3c/lRqWzYa4z9II5Lx1WEDcQ6NRtshyHfRSAFQ8IBFsqS+47KcwVlRjW1 X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a05:6902:1823:b0:d78:28d0:15bc with SMTP id cf35-20020a056902182300b00d7828d015bcmr6436ybb.4.1693413850250; Wed, 30 Aug 2023 09:44:10 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:46 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <94ad289dbb1cedfed2b8c66a3b805028e8038317.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 06/15] repo-settings: introduce commitgraph.changedPathsVersion From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " , Taylor Blau Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A subsequent commit will introduce another version of the changed-path filter in the commit graph file. In order to control which version to write (and read), a config variable is needed. Therefore, introduce this config variable. For forwards compatibility, teach Git to not read commit graphs when the config variable is set to an unsupported version. Because we teach Git this, commitgraph.readChangedPaths is now redundant, so deprecate it and define its behavior in terms of the config variable we introduce. This commit does not change the behavior of writing (Git writes changed path filters when explicitly instructed regardless of any config variable), but a subsequent commit will restrict Git such that it will only write when commitgraph.changedPathsVersion is a recognized value. Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/config/commitgraph.txt | 23 ++++++++++++++++++++--- commit-graph.c | 2 +- oss-fuzz/fuzz-commit-graph.c | 2 +- repo-settings.c | 6 +++++- repository.h | 2 +- 5 files changed, 28 insertions(+), 7 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index 30604e4a4c..2dc9170622 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -9,6 +9,23 @@ commitGraph.maxNewFilters:: commit-graph write` (c.f., linkgit:git-commit-graph[1]). commitGraph.readChangedPaths:: - If true, then git will use the changed-path Bloom filters in the - commit-graph file (if it exists, and they are present). Defaults to - true. See linkgit:git-commit-graph[1] for more information. + Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and + commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion + is also set, commitGraph.changedPathsVersion takes precedence.) + +commitGraph.changedPathsVersion:: + Specifies the version of the changed-path Bloom filters that Git will read and + write. May be -1, 0 or 1. ++ +Defaults to -1. ++ +If -1, Git will use the version of the changed-path Bloom filters in the +repository, defaulting to 1 if there are none. ++ +If 0, Git will not read any Bloom filters, and will write version 1 Bloom +filters when instructed to write. ++ +If 1, Git will only read version 1 Bloom filters, and will write version 1 +Bloom filters. ++ +See linkgit:git-commit-graph[1] for more information. diff --git a/commit-graph.c b/commit-graph.c index 0aa1640d15..da99f15fdf 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -401,7 +401,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, graph->read_generation_data = 1; } - if (s->commit_graph_read_changed_paths) { + if (s->commit_graph_changed_paths_version) { pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, &graph->chunk_bloom_indexes); read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c index 2992079dd9..325c0b991a 100644 --- a/oss-fuzz/fuzz-commit-graph.c +++ b/oss-fuzz/fuzz-commit-graph.c @@ -19,7 +19,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) * possible. */ the_repository->settings.commit_graph_generation_version = 2; - the_repository->settings.commit_graph_read_changed_paths = 1; + the_repository->settings.commit_graph_changed_paths_version = 1; g = parse_commit_graph(&the_repository->settings, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); diff --git a/repo-settings.c b/repo-settings.c index 525f69c0c7..db8fe817f3 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -24,6 +24,7 @@ void prepare_repo_settings(struct repository *r) int value; const char *strval; int manyfiles; + int read_changed_paths; if (!r->gitdir) BUG("Cannot add settings for uninitialized repository"); @@ -54,7 +55,10 @@ void prepare_repo_settings(struct repository *r) /* Commit graph config or default, does not cascade (simple) */ repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2); - repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1); + repo_cfg_bool(r, "commitgraph.readchangedpaths", &read_changed_paths, 1); + repo_cfg_int(r, "commitgraph.changedpathsversion", + &r->settings.commit_graph_changed_paths_version, + read_changed_paths ? -1 : 0); repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1); repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0); diff --git a/repository.h b/repository.h index 5f18486f64..f71154e12c 100644 --- a/repository.h +++ b/repository.h @@ -29,7 +29,7 @@ struct repo_settings { int core_commit_graph; int commit_graph_generation_version; - int commit_graph_read_changed_paths; + int commit_graph_changed_paths_version; int gc_write_commit_graph; int fetch_write_commit_graph; int command_requires_full_index; From patchwork Wed Aug 30 16:43:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370383 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02233C83F1C for ; Wed, 30 Aug 2023 18:29:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232003AbjH3S3g (ORCPT ); Wed, 30 Aug 2023 14:29:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343748AbjH3QoQ (ORCPT ); Wed, 30 Aug 2023 12:44:16 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6094619A for ; Wed, 30 Aug 2023 09:44:13 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-592210fe8easo81776507b3.3 for ; Wed, 30 Aug 2023 09:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413852; x=1694018652; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Q4mL2tbF3LkDJjowa2ynLMPhz+LsGJJ2c/Y7IBxmQhE=; b=NMujZADngW52aEeQOLiUY7v3SDL39xAQ7re8x0eQsd2Y+nomUfmLfxrLPI49gdBHJl W/t2HKXr5KSjhV1wMAG3xWtnDsxxcsMpGyZQokTB7U5LUsB2k7xx5npd95nHF+XBWFDp SuDT1U4KxXeuf4DkIdqRxzQ+NoH/LJikQVko5au4QSJKGBFH36jR92ElDECosjNPdUc6 yvjPOzk30zPtNgbXXbxL6/dbdDpXSix2apxYwCsmTr1WqgX0II5hu1eI6o6FOodMmop2 sKYxLSDNijinpb/RmfaYlkvJDOcoy6rdJg/7fYFMkRgWzYgdG2OS9mLzJd+VydRXO/aE B5Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413852; x=1694018652; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q4mL2tbF3LkDJjowa2ynLMPhz+LsGJJ2c/Y7IBxmQhE=; b=j/+jlDvzx3dxQKoSrQ7mAPG9EgMsBKE7ay2C9yyaqLumZqSsC9cXUKmzyBpfOx3ePj 1DhU7f2hGY6Sxhg18v0xyk8fKZmd556RC9+dgEevGw7BnhkK21rOxaDPQZw70JFQnd1j rjWnn/7EWwZXAsYhZiVvlJ5sQ4RRoc7+7//KPZL3SZvfmcrs1U6h5d0j9Nyw/idmhq2E a2Qp+0P+LxY3y7lNRH350mGOhzAlPqzD8JkyeWZL8h50pXwUyipwOes63I9lUHVhof40 BMGB8caLrkMTpj+oLBhL1gqzUK012A/8g2RH4AVq/JKI1GYPkC1XlemyRmwrDNdG7Msx zkAQ== X-Gm-Message-State: AOJu0Yydk9ctqbXKbyxTLbsIgxa2FOfVdWebjJc9CvFUqv4rkv9sSY6+ m8CjlDZtZeCW41MxANNX9KwiJP3t92OTIQcuIelDPbN1vQX1eqo/9wny+5zM6oOVGDt0wrqZNSk 22YPFcA9OrNt+FqFIq78PTFP8VpKPHDUBDfFAcI/XxSiyWoRtPVO9Wojn95dL02wCw7nPwhrJtZ DM X-Google-Smtp-Source: AGHT+IHTp32lagjGIi1p/FktkkT+IZLHgNOxzNDDe95aqFt8yrWRuoNch+wI5cwpQ6Yp1yo562upuaPJqJccJnC1JYzM X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:b323:0:b0:d4b:df05:3500 with SMTP id l35-20020a25b323000000b00d4bdf053500mr81859ybj.11.1693413852635; Wed, 30 Aug 2023 09:44:12 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:47 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <44d3163125026843d75ec609401d96acc3387f28.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 07/15] commit-graph: new filter ver. that fixes murmur3 From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " , Taylor Blau Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The murmur3 implementation in bloom.c has a bug when converting series of 4 bytes into network-order integers when char is signed (which is controllable by a compiler option, and the default signedness of char is platform-specific). When a string contains characters with the high bit set, this bug causes results that, although internally consistent within Git, does not accord with other implementations of murmur3 (thus, the changed path filters wouldn't be readable by other off-the-shelf implementatios of murmur3) and even with Git binaries that were compiled with different signedness of char. This bug affects both how Git writes changed path filters to disk and how Git interprets changed path filters on disk. Therefore, introduce a new version (2) of changed path filters that corrects this problem. The existing version (1) is still supported and is still the default, but users should migrate away from it as soon as possible. Because this bug only manifests with characters that have the high bit set, it may be possible that some (or all) commits in a given repo would have the same changed path filter both before and after this fix is applied. However, in order to determine whether this is the case, the changed paths would first have to be computed, at which point it is not much more expensive to just compute a new changed path filter. So this patch does not include any mechanism to "salvage" changed path filters from repositories. There is also no "mixed" mode - for each invocation of Git, reading and writing changed path filters are done with the same version number; this version number may be explicitly stated (typically if the user knows which version they need) or automatically determined from the version of the existing changed path filters in the repository. There is a change in write_commit_graph(). graph_read_bloom_data() makes it possible for chunk_bloom_data to be non-NULL but bloom_filter_settings to be NULL, which causes a segfault later on. I produced such a segfault while developing this patch, but couldn't find a way to reproduce it neither after this complete patch (or before), but in any case it seemed like a good thing to include that might help future patch authors. The value in t0095 was obtained from another murmur3 implementation using the following Go source code: package main import "fmt" import "github.com/spaolacci/murmur3" func main() { fmt.Printf("%x\n", murmur3.Sum32([]byte("Hello world!"))) fmt.Printf("%x\n", murmur3.Sum32([]byte{0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff})) } Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/config/commitgraph.txt | 5 +- bloom.c | 69 +++++++++++++++++- bloom.h | 8 +- commit-graph.c | 32 ++++++-- t/helper/test-bloom.c | 9 ++- t/t0095-bloom.sh | 8 ++ t/t4216-log-bloom.sh | 105 +++++++++++++++++++++++++++ 7 files changed, 223 insertions(+), 13 deletions(-) diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt index 2dc9170622..acc74a2f27 100644 --- a/Documentation/config/commitgraph.txt +++ b/Documentation/config/commitgraph.txt @@ -15,7 +15,7 @@ commitGraph.readChangedPaths:: commitGraph.changedPathsVersion:: Specifies the version of the changed-path Bloom filters that Git will read and - write. May be -1, 0 or 1. + write. May be -1, 0, 1, or 2. + Defaults to -1. + @@ -28,4 +28,7 @@ filters when instructed to write. If 1, Git will only read version 1 Bloom filters, and will write version 1 Bloom filters. + +If 2, Git will only read version 2 Bloom filters, and will write version 2 +Bloom filters. ++ See linkgit:git-commit-graph[1] for more information. diff --git a/bloom.c b/bloom.c index 3e78cfe79d..ebef5cfd2f 100644 --- a/bloom.c +++ b/bloom.c @@ -66,7 +66,64 @@ int load_bloom_filter_from_graph(struct commit_graph *g, * Not considered to be cryptographically secure. * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm */ -uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len) +uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len) +{ + const uint32_t c1 = 0xcc9e2d51; + const uint32_t c2 = 0x1b873593; + const uint32_t r1 = 15; + const uint32_t r2 = 13; + const uint32_t m = 5; + const uint32_t n = 0xe6546b64; + int i; + uint32_t k1 = 0; + const char *tail; + + int len4 = len / sizeof(uint32_t); + + uint32_t k; + for (i = 0; i < len4; i++) { + uint32_t byte1 = (uint32_t)(unsigned char)data[4*i]; + uint32_t byte2 = ((uint32_t)(unsigned char)data[4*i + 1]) << 8; + uint32_t byte3 = ((uint32_t)(unsigned char)data[4*i + 2]) << 16; + uint32_t byte4 = ((uint32_t)(unsigned char)data[4*i + 3]) << 24; + k = byte1 | byte2 | byte3 | byte4; + k *= c1; + k = rotate_left(k, r1); + k *= c2; + + seed ^= k; + seed = rotate_left(seed, r2) * m + n; + } + + tail = (data + len4 * sizeof(uint32_t)); + + switch (len & (sizeof(uint32_t) - 1)) { + case 3: + k1 ^= ((uint32_t)(unsigned char)tail[2]) << 16; + /*-fallthrough*/ + case 2: + k1 ^= ((uint32_t)(unsigned char)tail[1]) << 8; + /*-fallthrough*/ + case 1: + k1 ^= ((uint32_t)(unsigned char)tail[0]) << 0; + k1 *= c1; + k1 = rotate_left(k1, r1); + k1 *= c2; + seed ^= k1; + break; + } + + seed ^= (uint32_t)len; + seed ^= (seed >> 16); + seed *= 0x85ebca6b; + seed ^= (seed >> 13); + seed *= 0xc2b2ae35; + seed ^= (seed >> 16); + + return seed; +} + +static uint32_t murmur3_seeded_v1(uint32_t seed, const char *data, size_t len) { const uint32_t c1 = 0xcc9e2d51; const uint32_t c2 = 0x1b873593; @@ -131,8 +188,14 @@ void fill_bloom_key(const char *data, int i; const uint32_t seed0 = 0x293ae76f; const uint32_t seed1 = 0x7e646e2c; - const uint32_t hash0 = murmur3_seeded(seed0, data, len); - const uint32_t hash1 = murmur3_seeded(seed1, data, len); + uint32_t hash0, hash1; + if (settings->hash_version == 2) { + hash0 = murmur3_seeded_v2(seed0, data, len); + hash1 = murmur3_seeded_v2(seed1, data, len); + } else { + hash0 = murmur3_seeded_v1(seed0, data, len); + hash1 = murmur3_seeded_v1(seed1, data, len); + } key->hashes = (uint32_t *)xcalloc(settings->num_hashes, sizeof(uint32_t)); for (i = 0; i < settings->num_hashes; i++) diff --git a/bloom.h b/bloom.h index 1e4f612d2c..138d57a86b 100644 --- a/bloom.h +++ b/bloom.h @@ -8,9 +8,11 @@ struct commit_graph; struct bloom_filter_settings { /* * The version of the hashing technique being used. - * We currently only support version = 1 which is + * The newest version is 2, which is * the seeded murmur3 hashing technique implemented - * in bloom.c. + * in bloom.c. Bloom filters of version 1 were created + * with prior versions of Git, which had a bug in the + * implementation of the hash function. */ uint32_t hash_version; @@ -80,7 +82,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g, * Not considered to be cryptographically secure. * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm */ -uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len); +uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len); void fill_bloom_key(const char *data, size_t len, diff --git a/commit-graph.c b/commit-graph.c index da99f15fdf..f7322c4fff 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -304,17 +304,26 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, return 0; } +struct graph_read_bloom_data_context { + struct commit_graph *g; + int *commit_graph_changed_paths_version; +}; + static int graph_read_bloom_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { - struct commit_graph *g = data; + struct graph_read_bloom_data_context *c = data; + struct commit_graph *g = c->g; uint32_t hash_version; - g->chunk_bloom_data = chunk_start; hash_version = get_be32(chunk_start); - if (hash_version != 1) + if (*c->commit_graph_changed_paths_version == -1) { + *c->commit_graph_changed_paths_version = hash_version; + } else if (hash_version != *c->commit_graph_changed_paths_version) { return 0; + } + g->chunk_bloom_data = chunk_start; g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); g->bloom_filter_settings->hash_version = hash_version; g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4); @@ -402,10 +411,14 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, } if (s->commit_graph_changed_paths_version) { + struct graph_read_bloom_data_context context = { + .g = graph, + .commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version + }; pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, &graph->chunk_bloom_indexes); read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, - graph_read_bloom_data, graph); + graph_read_bloom_data, &context); } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { @@ -2376,6 +2389,13 @@ int write_commit_graph(struct object_directory *odb, } if (!commit_graph_compatible(r)) return 0; + if (r->settings.commit_graph_changed_paths_version < -1 + || r->settings.commit_graph_changed_paths_version > 2) { + warning(_("attempting to write a commit-graph, but " + "'commitgraph.changedPathsVersion' (%d) is not supported"), + r->settings.commit_graph_changed_paths_version); + return 0; + } CALLOC_ARRAY(ctx, 1); ctx->r = r; @@ -2388,6 +2408,8 @@ int write_commit_graph(struct object_directory *odb, ctx->write_generation_data = (get_configured_generation_version(r) == 2); ctx->num_generation_data_overflows = 0; + bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2 + ? 2 : 1; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", @@ -2417,7 +2439,7 @@ int write_commit_graph(struct object_directory *odb, g = ctx->r->objects->commit_graph; /* We have changed-paths already. Keep them in the next graph */ - if (g && g->chunk_bloom_data) { + if (g && g->bloom_filter_settings) { ctx->changed_paths = 1; ctx->bloom_settings = g->bloom_filter_settings; } diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c index aabe31d724..3cbc0a5b50 100644 --- a/t/helper/test-bloom.c +++ b/t/helper/test-bloom.c @@ -50,6 +50,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid) static const char *bloom_usage = "\n" " test-tool bloom get_murmur3 \n" +" test-tool bloom get_murmur3_seven_highbit\n" " test-tool bloom generate_filter [...]\n" " test-tool bloom get_filter_for_commit \n"; @@ -64,7 +65,13 @@ int cmd__bloom(int argc, const char **argv) uint32_t hashed; if (argc < 3) usage(bloom_usage); - hashed = murmur3_seeded(0, argv[2], strlen(argv[2])); + hashed = murmur3_seeded_v2(0, argv[2], strlen(argv[2])); + printf("Murmur3 Hash with seed=0:0x%08x\n", hashed); + } + + if (!strcmp(argv[1], "get_murmur3_seven_highbit")) { + uint32_t hashed; + hashed = murmur3_seeded_v2(0, "\x99\xaa\xbb\xcc\xdd\xee\xff", 7); printf("Murmur3 Hash with seed=0:0x%08x\n", hashed); } diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh index b567383eb8..c8d84ab606 100755 --- a/t/t0095-bloom.sh +++ b/t/t0095-bloom.sh @@ -29,6 +29,14 @@ test_expect_success 'compute unseeded murmur3 hash for test string 2' ' test_cmp expect actual ' +test_expect_success 'compute unseeded murmur3 hash for test string 3' ' + cat >expect <<-\EOF && + Murmur3 Hash with seed=0:0xa183ccfd + EOF + test-tool bloom get_murmur3_seven_highbit >actual && + test_cmp expect actual +' + test_expect_success 'compute bloom key for empty string' ' cat >expect <<-\EOF && Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004| diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 0aa9719934..1d0e11d7c1 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -456,4 +456,109 @@ test_expect_success 'version 1 changed-path used when version 1 requested' ' ) ' +test_expect_success 'version 1 changed-path not used when version 2 requested' ' + ( + cd highbit1 && + git config --add commitgraph.changedPathsVersion 2 && + test_bloom_filters_not_used "-- another$CENT" + ) +' + +test_expect_success 'version 1 changed-path used when autodetect requested' ' + ( + cd highbit1 && + git config --add commitgraph.changedPathsVersion -1 && + test_bloom_filters_used "-- another$CENT" + ) +' + +test_expect_success 'when writing another commit graph, preserve existing version 1 of changed-path' ' + test_commit -C highbit1 c1double "$CENT$CENT" && + git -C highbit1 commit-graph write --reachable --changed-paths && + ( + cd highbit1 && + git config --add commitgraph.changedPathsVersion -1 && + echo "options: bloom(1,10,7) read_generation_data" >expect && + test-tool read-graph >full && + grep options full >actual && + test_cmp expect actual + ) +' + +test_expect_success 'set up repo with high bit path, version 2 changed-path' ' + git init highbit2 && + git -C highbit2 config --add commitgraph.changedPathsVersion 2 && + test_commit -C highbit2 c2 "$CENT" && + git -C highbit2 commit-graph write --reachable --changed-paths +' + +test_expect_success 'check value of version 2 changed-path' ' + ( + cd highbit2 && + echo "c01f" >expect && + get_first_changed_path_filter >actual && + test_cmp expect actual + ) +' + +test_expect_success 'setup make another commit' ' + # "git log" does not use Bloom filters for root commits - see how, in + # revision.c, rev_compare_tree() (the only code path that eventually calls + # get_bloom_filter()) is only called by try_to_simplify_commit() when the commit + # has one parent. Therefore, make another commit so that we perform the tests on + # a non-root commit. + test_commit -C highbit2 anotherc2 "another$CENT" +' + +test_expect_success 'version 2 changed-path used when version 2 requested' ' + ( + cd highbit2 && + test_bloom_filters_used "-- another$CENT" + ) +' + +test_expect_success 'version 2 changed-path not used when version 1 requested' ' + ( + cd highbit2 && + git config --add commitgraph.changedPathsVersion 1 && + test_bloom_filters_not_used "-- another$CENT" + ) +' + +test_expect_success 'version 2 changed-path used when autodetect requested' ' + ( + cd highbit2 && + git config --add commitgraph.changedPathsVersion -1 && + test_bloom_filters_used "-- another$CENT" + ) +' + +test_expect_success 'when writing another commit graph, preserve existing version 2 of changed-path' ' + test_commit -C highbit2 c2double "$CENT$CENT" && + git -C highbit2 commit-graph write --reachable --changed-paths && + ( + cd highbit2 && + git config --add commitgraph.changedPathsVersion -1 && + echo "options: bloom(2,10,7) read_generation_data" >expect && + test-tool read-graph >full && + grep options full >actual && + test_cmp expect actual + ) +' + +test_expect_success 'when writing commit graph, do not reuse changed-path of another version' ' + git init doublewrite && + test_commit -C doublewrite c "$CENT" && + git -C doublewrite config --add commitgraph.changedPathsVersion 1 && + git -C doublewrite commit-graph write --reachable --changed-paths && + git -C doublewrite config --add commitgraph.changedPathsVersion 2 && + git -C doublewrite commit-graph write --reachable --changed-paths && + ( + cd doublewrite && + echo "c01f" >expect && + get_first_changed_path_filter >actual && + test_cmp expect actual + ) +' + test_done From patchwork Wed Aug 30 16:43:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370375 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CC04C83F23 for ; Wed, 30 Aug 2023 18:29:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231254AbjH3S3W (ORCPT ); Wed, 30 Aug 2023 14:29:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343749AbjH3QoS (ORCPT ); Wed, 30 Aug 2023 12:44:18 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE67019A for ; Wed, 30 Aug 2023 09:44:15 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-58fc448ee4fso78587327b3.2 for ; Wed, 30 Aug 2023 09:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413855; x=1694018655; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8C8CUra17YXt4XMBvPMnKmKa95IXpXdnkc3dMK/9kuI=; b=tHv2NhmihkkkOceQAgNPdFbQZenTOHmfXAc5J37k2eIcgHTRIks1zDVUxuM5o5ZIMm 88zZNJDo/mQBCmum2Ox0v99mmWsu8/pTbIf5Y2jlkeqJjUz/WMuOJvTsNpadG99j1Uxt 7mbFGMypwhzXLE/w3m1LtpB8z3nzOgh7Db+x1GPwVbex9x+jzf90eVs/EnTebWOyOqOj L38Nh7HbF0E2eQgApp2rppbqXYNBgg+BW/9sfDPN1Dfeg6dMwvvDYVLUgQjOwkJx1hiB tD/WF4p8JntC40nWwFhEoQXWUqRvZLB/CkVRCfoyOO2DcPRZR8p6NDOoB39V3aEQmn/g zcxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413855; x=1694018655; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8C8CUra17YXt4XMBvPMnKmKa95IXpXdnkc3dMK/9kuI=; b=NNpluswxcGIIK8kIArw0FYgJcnWkhRtd2ZlWMg/J0znB7EG2GzLtKFfDA0hULE0dBz urOfh7cmTgXCaBY+Nd7NbT+hvxElMrnN3S/d4boxM9/EYRh9JAllWB50QHxcIk0llmCF KgyEBGTjA6MoEr966ktH1I4UNjVo8aK1An8ZhrdvQOZeyTONT1xFrN4dwE/V/lizLnqr g56wvu+Zzb1WYJ5SQuKXVL3cc4tktAQRzEgonReaRqZfiblyFcF4fJmkfahNOWF/WGuF xduzjoyCIW4sTjs6b8veS40pD8fmTld07Q72JjU9M7ywHdf9RCaE2vw1e3DwNxxwTmhu v2PQ== X-Gm-Message-State: AOJu0Yxo8Mu088LMo5ojUdLT5WkokBt/E0MgZq1GFc120JkQrCrGykk7 apUo5AfCd7phVLYZR5CV6ZnYnyCTR8QAGCYZtRJX1LWwwfErCV+84/Ghddkbqbu0vA6SAduADzm VXmJvQqQyVatGrjPEGB7E21zNOzMRWlqdOwd+aPvi0pHojXzwt70xJJ5Ug83iu2hKEQMMHnCzUw ze X-Google-Smtp-Source: AGHT+IFfQv+DTPaoDOfsszO/DCxj54LHO3FSXkEmVk6EfoZS3lyq8ES5yplc3aS4qizXHDJyEzbFqhFNfzNeB/HyAbyc X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a81:a9c5:0:b0:584:41b7:30e7 with SMTP id g188-20020a81a9c5000000b0058441b730e7mr82486ywh.0.1693413855169; Wed, 30 Aug 2023 09:44:15 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:48 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: Subject: [PATCH v2 08/15] bloom: annotate filters with hash version From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau In subsequent commits, we will want to load existing Bloom filters out of a commit-graph, even when the hash version they were computed with does not match the value of `commitGraph.changedPathVersion`. In order to differentiate between the two, add a "version" field to each Bloom filter. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- bloom.c | 11 ++++++++--- bloom.h | 1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/bloom.c b/bloom.c index ebef5cfd2f..9b6a30f6f6 100644 --- a/bloom.c +++ b/bloom.c @@ -55,6 +55,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g, filter->data = (unsigned char *)(g->chunk_bloom_data + sizeof(unsigned char) * start_index + BLOOMDATA_CHUNK_HEADER_SIZE); + filter->version = g->bloom_filter_settings->hash_version; return 1; } @@ -240,11 +241,13 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, return strcmp(e1->path, e2->path); } -static void init_truncated_large_filter(struct bloom_filter *filter) +static void init_truncated_large_filter(struct bloom_filter *filter, + int version) { filter->data = xmalloc(1); filter->data[0] = 0xFF; filter->len = 1; + filter->version = version; } struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, @@ -329,13 +332,15 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } if (hashmap_get_size(&pathmap) > settings->max_changed_paths) { - init_truncated_large_filter(filter); + init_truncated_large_filter(filter, + settings->hash_version); if (computed) *computed |= BLOOM_TRUNC_LARGE; goto cleanup; } filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + filter->version = settings->hash_version; if (!filter->len) { if (computed) *computed |= BLOOM_TRUNC_EMPTY; @@ -355,7 +360,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, } else { for (i = 0; i < diff_queued_diff.nr; i++) diff_free_filepair(diff_queued_diff.queue[i]); - init_truncated_large_filter(filter); + init_truncated_large_filter(filter, settings->hash_version); if (computed) *computed |= BLOOM_TRUNC_LARGE; diff --git a/bloom.h b/bloom.h index 138d57a86b..330a140520 100644 --- a/bloom.h +++ b/bloom.h @@ -55,6 +55,7 @@ struct bloom_filter_settings { struct bloom_filter { unsigned char *data; size_t len; + int version; }; /* From patchwork Wed Aug 30 16:43:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370380 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3D7BC83F1D for ; Wed, 30 Aug 2023 18:29:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231889AbjH3S3d (ORCPT ); Wed, 30 Aug 2023 14:29:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343750AbjH3QoU (ORCPT ); Wed, 30 Aug 2023 12:44:20 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DD2019A for ; Wed, 30 Aug 2023 09:44:18 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-594e1154756so64397087b3.2 for ; Wed, 30 Aug 2023 09:44:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413857; x=1694018657; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4Jp/c5phBFct3TYWG4B5bYT8Dn2fmoiJY9/aSuGAeWE=; b=5juOe1Rh4Z8dgU33yWSW4Nm3WHzd9oTe4U3EGNVsq5Iwf737tus/A9UdLKqKlWLMgO tSbNvoWluG9haVCDK3rt88RLlNGYlRG333luk7fZnJ9g/QNyJ/XTqqL/yW2YFjowMBz0 ROOAUNCEOelgaRMsst7d8EODQjYHMvxVGi/NIPup+RTJwYpizryVOeEKb6oYhDpbACU8 Epv9qj0ietIdzX9ZM86W325LK9wvO6NMpjs3Zc4MbnogDVBQtWV3ZxlpFxc5UToSvTIF ezTYtGL4+7unoPBnOnhbBRqNjyXygYeeJXNKiDkNT/uwAzGRoPbfByyfX8t8LrmQYuXM SOWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413857; x=1694018657; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4Jp/c5phBFct3TYWG4B5bYT8Dn2fmoiJY9/aSuGAeWE=; b=HH7UXG+BVQGDX4gOKO+/uCsx3zjOrIENF1OdXPUL4FeDRzvmIOIpnYqUo4hld/gU2K kwpKN61ys32GgRt4vbva8+ID6zBJ2LVD+7pfaX2OOa7xNMJQBI/5zfNSm/5thmxENe9i GyeDMKBEGeZAb6Z/UASsF3DK+d1df6DxSqshs628ydFImz9t9Ft2otvcQYvb1CVIZcV/ PkO4IFWnICUoofqNdBCpXpdJK5DsjT19uEzMWKvr9Y36QtJJXBpcciECQXEDo0snPuQp V3kPsm1dPLuswPfuH3joPKM3iayxmx/yJLj6CsFbWij1ehKM59jwmlvs4eQfPbCTBLED DQUg== X-Gm-Message-State: AOJu0Ywo1rwxykQ9UcI0RuCXSMXjx1GH8MCs1vXW+9145/bLPvtW/WkB kORckiPOO9WMymXAWcobdMUIzizO3Py5u+D4dewjpRuszX6o71S2oc7d70B5kWTcKUCHadU6XlS mwt1u0Fm6Ph8RZRcXW0F4Vldkr3A4Du52jrynFnFg94lEIFNa51dZtK/au2dEj1jZGYujq/FKqw 3g X-Google-Smtp-Source: AGHT+IGd21/ocXpGOm/J/mwsf8vBJLlVeQNrnTJ18jKT4kd2ve1J21UDNL8gEUpEJeHYY3khadMaeMlOps9cwjmCOBSR X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a81:b188:0:b0:586:5d03:67c8 with SMTP id p130-20020a81b188000000b005865d0367c8mr89440ywh.3.1693413857268; Wed, 30 Aug 2023 09:44:17 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:49 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <3de6cd8460de51c3d866b0df9c219dd985f7433f.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 09/15] bloom: prepare to discard incompatible Bloom filters From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau Callers use the inline `get_bloom_filter()` implementation as a thin wrapper around `get_or_compute_bloom_filter()`. The former calls the latter with a value of "0" for `compute_if_not_present`, making `get_bloom_filter()` the default read-only path for fetching an existing Bloom filter. Callers expect the value returned from `get_bloom_filter()` is usable, that is that it's compatible with the configured value corresponding to `commitGraph.changedPathsVersion`. This is OK, since the commit-graph machinery only initializes its BDAT chunk (thereby enabling it to service Bloom filter queries) when the Bloom filter hash_version is compatible with our settings. So any value returned by `get_bloom_filter()` is trivially useable. However, subsequent commits will load the BDAT chunk even when the Bloom filters are built with incompatible hash versions. Prepare to handle this by teaching `get_bloom_filter()` to discard filters that are incompatible with the configured hash version. Callers who wish to read incompatible filters (e.g., for upgrading filters from v1 to v2) may use the lower level routine, `get_or_compute_bloom_filter()`. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- bloom.c | 20 +++++++++++++++++++- bloom.h | 20 ++++++++++++++++++-- 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/bloom.c b/bloom.c index 9b6a30f6f6..739fa093ba 100644 --- a/bloom.c +++ b/bloom.c @@ -250,6 +250,23 @@ static void init_truncated_large_filter(struct bloom_filter *filter, filter->version = version; } +struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) +{ + struct bloom_filter *filter; + int hash_version; + + filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL); + if (!filter) + return NULL; + + prepare_repo_settings(r); + hash_version = r->settings.commit_graph_changed_paths_version; + + if (!(hash_version == -1 || hash_version == filter->version)) + return NULL; /* unusable filter */ + return filter; +} + struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, struct commit *c, int compute_if_not_present, @@ -275,7 +292,8 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter, graph_pos); } - if (filter->data && filter->len) + if ((filter->data && filter->len) && + (!settings || settings->hash_version == filter->version)) return filter; if (!compute_if_not_present) return NULL; diff --git a/bloom.h b/bloom.h index 330a140520..bfe389e29c 100644 --- a/bloom.h +++ b/bloom.h @@ -110,8 +110,24 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, const struct bloom_filter_settings *settings, enum bloom_filter_computed *computed); -#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \ - (r), (c), 0, NULL, NULL) +/* + * Find the Bloom filter associated with the given commit "c". + * + * If any of the following are true + * + * - the repository does not have a commit-graph, or + * - the repository disables reading from the commit-graph, or + * - the given commit does not have a Bloom filter computed, or + * - there is a Bloom filter for commit "c", but it cannot be read + * because the filter uses an incompatible version of murmur3 + * + * , then `get_bloom_filter()` will return NULL. Otherwise, the corresponding + * Bloom filter will be returned. + * + * For callers who wish to inspect Bloom filters with incompatible hash + * versions, use get_or_compute_bloom_filter(). + */ +struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c); int bloom_filter_contains(const struct bloom_filter *filter, const struct bloom_key *key, From patchwork Wed Aug 30 16:43:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370378 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7507AC83F1F for ; Wed, 30 Aug 2023 18:29:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231516AbjH3S32 (ORCPT ); Wed, 30 Aug 2023 14:29:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343751AbjH3QoW (ORCPT ); Wed, 30 Aug 2023 12:44:22 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE3FA19A for ; Wed, 30 Aug 2023 09:44:19 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d7493fcd829so6783995276.3 for ; Wed, 30 Aug 2023 09:44:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413859; x=1694018659; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jgMmPprfbr6yxuzkSBvh6xzHrlrH6VPE5214ypOafUs=; b=52qzMA7xrFVkty2foWzNE8x5dL8fhIC+042rhcn/7AwLZ+jMBiJbXyGF6c3+VtZBDC 4OvlD++Yl1z/CWD6U7Ojr9tj7IhGeVJGw2riTDQcuOghmNN5Su/HyWI+nTaPYW9bULIn bZuk+/xPZBSnKMYg8MKkPMDX1jZPgApGgTmUfHKw5D4DH5djNLm09OETG3BYIy7Yrplb pepNQQ1z4V1jVEnaugXqnTn8FBojtjiy3oTC5CXOgyEuW5P8k8Pk9fyAy5kVDU/ZuZKi UKZLMrIKoazT6oMRT7y/iylnf0/7hNFY6EWUeJKAIvH7cb21JgZfHim2t/SgvVvyIfEW LiVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413859; x=1694018659; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jgMmPprfbr6yxuzkSBvh6xzHrlrH6VPE5214ypOafUs=; b=f2ty82szlelqjRJKthjDC4/VeYWLejzX+MinWntxCmnI/W0hjZj1EXiSWibgTaQqNF jwtD0RhV0s5toVfi51wkCH0fxcMuX60hFqiSQ6rq+0bplv2Qbu6hzl1FbGCoH4SEJfOw w6sbmCukwo+bWarPcrnEYnjtRJyexzLEys+6OwnB2mRV8qhFi2HwKqYTLUZHPKruWAqe cx4kaC/d0ztHt9tHLNRllsdeoCKGnG4xXku3AwJw/Uo4SH+sIKykSXWbzCVpoBL/Ot9Q 4JZDzjmXpk0eiECnnZV1mV3HP6z2llT8i63Qkm3hvPZ0LiYuLBbrJNTm3ft+LsenREd4 BcQw== X-Gm-Message-State: AOJu0YymeiFsbKHaiRAMsy/Ur3jcCIuZF90LdPo9s775F55UGervZT7N KGpSnjcc6AqHHQkNnj7MUCZpxpA0ghCj+D5ljCwPvkL5D8T+6+5PPOhChzkMyjPpC8oRNVY/++B 70d5XW7gkHq+QKfOR2t7WTYJE0NyMksXUgd3updRhPLhdoqua0C7Mv0qGszpgAWlJqY2rbbZJTq ce X-Google-Smtp-Source: AGHT+IFqiOwfgp8zk8P4TQtj1BQrkFZiwbEq1bAj9Uzn+w8vkx7E4Lz3z37VoQAG3bHZUXTtB3i7cwW+b0+zjYHVDRm/ X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a5b:acd:0:b0:d0f:a0a6:8e87 with SMTP id a13-20020a5b0acd000000b00d0fa0a68e87mr73739ybr.2.1693413859129; Wed, 30 Aug 2023 09:44:19 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:50 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: Subject: [PATCH v2 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau The existing implementation of test_bloom_filters_not_used() asserts that the Bloom filter sub-system has not been initialized at all, by checking for the absence of any data from it from trace2. In the following commit, it will become possible to load Bloom filters without using them (e.g., because `commitGraph.changedPathVersion` is incompatible with the hash version with which the commit-graph's Bloom filters were written). When this is the case, it's possible to initialize the Bloom filter sub-system, while still not using any Bloom filters. When this is the case, check that the data dump from the Bloom sub-system is all zeros, indicating that no filters were used. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- t/t4216-log-bloom.sh | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 1d0e11d7c1..940a71d8b8 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -81,7 +81,19 @@ test_bloom_filters_used () { test_bloom_filters_not_used () { log_args=$1 setup "$log_args" && - ! grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf" && + + if grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf" + then + # if the Bloom filter system is initialized, ensure that no + # filters were used + data="statistics:{" + data="$data\"filter_not_present\":0," + data="$data\"maybe\":0," + data="$data\"definitely_not\":0," + data="$data\"false_positive\":0}" + + grep -q "$data" "$TRASH_DIRECTORY/trace.perf" + fi && test_cmp log_wo_bloom log_w_bloom } From patchwork Wed Aug 30 16:43:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370377 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F08A4C83F1D for ; Wed, 30 Aug 2023 18:29:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231453AbjH3S30 (ORCPT ); Wed, 30 Aug 2023 14:29:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343752AbjH3QoY (ORCPT ); Wed, 30 Aug 2023 12:44:24 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 100F119A for ; Wed, 30 Aug 2023 09:44:22 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-c8f360a07a2so6641534276.2 for ; Wed, 30 Aug 2023 09:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413861; x=1694018661; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Vtj/MIU+MhrfFYA/6mQ0ZFw5TPY8EoiuJeVt72jrv90=; b=2OyjEuaQOwnA3mBz2zVrCNJn0YQnGi+c16AHjL7BiUiGK+P2VJQUYHO8A51h0AIpPa WKWkcxFjjwBDmDpFQL/4b0AU1cNrJ5jYpnqPmWRLQXiPhsti9BeeKJjrt8zT8f/rTaYD vxDMYCEgeZYQRJd1Mq2Me3y8pALkOEzzHHW2fzGVL8D0tQZlfjUcv7KCHpNp7B+Zz1y8 oILSm6B+xnXI1joxR6LD0HiQzFmmcs9sNVeYo8fEmZpWaMnPUxheemGxiX0aAdltpWnl q5jTgJXFtIpSAuRdQayx1XWaty1f/PpZfgNW4iiEYsE+eIFz0CluzYz88tWpKLjU5N5b HriQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413861; x=1694018661; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Vtj/MIU+MhrfFYA/6mQ0ZFw5TPY8EoiuJeVt72jrv90=; b=ZiV9q+mb7mT7aIjdTReAI6fx4lg7bZQjVcJnwEF1XXmBCeaa5+lx8DTuDJFEUpp0RL fOQyG1aXkcc0Ll2rrtwv3LzaFAucj9Ri3LDIMP1FQ35/rPTEUukoyVg1TXysYOf4hPEO TE1BHOUQrftLgoSAvAE2lauuzVa2f7Ao8t5mSPOd5XzRbpz0Lm6h2+cJlg257swQcLRB F3y4cuaaqOKVYrUTg6ZWbqK973uNnZbJRvyme8ET8NYr7JuXspvkwsrrY5MidkPgi0XF lR5XOi8Al6rs2vV+9nEbFcm0njYXe0FmHSebvEeyeQ1E2j5nroJAGL7yx+ZP7B0QGgUH EaSA== X-Gm-Message-State: AOJu0Yz2Ls+LtYvL7+X7tyym/4rytH+adsiR6Rq2bzaQ+O0e9k8gY+TA VlZfzvAHc0u9HCO08bg92qPhOIEaP9QFyomkgOHJ5XeuS0uG/wwxmku0SjK71vuyYOShzpE/blW NH5DDvbC/xvRprpsFvhoRyAF+iTsgQEMApvXWq6ULMaO4dnOfU7tBAvJ3ZoqI5N1M3Am/CDmrMd jT X-Google-Smtp-Source: AGHT+IHbOxy2NvWAySbVNNaNrV+oZ/luSEhGZ7hUkR5nfBvXrndHQbECOw7K6CT/JvpClIWOHYwqtNzWCTQEIPq9XZp/ X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:bc87:0:b0:d73:bcb7:7282 with SMTP id e7-20020a25bc87000000b00d73bcb77282mr79407ybk.8.1693413861114; Wed, 30 Aug 2023 09:44:21 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:51 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: Subject: [PATCH v2 11/15] commit-graph.c: unconditionally load Bloom filters From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3, 2023-08-01), we began ignoring the Bloom data ("BDAT") chunk for commit-graphs whose Bloom filters were computed using a hash version incompatible with the value of `commitGraph.changedPathVersion`. Now that the Bloom API has been hardened to discard these incompatible filters (with the exception of low-level APIs), we can safely load these Bloom filters unconditionally. We no longer want to return early from `graph_read_bloom_data()`, and similarly do not want to set the bloom_settings' `hash_version` field as a side-effect. The latter is because we want to wait until we know which Bloom settings we're using (either the defaults, from the GIT_TEST variables, or from the previous commit-graph layer) before deciding what hash_version to use. If we detect an existing BDAT chunk, we'll infer the rest of the settings (e.g., number of hashes, bits per entry, and maximum number of changed paths) from the earlier graph layer. The hash_version will be inferred from the previous layer as well, unless one has already been specified via configuration. Once all of that is done, we normalize the value of the hash_version to either "1" or "2". Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- commit-graph.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f7322c4fff..665a3edf78 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -317,12 +317,6 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, uint32_t hash_version; hash_version = get_be32(chunk_start); - if (*c->commit_graph_changed_paths_version == -1) { - *c->commit_graph_changed_paths_version = hash_version; - } else if (hash_version != *c->commit_graph_changed_paths_version) { - return 0; - } - g->chunk_bloom_data = chunk_start; g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings)); g->bloom_filter_settings->hash_version = hash_version; @@ -2408,8 +2402,7 @@ int write_commit_graph(struct object_directory *odb, ctx->write_generation_data = (get_configured_generation_version(r) == 2); ctx->num_generation_data_overflows = 0; - bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2 - ? 2 : 1; + bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", @@ -2441,10 +2434,18 @@ int write_commit_graph(struct object_directory *odb, /* We have changed-paths already. Keep them in the next graph */ if (g && g->bloom_filter_settings) { ctx->changed_paths = 1; - ctx->bloom_settings = g->bloom_filter_settings; + + /* don't propagate the hash_version unless unspecified */ + if (bloom_settings.hash_version == -1) + bloom_settings.hash_version = g->bloom_filter_settings->hash_version; + bloom_settings.bits_per_entry = g->bloom_filter_settings->bits_per_entry; + bloom_settings.num_hashes = g->bloom_filter_settings->num_hashes; + bloom_settings.max_changed_paths = g->bloom_filter_settings->max_changed_paths; } } + bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1; + if (ctx->split) { struct commit_graph *g = ctx->r->objects->commit_graph; From patchwork Wed Aug 30 16:43:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370381 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BF97C83F29 for ; Wed, 30 Aug 2023 18:29:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231936AbjH3S3e (ORCPT ); Wed, 30 Aug 2023 14:29:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343753AbjH3Qo0 (ORCPT ); Wed, 30 Aug 2023 12:44:26 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE90C19A for ; Wed, 30 Aug 2023 09:44:23 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-58fc448ee4fso78589367b3.2 for ; Wed, 30 Aug 2023 09:44:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413863; x=1694018663; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gHuRnI8rdlErjK9WrL45txbbQojHZ2GJn7LZ5ZUyjaE=; b=nZmUi9rmUE75u2KKZS8bLKISjhhRCsM13hvWZplX5zDbk1mYZOayfR9zloi/kjDlfh ziCPMm+GTXXwmoPZjAq5z/1ThdLcyy0bQqUwTTGI67TC9t9LVSti8sFpsWEd6RjbATNU WIMLG80rnDR/CnDtXtbftrvNxi1sgopJKzQtcR1zk5kSeZdBjJE+668Sa9Odk9uyqMEk Sm3+FpafOLdhmSAe4boUaXf/7C/mEPu5qfAjxXLeYDmsYr8t4mdtgTg9gRkdmgfMQNhQ Whh7B5VImAlrcZ6eumM3GaBNE8P8LsoRQP3xYR1359WmGnyveBFTaNkmw6TP/bQ68Eta AuaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413863; x=1694018663; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gHuRnI8rdlErjK9WrL45txbbQojHZ2GJn7LZ5ZUyjaE=; b=NwiIg+hX/3ci6AsgoOEtCx/ZnVWWFyKd0RdatAXd6DxH2zyDiPJnXcwywxjrRmq5qg frArXgmbDGqNdQnzGpMJGOIBZpEQdNUwGz7inHivPazzHGE2Zm0zh9sRSZ9hu1Rt7Evq 0gLWib5Ran8QRBb1uCaJbO6+aUJaFj3Yy4a9GonugGSkjNk5u+R1v7luwZr6oTIlAOv2 CTA1iUyzI3kZzYR8BXScIAowQ21yhm3tnPl1AtVKg/zTlTJjcyYwcccYIHy+axrpyL8p NQDTLRrUVL+mPD7oRQUtmjtxzaXGfcSBDpJKrLUxFF+JiYscPF9Awi4PkRlba0qUFcRa uAcA== X-Gm-Message-State: AOJu0Yx0bycMVoTUHP837oIljsfJXhYBUPDtEVT3cw5MSbcgtGTK4/GZ QERofFjsUaz4B+FeLUR5BlXUAGbjA7hqOAUEvYZOUuS0QXSX/IwSPlXebI511xgF21LSgWTCURv /xe445ta2ytSxHNrsZTdp0c8T4oOtY4ol3HaX23F50wc8HXthBDhg3haA1IQsGHAUaH02VaXNOO dO X-Google-Smtp-Source: AGHT+IHymCuw2hpm06AT5zfvYtrd/QrbcmYpvmaTc3jErYv8t/ddPzHkWMODwwAojcGWIk6TlAkS2XH+ARC3XerTrE8Q X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a81:bd07:0:b0:586:50cf:e13f with SMTP id b7-20020a81bd07000000b0058650cfe13fmr79306ywi.1.1693413862954; Wed, 30 Aug 2023 09:44:22 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:52 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <4d57f5185400fd44956b596fe8c32cbd1ba185de.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau The `graph_read_bloom_data_context` struct was introduced in an earlier commit in order to pass pointers to the commit-graph and changed-path Bloom filter version when reading the BDAT chunk. The previous commit no longer writes through the changed_paths_version pointer, making the surrounding context structure unnecessary. Drop it and pass a pointer to the commit-graph directly when reading the BDAT chunk. Noticed-by: Jonathan Tan Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- commit-graph.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 665a3edf78..a8e33c0739 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -304,16 +304,10 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, return 0; } -struct graph_read_bloom_data_context { - struct commit_graph *g; - int *commit_graph_changed_paths_version; -}; - static int graph_read_bloom_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { - struct graph_read_bloom_data_context *c = data; - struct commit_graph *g = c->g; + struct commit_graph *g = data; uint32_t hash_version; hash_version = get_be32(chunk_start); @@ -405,14 +399,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, } if (s->commit_graph_changed_paths_version) { - struct graph_read_bloom_data_context context = { - .g = graph, - .commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version - }; pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, &graph->chunk_bloom_indexes); read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, - graph_read_bloom_data, &context); + graph_read_bloom_data, graph); } if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) { From patchwork Wed Aug 30 16:43:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370372 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49D16C83F01 for ; Wed, 30 Aug 2023 18:29:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230084AbjH3S3M (ORCPT ); Wed, 30 Aug 2023 14:29:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343754AbjH3Qo2 (ORCPT ); Wed, 30 Aug 2023 12:44:28 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 827DB19A for ; Wed, 30 Aug 2023 09:44:25 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-58d799aa369so80815517b3.0 for ; Wed, 30 Aug 2023 09:44:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413865; x=1694018665; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9ZhwDdbApNwGHpX7yhwEnVwUwJLYUxC1CUiIoORbjzE=; b=ej8rxz41keBDUcPaIccV6nxOmjT5ixp6Fbg6ipWa268q1l/hrs09wzCsu9zzIF5Oq2 qrLZGcuYhd2L21UU/jHfZQn0yOfJXCszmUu6fKSG7IwOe3/YWMGxfPFaYgFjePj56Ueg CNkJLT1r5b3GHP/QZFJdYfv7C+bRqQtj+2nOqCo5eFtbn14k2nWVk7N/8xO3KXTmcapz P5dhAlbqgl2re9XpNGVaSAyknRa3OHGoW7wTRum0C9wm6xyWdSLD03YwPgQ84Jm7CDfT UlyxRJEi8jxHSfl9acMu9PYn7kA7pRL2wZRX5RC0+OlXJw4zW5jHcKFDX0Sk0pS+L5EF c+Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413865; x=1694018665; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9ZhwDdbApNwGHpX7yhwEnVwUwJLYUxC1CUiIoORbjzE=; b=aSHorlXspp1VLPxEeNo30vdxqfYomW4xm5PiP1YaeylIELZ3gZesn7Vqk86mkfEC8+ jIMZK8u1DFoa7qmlcD69pu5vwP0TcIPQBbY48Q+bV8SoUbdD59wVW+Q85obk7ikxB/Tz j4cHqVvRNjsWZIgP9vgdyl2ksulrxfiLEe6avIaQR0H9qlZNGJOkn5KVp8lwvxiiGJ+b 2uu3W/i4ymfNs+PsIqS92pl4pQvttwHxh2slVNggOp7YXMmzJWUPsYCsbg/QqT7rU8iK o7JjZYCfiu6qgr+XjuiZOoiBv2X90TQWURny6KDHRTqEUCP5vzw57KR0MzrT/StjtoBw 2vlA== X-Gm-Message-State: AOJu0YwfJIEI9mXbM3T05e4VOn4+UqhyqxSh2N8qJvl09UBUZwn4Nur3 UGueXEGCq4Or8NTpDpjVR+U8NkB3a1BbrsqXLquUUUTVfhpq89ES7WPHGj9fAUfQI0DS7bFvbfF YzeqDPLv6skekLu2DyxkXnlB2ekohzjFKwMYt4K7ciPZ6c6NvJMmZO97ANYxsLSINyN2WZ37WhC ii X-Google-Smtp-Source: AGHT+IGZPxMUSaqRofR2vzyTueLkGLm1rO+VfNKOet+PsyXknczyQsfk0qXEScnVbYAduiaYK4FlfXE2uP1E1w9rt1t7 X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:c542:0:b0:d7b:9830:c172 with SMTP id v63-20020a25c542000000b00d7b9830c172mr79342ybe.0.1693413864781; Wed, 30 Aug 2023 09:44:24 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:53 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: Subject: [PATCH v2 13/15] object.h: fix mis-aligned flag bits table From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau Bit position 23 is one column too far to the left. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- object.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/object.h b/object.h index 114d45954d..db25714b4e 100644 --- a/object.h +++ b/object.h @@ -62,7 +62,7 @@ void object_array_init(struct object_array *array); /* * object flag allocation: - * revision.h: 0---------10 15 23------27 + * revision.h: 0---------10 15 23------27 * fetch-pack.c: 01 67 * negotiator/default.c: 2--5 * walker.c: 0-2 From patchwork Wed Aug 30 16:43:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370386 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 745F8C6FA8F for ; Wed, 30 Aug 2023 18:29:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232654AbjH3S3y (ORCPT ); Wed, 30 Aug 2023 14:29:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343755AbjH3Qoc (ORCPT ); Wed, 30 Aug 2023 12:44:32 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D453119A for ; Wed, 30 Aug 2023 09:44:28 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-59263889eacso80178567b3.3 for ; Wed, 30 Aug 2023 09:44:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413868; x=1694018668; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=MclijXimSCpR5Nlc7plEF634UxdBry5i6rqo/swZo/A=; b=j/UXK2XB77Wrcz/0EAalIFdddNTbyKqOCFF68Y3wy674UCnjCP01vCyT50o7GMf3de Ot3mck3cX7gTghrFbnOsSETA0nBROl7arxOLjLYhJTKqMJxdPIAwF2gzKCPnlxs9bYha JMmggvIEr/iziNqDl36v44x2WxJwwloK83ZIt7gzHB4VcLWgLajt0p8mxlTDlSNQ++Jv pUA+RtTiwAkmjq6yJfdrQsFBmUFOSEQNvi9ULiHRuXMB71Zy4yMI5+iXJ1JRh/vgaXaL 81BFOe48E2c8c7PpT4Nxpq58FEg2HVYvH9ohjh3yaTk0CY/mtBLM0mQe+pMsVRRiW2G9 8suQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413868; x=1694018668; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=MclijXimSCpR5Nlc7plEF634UxdBry5i6rqo/swZo/A=; b=ZJwdjaaewp2VGEtrylxgB19UJ1eoPQGTNs25lwTCe6FHhVL9HfzvdHrEDQss4LlxSQ B8rGCrBLdwdyU8SBtixIs7aMiDAZLDQA/MfezZg18AoUj2rkbeGU+6uGiSwhc/wAT0K8 n3sAWknzXO9CmklIxBCl3d1rto6tyYKjb5H5LnXLWiZomjDM8a8+1OCf3eTTMYKh0BsI bK1bMxYFFGqMcv69VHbIuI1ZDzGLG2Pfwy/NKqNNd3xhg2SFftouDEqEIb12BGj11pe4 0Bzab0CPmDf8Uvr4Mwtga/BSF0SfMmi0/wuQLAAOHDcVo86nebCm0XVaVd2+YDBs2tTf xH5Q== X-Gm-Message-State: AOJu0YxiKNymJgitz8Gw+2s4Evb7G3PJWR/DHjwJxM94lXr1cb8oGpWL UNLxGWFT0T0pMsAZ+IPXfBK+9nbXcjdc8n7LkrN7fXpLSGLygNQhkG8uzpapJ/oBaOnHFrZjMC2 CZxI12wBEiJRvk3bRFq/r+ulCp7ueMR6qA//s8HfmSByOLNuOIqp22ebvSTvzxaVHkBCRFHPLoS 6z X-Google-Smtp-Source: AGHT+IEKs0gZl2Qeu0y9GTZbi1YLMnGfHoWXh/LZOTZAMfQ2/kU2mV69q2BneiVXypdguzkrIgq+xOlISmoSMDGbKuK4 X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a25:258f:0:b0:d7a:bfcf:2d7 with SMTP id l137-20020a25258f000000b00d7abfcf02d7mr78766ybl.6.1693413868104; Wed, 30 Aug 2023 09:44:28 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:54 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <05357f9533d6ca6dd51cce3cd399fbbcedfcf93a.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 14/15] commit-graph: reuse existing Bloom filters where possible From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3, 2023-08-01), a bug was described where it's possible for Git to produce non-murmur3 hashes when the platform's "char" type is signed, and there are paths with characters whose highest bit is set (i.e. all characters >= 0x80). That patch allows the caller to control which version of Bloom filters are read and written. However, even on platforms with a signed "char" type, it is possible to reuse existing Bloom filters if and only if there are no changed paths in any commit's first parent tree-diff whose characters have their highest bit set. When this is the case, we can reuse the existing filter without having to compute a new one. This is done by marking trees which are known to have (or not have) any such paths. When a commit's root tree is verified to not have any such paths, we mark it as such and declare that the commit's Bloom filter is reusable. Note that this heuristic only goes in one direction. If neither a commit nor its first parent have any paths in their trees with non-ASCII characters, then we know for certain that a path with non-ASCII characters will not appear in a tree-diff against that commit's first parent. The reverse isn't necessarily true: just because the tree-diff doesn't contain any such paths does not imply that no such paths exist in either tree. So we end up recomputing some Bloom filters that we don't strictly have to (i.e. their bits are the same no matter which version of murmur3 we use). But culling these out is impossible, since we'd have to perform the full tree-diff, which is the same effort as computing the Bloom filter from scratch. But because we can cache our results in each tree's flag bits, we can often avoid recomputing many filters, thereby reducing the time it takes to run $ git commit-graph write --changed-paths --reachable when upgrading from v1 to v2 Bloom filters. To benchmark this, let's generate a commit-graph in linux.git with v1 changed-paths in generation order[^1]: $ git clone git@github.com:torvalds/linux.git $ cd linux $ git commit-graph write --reachable --changed-paths $ graph=".git/objects/info/commit-graph" $ mv $graph{,.bak} Then let's time how long it takes to go from v1 to v2 filters (with and without the upgrade path enabled), resetting the state of the commit-graph each time: $ git config commitGraph.changedPathsVersion 2 $ hyperfine -p 'cp -f $graph.bak $graph' -L v 0,1 \ 'GIT_TEST_UPGRADE_BLOOM_FILTERS={v} git.compile commit-graph write --reachable --changed-paths' On linux.git (where there aren't any non-ASCII paths), the timings indicate that this patch represents a speed-up over recomputing all Bloom filters from scratch: Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] Range (min … max): 124.621 s … 125.227 s 3 runs Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] Range (min … max): 79.112 s … 79.437 s 3 runs Summary 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' On git.git, we do have some non-ASCII paths, giving us a more modest improvement from 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. On my machine, the stats for git.git are: - 8,285 Bloom filters computed from scratch - 10 Bloom filters generated as empty - 4 Bloom filters generated as truncated due to too many changed paths - 65,114 Bloom filters were reused when transitioning from v1 to v2. [^1]: Note that this is is important, since `--stdin-packs` or `--stdin-commits` orders commits in the commit-graph by their pack position (with `--stdin-packs`) or in the raw input (with `--stdin-commits`). Since we compute Bloom filters in the same order that commits appear in the graph, we must see a commit's (first) parent before we process the commit itself. This is only guaranteed to happen when sorting commits by their generation number. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- bloom.c | 90 ++++++++++++++++++++++++++++++++++++++++++-- bloom.h | 1 + commit-graph.c | 5 +++ object.h | 1 + t/t4216-log-bloom.sh | 35 ++++++++++++++++- 5 files changed, 127 insertions(+), 5 deletions(-) diff --git a/bloom.c b/bloom.c index 739fa093ba..24dd874e46 100644 --- a/bloom.c +++ b/bloom.c @@ -7,6 +7,9 @@ #include "commit-graph.h" #include "commit.h" #include "commit-slab.h" +#include "tree.h" +#include "tree-walk.h" +#include "config.h" define_commit_slab(bloom_filter_slab, struct bloom_filter); @@ -250,6 +253,73 @@ static void init_truncated_large_filter(struct bloom_filter *filter, filter->version = version; } +#define VISITED (1u<<21) +#define HIGH_BITS (1u<<22) + +static int has_entries_with_high_bit(struct repository *r, struct tree *t) +{ + if (parse_tree(t)) + return 1; + + if (!(t->object.flags & VISITED)) { + struct tree_desc desc; + struct name_entry entry; + + init_tree_desc(&desc, t->buffer, t->size); + while (tree_entry(&desc, &entry)) { + size_t i; + for (i = 0; i < entry.pathlen; i++) { + if (entry.path[i] & 0x80) { + t->object.flags |= HIGH_BITS; + goto done; + } + } + + if (S_ISDIR(entry.mode)) { + struct tree *sub = lookup_tree(r, &entry.oid); + if (sub && has_entries_with_high_bit(r, sub)) { + t->object.flags |= HIGH_BITS; + goto done; + } + } + + } + +done: + t->object.flags |= VISITED; + } + + return !!(t->object.flags & HIGH_BITS); +} + +static int commit_tree_has_high_bit_paths(struct repository *r, + struct commit *c) +{ + struct tree *t; + if (repo_parse_commit(r, c)) + return 1; + t = repo_get_commit_tree(r, c); + if (!t) + return 1; + return has_entries_with_high_bit(r, t); +} + +static struct bloom_filter *upgrade_filter(struct repository *r, struct commit *c, + struct bloom_filter *filter, + int hash_version) +{ + struct commit_list *p = c->parents; + if (commit_tree_has_high_bit_paths(r, c)) + return NULL; + + if (p && commit_tree_has_high_bit_paths(r, p->item)) + return NULL; + + filter->version = hash_version; + + return filter; +} + struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) { struct bloom_filter *filter; @@ -292,9 +362,23 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter, graph_pos); } - if ((filter->data && filter->len) && - (!settings || settings->hash_version == filter->version)) - return filter; + if (filter->data && filter->len) { + struct bloom_filter *upgrade; + if (!settings || settings->hash_version == filter->version) + return filter; + + /* version mismatch, see if we can upgrade */ + if (compute_if_not_present && + git_env_bool("GIT_TEST_UPGRADE_BLOOM_FILTERS", 1)) { + upgrade = upgrade_filter(r, c, filter, + settings->hash_version); + if (upgrade) { + if (computed) + *computed |= BLOOM_UPGRADED; + return upgrade; + } + } + } if (!compute_if_not_present) return NULL; diff --git a/bloom.h b/bloom.h index bfe389e29c..e3a9b68905 100644 --- a/bloom.h +++ b/bloom.h @@ -102,6 +102,7 @@ enum bloom_filter_computed { BLOOM_COMPUTED = (1 << 1), BLOOM_TRUNC_LARGE = (1 << 2), BLOOM_TRUNC_EMPTY = (1 << 3), + BLOOM_UPGRADED = (1 << 4), }; struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, diff --git a/commit-graph.c b/commit-graph.c index a8e33c0739..a3473df515 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1045,6 +1045,7 @@ struct write_commit_graph_context { int count_bloom_filter_not_computed; int count_bloom_filter_trunc_empty; int count_bloom_filter_trunc_large; + int count_bloom_filter_upgraded; }; static int write_graph_chunk_fanout(struct hashfile *f, @@ -1651,6 +1652,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte ctx->count_bloom_filter_trunc_empty); trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large", ctx->count_bloom_filter_trunc_large); + trace2_data_intmax("commit-graph", ctx->r, "filter-upgraded", + ctx->count_bloom_filter_upgraded); } static void compute_bloom_filters(struct write_commit_graph_context *ctx) @@ -1692,6 +1695,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->count_bloom_filter_trunc_empty++; if (computed & BLOOM_TRUNC_LARGE) ctx->count_bloom_filter_trunc_large++; + } else if (computed & BLOOM_UPGRADED) { + ctx->count_bloom_filter_upgraded++; } else if (computed & BLOOM_NOT_COMPUTED) ctx->count_bloom_filter_not_computed++; ctx->total_bloom_filter_data_size += filter diff --git a/object.h b/object.h index db25714b4e..2e5e08725f 100644 --- a/object.h +++ b/object.h @@ -75,6 +75,7 @@ void object_array_init(struct object_array *array); * commit-reach.c: 16-----19 * sha1-name.c: 20 * list-objects-filter.c: 21 + * bloom.c: 2122 * builtin/fsck.c: 0--3 * builtin/gc.c: 0 * builtin/index-pack.c: 2021 diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 940a71d8b8..502115abc3 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -217,6 +217,10 @@ test_filter_trunc_large () { grep "\"key\":\"filter-trunc-large\",\"value\":\"$1\"" $2 } +test_filter_upgraded () { + grep "\"key\":\"filter-upgraded\",\"value\":\"$1\"" $2 +} + test_expect_success 'correctly report changes over limit' ' git init limits && ( @@ -561,10 +565,19 @@ test_expect_success 'when writing another commit graph, preserve existing versio test_expect_success 'when writing commit graph, do not reuse changed-path of another version' ' git init doublewrite && test_commit -C doublewrite c "$CENT" && + git -C doublewrite config --add commitgraph.changedPathsVersion 1 && - git -C doublewrite commit-graph write --reachable --changed-paths && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C doublewrite commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + git -C doublewrite config --add commitgraph.changedPathsVersion 2 && - git -C doublewrite commit-graph write --reachable --changed-paths && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C doublewrite commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + ( cd doublewrite && echo "c01f" >expect && @@ -573,4 +586,22 @@ test_expect_success 'when writing commit graph, do not reuse changed-path of ano ) ' +test_expect_success 'when writing commit graph, reuse changed-path of another version where possible' ' + git init upgrade && + + test_commit -C upgrade base no-high-bits && + + git -C upgrade config --add commitgraph.changedPathsVersion 1 && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C upgrade commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + + git -C upgrade config --add commitgraph.changedPathsVersion 2 && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C upgrade commit-graph write --reachable --changed-paths && + test_filter_computed 0 trace2.txt && + test_filter_upgraded 1 trace2.txt +' + test_done From patchwork Wed Aug 30 16:43:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 13370388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4102C83F1E for ; Wed, 30 Aug 2023 18:29:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232795AbjH3S36 (ORCPT ); Wed, 30 Aug 2023 14:29:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343756AbjH3Qod (ORCPT ); Wed, 30 Aug 2023 12:44:33 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC1D719A for ; Wed, 30 Aug 2023 09:44:30 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-59593051dd8so11262477b3.0 for ; Wed, 30 Aug 2023 09:44:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413870; x=1694018670; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b6hmBCSf+jsVLbO+quz67hLokPl4eNQfHa0cceH9kAM=; b=QJrUzFMASxALQq2uO2aEvX3L1T5aJ/AHtfoZdTbAQBZ8eV4KDY5wMjn2q5i34iT+5s tJkCRxIkOzX7EAZVfbgiCtZDFsyKBK/kbv5eVmMO9OhrqiGEmzj9C4uz4F8NZqatOQ4E Faxe80nOf2FRhZqKMY+/xfUzlWFEiJV6fRt3UzNJM902SElLSLT35Yq8CeKQmnozL6rO Yxq+mxKX/Ew6+e8tI1X3kqc2m/DX9wrNAJcC7jbxR/9YqCesGWJkDgjZCuMkyj4kfy0Y eCYVKNhjN35I9FWvUzIgs1FgYy61s3wCMD+cSidlklwBhUQhw7dCgicyvuzYB9c4hYyF cXQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413870; x=1694018670; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b6hmBCSf+jsVLbO+quz67hLokPl4eNQfHa0cceH9kAM=; b=cJERQAeg9cLOf0UR2ukxgsmM8cGRoZIdHryPwtIxvEXQMg3w8CxOExO5y+iNzT7Q5+ 7h4KdUd/KKGn1YbwFJmjI3Tj4UOQM2jVx2ZgiA6+n8Lakh4uxOAmOwYRJx/KzVMNI6yg H0Y6TcCdmHBziF+9sCP+wSHXiOJd0kWarr1TTuCjbPp0dvMYZXYhYtDRqkfeAJVOkSUK mwr+7o7s93dq2DIBR2ohpeDkwNVVMJOb9ZhUp2yF/v0LiTjdon6IzP0t/HMc7xrSGo2O AkB+E3YygA30lwT5IGouXdLiokZWejKzowl3wO8PVp+XREdDseVlHjvjc0RBNlBHubYk OPOQ== X-Gm-Message-State: AOJu0YycKzvr7tMiQM8fj/5f6/eS2m7T0MjTCNguNZGZE11Lcf9Uiu4T LIOzOSTvSZUUvWNPfLv/cFsOi66uV+mpcxfHfkzDj+/EtHmIYgsN3lmDP0DYr8zYaikXhC+cG4C 6qUwVHu4ZZyBFNznwW3LmhgscDQJAh4L9N32Lf0CZdG90cgPPGxjYsTcj3p+bzSVVFsRWGjvzkl Lg X-Google-Smtp-Source: AGHT+IEAJUrrMMflk/sbqVvVWKUO1EcVvsMMGnD5C3Civf8LYGcdTAxVpDrOiJjXki2ZqoD9Xe92+xZHOZCAjqU35U49 X-Received: from jonathantanmy0.svl.corp.google.com ([2620:15c:2d3:204:2899:32d6:b7e3:8e6e]) (user=jonathantanmy job=sendgmr) by 2002:a81:af1b:0:b0:586:a58d:2e24 with SMTP id n27-20020a81af1b000000b00586a58d2e24mr79173ywh.5.1693413870005; Wed, 30 Aug 2023 09:44:30 -0700 (PDT) Date: Wed, 30 Aug 2023 09:43:55 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <58a1d90e6d83506e151bba51fdddcd5cf17b9f29.1693413637.git.jonathantanmy@google.com> Subject: [PATCH v2 15/15] bloom: introduce `deinit_bloom_filters()` From: Jonathan Tan To: git@vger.kernel.org Cc: Taylor Blau , Jonathan Tan , Junio C Hamano , "SZEDER =?utf-8?b?R8OhYm9y?= " Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau After we are done using Bloom filters, we do not currently clean up any memory allocated by the commit slab used to store those filters in the first place. Besides the bloom_filter structures themselves, there is mostly nothing to free() in the first place, since in the read-only path all Bloom filter's `data` members point to a memory mapped region in the commit-graph file itself. But when generating Bloom filters from scratch (or initializing truncated filters) we allocate additional memory to store the filter's data. Keep track of when we need to free() this additional chunk of memory by using an extra pointer `to_free`. Most of the time this will be NULL (indicating that we are representing an existing Bloom filter stored in a memory mapped region). When it is non-NULL, free it before discarding the Bloom filters slab. Suggested-by: Jonathan Tan Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- bloom.c | 16 +++++++++++++++- bloom.h | 3 +++ commit-graph.c | 4 ++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/bloom.c b/bloom.c index 24dd874e46..ff131893cd 100644 --- a/bloom.c +++ b/bloom.c @@ -59,6 +59,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g, sizeof(unsigned char) * start_index + BLOOMDATA_CHUNK_HEADER_SIZE); filter->version = g->bloom_filter_settings->hash_version; + filter->to_free = NULL; return 1; } @@ -231,6 +232,18 @@ void init_bloom_filters(void) init_bloom_filter_slab(&bloom_filters); } +static void free_one_bloom_filter(struct bloom_filter *filter) +{ + if (!filter) + return; + free(filter->to_free); +} + +void deinit_bloom_filters(void) +{ + deep_clear_bloom_filter_slab(&bloom_filters, free_one_bloom_filter); +} + static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, const struct hashmap_entry *eptr, const struct hashmap_entry *entry_or_key, @@ -247,7 +260,7 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, static void init_truncated_large_filter(struct bloom_filter *filter, int version) { - filter->data = xmalloc(1); + filter->data = filter->to_free = xmalloc(1); filter->data[0] = 0xFF; filter->len = 1; filter->version = version; @@ -449,6 +462,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter->len = 1; } CALLOC_ARRAY(filter->data, filter->len); + filter->to_free = filter->data; hashmap_for_each_entry(&pathmap, &iter, e, entry) { struct bloom_key key; diff --git a/bloom.h b/bloom.h index e3a9b68905..d20e64bfbb 100644 --- a/bloom.h +++ b/bloom.h @@ -56,6 +56,8 @@ struct bloom_filter { unsigned char *data; size_t len; int version; + + void *to_free; }; /* @@ -96,6 +98,7 @@ void add_key_to_filter(const struct bloom_key *key, const struct bloom_filter_settings *settings); void init_bloom_filters(void); +void deinit_bloom_filters(void); enum bloom_filter_computed { BLOOM_NOT_COMPUTED = (1 << 0), diff --git a/commit-graph.c b/commit-graph.c index a3473df515..585539da2f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -723,6 +723,7 @@ static void close_commit_graph_one(struct commit_graph *g) void close_commit_graph(struct raw_object_store *o) { close_commit_graph_one(o->commit_graph); + deinit_bloom_filters(); o->commit_graph = NULL; } @@ -2523,6 +2524,9 @@ int write_commit_graph(struct object_directory *odb, res = write_commit_graph_file(ctx); + if (ctx->changed_paths) + deinit_bloom_filters(); + if (ctx->split) mark_commit_graphs(ctx);