From patchwork Mon Feb 28 13:53:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12763333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91E39C433F5 for ; Mon, 28 Feb 2022 13:53:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235958AbiB1Ny3 (ORCPT ); Mon, 28 Feb 2022 08:54:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230155AbiB1NyZ (ORCPT ); Mon, 28 Feb 2022 08:54:25 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09842403C0 for ; Mon, 28 Feb 2022 05:53:47 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id r10so15599944wrp.3 for ; Mon, 28 Feb 2022 05:53:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Wvtv2pR79CcNIaHW+fTGq/M5gGXZdqURJ8pamHD494U=; b=H0Xe1WApma9LkzyivbiZC0XourDR5hbim3a/Bwrs28+3uhRUjGx5Io5Pn6P/DSqCOB tX/LTbInD6QRF26F/IhEccB+dJ5nR73ySq+0Xwr7bQEdB0FTbnRAMqc+R1tGM9kIHeeB 6tEdr1G41YJeXH6Z4JfvmeXfQEIOlgDulHefGJcE7Iea7rEPPycPoQzIowY4Tf5vPnfL gzLMHMCivgTy49bEiiN8TMePNdsvjUv3h0YHK6gSvZTVXEYaCZKs5buXrIcMc2YmtAS9 a+NkhF3dKTCqNgGBb+a9Z54XRekoDZvti10q4AeLgi4G+f75+vXlX0t97/GCkCbKRYSc Ju4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Wvtv2pR79CcNIaHW+fTGq/M5gGXZdqURJ8pamHD494U=; b=qd9n94vTpyUN0cKL0RmQTOx0FVR43ANP0rgTlRCFbO0LVOy8vSVCqnVb2cSzz0HbV+ MhVKS8UKezfm1CzQ8BIE7ZDEBaSp5vEObBtw09IftCw0cfMnrnNF0Ui3N8yPpWUZmnDa mryQP4SZoc475SfaAsbCGTxy6DbtfQBaIWRZo5bf9Luy9awaEwAnONfDtpEZikVzPZI7 qKUzhNL5P48t7h56KQh5CnNk6rtUAugRSJNtUAR35nckV2mIdwc1Jj9UpU0RezNhdudW 22jDhU0P0l4+fA9xscbVkB8UeNb/fsaR4w/K0K5Uwe3YvaurPuESRgqnOszjd08dW6Y0 W8JQ== X-Gm-Message-State: AOAM5309fYnnoq1xLdynkZICXbVMXb7IumCJZwq/QQ/kUjDKCA2IM+Ol FZKTLHrW7XurVxwXPFHF8nX1n6O0bRk= X-Google-Smtp-Source: ABdhPJxkyUMLD4/N8bVpyBwkyVpukB8/kfOJDePR9StOcRIxIeysaCPFRWwcuahSJmiqapSYnwB4bg== X-Received: by 2002:adf:fbd0:0:b0:1e6:8ec3:570 with SMTP id d16-20020adffbd0000000b001e68ec30570mr16810464wrs.396.1646056425432; Mon, 28 Feb 2022 05:53:45 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n7-20020a5d51c7000000b001a38105483dsm10648944wrv.24.2022.02.28.05.53.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 05:53:44 -0800 (PST) Message-Id: <2f89275314b4a2a89a18d14e41602bbe2e1988dc.1646056423.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 28 Feb 2022 13:53:40 +0000 Subject: [PATCH v2 1/4] test-read-graph: include extra post-parse info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, abhishekkumar8222@gmail.com, avarab@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee It can be helpful to verify that the 'struct commit_graph' that results from parsing a commit-graph is correctly structured. The existence of different chunks is not enough to verify that all of the optional features are correctly enabled. Update 'test-tool read-graph' to output an "options:" line that includes information for different parts of the struct commit_graph. In particular, this change demonstrates that the read_generation_data option is never being enabled, which will be fixed in a later change. Signed-off-by: Derrick Stolee --- t/helper/test-read-graph.c | 13 +++++++++++++ t/t4216-log-bloom.sh | 1 + t/t5318-commit-graph.sh | 1 + t/t5324-split-commit-graph.sh | 5 +++++ 4 files changed, 20 insertions(+) diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 75927b2c81d..c3b6b8d1734 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -3,6 +3,7 @@ #include "commit-graph.h" #include "repository.h" #include "object-store.h" +#include "bloom.h" int cmd__read_graph(int argc, const char **argv) { @@ -45,6 +46,18 @@ int cmd__read_graph(int argc, const char **argv) printf(" bloom_data"); printf("\n"); + printf("options:"); + if (graph->bloom_filter_settings) + printf(" bloom(%d,%d,%d)", + graph->bloom_filter_settings->hash_version, + graph->bloom_filter_settings->bits_per_entry, + graph->bloom_filter_settings->num_hashes); + if (graph->read_generation_data) + printf(" read_generation_data"); + if (graph->topo_levels) + printf(" topo_levels"); + printf("\n"); + UNLEAK(graph); return 0; diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index cc3cebf6722..5ed6d2a21c1 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -48,6 +48,7 @@ graph_read_expect () { header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data + options: bloom(1,10,7) EOF test-tool read-graph >actual && test_cmp expect actual diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index edb728f77c3..2b05026cf6d 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -104,6 +104,7 @@ graph_read_expect() { header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL + options: EOF test-tool read-graph >output && test_cmp expect output diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 847b8097109..778fa418de2 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -34,6 +34,7 @@ graph_read_expect() { header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata generation_data + options: EOF test-tool read-graph >output && test_cmp expect output @@ -508,6 +509,7 @@ test_expect_success 'setup repo for mixed generation commit-graph-chain' ' header: 43475048 1 $(test_oid oid_version) 4 1 num_commits: $NUM_SECOND_LAYER_COMMITS chunks: oid_fanout oid_lookup commit_metadata + options: EOF test_cmp expect output && git commit-graph verify && @@ -540,6 +542,7 @@ test_expect_success 'do not write generation data chunk if not present on existi header: 43475048 1 $(test_oid oid_version) 4 2 num_commits: $NUM_THIRD_LAYER_COMMITS chunks: oid_fanout oid_lookup commit_metadata + options: EOF test_cmp expect output && git commit-graph verify @@ -581,6 +584,7 @@ test_expect_success 'do not write generation data chunk if the topmost remaining header: 43475048 1 $(test_oid oid_version) 4 2 num_commits: $(($NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS)) chunks: oid_fanout oid_lookup commit_metadata + options: EOF test_cmp expect output && git commit-graph verify @@ -620,6 +624,7 @@ test_expect_success 'write generation data chunk if topmost remaining layer has header: 43475048 1 $(test_oid oid_version) 5 1 num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS)) chunks: oid_fanout oid_lookup commit_metadata generation_data + options: EOF test_cmp expect output ) From patchwork Mon Feb 28 13:53:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12763334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAC24C433EF for ; Mon, 28 Feb 2022 13:53:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236796AbiB1Nyc (ORCPT ); Mon, 28 Feb 2022 08:54:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234281AbiB1Ny1 (ORCPT ); Mon, 28 Feb 2022 08:54:27 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF48441303 for ; Mon, 28 Feb 2022 05:53:47 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id r187-20020a1c2bc4000000b003810e6b192aso6093739wmr.1 for ; Mon, 28 Feb 2022 05:53:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=oDE4QNW/5aFqi9CIDpcte7lNYk/6m0lvjDtvQYQ2WJQ=; b=UBsQhZnVbkQ+I4xDjljZ5w4UJjRBJPWLwp7ySya/UIWoMQPXFGBC2i5CROchv6sJhs XUCTLHrRei4MXHfXl/Tfi5UKiZeCMNtAOnEN649iGiVj/q+vcjf3r9Ufyu7bHQQgghHH T6rZRcmZbIBeRgg7XBuaRiXyJmu8Mpi1y8IJkm3F7YlwaM6UBbnYq8btsX/C2L5cXUv2 fNfhW9GoKtduTzw5brMLoP+OjtuOz9Zm6KbeBNhTuNFMZJCNAD2LpllHCDd157T5LRdm qr/uISOpx+8jbmKX1ptcJGg2e3aGGeMo8g74TxUQA0oTuJEVcnPJer/eM4EmeHdMpgrE /EKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=oDE4QNW/5aFqi9CIDpcte7lNYk/6m0lvjDtvQYQ2WJQ=; b=V7kXVxHKpO2PdRRNHVbqCjt9JAi8yiXk1N/WK5jYQ7XUoArVl2dwgITlaJvGjR7083 qYEq5B8Rpjnb9bKC/jDAGIl2WpILi5yxeKHuyqI15YO7LEjLUe/DhRbYjU7e+9ba3CUR Ug/cSDnzpIroK/YeYamQDHlro2NTI42oCwEM+307WpBth1qGHHuVuQEQ3OeNF5DtIAqK 0zisgOZjV9RDFeV+DjjdnN3RNlhGqyXMm+LU/PG1hb6Wh8dxG9wwoYm54IKLPjTjYZDE nUmmTqRb4M61pvLkcE8KeAd2b6ptHcraeqfAaFezzLfhwQBpjlmZtKNKNl89DTAdD/6o fnug== X-Gm-Message-State: AOAM531/l3togY4LlBohC1SQDSRmMakSzeYXpuNMw0xjqWoynq/x8qhL DeroTlPmxJqaIYnBOpjyU7w7nfsBZv8= X-Google-Smtp-Source: ABdhPJzNKQD/TtjhwjkQz23cojERR4mBOZv4vk2vZzoDzwER+vA9fn/HSeF0sFv2g4ZeQAfgKp5oMQ== X-Received: by 2002:a05:600c:42c4:b0:380:da32:26e3 with SMTP id j4-20020a05600c42c400b00380da3226e3mr13612240wme.142.1646056426249; Mon, 28 Feb 2022 05:53:46 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v20-20020a7bcb54000000b0037fa63db8aasm14563001wmj.5.2022.02.28.05.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 05:53:45 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 28 Feb 2022 13:53:41 +0000 Subject: [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, abhishekkumar8222@gmail.com, avarab@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When computing the generation numbers for a commit-graph, we compute the corrected commit dates and then check if their offsets from the actual dates is too large to fit in the 32-bit Generation Data chunk. However, there is a problem with this approach: if we have parsed the generation data from the previous commit-graph, then we continue the loop because the corrected commit date is already computed. This causes an under-count in the number of overflow values. It is incorrect to add an increment to num_generation_data_overflows next to this 'continue' statement, because we might start double-counting commits that are computed because of the depth-first search walk from a commit with an earlier OID. Instead, iterate over the full commit list at the end, checking the offsets to see how many grow beyond the maximum value. Update a test in t5318 to use a larger time value, which will help demonstrate this bug in more cases. It still won't hit all potential cases until the next change, which reenables reading generation numbers. Signed-off-by: Derrick Stolee --- commit-graph.c | 10 +++++++--- t/t5318-commit-graph.sh | 4 ++-- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 265c010122e..a19bd96c2ee 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1556,12 +1556,16 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (current->date && current->date > max_corrected_commit_date) max_corrected_commit_date = current->date - 1; commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; - - if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) - ctx->num_generation_data_overflows++; } } } + + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) + ctx->num_generation_data_overflows++; + } stop_progress(&ctx->progress); } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2b05026cf6d..f9bffe38013 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -467,10 +467,10 @@ test_expect_success 'warn on improper hash version' ' ) ' -test_expect_success 'lower layers have overflow chunk' ' +test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' ' cd "$TRASH_DIRECTORY/full" && UNIX_EPOCH_ZERO="@0 +0000" && - FUTURE_DATE="@2147483646 +0000" && + FUTURE_DATE="@4147483646 +0000" && rm -f .git/objects/info/commit-graph && test_commit --date "$FUTURE_DATE" future-1 && test_commit --date "$UNIX_EPOCH_ZERO" old-1 && From patchwork Mon Feb 28 13:53:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12763335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 496D9C433FE for ; Mon, 28 Feb 2022 13:53:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236803AbiB1Nye (ORCPT ); Mon, 28 Feb 2022 08:54:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235617AbiB1Ny1 (ORCPT ); Mon, 28 Feb 2022 08:54:27 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB504419A7 for ; Mon, 28 Feb 2022 05:53:48 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id b5so15629502wrr.2 for ; Mon, 28 Feb 2022 05:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=7Z5V7sOeHr0ggnLaIE0MmZnuQTZSihaETgiBRpj/V28=; b=YzW2cMSfkA4gkVTktIMNe4U053nrDEZPO2+erOh2YLDA5aSFArd+rqeDS2hlWa/AcP uHgCTTHKZtMteUkWpuJwBdXYNs0H7LSFOt3U9zVcPB+zHYQ5O0GxT8+um+fgJ2pjExxB fGr9yKWFh3s6kEH9FQJntXbaynObHIdM7L/j4MDdhrFWOopFCDNmbDvPb4h7qOetw2ti M9HkSrik8X9NaswF5wuqdheELfMoRDdrD1larofSRTNGGzSudJzHmnqpYkclcNjKyJ57 kWGGk9N7z+g4yeZPYqEQLDy+fbhzH7EMsPlqOie2HfJuU0vrXmsRC9SychHV/wZ/oCkC a3mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=7Z5V7sOeHr0ggnLaIE0MmZnuQTZSihaETgiBRpj/V28=; b=xHeohzK6iRwrbUqGdqDrSmtfANKmGogPu4fJSLbjDI/yRW5miaNSqyRJFZlWAsp7gX OtBxONJG+4fa4iaPIVykRpeqb+dvCIgmuMq8ZGQd4OPVHT45KfktlN1YeJT2EQzrYAQa k4xjIlB6BJ/SPafA1L0UZIA5NArh2QTPObhfVVcFEvxI/gzKIUzmBE/ih09dmdNsUr/E FoKIS556XS8ymDAswxtQTLp7eTrq4+455vmri9PLOEdb9Lq56Eb4aoldYC4O+QWS2L0V tUFsrw0cZ/u5K6kcI/825iqxOkbv8JKkWel4lFjPkVdioQTqK3Xb9VEU2+d6Frdmf96h zNAg== X-Gm-Message-State: AOAM532EU9LpXY2AVi7oAUQOA5SkBBG4uxzxf0pofgQiPggN5Sl/GA8b mxQ6WGL6uWw19cqux53sZi2TL6OmAxU= X-Google-Smtp-Source: ABdhPJwcCX6/P0/q/JNmd3kjRzOLF1kFsSuh7pdprqgp41qqH8pBoALsi4aqN7/ieTdpk+LtOJWQgw== X-Received: by 2002:a5d:588a:0:b0:1e8:b478:e74f with SMTP id n10-20020a5d588a000000b001e8b478e74fmr16221150wrf.210.1646056427160; Mon, 28 Feb 2022 05:53:47 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a10-20020a5d53ca000000b001d847134b04sm10454358wrw.97.2022.02.28.05.53.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 05:53:46 -0800 (PST) Message-Id: <5bc6a7660d897ca6c52eabba8fb9ecfb6304dabb.1646056423.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 28 Feb 2022 13:53:42 +0000 Subject: [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, abhishekkumar8222@gmail.com, avarab@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e65 (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110aac (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee --- commit-graph.c | 3 +++ t/t4216-log-bloom.sh | 2 +- t/t5318-commit-graph.sh | 14 ++++++++++++-- t/t5324-split-commit-graph.sh | 9 +++++++-- 4 files changed, 23 insertions(+), 5 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index a19bd96c2ee..8e52bb09552 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -407,6 +407,9 @@ struct commit_graph *parse_commit_graph(struct repository *r, &graph->chunk_generation_data); pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, &graph->chunk_generation_data_overflow); + + if (graph->chunk_generation_data) + graph->read_generation_data = 1; } if (r->settings.commit_graph_read_changed_paths) { diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 5ed6d2a21c1..fa9d32facfb 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -48,7 +48,7 @@ graph_read_expect () { header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data - options: bloom(1,10,7) + options: bloom(1,10,7) read_generation_data EOF test-tool read-graph >actual && test_cmp expect actual diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index f9bffe38013..1afee1c2705 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -100,11 +100,21 @@ graph_read_expect() { OPTIONAL=" $2" NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) fi + GENERATION_VERSION=2 + if test ! -z "$3" + then + GENERATION_VERSION=$3 + fi + OPTIONS= + if test $GENERATION_VERSION -gt 1 + then + OPTIONS=" read_generation_data" + fi cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL - options: + options:$OPTIONS EOF test-tool read-graph >output && test_cmp expect output @@ -498,7 +508,7 @@ test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && git rev-parse commits/8 | git -c commitGraph.generationVersion=1 commit-graph write --stdin-commits && git commit-graph verify >output && - graph_read_expect 9 extra_edges + graph_read_expect 9 extra_edges 1 ' NUM_COMMITS=9 diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 778fa418de2..669ddc645fa 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -30,11 +30,16 @@ graph_read_expect() { then NUM_BASE=$2 fi + OPTIONS= + if test -z "$3" + then + OPTIONS=" read_generation_data" + fi cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata generation_data - options: + options:$OPTIONS EOF test-tool read-graph >output && test_cmp expect output @@ -624,7 +629,7 @@ test_expect_success 'write generation data chunk if topmost remaining layer has header: 43475048 1 $(test_oid oid_version) 5 1 num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS)) chunks: oid_fanout oid_lookup commit_metadata generation_data - options: + options: read_generation_data EOF test_cmp expect output ) From patchwork Mon Feb 28 13:53:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 12763336 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48CA2C433F5 for ; Mon, 28 Feb 2022 13:54:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236822AbiB1Nyg (ORCPT ); Mon, 28 Feb 2022 08:54:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236342AbiB1Ny3 (ORCPT ); Mon, 28 Feb 2022 08:54:29 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9ED9641FB3 for ; Mon, 28 Feb 2022 05:53:49 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id p9so15542498wra.12 for ; Mon, 28 Feb 2022 05:53:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=orx+tEFDgMXxen+rAC9g4k/4vx8G/hRICbZCGxrMaWM=; b=HKoUsLrfEAeOQoF8KxpYI0MrRAZ3p/mLlldLWTBaD6iUbbnnZZhFU2JqYYrr7DaJ52 HR9cSSwJhKsIXaQUf8H5OOUDsEWhq//pXUJVildip6u69cJJUi1NSgvHr/JNytFO2pg4 igi9846mZN7Foe7O3QPNre6mbuPnlJkioqnuRm7alJJ7oc3LWtXsHpAJILJXI5gN8AvG ALSCF5CqmKEtGYJRiH07+hHIrFXhGWhaDhrrQEDESJm9O9ZgU0iPhA9DEvpJW+RFHTj8 VL1HKIgEmqxiUeVQEdxCE/zLh4p3OnI/w1392QCus5yyLmMnldNXOwNUIXxR1cbHYQHE puGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=orx+tEFDgMXxen+rAC9g4k/4vx8G/hRICbZCGxrMaWM=; b=x34Tq6Q8/VDdB7zzssgrkOItOD5017lckzrVBEqHvyh7D41Y0vibJBKsvn95Yb5XVG PUAGJz5sLWaCuolwx29itya0crl0KpcTtudwhASC6fqYzFUTIMD9M9WSQWte9dQ0j9K5 SR5FHTJcfou6zmmNCI09Wp+qOSKImSETAOiJVZ+rA1NZIVzH/H339uHhyb6HWvGHz6OT G+qZtYXD/QEkAQYLzxOnqD20twibFAJTO+KCdM8IH+smHtl0o8DBRlA2kla+YN9deHAm BlC1d3g2rEvgB7pzg6DnEMP0ctToQNLO1L/tXXTj4fTvwUPwKWeIabGTA6rhCOE5NE26 oeOQ== X-Gm-Message-State: AOAM531tGSACW28aFdAXmBFjg3TsSUmJidhK9sIVMAEmaeWhMJ//VRGc QUQNFuyuD2hZBSuMKmIxgxVsP3l1+T4= X-Google-Smtp-Source: ABdhPJzQUVA9jn9H1gxHXDus4L8nGiqXj269qatVmqJnYLGM4sPSgNZq5HHNjArLAwDw/LPqOKntUA== X-Received: by 2002:a05:6000:1846:b0:1ea:7f4d:c56f with SMTP id c6-20020a056000184600b001ea7f4dc56fmr15909170wri.25.1646056428028; Mon, 28 Feb 2022 05:53:48 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g17-20020a5d5411000000b001e688b4ee6asm11532038wrv.35.2022.02.28.05.53.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 05:53:47 -0800 (PST) Message-Id: <193217c71e0aaf3f56a02d9abec6753bd19aba71.1646056423.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 28 Feb 2022 13:53:43 +0000 Subject: [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, abhishekkumar8222@gmail.com, avarab@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The Generation Data Chunk was implemented and tested in e8b63005c (commit-graph: implement generation data chunk, 2021-01-16), but the test was carefully constructed to work on systems with 32-bit dates. Since the corrected commit date offsets still required more than 31 bits, this triggered writing the generation_data_overflow chunk. However, upon closer look, the write_graph_chunk_generation_data_overflow() method writes the offsets to the chunk (as dictated by the format) but fill_commit_graph_info() treats the value in the chunk as if it is the full corrected commit date (not an offset). For some reason, this does not cause an issue when using the FUTURE_DATE specified in t5318-commit-graph.sh, but it does show up as a failure in 'git commit-graph verify' if we increase that FUTURE_DATE to be above four billion. Fix this error and update the test to require 64-bit dates so we can safely use this large value in our test. Signed-off-by: Derrick Stolee --- commit-graph.c | 2 +- t/t5318-commit-graph.sh | 17 +++++++++++++++-- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 8e52bb09552..b86a6a634fe 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -806,7 +806,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, die(_("commit-graph requires overflow generation data but has none")); offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; - graph_data->generation = get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); + graph_data->generation = item->date + get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); } else graph_data->generation = item->date + offset; } else diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 1afee1c2705..f4ffaad661d 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -815,6 +815,15 @@ test_expect_success 'corrupt commit-graph write (missing tree)' ' ) ' +# The remaining tests check timestamps that flow over +# 32-bits. The graph_git_behavior checks can't take a +# prereq, so just stop here if we are on a 32-bit machine. + +if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT +then + test_done +fi + # We test the overflow-related code with the following repo history: # # 4:F - 5:N - 6:U @@ -832,10 +841,10 @@ test_expect_success 'corrupt commit-graph write (missing tree)' ' # The largest offset observed is 2 ^ 31, just large enough to overflow. # -test_expect_success 'set up and verify repo with generation data overflow chunk' ' +test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'set up and verify repo with generation data overflow chunk' ' objdir=".git/objects" && UNIX_EPOCH_ZERO="@0 +0000" && - FUTURE_DATE="@2147483646 +0000" && + FUTURE_DATE="@4000000000 +0000" && test_oid_cache <<-EOF && oid_version sha1:1 oid_version sha256:2 @@ -867,4 +876,8 @@ test_expect_success 'set up and verify repo with generation data overflow chunk' graph_git_behavior 'generation data overflow chunk repo' repo left right +# Do not add tests at the end of this file, unless they require 64-bit +# timestamps, since this portion of the script is only executed when +# time data types have 64 bits. + test_done