From patchwork Mon Feb 1 06:58:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DE7CC433DB for ; Mon, 1 Feb 2021 06:59:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF62964E2B for ; Mon, 1 Feb 2021 06:59:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229630AbhBAG7c (ORCPT ); Mon, 1 Feb 2021 01:59:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229527AbhBAG7b (ORCPT ); Mon, 1 Feb 2021 01:59:31 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5501C061573 for ; Sun, 31 Jan 2021 22:58:50 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id d16so15285408wro.11 for ; Sun, 31 Jan 2021 22:58:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=TEn2MjmptZkcNZR3QfnWESfHn4uIr+R6RuBklnAznUY=; b=AdvfLXfiRf/+0+Mx6yykltxh9AM3iy19XCAL6ukd25KNhyKGeMeS7p+PFCgx/L/LlB +OeDrpTSV1sYYMhld1zipgqnVO2EOF29lRVMVRIRUEJ1nZT2goWuXYX0xkmLpEu3hdUl wyh7PL85rOwfQqCciA2pmP74fBgMRi3h5u7ACVJA33yb2n7KXYQKT4bQdqXYRaT+C+/w qmdwPgVq664VFcB3LNP4ljzXoP7r3W/RW+0FOS45SlSZsfumgT6pe680j/1lYWiyWiff zdUydruzUDcDypuCvS5mQxjKb1NG0985C555BqAGEDCAUvVhE3NPWalc2cTjMSjDUV7j 0DpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=TEn2MjmptZkcNZR3QfnWESfHn4uIr+R6RuBklnAznUY=; b=Hf5OUBeGdWOR+L+tv5btUWVMzfn45yrSmEyXb0pQKMAEN0jEBuiDeBNoarvebz9jGi IS2yO7y5fWgOcSzhrhpkJlz0iOk58VykloC85f7TYxFrOfkEeFMsmpBdy50QsJfN0/KK fr0evxPRXk6sA8JohJoQ40hMUabFz0mnOPCppjsDWIiHzyni2wMetfVGOJMCBgNJVQq7 8+/dfXaW0nnJIiFVfZsZF8WJUbvn63HDLlcCyQCm4UnLZehmaLUP/hyidu0Oy4zC6Aeb dvFXb2kEh/DG5m5/Cml6Y+/c4fwTxqwP/IUwAybTmFe4qd27vFnFBDDkkjbTWvUkLGCV tZJA== X-Gm-Message-State: AOAM533VnXEXNn9d8S749S+Kaymk8Xb37j17cFrU0+Dzm9UXNjVjZWwh +31/iSjYjbWU9rVtqtq85NagudP9kIk= X-Google-Smtp-Source: ABdhPJzvxTekn/GYwIfJtSysU6/KCE6OyRo6X4VyRhGXQLWbMvBE0DOyTj0e3edHozXqKezWoITmZQ== X-Received: by 2002:a05:6000:2:: with SMTP id h2mr16173055wrx.91.1612162729353; Sun, 31 Jan 2021 22:58:49 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u10sm19148668wmj.40.2021.01.31.22.58.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:48 -0800 (PST) Message-Id: <9ac331b63ee609f5380649d3b395f420e57e56f8.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:35 +0000 Subject: [PATCH v7 01/11] commit-graph: fix regression when computing Bloom filters Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Before computing Bloom filters, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since c49c82aa4c (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for a 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_gen_cmp()' falls back to comparing commits by their date when they have equal generation number, and so since c49c82aa4c is purely a date comparison function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. Applying this patch (compared with v2.30.0) speeds up computing Bloom filters by factors ranging from 0.40% to 5.19% on various repositories [1]. So, avoid the useless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. [1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/ Signed-off-by: Abhishek Kumar --- commit-graph.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f3486ec18f1..78de312ccec 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -139,13 +139,17 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } +/* + * Should be used only while writing commit-graph as it compares + * generation value of commits by directly accessing commit-slab. + */ static int commit_gen_cmp(const void *va, const void *vb) { const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + uint32_t generation_a = commit_graph_data_at(a)->generation; + uint32_t generation_b = commit_graph_data_at(b)->generation; /* lower generation commits first */ if (generation_a < generation_b) return -1; From patchwork Mon Feb 1 06:58:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69399C433E0 for ; Mon, 1 Feb 2021 06:59:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1AD6D64E30 for ; Mon, 1 Feb 2021 06:59:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230098AbhBAG7h (ORCPT ); Mon, 1 Feb 2021 01:59:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229629AbhBAG7c (ORCPT ); Mon, 1 Feb 2021 01:59:32 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5ECCC06174A for ; Sun, 31 Jan 2021 22:58:51 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id p15so15280175wrq.8 for ; Sun, 31 Jan 2021 22:58:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=bbJNvBd8SqTjp1bLQAT5B9ts20SlrwOtN5mJW+5EG1Y=; b=E7syTpSwIbDS1Oggv0pYzVaAcrwG71Zt2XPnV6nH/PO/gBfzsQyyQj8dN+7tWOjNpe wkELkRlUeXwOtB6W/0ggPHPbkPWpjMnXU5LjXLAtCub4LXw1b6kGEx/u9eKy3Pd/9vXd OJB8ODOQVK7QZ3cMfFsU7cYytmeReLZ6tjMGqcM6Kej2efDN0BpPAihWPjyvPPkW9qmq 79mxZH+UDMMH1YcbyiHAAjXevX4cjsusGjC4W5dy23yX6yB6gqSb8thXLaqDqURYS+vd meSVFfl3fWpvTK3J64ARyG1fV5b4siKisr0V3lPYF/mOUsQbVJHTumtfHqkLOUQjTTnN vhew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=bbJNvBd8SqTjp1bLQAT5B9ts20SlrwOtN5mJW+5EG1Y=; b=qGG90U6QCDrmAhKAZSysRPA4bA7BFBl0dH3gZ8mqE/707lkPnFV2RMHAT6XszAIX5P Hq66DrG8g0TNyhbBKUyRMoTatIsrG16WMwRE3dIDrgXBATMC3Y4LDLdbDsgom5hiRhca r6jPZVRfLz5kUWQnWC7x4jaSh3Dk8PNnLi3lCMhxEc5SbPhbh56odH0mKn/fyXKuxMDo 2CF2xsdmRol+zD0JocxoDJqclak6+gcJXmnSwbGpwmTZq9iZiGCWfvkfp5Mf7hxvvGDO KoXpdkjWhqzr87Cb6HQ4B7xQDdURE5azSVIyHX35/T4b6Ns+yi+Trrhh0ebagb4iRlV+ g1EA== X-Gm-Message-State: AOAM532XbRCbObZo+07EYV9h11UqyDGLM2wBZwEq80Qn0LUVPCqzhAyv oyiQA71thaOuXJdX8khfm7XbGWQDswg= X-Google-Smtp-Source: ABdhPJxMaMOs0MB+y8BLnNxorYDv0WirVgUqQyqp2EpDdi+AOFnoQ9yq5AWEeF5Nte2I7Q72WjgrhQ== X-Received: by 2002:a05:6000:234:: with SMTP id l20mr16363440wrz.212.1612162730488; Sun, 31 Jan 2021 22:58:50 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z4sm26051978wrw.38.2021.01.31.22.58.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:49 -0800 (PST) Message-Id: <90ca0a1fd697b91180a01d82eeaed54511c9719f.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:36 +0000 Subject: [PATCH v7 02/11] revision: parse parent in indegree_walk_step() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In indegree_walk_step(), we add unvisited parents to the indegree queue. However, parents are not guaranteed to be parsed. As the indegree queue sorts by generation number, let's parse parents before inserting them to ensure the correct priority order. Signed-off-by: Abhishek Kumar --- revision.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/revision.c b/revision.c index 0b5c7231401..5474001331a 100644 --- a/revision.c +++ b/revision.c @@ -3399,6 +3399,9 @@ static void indegree_walk_step(struct rev_info *revs) struct commit *parent = p->item; int *pi = indegree_slab_at(&info->indegree, parent); + if (repo_parse_commit_gently(revs->repo, parent, 1) < 0) + return; + if (*pi) (*pi)++; else From patchwork Mon Feb 1 06:58:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 056EAC433E6 for ; Mon, 1 Feb 2021 06:59:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B26A964E2B for ; Mon, 1 Feb 2021 06:59:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230136AbhBAG7j (ORCPT ); Mon, 1 Feb 2021 01:59:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229527AbhBAG7d (ORCPT ); Mon, 1 Feb 2021 01:59:33 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE307C061756 for ; Sun, 31 Jan 2021 22:58:52 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id c12so15327630wrc.7 for ; Sun, 31 Jan 2021 22:58:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=DwtwXjs+wLi+6q2F8F5zy7dwAI6S0gMGo+qQ9ijVI6A=; b=tEohX5MYCytxhllLhuZj4TZB8nqbNPmymZErczbb9lSpMKpa7ksaDK2hxXx7Gc9EEC PeI9lOinYWto3Gq8kO/R5r2ubwFu+JEZVkg9Yl9LzT7tw/9Q/CFC3xdRqe+GpjZlzP5t TU0Jmb98CT+tfHxBgCTIscLEFxlKDuUWUgomu10kukW7SqkvHhPFfmfNqaTlD68Zz9HH bhniSNmlqnH3K4Kw0Nq3+NITeYsgmNZG4Xzngze0tXiZFknekad8NbPCaeLv3b7Dtiwj D/GoCTlr0NO1Ru9lrDvx4VgFBcQBzEq4eViioJp4DjmSaDA/e2umGblWR/vBbqcFiYm7 h7Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=DwtwXjs+wLi+6q2F8F5zy7dwAI6S0gMGo+qQ9ijVI6A=; b=V4kmQonzYrYUPSnlDmgZQwRh5zxd3YXU++lV6eufM4To7zvaPg2NdhNfffDzp4IGIw cEumh2tPwLX8T5MfwK8iUqTbpoXqN/ld3ZMpOYBX2px4fBde1pbBx4lvKadKxdD9jEDQ DWjorlJVQEtreJQAm6sxg4bNv1zIAJwL6i2E8TttHMXoeVoSm3QzehjPOO5W+TKHIMXW Ou1k0wtEFsW2MUTEz/UVOmO/zjyXnerLqv20PabcEDN7Umum7dYeWn2Fq07J+8Hw3GJD dwZzWruoZ6RaqvcN2OOR3dk5zZ6Ckqx18dnUSpzjnObXXcJDI69srSfz8b674CUBnEvf x9/Q== X-Gm-Message-State: AOAM530sFn1JddiVOtLkqvFOOluVcpF6ODa6YA1mDTU4vJ/DOaMKtIMV y2t6FS23K1jsVUgOf9R0lYt/yh5EZ0c= X-Google-Smtp-Source: ABdhPJxcmWMtZMoHdGPdkwPL0hnz0FJmjwDbEeXBnjI3Vw6tLO+vHOHBM1YpT7QdmmL7/SVa3YAT+w== X-Received: by 2002:adf:fb91:: with SMTP id a17mr16411519wrr.93.1612162731321; Sun, 31 Jan 2021 22:58:51 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e11sm25327973wrx.14.2021.01.31.22.58.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:50 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:37 +0000 Subject: [PATCH v7 03/11] commit-graph: consolidate fill_commit_graph_info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Both fill_commit_graph_info() and fill_commit_in_graph() parse information present in commit data chunk. Let's simplify the implementation by calling fill_commit_graph_info() within fill_commit_in_graph(). fill_commit_graph_info() used to not load committer data from commit data chunk. However, with the upcoming switch to using corrected committer date as generation number v2, we will have to load committer date to compute generation number value anyway. e51217e15 (t5000: test tar files that overflow ustar headers, 30-06-2016) introduced a test 'generate tar with future mtime' that creates a commit with committer date of (2^36 + 1) seconds since EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus committer time overflows into generation number (within CDAT chunk) and has undefined behavior. The test used to pass as fill_commit_graph_info() would not set struct member `date` of struct commit and load committer date from the object database, generating a tar file with the expected mtime. However, with corrected commit date, we will load the committer date from CDAT chunk (truncated to lower 34-bits to populate the generation number. Thus, Git sets date and generates tar file with the truncated mtime. The ustar format (the header format used by most modern tar programs) only has room for 11 (or 12, depending on some implementations) octal digits for the size and mtime of each file. As the CDAT chunk is overflow by 12-octal digits but not 11-octal digits, we split the existing tests to test both implementations separately and add a new explicit test for 11-digit implementation. To test the 11-octal digit implementation, we create a future commit with committer date of 2^34 - 1, which overflows 11-octal digits without overflowing 34-bits of the Commit Date chunks. To test the 12-octal digit implementation, the smallest committer date possible is 2^36 + 1, which overflows the CDAT chunk and thus commit-graph must be disabled for the test. Signed-off-by: Abhishek Kumar --- commit-graph.c | 27 ++++++++++----------------- t/t5000-tar-tree.sh | 24 +++++++++++++++++++++--- 2 files changed, 31 insertions(+), 20 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 78de312ccec..955418bd6e5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -753,15 +753,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, const unsigned char *commit_data; struct commit_graph_data *graph_data; uint32_t lex_index; + uint64_t date_high, date_low; while (pos < g->num_commits_in_base) g = g->base_graph; + if (pos >= g->num_commits + g->num_commits_in_base) + die(_("invalid commit position. commit-graph is likely corrupt")); + lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; graph_data = commit_graph_data_at(item); graph_data->graph_pos = pos; + + date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; + date_low = get_be32(commit_data + g->hash_len + 12); + item->date = (timestamp_t)((date_high << 32) | date_low); + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } @@ -776,38 +785,22 @@ static int fill_commit_in_graph(struct repository *r, { uint32_t edge_value; uint32_t *parent_data_ptr; - uint64_t date_low, date_high; struct commit_list **pptr; - struct commit_graph_data *graph_data; const unsigned char *commit_data; uint32_t lex_index; while (pos < g->num_commits_in_base) g = g->base_graph; - if (pos >= g->num_commits + g->num_commits_in_base) - die(_("invalid commit position. commit-graph is likely corrupt")); + fill_commit_graph_info(item, g, pos); - /* - * Store the "full" position, but then use the - * "local" position for the rest of the calculation. - */ - graph_data = commit_graph_data_at(item); - graph_data->graph_pos = pos; lex_index = pos - g->num_commits_in_base; - commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; item->object.parsed = 1; set_commit_tree(item, NULL); - date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; - date_low = get_be32(commit_data + g->hash_len + 12); - item->date = (timestamp_t)((date_high << 32) | date_low); - - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; - pptr = &item->parents; edge_value = get_be32(commit_data + g->hash_len); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 3ebb0d3b652..7204799a0b5 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -431,15 +431,33 @@ test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' ' test_cmp expect actual ' -test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' ' +test_expect_success TIME_IS_64BIT 'set up repository with far-future (2^34 - 1) commit' ' + rm -f .git/index && + echo foo >file && + git add file && + GIT_COMMITTER_DATE="@17179869183 +0000" \ + git commit -m "tempori parendum" +' + +test_expect_success TIME_IS_64BIT 'generate tar with far-future mtime' ' + git archive HEAD >future.tar +' + +test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' ' + echo 2514 >expect && + tar_info future.tar | cut -d" " -f2 >actual && + test_cmp expect actual +' + +test_expect_success TIME_IS_64BIT 'set up repository with far-far-future (2^36 + 1) commit' ' rm -f .git/index && echo content >file && git add file && - GIT_COMMITTER_DATE="@68719476737 +0000" \ + GIT_TEST_COMMIT_GRAPH=0 GIT_COMMITTER_DATE="@68719476737 +0000" \ git commit -m "tempori parendum" ' -test_expect_success TIME_IS_64BIT 'generate tar with future mtime' ' +test_expect_success TIME_IS_64BIT 'generate tar with far-far-future mtime' ' git archive HEAD >future.tar ' From patchwork Mon Feb 1 06:58:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D352C433DB for ; Mon, 1 Feb 2021 06:59:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD8E164E30 for ; Mon, 1 Feb 2021 06:59:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230145AbhBAG7q (ORCPT ); Mon, 1 Feb 2021 01:59:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229703AbhBAG7e (ORCPT ); Mon, 1 Feb 2021 01:59:34 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D01AFC0613D6 for ; Sun, 31 Jan 2021 22:58:53 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id l12so15340196wry.2 for ; Sun, 31 Jan 2021 22:58:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=cEe95guw9s9mzUDrRNnTP/221TBTHwdatVA+Owv7Nr0=; b=hBl6uBiW/aKHt0+bz3CnFE3dDdP9AMeWkEgCVczuhd1NLGgOtdJxKYxfTxcF67zECh 0tryZswbk6HVKqbip6ULx/a9Q4zmywj+ahVxRidlEhahBIj+E8bgdQR4lrwRC1Ge96yO q2a0rAXwNPQkze7eWLrzgx5KEIPJpamDl1ib5vzbBkrWYoMDjhxEcqmVzSEnP237RpXG WLIzTs3vPFN4+rHhFRH/6zAyWcYC1UJDre11kDPpZrliz9ucROBoIC7xSOgvxaJTRZxD hgVr6VLOmV4GBKa0KXeUaS4fG3AKsd8g/vtVGO5F2Pmz/vjVMRCflZWBlnT8BquVSuDj XSbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=cEe95guw9s9mzUDrRNnTP/221TBTHwdatVA+Owv7Nr0=; b=V68vyo9N+hWLGZXMRzPaIRuv7xQ1lA1t/ZvxDtawKHsO8pvB5g5XJUVlDVua99TirF Bx8ojzZ3D58EMQXRIokajJ3WPwVfN22wy9lSCb+ats4awFYPfwlJSQC/Ajek0WvLzGql a5/cP14HGLJ2pJTgVK+t6Vnpljf1YidNiaGMvtA9IuueS+fE4kxayIu3f1j6CByHdZ5L FcUBT6ZLxYpOzNMdQVK1m1mSJoY4MlxjjdNdnej2SCAzcY4yUCAWW8kaxtP7H4FjiqR0 Yj/OH/54utAgpssV1vaj+hJiSQOlDogzWpu853AXs0uOMpW0gNtf6hs0Ri7WjZHjCo5h FJgQ== X-Gm-Message-State: AOAM531HW+3gW5Q/z5j9ltJcqxlvHo4q8bjStx6AyHNMmmsu1eCmy0y8 xytLGgh/szpTlnF/In+RWSdWjJlZE2w= X-Google-Smtp-Source: ABdhPJzn/dk5F6vKsgcjMGz2g3gG+SfFo1lKjIq/IjER5B4wgepri8HX93KXidYuxmpEMu1gEG4HsQ== X-Received: by 2002:a5d:4443:: with SMTP id x3mr17033288wrr.409.1612162732367; Sun, 31 Jan 2021 22:58:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p15sm25487707wrt.15.2021.01.31.22.58.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:51 -0800 (PST) Message-Id: <085085a433072076ffa45d149cdf4e0b6b55d918.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:38 +0000 Subject: [PATCH v7 04/11] t6600-test-reach: generalize *_three_modes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In a preparatory step to implement generation number v2, we add tests to ensure Git can read and parse commit-graph files without Generation Data chunk. These files represent commit-graph files written by Old Git and are neccesary for backward compatability. We extend run_three_modes() and test_three_modes() to *_all_modes() with the fourth mode being "commit-graph without generation data chunk". Signed-off-by: Abhishek Kumar --- t/t6600-test-reach.sh | 62 +++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index f807276337d..af10f0dc090 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -58,7 +58,7 @@ test_expect_success 'setup' ' git config core.commitGraph true ' -run_three_modes () { +run_all_modes () { test_when_finished rm -rf .git/objects/info/commit-graph && "$@" actual && test_cmp expect actual && @@ -70,8 +70,8 @@ run_three_modes () { test_cmp expect actual } -test_three_modes () { - run_three_modes test-tool reach "$@" +test_all_modes () { + run_all_modes test-tool reach "$@" } test_expect_success 'ref_newer:miss' ' @@ -80,7 +80,7 @@ test_expect_success 'ref_newer:miss' ' B:commit-4-9 EOF echo "ref_newer(A,B):0" >expect && - test_three_modes ref_newer + test_all_modes ref_newer ' test_expect_success 'ref_newer:hit' ' @@ -89,7 +89,7 @@ test_expect_success 'ref_newer:hit' ' B:commit-2-3 EOF echo "ref_newer(A,B):1" >expect && - test_three_modes ref_newer + test_all_modes ref_newer ' test_expect_success 'in_merge_bases:hit' ' @@ -98,7 +98,7 @@ test_expect_success 'in_merge_bases:hit' ' B:commit-8-8 EOF echo "in_merge_bases(A,B):1" >expect && - test_three_modes in_merge_bases + test_all_modes in_merge_bases ' test_expect_success 'in_merge_bases:miss' ' @@ -107,7 +107,7 @@ test_expect_success 'in_merge_bases:miss' ' B:commit-5-9 EOF echo "in_merge_bases(A,B):0" >expect && - test_three_modes in_merge_bases + test_all_modes in_merge_bases ' test_expect_success 'in_merge_bases_many:hit' ' @@ -117,7 +117,7 @@ test_expect_success 'in_merge_bases_many:hit' ' X:commit-5-7 EOF echo "in_merge_bases_many(A,X):1" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'in_merge_bases_many:miss' ' @@ -127,7 +127,7 @@ test_expect_success 'in_merge_bases_many:miss' ' X:commit-8-6 EOF echo "in_merge_bases_many(A,X):0" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'in_merge_bases_many:miss-heuristic' ' @@ -137,7 +137,7 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' X:commit-6-6 EOF echo "in_merge_bases_many(A,X):0" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'is_descendant_of:hit' ' @@ -148,7 +148,7 @@ test_expect_success 'is_descendant_of:hit' ' X:commit-1-1 EOF echo "is_descendant_of(A,X):1" >expect && - test_three_modes is_descendant_of + test_all_modes is_descendant_of ' test_expect_success 'is_descendant_of:miss' ' @@ -159,7 +159,7 @@ test_expect_success 'is_descendant_of:miss' ' X:commit-7-6 EOF echo "is_descendant_of(A,X):0" >expect && - test_three_modes is_descendant_of + test_all_modes is_descendant_of ' test_expect_success 'get_merge_bases_many' ' @@ -174,7 +174,7 @@ test_expect_success 'get_merge_bases_many' ' git rev-parse commit-5-6 \ commit-4-7 | sort } >expect && - test_three_modes get_merge_bases_many + test_all_modes get_merge_bases_many ' test_expect_success 'reduce_heads' ' @@ -196,7 +196,7 @@ test_expect_success 'reduce_heads' ' commit-2-8 \ commit-1-10 | sort } >expect && - test_three_modes reduce_heads + test_all_modes reduce_heads ' test_expect_success 'can_all_from_reach:hit' ' @@ -219,7 +219,7 @@ test_expect_success 'can_all_from_reach:hit' ' Y:commit-8-1 EOF echo "can_all_from_reach(X,Y):1" >expect && - test_three_modes can_all_from_reach + test_all_modes can_all_from_reach ' test_expect_success 'can_all_from_reach:miss' ' @@ -241,7 +241,7 @@ test_expect_success 'can_all_from_reach:miss' ' Y:commit-8-5 EOF echo "can_all_from_reach(X,Y):0" >expect && - test_three_modes can_all_from_reach + test_all_modes can_all_from_reach ' test_expect_success 'can_all_from_reach_with_flag: tags case' ' @@ -264,7 +264,7 @@ test_expect_success 'can_all_from_reach_with_flag: tags case' ' Y:commit-8-1 EOF echo "can_all_from_reach_with_flag(X,_,_,0,0):1" >expect && - test_three_modes can_all_from_reach_with_flag + test_all_modes can_all_from_reach_with_flag ' test_expect_success 'commit_contains:hit' ' @@ -280,8 +280,8 @@ test_expect_success 'commit_contains:hit' ' X:commit-9-3 EOF echo "commit_contains(_,A,X,_):1" >expect && - test_three_modes commit_contains && - test_three_modes commit_contains --tag + test_all_modes commit_contains && + test_all_modes commit_contains --tag ' test_expect_success 'commit_contains:miss' ' @@ -297,8 +297,8 @@ test_expect_success 'commit_contains:miss' ' X:commit-9-3 EOF echo "commit_contains(_,A,X,_):0" >expect && - test_three_modes commit_contains && - test_three_modes commit_contains --tag + test_all_modes commit_contains && + test_all_modes commit_contains --tag ' test_expect_success 'rev-list: basic topo-order' ' @@ -310,7 +310,7 @@ test_expect_success 'rev-list: basic topo-order' ' commit-6-2 commit-5-2 commit-4-2 commit-3-2 commit-2-2 commit-1-2 \ commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \ >expect && - run_three_modes git rev-list --topo-order commit-6-6 + run_all_modes git rev-list --topo-order commit-6-6 ' test_expect_success 'rev-list: first-parent topo-order' ' @@ -322,7 +322,7 @@ test_expect_success 'rev-list: first-parent topo-order' ' commit-6-2 \ commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \ >expect && - run_three_modes git rev-list --first-parent --topo-order commit-6-6 + run_all_modes git rev-list --first-parent --topo-order commit-6-6 ' test_expect_success 'rev-list: range topo-order' ' @@ -334,7 +334,7 @@ test_expect_success 'rev-list: range topo-order' ' commit-6-2 commit-5-2 commit-4-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --topo-order commit-3-3..commit-6-6 + run_all_modes git rev-list --topo-order commit-3-3..commit-6-6 ' test_expect_success 'rev-list: range topo-order' ' @@ -346,7 +346,7 @@ test_expect_success 'rev-list: range topo-order' ' commit-6-2 commit-5-2 commit-4-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --topo-order commit-3-8..commit-6-6 + run_all_modes git rev-list --topo-order commit-3-8..commit-6-6 ' test_expect_success 'rev-list: first-parent range topo-order' ' @@ -358,7 +358,7 @@ test_expect_success 'rev-list: first-parent range topo-order' ' commit-6-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6 + run_all_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6 ' test_expect_success 'rev-list: ancestry-path topo-order' ' @@ -368,7 +368,7 @@ test_expect_success 'rev-list: ancestry-path topo-order' ' commit-6-4 commit-5-4 commit-4-4 commit-3-4 \ commit-6-3 commit-5-3 commit-4-3 \ >expect && - run_three_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6 + run_all_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6 ' test_expect_success 'rev-list: symmetric difference topo-order' ' @@ -382,7 +382,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' ' commit-3-8 commit-2-8 commit-1-8 \ commit-3-7 commit-2-7 commit-1-7 \ >expect && - run_three_modes git rev-list --topo-order commit-3-8...commit-6-6 + run_all_modes git rev-list --topo-order commit-3-8...commit-6-6 ' test_expect_success 'get_reachable_subset:all' ' @@ -402,7 +402,7 @@ test_expect_success 'get_reachable_subset:all' ' commit-1-7 \ commit-5-6 | sort ) >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_expect_success 'get_reachable_subset:some' ' @@ -420,7 +420,7 @@ test_expect_success 'get_reachable_subset:some' ' git rev-parse commit-3-3 \ commit-1-7 | sort ) >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_expect_success 'get_reachable_subset:none' ' @@ -434,7 +434,7 @@ test_expect_success 'get_reachable_subset:none' ' Y:commit-2-8 EOF echo "get_reachable_subset(X,Y)" >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_done From patchwork Mon Feb 1 06:58:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35F50C433DB for ; Mon, 1 Feb 2021 07:00:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CB97F64E31 for ; Mon, 1 Feb 2021 07:00:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230189AbhBAHAO (ORCPT ); Mon, 1 Feb 2021 02:00:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229703AbhBAHAM (ORCPT ); Mon, 1 Feb 2021 02:00:12 -0500 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AB11C0613ED for ; Sun, 31 Jan 2021 22:58:55 -0800 (PST) Received: by mail-wm1-x332.google.com with SMTP id y187so12204050wmd.3 for ; Sun, 31 Jan 2021 22:58:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Qm98VKlkwBFr4t215FdMpwLOpqWNk9wC7rfbJCXWIUU=; b=DZjcD6/Js/kK2HevoUYoQlXVvSLIXfbzYnBq6n2ar7TLUOmnexo7XOYkSSQxxX04rv FovBNB2tyMpffefqDrqwFzPdO5GbFNgf5qY0szTHLZAt9q1IWi709U4i3dlJkWniPcDc efKpKbfQgl4dxef/TpnNzQS/ThIsbrOoGfqnVxwG2V102QokbGBMc+OYFFIloy05Vgkx zlfaPvvhaq9Bn61Dh7BkY0vzoxGt6y0ZCytQSqO2sYkn7iweAtYvyihxJyrlpSR1NsB8 wzeuPpxH2Qn3r4RcQsYTI1bVn5x0n/xORCVpmamRbG+UlSQ/DSeWKFG0JSaHafbWHYxx SNtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Qm98VKlkwBFr4t215FdMpwLOpqWNk9wC7rfbJCXWIUU=; b=fHomFmZxEp5zfGIwHqMs9r4Hx0YSqS5sPCdJu6qDUnbzWZpY5eRkBRTz9+Qo3cTQh3 kZ2OQ7ZsKrQorYDxMR3zonntLc49ieRZr13XNvt1Iu8JMGQCPTDHBCfo6B1UDa7mcAOE tI+Dtcr3xbc2kBDxfQVE24MEbK00OoITcpTqPB8ZPy1cW1z9ckbWdiYm3hGZUdUc2Uqs uMVhMNjBKCCxjFWdXps40fBRZxacQAJbUSIN8vIbBT7xeow2zkztNhbv/WDAQjvFm5/R qXjfKiREGm7R8lXYyW3fcPeauSjR4RKeclGEy8Ym4b8Jk95CQ38c3O653miGLBFkCRla qXig== X-Gm-Message-State: AOAM5325JJWsgZDXbDsvkf7Rjw4LH7qTVibFRuuAnnKiBOszlVs+70Yi FcR5mmtsIU5XdXC1lMX3Z2mBS+T7PPg= X-Google-Smtp-Source: ABdhPJx2LbAw45Sp54nE4TWdnQbywKw+aU5w/iv1ffvPHq2LHgFIyEtvAMtXFTV2r9RTYFqbeQE5fQ== X-Received: by 2002:a05:600c:618:: with SMTP id o24mr13740887wmm.82.1612162733593; Sun, 31 Jan 2021 22:58:53 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t18sm21827945wrr.56.2021.01.31.22.58.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:52 -0800 (PST) Message-Id: <3b1aae4106a409cf4652e07295683f92d7bf1fb2.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:39 +0000 Subject: [PATCH v7 05/11] commit-graph: add a slab to store topological levels Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In a later commit we will introduce corrected commit date as the generation number v2. Corrected commit dates will be stored in the new seperate Generation Data chunk. However, to ensure backwards compatibility with "Old" Git we need to continue to write generation number v1 (topological levels) to the commit data chunk. Thus, we need to compute and store both versions of generation numbers to write the commit-graph file. Therefore, let's introduce a commit-slab `topo_level_slab` to store topological levels; corrected commit date will be stored in the member `generation` of struct commit_graph_data. The macros `GENERATION_NUMBER_INFINITY` and `GENERATION_NUMBER_ZERO` mark commits not in the commit-graph file and commits written by a version of Git that did not compute generation numbers respectively. Generation numbers are computed identically for both kinds of commits. A "slab-miss" should return `GENERATION_NUMBER_INFINITY` as the commit is not in the commit-graph file. However, since the slab is zero-initialized, it returns 0 (or rather `GENERATION_NUMBER_ZERO`). Thus, we no longer need to check if the topological level of a commit is `GENERATION_NUMBER_INFINITY`. We will add a pointer to the slab in `struct write_commit_graph_context` and `struct commit_graph` to populate the slab in `fill_commit_graph_info` if the commit has a pre-computed topological level as in case of split commit-graphs. Signed-off-by: Abhishek Kumar --- commit-graph.c | 45 ++++++++++++++++++++++++++++++--------------- commit-graph.h | 1 + 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 955418bd6e5..2f344cce151 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -64,6 +64,8 @@ void git_test_write_commit_graph_or_die(void) /* Remember to update object flag allocation in object.h */ #define REACHABLE (1u<<15) +define_commit_slab(topo_level_slab, uint32_t); + /* Keep track of the order in which commits are added to our list. */ define_commit_slab(commit_pos, int); static struct commit_pos commit_pos = COMMIT_SLAB_INIT(1, commit_pos); @@ -772,6 +774,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, item->date = (timestamp_t)((date_high << 32) | date_low); graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + + if (g->topo_levels) + *topo_level_slab_at(g->topo_levels, item) = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -960,6 +965,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; + struct topo_level_slab *topo_levels; const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1106,7 +1112,7 @@ static int write_graph_chunk_data(struct hashfile *f, else packedDate[0] = 0; - packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); + packedDate[0] |= htonl(*topo_level_slab_at(ctx->topo_levels, *list) << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1336,11 +1342,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { - uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); display_progress(ctx->progress, i + 1); - if (generation != GENERATION_NUMBER_INFINITY && - generation != GENERATION_NUMBER_ZERO) + if (level != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1348,29 +1353,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) struct commit *current = list->item; struct commit_list *parent; int all_parents_computed = 1; - uint32_t max_generation = 0; + uint32_t max_level = 0; for (parent = current->parents; parent; parent = parent->next) { - generation = commit_graph_data_at(parent->item)->generation; + level = *topo_level_slab_at(ctx->topo_levels, parent->item); - if (generation == GENERATION_NUMBER_INFINITY || - generation == GENERATION_NUMBER_ZERO) { + if (level == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (generation > max_generation) { - max_generation = generation; + } else if (level > max_level) { + max_level = level; } } if (all_parents_computed) { - struct commit_graph_data *data = commit_graph_data_at(current); - - data->generation = max_generation + 1; pop_commit(&list); - if (data->generation > GENERATION_NUMBER_MAX) - data->generation = GENERATION_NUMBER_MAX; + if (max_level > GENERATION_NUMBER_MAX - 1) + max_level = GENERATION_NUMBER_MAX - 1; + *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; } } } @@ -2106,6 +2108,7 @@ int write_commit_graph(struct object_directory *odb, int res = 0; int replace = 0; struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; + struct topo_level_slab topo_levels; prepare_repo_settings(the_repository); if (!the_repository->settings.core_commit_graph) { @@ -2132,6 +2135,18 @@ int write_commit_graph(struct object_directory *odb, bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; + init_topo_level_slab(&topo_levels); + ctx->topo_levels = &topo_levels; + + if (ctx->r->objects->commit_graph) { + struct commit_graph *g = ctx->r->objects->commit_graph; + + while (g) { + g->topo_levels = &topo_levels; + g = g->base_graph; + } + } + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/commit-graph.h b/commit-graph.h index f8e92500c6e..00f00745b79 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -73,6 +73,7 @@ struct commit_graph { const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + struct topo_level_slab *topo_levels; struct bloom_filter_settings *bloom_filter_settings; }; From patchwork Mon Feb 1 06:58:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C99DC433DB for ; Mon, 1 Feb 2021 07:00:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C1DD564E31 for ; Mon, 1 Feb 2021 07:00:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231376AbhBAHA0 (ORCPT ); Mon, 1 Feb 2021 02:00:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbhBAHAM (ORCPT ); Mon, 1 Feb 2021 02:00:12 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEF77C061786 for ; Sun, 31 Jan 2021 22:58:56 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id p15so15280385wrq.8 for ; Sun, 31 Jan 2021 22:58:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=R4qr6SbmnBm43Y/YrJpB+Jon+KbO/NPJ3MEdVUb+AD4=; b=ehBp+sEzydggjD6Ryg8m9z064KOAVVL8aG5Gopva7BhtSbRo3rsSXTzGj6uE+kayxo 16IoxmEmWZ9gc/V2Sjx3rVImrzvuYBL0Gp+0+Jyu3en0m53kh/siFHedHcvdaiBnR9Tt x58Ho4/prlRty4nVbw4CEYeGxl4FkZLVzQBwpnyQ0ajTWsah94czuNGCyFYGmwLRTDHu qZsj2vRmZOH0o7PpvmVYiUDz0KanBUNR7mMt+2zUmBKEFA4tqXeipuM3wInO+YKmoZD0 ksxiUGJs8rbCw1+zlCHbN2IO1xBONhpWJTn6O/e/jZHnde5K6lhamEEm3701+UUIuCFI 97yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=R4qr6SbmnBm43Y/YrJpB+Jon+KbO/NPJ3MEdVUb+AD4=; b=hZ4ZM6TYidNjQvKE/62ab0mjCrvbcTJSPUYjOUt1/y5+vu01Abc8UOTpOssEN5oMGB akRajEAT072ylbtnePRMM97ab6PCOQQXqtEFkfvbyfFIFCTYMUYRGlihJ7nOQZ9bnB9L JsV6sZQW5jn/LeuD03ENRp2bk/wJhqy7NDcu9ZnQ/2a2mWh9zCRb/OjpPW5Zn+0meerN SDpu78BZq05/zYyiCMBu5T/Pv6FUw9dgwXElidJIhbWy5DPHKnphTDABxgCiCcZYwOwm uvvKV6aATMXilmVxEaD4JgCOwvFiyHAjAnCr3ZCvG5DN8RkMyrGXjQdlGwWgxkOKNuA2 tPng== X-Gm-Message-State: AOAM530GEcqzoxEufxVd/VMu6i7nyjSVuNkw9VXdithfs1JgUMxtfMh7 vw10YkbeTUZJwS3gZzsFFXBwiSG/uFQ= X-Google-Smtp-Source: ABdhPJyFph9aFXAv3lW406/reOT/7fW04UO8tb0DrH9uMzecBnYiq2/4BcigVPwZphdeQ6/euPBiNg== X-Received: by 2002:adf:cd10:: with SMTP id w16mr16395910wrm.90.1612162735383; Sun, 31 Jan 2021 22:58:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x25sm20010092wmk.20.2021.01.31.22.58.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:54 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:40 +0000 Subject: [PATCH v7 06/11] commit-graph: return 64-bit generation number Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In a preparatory step for introducing corrected commit dates, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar --- commit-graph.c | 22 +++++++++++----------- commit-graph.h | 4 ++-- commit-reach.c | 36 ++++++++++++++++++------------------ commit-reach.h | 2 +- commit.c | 4 ++-- commit.h | 4 ++-- revision.c | 10 +++++----- upload-pack.c | 2 +- 8 files changed, 42 insertions(+), 42 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 2f344cce151..8f17815021d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -101,7 +101,7 @@ uint32_t commit_graph_position(const struct commit *c) return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH; } -uint32_t commit_graph_generation(const struct commit *c) +timestamp_t commit_graph_generation(const struct commit *c) { struct commit_graph_data *data = commit_graph_data_slab_peek(&commit_graph_data_slab, c); @@ -150,8 +150,8 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_data_at(a)->generation; - uint32_t generation_b = commit_graph_data_at(b)->generation; + const timestamp_t generation_a = commit_graph_data_at(a)->generation; + const timestamp_t generation_b = commit_graph_data_at(b)->generation; /* lower generation commits first */ if (generation_a < generation_b) return -1; @@ -1370,8 +1370,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (all_parents_computed) { pop_commit(&list); - if (max_level > GENERATION_NUMBER_MAX - 1) - max_level = GENERATION_NUMBER_MAX - 1; + if (max_level > GENERATION_NUMBER_V1_MAX - 1) + max_level = GENERATION_NUMBER_V1_MAX - 1; *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; } } @@ -2367,8 +2367,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit, *odb_commit; struct commit_list *graph_parents, *odb_parents; - uint32_t max_generation = 0; - uint32_t generation; + timestamp_t max_generation = 0; + timestamp_t generation; display_progress(progress, i + 1); hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -2432,16 +2432,16 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) continue; /* - * If one of our parents has generation GENERATION_NUMBER_MAX, then - * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid + * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then + * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid * extra logic in the following condition. */ - if (max_generation == GENERATION_NUMBER_MAX) + if (max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); if (generation != max_generation + 1) - graph_report(_("commit-graph generation for commit %s is %u != %u"), + graph_report(_("commit-graph generation for commit %s is %"PRItime" != %"PRItime), oid_to_hex(&cur_oid), generation, max_generation + 1); diff --git a/commit-graph.h b/commit-graph.h index 00f00745b79..2e9aa7824ee 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -145,12 +145,12 @@ void disable_commit_graph(struct repository *r); struct commit_graph_data { uint32_t graph_pos; - uint32_t generation; + timestamp_t generation; }; /* * Commits should be parsed before accessing generation, graph positions. */ -uint32_t commit_graph_generation(const struct commit *); +timestamp_t commit_graph_generation(const struct commit *); uint32_t commit_graph_position(const struct commit *); #endif diff --git a/commit-reach.c b/commit-reach.c index 50175b159e7..9b24b0378d5 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -32,12 +32,12 @@ static int queue_has_nonstale(struct prio_queue *queue) static struct commit_list *paint_down_to_common(struct repository *r, struct commit *one, int n, struct commit **twos, - int min_generation) + timestamp_t min_generation) { struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; struct commit_list *result = NULL; int i; - uint32_t last_gen = GENERATION_NUMBER_INFINITY; + timestamp_t last_gen = GENERATION_NUMBER_INFINITY; if (!min_generation) queue.compare = compare_commits_by_commit_date; @@ -58,10 +58,10 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit *commit = prio_queue_get(&queue); struct commit_list *parents; int flags; - uint32_t generation = commit_graph_generation(commit); + timestamp_t generation = commit_graph_generation(commit); if (min_generation && generation > last_gen) - BUG("bad generation skip %8x > %8x at %s", + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", generation, last_gen, oid_to_hex(&commit->object.oid)); last_gen = generation; @@ -177,12 +177,12 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; - uint32_t min_generation = commit_graph_generation(array[i]); + timestamp_t min_generation = commit_graph_generation(array[i]); if (redundant[i]) continue; for (j = filled = 0; j < cnt; j++) { - uint32_t curr_generation; + timestamp_t curr_generation; if (i == j || redundant[j]) continue; filled_index[filled] = j; @@ -321,7 +321,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, { struct commit_list *bases; int ret = 0, i; - uint32_t generation, max_generation = GENERATION_NUMBER_ZERO; + timestamp_t generation, max_generation = GENERATION_NUMBER_ZERO; if (repo_parse_commit(r, commit)) return ret; @@ -470,7 +470,7 @@ static int in_commit_list(const struct commit_list *want, struct commit *c) static enum contains_result contains_test(struct commit *candidate, const struct commit_list *want, struct contains_cache *cache, - uint32_t cutoff) + timestamp_t cutoff) { enum contains_result *cached = contains_cache_at(cache, candidate); @@ -506,11 +506,11 @@ static enum contains_result contains_tag_algo(struct commit *candidate, { struct contains_stack contains_stack = { 0, 0, NULL }; enum contains_result result; - uint32_t cutoff = GENERATION_NUMBER_INFINITY; + timestamp_t cutoff = GENERATION_NUMBER_INFINITY; const struct commit_list *p; for (p = want; p; p = p->next) { - uint32_t generation; + timestamp_t generation; struct commit *c = p->item; load_commit_graph_info(the_repository, c); generation = commit_graph_generation(c); @@ -566,8 +566,8 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + timestamp_t generation_a = commit_graph_generation(a); + timestamp_t generation_b = commit_graph_generation(b); if (generation_a < generation_b) return -1; @@ -580,7 +580,7 @@ int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, time_t min_commit_date, - uint32_t min_generation) + timestamp_t min_generation) { struct commit **list = NULL; int i; @@ -681,13 +681,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0; struct commit_list *from_iter = from, *to_iter = to; int result; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; while (from_iter) { add_object_array(&from_iter->item->object, NULL, &from_objs); if (!parse_commit(from_iter->item)) { - uint32_t generation; + timestamp_t generation; if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; @@ -701,7 +701,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, while (to_iter) { if (!parse_commit(to_iter->item)) { - uint32_t generation; + timestamp_t generation; if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; @@ -741,13 +741,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit_list *found_commits = NULL; struct commit **to_last = to + nr_to; struct commit **from_last = from + nr_from; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; int num_to_find = 0; struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; for (item = to; item < to_last; item++) { - uint32_t generation; + timestamp_t generation; struct commit *c = *item; parse_commit(c); diff --git a/commit-reach.h b/commit-reach.h index b49ad71a317..148b56fea50 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -87,7 +87,7 @@ int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, time_t min_commit_date, - uint32_t min_generation); + timestamp_t min_generation); int can_all_from_reach(struct commit_list *from, struct commit_list *to, int commit_date_cutoff); diff --git a/commit.c b/commit.c index bab8d5ab07c..4c717329ee0 100644 --- a/commit.c +++ b/commit.c @@ -753,8 +753,8 @@ int compare_commits_by_author_date(const void *a_, const void *b_, int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused) { const struct commit *a = a_, *b = b_; - const uint32_t generation_a = commit_graph_generation(a), - generation_b = commit_graph_generation(b); + const timestamp_t generation_a = commit_graph_generation(a), + generation_b = commit_graph_generation(b); /* newer commits first */ if (generation_a < generation_b) diff --git a/commit.h b/commit.h index f4e7b0158e2..742d96c41e8 100644 --- a/commit.h +++ b/commit.h @@ -11,8 +11,8 @@ #include "commit-slab.h" #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF -#define GENERATION_NUMBER_MAX 0x3FFFFFFF +#define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1) +#define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 struct commit_list { diff --git a/revision.c b/revision.c index 5474001331a..a54d2bd28df 100644 --- a/revision.c +++ b/revision.c @@ -3302,7 +3302,7 @@ define_commit_slab(indegree_slab, int); define_commit_slab(author_date_slab, timestamp_t); struct topo_walk_info { - uint32_t min_generation; + timestamp_t min_generation; struct prio_queue explore_queue; struct prio_queue indegree_queue; struct prio_queue topo_queue; @@ -3370,7 +3370,7 @@ static void explore_walk_step(struct rev_info *revs) } static void explore_to_depth(struct rev_info *revs, - uint32_t gen_cutoff) + timestamp_t gen_cutoff) { struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; @@ -3415,7 +3415,7 @@ static void indegree_walk_step(struct rev_info *revs) } static void compute_indegrees_to_depth(struct rev_info *revs, - uint32_t gen_cutoff) + timestamp_t gen_cutoff) { struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; @@ -3473,7 +3473,7 @@ static void init_topo_walk(struct rev_info *revs) info->min_generation = GENERATION_NUMBER_INFINITY; for (list = revs->commits; list; list = list->next) { struct commit *c = list->item; - uint32_t generation; + timestamp_t generation; if (repo_parse_commit_gently(revs->repo, c, 1)) continue; @@ -3541,7 +3541,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) for (p = commit->parents; p; p = p->next) { struct commit *parent = p->item; int *pi; - uint32_t generation; + timestamp_t generation; if (parent->object.flags & UNINTERESTING) continue; diff --git a/upload-pack.c b/upload-pack.c index 3b66bf92ba8..b87607e0dd4 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -500,7 +500,7 @@ static int got_oid(struct upload_pack_data *data, static int ok_to_give_up(struct upload_pack_data *data) { - uint32_t min_generation = GENERATION_NUMBER_ZERO; + timestamp_t min_generation = GENERATION_NUMBER_ZERO; if (!data->have_obj.nr) return 0; From patchwork Mon Feb 1 06:58:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058185 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64A57C433DB for ; Mon, 1 Feb 2021 07:00:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1DA2564E31 for ; Mon, 1 Feb 2021 07:00:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231572AbhBAHAq (ORCPT ); Mon, 1 Feb 2021 02:00:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230215AbhBAHAQ (ORCPT ); Mon, 1 Feb 2021 02:00:16 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 213BDC061788 for ; Sun, 31 Jan 2021 22:58:58 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id c4so12609049wru.9 for ; Sun, 31 Jan 2021 22:58:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=mXhJuCaPYYFmBKnJdgRD818ZWMUQjVLF9ZfvmsfqxHk=; b=C7ydMNitUr87dHK4UFM3xXnQ0paDwbU7PsaSnGTcxDu/sIIGAgL5pPfHIzI3szw91I NJ5R7ZduV4gZdD+hhDl5raUxwGqVwMMKxmqz7cHZ5t2/15KxzmP4Lv8ztXqwGjUvsGqz JbxalvrSiITAAp5HahnV9dl+KyZxbgR3av6Q4tFNx+RGySFTDfw3W3+Wiz7dcCMAkd18 5BrhmZryhdk9GglBK5w5TXMDYTJEBdjnnWRhPKOqERzn0BSAZYgDAf8oUc1jeI2G7xtu 4Rf2MERUYm9fkBj6nhHwwZQ9RCqw7L4I9jK4Y7H8SPYeNP8WSXnuSR9E4LTFRDK2qGNs i2jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=mXhJuCaPYYFmBKnJdgRD818ZWMUQjVLF9ZfvmsfqxHk=; b=fQtV/QZgzGWFd6TLMUGl6fvt5sarn2lYZR/3fcWF7gJoXkzXj+omn7EBfaTs0xKO0u lASZ5inG+axakyB+tGrbn4aKrkA2xiHIStEIHfy0EJ209TZS5T5JrN5P/G0Iwo6tJlen ryZ1VmTXe1uA2qnv/gRPuP1XVkTrQ0UewnM+HZXVna9Aj4gLYPP/wSGmGGTCUxfsQYp6 Z7THKQkeGnpjb2eBWkEQ3pV0SFSpqf2E9UomgnrjkGDxTx8yH90PmDTNpusUEMVk24F1 2JCkpDfEJDwb8+2W9xvxAn89cw34SHKqB6BmLoFXiEqkpulcbzzYhB57Q85MgeB3jXBO CJmQ== X-Gm-Message-State: AOAM533NTR36VCE0kaZY8x7wYF7oUajxERuL6MS1goyOLVKWylpSrKTR P7Dm58fO5U2f/OACSi/iKbjDJx8YOjI= X-Google-Smtp-Source: ABdhPJzge4IMOBO6oJN9H7lkiE0M1kemJ0MYJDoCEmbXFIHLz2nk1TB0DYBUylxMQnWjXFZo2YmYRQ== X-Received: by 2002:a5d:5910:: with SMTP id v16mr17032750wrd.29.1612162736390; Sun, 31 Jan 2021 22:58:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v1sm19678257wmj.31.2021.01.31.22.58.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:55 -0800 (PST) Message-Id: <8647b5d2e38d7c00b4b189b5cba27cc4f61b778c.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:41 +0000 Subject: [PATCH v7 07/11] commit-graph: document generation number v2 Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Git uses topological levels in the commit-graph file for commit-graph traversal operations like 'git log --graph'. Unfortunately, topological levels can perform worse than committer date when parents of a commit differ greatly in generation numbers [1]. For example, 'git merge-base v4.8 v4.9' on the Linux repository walks 635,579 commits using topological levels and walks 167,468 using committer date. Since 091f4cf3 (commit: don't use generation numbers if not needed, 2018-08-30), 'git merge-base' uses committer date heuristic unless there is a cutoff because of the performance hit. [1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/ Thus, the need for generation number v2 was born. As Git used to die when graph version understood by it and in the commit-graph file are different [2], we needed a way to distinguish between the old and new generation number without incrementing the graph version. [2] https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ The following candidates were proposed (https://github.com/derrickstolee/gen-test, https://github.com/abhishekkumar2718/git/pull/1): - (Epoch, Date) Pairs. - Maximum Generation Numbers. - Corrected Commit Date. - FELINE Index. - Corrected Commit Date with Monotonically Increasing Offsets. Based on performance, local computability, and immutability (along with the introduction of an additional commit-graph chunk which relieved the requirement of backwards-compatibility) Corrected Commit Date was chosen as generation number v2 and is defined as follows: For a commit C, let its corrected commit date be the maximum of the commit date of C and the corrected commit dates of its parents plus 1. Then corrected commit date offset is the difference between corrected commit date of C and commit date of C. As a special case, a root commit with the timestamp zero has corrected commit date of 1 to distinguish it from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number). While it was proposed initially to store corrected commit date offsets within Commit Data Chunk, storing the offsets in a new chunk did not affect the performance measurably. The new chunk is "Generation DATa (GDAT) chunk" and it stores corrected commit date offsets while CDAT chunk stores topological level. The old versions of Git would ignore GDAT chunk, using topological levels from CDAT chunk. In contrast, new versions of Git would use corrected commit dates, falling back to topological level if the generation data chunk is absent in the commit-graph file. While storing corrected commit date offsets saves us 4 bytes per commit (as compared with storing corrected commit dates directly), it's however possible for the offset to overflow the space allocated. To handle such cases, we introduce a new chunk, _Generation Data Overflow_ (GDOV) that stores the corrected commit date. For overflowing offsets, we set MSB and store the position into the GDOV chunk, in a mechanism similar to the Extra Edges list chunk. For mixed generation number environment (for example new Git on the command line, old Git used by GUI client), we can encounter a mixed-chain commit-graph (a commit-graph chain where some of split commit-graph files have GDAT chunk and others do not). As backward compatibility is one of the goals, we can define the following behavior: While reading a mixed-chain commit-graph version, we fall back on topological levels as corrected commit dates and topological levels cannot be compared directly. When adding new layer to the split commit-graph file, and when merging some or all layers (replacing them in the latter case), the new layer will have GDAT chunk if and only if in the final result there would be no layer without GDAT chunk just below it. Signed-off-by: Abhishek Kumar --- .../technical/commit-graph-format.txt | 28 +++++-- Documentation/technical/commit-graph.txt | 77 +++++++++++++++---- 2 files changed, 86 insertions(+), 19 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b3b58880b92..b6658eff188 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -4,11 +4,7 @@ Git commit graph format The Git commit graph stores a list of commit OIDs and some associated metadata, including: -- The generation number of the commit. Commits with no parents have - generation number 1; commits with parents have generation number - one more than the maximum generation number of its parents. We - reserve zero as special, and can be used to mark a generation - number invalid or as "not computed". +- The generation number of the commit. - The root tree OID. @@ -86,13 +82,33 @@ CHUNK DATA: position. If there are more than two parents, the second value has its most-significant bit on and the other bits store an array position into the Extra Edge List chunk. - * The next 8 bytes store the generation number of the commit and + * The next 8 bytes store the topological level (generation number v1) + of the commit and the commit time in seconds since EPOCH. The generation number uses the higher 30 bits of the first 4 bytes, while the commit time uses the 32 bits of the second 4 bytes, along with the lowest 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time. + Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional] + * This list of 4-byte values store corrected commit date offsets for the + commits, arranged in the same order as commit data chunk. + * If the corrected commit date offset cannot be stored within 31 bits, + the value has its most-significant bit on and the other bits store + the position of corrected commit date into the Generation Data Overflow + chunk. + * Generation Data chunk is present only when commit-graph file is written + by compatible versions of Git and in case of split commit-graph chains, + the topmost layer also has Generation Data chunk. + + Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] + * This list of 8-byte values stores the corrected commit date offsets + for commits with corrected commit date offsets that cannot be + stored within 31 bits. + * Generation Data Overflow chunk is present only when Generation Data + chunk is present and atleast one corrected commit date offset cannot + be stored within 31 bits. + Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] This list of 4-byte values store the second through nth parents for all octopus merges. The second parent value in the commit data stores diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index f14a7659aa8..f05e7bda1a9 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph: Values 1-4 satisfy the requirements of parse_commit_gently(). -Define the "generation number" of a commit recursively as follows: +There are two definitions of generation number: +1. Corrected committer dates (generation number v2) +2. Topological levels (generation nummber v1) - * A commit with no parents (a root commit) has generation number one. +Define "corrected committer date" of a commit recursively as follows: - * A commit with at least one parent has generation number one more than - the largest generation number among its parents. + * A commit with no parents (a root commit) has corrected committer date + equal to its committer date. -Equivalently, the generation number of a commit A is one more than the + * A commit with at least one parent has corrected committer date equal to + the maximum of its commiter date and one more than the largest corrected + committer date among its parents. + + * As a special case, a root commit with timestamp zero has corrected commit + date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO + (that is, an uncomputed corrected commit date). + +Define the "topological level" of a commit recursively as follows: + + * A commit with no parents (a root commit) has topological level of one. + + * A commit with at least one parent has topological level one more than + the largest topological level among its parents. + +Equivalently, the topological level of a commit A is one more than the length of a longest path from A to a root commit. The recursive definition is easier to use for computation and observing the following property: @@ -60,6 +77,9 @@ is easier to use for computation and observing the following property: generation numbers, then we always expand the boundary commit with highest generation number and can easily detect the stopping condition. +The property applies to both versions of generation number, that is both +corrected committer dates and topological levels. + This property can be used to significantly reduce the time it takes to walk commits and determine topological relationships. Without generation numbers, the general heuristic is the following: @@ -67,7 +87,9 @@ numbers, the general heuristic is the following: If A and B are commits with commit time X and Y, respectively, and X < Y, then A _probably_ cannot reach B. -This heuristic is currently used whenever the computation is allowed to +In absence of corrected commit dates (for example, old versions of Git or +mixed generation graph chains), +this heuristic is currently used whenever the computation is allowed to violate topological relationships due to clock skew (such as "git log" with default order), but is not used when the topological order is required (such as merge base calculations, "git log --graph"). @@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite" generation number and walk until reaching commits with known generation number. -We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not +We use the macro GENERATION_NUMBER_INFINITY to mark commits not in the commit-graph file. If a commit-graph file was written by a version of Git that did not compute generation numbers, then those commits will have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. @@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in walking a few extra commits, but the simplicity in dealing with commits with generation number *_INFINITY or *_ZERO is valuable. -We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose -generation numbers are computed to be at least this value. We limit at -this value since it is the largest value that can be stored in the -commit-graph file using the 30 bits available to generation numbers. This -presents another case where a commit can have generation number equal to -that of a parent. +We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose +topological levels (generation number v1) are computed to be at least +this value. We limit at this value since it is the largest value that +can be stored in the commit-graph file using the 30 bits available +to topological levels. This presents another case where a commit can +have generation number equal to that of a parent. Design Details -------------- @@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Handling Mixed Generation Number Chains + +With the introduction of generation number v2 and generation data chunk, the +following scenario is possible: + +1. "New" Git writes a commit-graph with the corrected commit dates. +2. "Old" Git writes a split commit-graph on top without corrected commit dates. + +A naive approach of using the newest available generation number from +each layer would lead to violated expectations: the lower layer would +use corrected commit dates which are much larger than the topological +levels of the higher layer. For this reason, Git inspects the topmost +layer to see if the layer is missing corrected commit dates. In such a case +Git only uses topological level for generation numbers. + +When writing a new layer in split commit-graph, we write corrected commit +dates if the topmost layer has corrected commit dates written. This +guarantees that if a layer has corrected commit dates, all lower layers +must have corrected commit dates as well. + +When merging layers, we do not consider whether the merged layers had corrected +commit dates. Instead, the new layer will have corrected commit dates if the +layer below the new layer has corrected commit dates. + +While writing or merging layers, if the new layer is the only layer, it will +have corrected commit dates when written by compatible versions of Git. Thus, +rewriting split commit-graph as a single file (`--split=replace`) creates a +single layer with corrected commit dates. + ## Deleting graph-{hash} files After a new tip file is written, some `graph-{hash}` files may no longer From patchwork Mon Feb 1 06:58:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7211C433DB for ; Mon, 1 Feb 2021 07:00:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8307164E35 for ; Mon, 1 Feb 2021 07:00:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231516AbhBAHAj (ORCPT ); Mon, 1 Feb 2021 02:00:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230224AbhBAHAQ (ORCPT ); Mon, 1 Feb 2021 02:00:16 -0500 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE4E4C06178A for ; Sun, 31 Jan 2021 22:58:58 -0800 (PST) Received: by mail-wm1-x331.google.com with SMTP id f16so11619624wmq.5 for ; Sun, 31 Jan 2021 22:58:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=MLHCfKST6sbi5X1s5XvAFO8baFBf4VFen9s1WclrfWw=; b=HTANDkoa9YQiCkhWMmuwGbMuvNkazRXGrhvDCJf/kBMquO9JRnQzXsfgCo+a9ZVDiv ZTDUUsGFpv8eTMKURbXDLpNE6TqRwKmjriKs/dlhUYmWc9F2zCmP7icHyJYYcM9eNBTe AAgly/HgFCyT8m27G1VXNoHAdW3cDCIOhXAKm9cvCYxPo24OPu7Eltz/mb3VJTqiqr70 sV3FXiSCKzrEqYgrpj4psnyK46xECwfVn84+fu2/driMrdRZBMA2BOFT4up6LKNY5/EA NoRxspWDEnF3evG3k78Lglmq8pyCqAvuxTsTAzaZC/rtMJzcQZ+mGuadUmTlUd52HnmM N6Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=MLHCfKST6sbi5X1s5XvAFO8baFBf4VFen9s1WclrfWw=; b=qF0Xzn3gh359cKZ+ROvXMOopoogLWbWzFpulvPMD8N4TePxNMlTU94AbHh0MLwe8l4 i1uryuzpEn4LKPAPHGMBYChyxh09Hum7GVdQCgS5jNVfew6MhPkJIXqJ9BWxs+Y4bSfM NusVoH697z32UQpNgQUtJVe2t60p2h84mLm/n5l45N78nK51InlDpYRwtOlT1ohmmhkG RstJSJ7JEv4VQBT/Kpt8a909o4G2ngeULCJAH9/AdaALFZ6YcSg5kvKMmZJb08akI8nw 0GnKazndro1hOKXQwtjpy9PW08KTSv3Hkwed2ySp9EqwAw0EL+ZMEW/gVWzLROsVwlcD eWUw== X-Gm-Message-State: AOAM533k+ATBV7I7JvjE62fq43CfvH8B1LE60mr/MsYeS8dsbZmqHTz6 1b0p9g0nSPyM4mf26dMzQN1OAgXqHd4= X-Google-Smtp-Source: ABdhPJxC8dSQp6AZe3IQiMyJtbtikECE09EJIl8q97floD+12qK1pGEO0SEF1LhDVAyuwuYQxQZ3/A== X-Received: by 2002:a1c:7d41:: with SMTP id y62mr6956990wmc.139.1612162737389; Sun, 31 Jan 2021 22:58:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t18sm21828230wrr.56.2021.01.31.22.58.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:56 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:42 +0000 Subject: [PATCH v7 08/11] commit-graph: implement corrected commit date Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar With most of preparations done, let's implement corrected commit date. The corrected commit date for a commit is defined as: * A commit with no parents (a root commit) has corrected commit date equal to its committer date. * A commit with at least one parent has corrected commit date equal to the maximum of its commit date and one more than the largest corrected commit date among its parents. As a special case, a root commit with timestamp of zero (01.01.1970 00:00:00Z) has corrected commit date of one, to be able to distinguish from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit date). To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file. The corrected commit date offset for a commit is defined as the difference between its corrected commit date and actual commit date. Storing corrected commit date requires sizeof(timestamp_t) bytes, which in most cases is 64 bits (uintmax_t). However, corrected commit date offsets can be safely stored using only 32-bits. This halves the size of GDAT chunk, which is a reduction of around 6% in the size of commit-graph file. However, using offsets be problematic if a commit is malformed but valid and has committer date of 0 Unix time, as the offset would be the same as corrected commit date and thus require 64-bits to be stored properly. While Git does not write out offsets at this stage, Git stores the corrected commit dates in member generation of struct commit_graph_data. It will begin writing commit date offsets with the introduction of generation data chunk. Signed-off-by: Abhishek Kumar --- commit-graph.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 8f17815021d..d1e6ced8647 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1343,9 +1343,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); + timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation; display_progress(ctx->progress, i + 1); - if (level != GENERATION_NUMBER_ZERO) + if (level != GENERATION_NUMBER_ZERO && + corrected_commit_date != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1354,17 +1356,24 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) struct commit_list *parent; int all_parents_computed = 1; uint32_t max_level = 0; + timestamp_t max_corrected_commit_date = 0; for (parent = current->parents; parent; parent = parent->next) { level = *topo_level_slab_at(ctx->topo_levels, parent->item); + corrected_commit_date = commit_graph_data_at(parent->item)->generation; - if (level == GENERATION_NUMBER_ZERO) { + if (level == GENERATION_NUMBER_ZERO || + corrected_commit_date == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (level > max_level) { - max_level = level; } + + if (level > max_level) + max_level = level; + + if (corrected_commit_date > max_corrected_commit_date) + max_corrected_commit_date = corrected_commit_date; } if (all_parents_computed) { @@ -1373,6 +1382,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (max_level > GENERATION_NUMBER_V1_MAX - 1) max_level = GENERATION_NUMBER_V1_MAX - 1; *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; + + if (current->date && current->date > max_corrected_commit_date) + max_corrected_commit_date = current->date - 1; + commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; } } } From patchwork Mon Feb 1 06:58:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77130C433E6 for ; Mon, 1 Feb 2021 07:00:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28CAA64E32 for ; Mon, 1 Feb 2021 07:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231558AbhBAHAn (ORCPT ); Mon, 1 Feb 2021 02:00:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230264AbhBAHAQ (ORCPT ); Mon, 1 Feb 2021 02:00:16 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15998C06178B for ; Sun, 31 Jan 2021 22:59:00 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id c12so15327919wrc.7 for ; Sun, 31 Jan 2021 22:59:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=WYlyT4mU2M5mmQC+QPQ38feGxWm47NgW+rBMC8aRoAU=; b=cRZ+p48Cp36ieyDujctcRLDpGIVNpUCJaVFiVJAqzyXFFhKFv1mTDY8n8DoNl9NSGs NI2p7ImEe6J0FTUhj4KvhsPJ62Zhkj+ylYz0GHQc+XMJfdUAHjtYMyt9FjJH+8/hx+SN t4k35UBA0XjqoQ85KZt01SSOpb9icmHDsOc5MH02JXHMKn/PGUR58Qs/ckJPET/qBN/Q hhx+SBenwJDdHA8JjpUwccgMg/gJhQSdsZoa2ocXI7zoVBCTDqIWmko7QZ6nOd+3J5+B a6xN2GjFkQXC53XTieRPJouQrHhCxTfHzANiE26QbNOC6AUH5zghq3JkJi8zexKbVJRc 3OxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=WYlyT4mU2M5mmQC+QPQ38feGxWm47NgW+rBMC8aRoAU=; b=NE/HQ3aNTNMN6KHmKqBF04vkVEuP/e+vmSNr8bF+UBizSyirPzRb+eGJ1Q3/uQiO9o P6EJ+tJHvxDjL7pZh3cp3N30Afb3Xbc+oXjWibOWMrSq+z0GvpH+uyK2cluMgKTIOPo5 /UK4XwqyI309B/ZM3xWGeXgTrA4cW/fbPUKA7fFOKeruLwAazuA/XlFmNo4Ei4LknJ9k GVXee12okwpwuVrDNhmm3niVZXal0m8DaU00j4aZtgxXdLLWNBl7PJp+r6dNRP6TWuon RkgVEX/Jy5WGymvDiz8u0/S2H7PgwNgBLgiP7JIBBvIvTp3wjcPw5mfhBdc0wTiBmmIc tFLA== X-Gm-Message-State: AOAM532u/PPktufJFlFVCqeGENQ2OcJQkra6b1u8wo7Qw/qrGtmjQLme 2fep3w3qZ7Mc6V1lLMDfgOuWYgqHg2o= X-Google-Smtp-Source: ABdhPJyNWewZlmzu5ptqwZBYg3/HuVqjqcKv7C2C0OD5DcXZ8YBW49/J96jIft7FR+X8s47juarqCQ== X-Received: by 2002:a5d:5908:: with SMTP id v8mr16969723wrd.113.1612162738439; Sun, 31 Jan 2021 22:58:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i18sm17536632wrn.29.2021.01.31.22.58.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:57 -0800 (PST) Message-Id: <71d815188571c3ea421244918b73f762cd7963e3.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:43 +0000 Subject: [PATCH v7 09/11] commit-graph: implement generation data chunk MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar --- commit-graph.c | 114 ++++++++++++++++++++++++++++++---- commit-graph.h | 3 + commit.h | 1 + t/README | 3 + t/helper/test-read-graph.c | 4 ++ t/t4216-log-bloom.sh | 4 +- t/t5318-commit-graph.sh | 79 +++++++++++++++++++---- t/t5324-split-commit-graph.sh | 12 ++-- t/t6600-test-reach.sh | 6 ++ t/test-lib-functions.sh | 6 ++ 10 files changed, 200 insertions(+), 32 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d1e6ced8647..d2afcc83283 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -38,11 +38,13 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 9 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -61,6 +63,8 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) +#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31) + /* Remember to update object flag allocation in object.h */ #define REACHABLE (1u<<15) @@ -394,6 +398,20 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->chunk_commit_data = data + chunk_offset; break; + case GRAPH_CHUNKID_GENERATION_DATA: + if (graph->chunk_generation_data) + chunk_repeated = 1; + else + graph->chunk_generation_data = data + chunk_offset; + break; + + case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: + if (graph->chunk_generation_data_overflow) + chunk_repeated = 1; + else + graph->chunk_generation_data_overflow = data + chunk_offset; + break; + case GRAPH_CHUNKID_EXTRAEDGES: if (graph->chunk_extra_edges) chunk_repeated = 1; @@ -754,8 +772,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, { const unsigned char *commit_data; struct commit_graph_data *graph_data; - uint32_t lex_index; - uint64_t date_high, date_low; + uint32_t lex_index, offset_pos; + uint64_t date_high, date_low, offset; while (pos < g->num_commits_in_base) g = g->base_graph; @@ -773,7 +791,19 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + if (g->chunk_generation_data) { + offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); + + if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { + if (!g->chunk_generation_data_overflow) + die(_("commit-graph requires overflow generation data but has none")); + + offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; + graph_data->generation = get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); + } else + graph_data->generation = item->date + offset; + } else + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; if (g->topo_levels) *topo_level_slab_at(g->topo_levels, item) = get_be32(commit_data + g->hash_len + 8) >> 2; @@ -945,6 +975,7 @@ struct write_commit_graph_context { struct oid_array oids; struct packed_commit_list commits; int num_extra_edges; + int num_generation_data_overflows; unsigned long approx_nr_objects; struct progress *progress; int progress_done; @@ -963,7 +994,8 @@ struct write_commit_graph_context { report_progress:1, split:1, changed_paths:1, - order_by_pack:1; + order_by_pack:1, + write_generation_data:1; struct topo_level_slab *topo_levels; const struct commit_graph_opts *opts; @@ -1123,6 +1155,45 @@ static int write_graph_chunk_data(struct hashfile *f, return 0; } +static int write_graph_chunk_generation_data(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + int i, num_generation_data_overflows = 0; + + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + display_progress(ctx->progress, ++ctx->progress_cnt); + + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { + offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; + num_generation_data_overflows++; + } + + hashwrite_be32(f, offset); + } + + return 0; +} + +static int write_graph_chunk_generation_data_overflow(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + int i; + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + display_progress(ctx->progress, ++ctx->progress_cnt); + + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { + hashwrite_be32(f, offset >> 32); + hashwrite_be32(f, (uint32_t) offset); + } + } + + return 0; +} + static int write_graph_chunk_extra_edges(struct hashfile *f, struct write_commit_graph_context *ctx) { @@ -1386,6 +1457,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (current->date && current->date > max_corrected_commit_date) max_corrected_commit_date = current->date - 1; commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; + + if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) + ctx->num_generation_data_overflows++; } } } @@ -1719,6 +1793,21 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunks[2].id = GRAPH_CHUNKID_DATA; chunks[2].size = (hashsz + 16) * ctx->commits.nr; chunks[2].write_fn = write_graph_chunk_data; + + if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) + ctx->write_generation_data = 0; + if (ctx->write_generation_data) { + chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; + chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; + chunks[num_chunks].write_fn = write_graph_chunk_generation_data; + num_chunks++; + } + if (ctx->num_generation_data_overflows) { + chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; + chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; + chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; + num_chunks++; + } if (ctx->num_extra_edges) { chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; chunks[num_chunks].size = 4 * ctx->num_extra_edges; @@ -2139,6 +2228,8 @@ int write_commit_graph(struct object_directory *odb, ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; + ctx->write_generation_data = 1; + ctx->num_generation_data_overflows = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); @@ -2445,16 +2536,17 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) continue; /* - * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then - * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid - * extra logic in the following condition. + * If we are using topological level and one of our parents has + * generation GENERATION_NUMBER_V1_MAX, then our generation is + * also GENERATION_NUMBER_V1_MAX. Decrement to avoid extra logic + * in the following condition. */ - if (max_generation == GENERATION_NUMBER_V1_MAX) + if (!g->chunk_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); - if (generation != max_generation + 1) - graph_report(_("commit-graph generation for commit %s is %"PRItime" != %"PRItime), + if (generation < max_generation + 1) + graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime), oid_to_hex(&cur_oid), generation, max_generation + 1); diff --git a/commit-graph.h b/commit-graph.h index 2e9aa7824ee..19a02001fde 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -6,6 +6,7 @@ #include "oidset.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" +#define GIT_TEST_COMMIT_GRAPH_NO_GDAT "GIT_TEST_COMMIT_GRAPH_NO_GDAT" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" @@ -68,6 +69,8 @@ struct commit_graph { const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_commit_data; + const unsigned char *chunk_generation_data; + const unsigned char *chunk_generation_data_overflow; const unsigned char *chunk_extra_edges; const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; diff --git a/commit.h b/commit.h index 742d96c41e8..eff94f3f7c2 100644 --- a/commit.h +++ b/commit.h @@ -14,6 +14,7 @@ #define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1) #define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 +#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1) struct commit_list { struct commit *item; diff --git a/t/README b/t/README index c730a707705..8a121487279 100644 --- a/t/README +++ b/t/README @@ -393,6 +393,9 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the commit-graph to be written after every 'git commit' command, and overrides the 'core.commitGraph' setting to true. +GIT_TEST_COMMIT_GRAPH_NO_GDAT=, when true, forces the +commit-graph to be written without generation data chunk. + GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=, when true, forces commit-graph write to compute and write changed path Bloom filters for every 'git commit-graph write', as if the `--changed-paths` option was diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 5f585a17256..75927b2c81d 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -33,6 +33,10 @@ int cmd__read_graph(int argc, const char **argv) printf(" oid_lookup"); if (graph->chunk_commit_data) printf(" commit_metadata"); + if (graph->chunk_generation_data) + printf(" generation_data"); + if (graph->chunk_generation_data_overflow) + printf(" generation_data_overflow"); if (graph->chunk_extra_edges) printf(" extra_edges"); if (graph->chunk_bloom_indexes) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 0f16c4b9d52..50f206db550 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -43,11 +43,11 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 - chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data + chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data EOF test-tool read-graph >actual && test_cmp expect actual diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2ed0c1544da..fa27df579a5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -76,7 +76,7 @@ graph_git_behavior 'no graph' full commits/3 commits/1 graph_read_expect() { OPTIONAL="" NUM_CHUNKS=3 - if test ! -z $2 + if test ! -z "$2" then OPTIONAL=" $2" NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) @@ -103,14 +103,14 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' ' # valid commit and tree OID git rev-parse HEAD HEAD^{tree} >in && git commit-graph write --stdin-commits >commits-in && cat commits-in | git commit-graph write --stdin-commits && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "6" + graph_read_expect "6" "generation_data" ' graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1 @@ -297,7 +297,7 @@ test_expect_success 'build graph from commits with append' ' cd "$TRASH_DIRECTORY/full" && git rev-parse merge/3 | git commit-graph write --stdin-commits --append && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "10" "extra_edges" + graph_read_expect "10" "generation_data extra_edges" ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 @@ -307,7 +307,7 @@ test_expect_success 'build graph using --reachable' ' cd "$TRASH_DIRECTORY/full" && git commit-graph write --reachable && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "11" "extra_edges" + graph_read_expect "11" "generation_data extra_edges" ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 @@ -328,7 +328,7 @@ test_expect_success 'write graph in bare repo' ' cd "$TRASH_DIRECTORY/bare" && git commit-graph write && test_path_is_file $baredir/info/commit-graph && - graph_read_expect "11" "extra_edges" + graph_read_expect "11" "generation_data extra_edges" ' graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1 @@ -454,8 +454,9 @@ test_expect_success 'warn on improper hash version' ' test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && - git rev-parse commits/8 | git commit-graph write --stdin-commits && - git commit-graph verify >output + git rev-parse commits/8 | GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --stdin-commits && + git commit-graph verify >output && + graph_read_expect 9 extra_edges ' NUM_COMMITS=9 @@ -741,4 +742,56 @@ test_expect_success 'corrupt commit-graph write (missing tree)' ' ) ' +# We test the overflow-related code with the following repo history: +# +# 4:F - 5:N - 6:U +# / \ +# 1:U - 2:N - 3:U M:N +# \ / +# 7:N - 8:F - 9:N +# +# Here the commits denoted by U have committer date of zero seconds +# since Unix epoch, the commits denoted by N have committer date +# starting from 1112354055 seconds since Unix epoch (default committer +# date for the test suite), and the commits denoted by F have committer +# date of (2 ^ 31 - 2) seconds since Unix epoch. +# +# The largest offset observed is 2 ^ 31, just large enough to overflow. +# + +test_expect_success 'set up and verify repo with generation data overflow chunk' ' + objdir=".git/objects" && + UNIX_EPOCH_ZERO="@0 +0000" && + FUTURE_DATE="@2147483646 +0000" && + test_oid_cache <<-EOF && + oid_version sha1:1 + oid_version sha256:2 + EOF + cd "$TRASH_DIRECTORY" && + mkdir repo && + cd repo && + git init && + test_commit --date "$UNIX_EPOCH_ZERO" 1 && + test_commit 2 && + test_commit --date "$UNIX_EPOCH_ZERO" 3 && + git commit-graph write --reachable && + graph_read_expect 3 generation_data && + test_commit --date "$FUTURE_DATE" 4 && + test_commit 5 && + test_commit --date "$UNIX_EPOCH_ZERO" 6 && + git branch left && + git reset --hard 3 && + test_commit 7 && + test_commit --date "$FUTURE_DATE" 8 && + test_commit 9 && + git branch right && + git reset --hard 3 && + test_merge M left right && + git commit-graph write --reachable && + graph_read_expect 10 "generation_data generation_data_overflow" && + git commit-graph verify +' + +graph_git_behavior 'generation data overflow chunk repo' repo left right + test_done diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 4d3842b83b9..587757b62d9 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -13,11 +13,11 @@ test_expect_success 'setup repo' ' infodir=".git/objects/info" && graphdir="$infodir/commit-graphs" && test_oid_cache <<-EOM - shallow sha1:1760 - shallow sha256:2064 + shallow sha1:2132 + shallow sha256:2436 - base sha1:1376 - base sha256:1496 + base sha1:1408 + base sha256:1528 oid_version sha1:1 oid_version sha256:2 @@ -31,9 +31,9 @@ graph_read_expect() { NUM_BASE=$2 fi cat >expect <<- EOF - header: 43475048 1 $(test_oid oid_version) 3 $NUM_BASE + header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE num_commits: $1 - chunks: oid_fanout oid_lookup commit_metadata + chunks: oid_fanout oid_lookup commit_metadata generation_data EOF test-tool read-graph >output && test_cmp expect output diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index af10f0dc090..e2d33a8a4c4 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -55,6 +55,9 @@ test_expect_success 'setup' ' git show-ref -s commit-5-5 | git commit-graph write --stdin-commits && mv .git/objects/info/commit-graph commit-graph-half && chmod u+w commit-graph-half && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable && + mv .git/objects/info/commit-graph commit-graph-no-gdat && + chmod u+w commit-graph-no-gdat && git config core.commitGraph true ' @@ -67,6 +70,9 @@ run_all_modes () { test_cmp expect actual && cp commit-graph-half .git/objects/info/commit-graph && "$@" actual && + test_cmp expect actual && + cp commit-graph-no-gdat .git/objects/info/commit-graph && + "$@" actual && test_cmp expect actual } diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6bca0023168..df5bba07295 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -218,6 +218,12 @@ test_commit () { --signoff) signoff="$1" ;; + --date) + notick=yes + GIT_COMMITTER_DATE="$2" + GIT_AUTHOR_DATE="$2" + shift + ;; -C) indir="$2" shift From patchwork Mon Feb 1 06:58:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD450C433E0 for ; Mon, 1 Feb 2021 07:00:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 854D764E31 for ; Mon, 1 Feb 2021 07:00:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231530AbhBAHAl (ORCPT ); Mon, 1 Feb 2021 02:00:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230273AbhBAHAQ (ORCPT ); Mon, 1 Feb 2021 02:00:16 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29D0FC06178C for ; Sun, 31 Jan 2021 22:59:01 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id l12so15340471wry.2 for ; Sun, 31 Jan 2021 22:59:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=UYJgOKWJ/XWhk8Iv45XU12uhxalFQCYx0vedEBag57A=; b=dsQC4ypQV9wgBhGcLh8evjUTWCEaPBg5FzFCvDSV+UJILtpiiKqzcNHCPfG+vXWlLy r1shfW89T981muqFNO+mQr/PJrmuV5qKbxcP4zzDX13ltSWBYL9zMZF0IC9TXthsGRZH HdWPwXVnEzafS3fKXj4KhIu9B6Hfpa59WA9V3hN1Mw4/Cs0XKHCCpscyn8TGaVDGFOYv aI4z4tho6uALEk7GjLhPyoq5tC05X5t/iFcxZVdaNiSFSu2IPQmwbBrZNQ/CFos2Tpr8 vdtl81D9C3iRvuILcuk3D7vgBztmfjaUvk6BjgNcRcYWDMyCadA2gBhKV9LZSeloR0gm Y3aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=UYJgOKWJ/XWhk8Iv45XU12uhxalFQCYx0vedEBag57A=; b=d7SfOsLepUkRxD9LvaP/JmqSIS8DZgFndRUBsIg82cRMjQB3k1pgeULKcxAT6okfbA SAHZShL4quep3VReLCYdwZojnmAuGEhhKlwWPxqEpahcMFFQz4F0atKeC0FxJD9uPpyx xyjRloIPFS7x8eu2egaTW28JRlFago3eSc88uZZbKXnnAOVjknApJQJPjBaNcUaJJZ5t GTwwkD0XZD5n80kTvm1VLVAsCZpyPXLxLcf4Ma15OAGCKkdT43jMcQfQ2VumeUfrICKL ZDZgxTInfUTbCN4FN7tay907cPM5YH3j6RzfFU7jOEV/bAUGUkaLVPtzjCdkwvdOq//Q S2DQ== X-Gm-Message-State: AOAM531DeqI2Uc27fCK3ABVcnEpwfc/XdUsYEWUjRREuXbJuB7FD7ISd yWslSoaLv3JzgYYq3lyaprtmH1zsZeg= X-Google-Smtp-Source: ABdhPJzRdyKgrnS8zsBCbdXdGBw5htyZ3uaMBvVZSIx1dsawGc7oL89/DA0WV3YcGumJq63akuH66Q== X-Received: by 2002:a5d:470f:: with SMTP id y15mr16247362wrq.187.1612162739583; Sun, 31 Jan 2021 22:58:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v25sm20517098wmh.4.2021.01.31.22.58.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:58 -0800 (PST) Message-Id: <07a88f1aae6f7f7812ab7a5937eac73131c2139e.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:44 +0000 Subject: [PATCH v7 10/11] commit-graph: use generation v2 only if entire chain does Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Since there are released versions of Git that understand generation numbers in the commit-graph's CDAT chunk but do not understand the GDAT chunk, the following scenario is possible: 1. "New" Git writes a commit-graph with the GDAT chunk. 2. "Old" Git writes a split commit-graph on top without a GDAT chunk. If each layer of split commit-graph is treated independently, as it was the case before this commit, with Git inspecting only the current layer for chunk_generation_data pointer, commits in the lower layer (one with GDAT) whould have corrected commit date as their generation number, while commits in the upper layer would have topological levels as their generation. Corrected commit dates usually have much larger values than topological levels. This means that if we take two commits, one from the upper layer, and one reachable from it in the lower layer, then the expectation that the generation of a parent is smaller than the generation of a child would be violated. It is difficult to expose this issue in a test. Since we _start_ with artificially low generation numbers, any commit walk that prioritizes generation numbers will walk all of the commits with high generation number before walking the commits with low generation number. In all the cases I tried, the commit-graph layers themselves "protect" any incorrect behavior since none of the commits in the lower layer can reach the commits in the upper layer. This issue would manifest itself as a performance problem in this case, especially with something like "git log --graph" since the low generation numbers would cause the in-degree queue to walk all of the commits in the lower layer before allowing the topo-order queue to write anything to output (depending on the size of the upper layer). Therefore, When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well. Rewriting layers follows similar approach: if the topmost layer below the set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either. Signed-off-by: Derrick Stolee Signed-off-by: Abhishek Kumar --- commit-graph.c | 30 +++++- commit-graph.h | 1 + t/t5324-split-commit-graph.sh | 181 ++++++++++++++++++++++++++++++++++ 3 files changed, 210 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d2afcc83283..77fef5a240e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -614,6 +614,21 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static void validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data; + + if (!g) + return; + + read_generation_data = !!g->chunk_generation_data; + + while (g) { + g->read_generation_data = read_generation_data; + g = g->base_graph; + } +} + struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -622,6 +637,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); + validate_mixed_generation_chain(g); + return g; } @@ -791,7 +808,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - if (g->chunk_generation_data) { + if (g->read_generation_data) { offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { @@ -2019,6 +2036,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) if (i < ctx->num_commit_graphs_after) ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid)); + /* + * If the topmost remaining layer has generation data chunk, the + * resultant layer also has generation data chunk. + */ + if (i == ctx->num_commit_graphs_after - 2) + ctx->write_generation_data = !!g->chunk_generation_data; + i--; g = g->base_graph; } @@ -2343,6 +2367,8 @@ int write_commit_graph(struct object_directory *odb, } else ctx->num_commit_graphs_after = 1; + validate_mixed_generation_chain(ctx->r->objects->commit_graph); + compute_generation_numbers(ctx); if (ctx->changed_paths) @@ -2541,7 +2567,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) * also GENERATION_NUMBER_V1_MAX. Decrement to avoid extra logic * in the following condition. */ - if (!g->chunk_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) + if (!g->read_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); diff --git a/commit-graph.h b/commit-graph.h index 19a02001fde..ad52130883b 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -64,6 +64,7 @@ struct commit_graph { struct object_directory *odb; uint32_t num_commits_in_base; + unsigned int read_generation_data; struct commit_graph *base_graph; const uint32_t *chunk_oid_fanout; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 587757b62d9..8e90f3423b8 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -453,4 +453,185 @@ test_expect_success 'prevent regression for duplicate commits across layers' ' git -C dup commit-graph verify ' +NUM_FIRST_LAYER_COMMITS=64 +NUM_SECOND_LAYER_COMMITS=16 +NUM_THIRD_LAYER_COMMITS=7 +NUM_FOURTH_LAYER_COMMITS=8 +NUM_FIFTH_LAYER_COMMITS=16 +SECOND_LAYER_SEQUENCE_START=$(($NUM_FIRST_LAYER_COMMITS + 1)) +SECOND_LAYER_SEQUENCE_END=$(($SECOND_LAYER_SEQUENCE_START + $NUM_SECOND_LAYER_COMMITS - 1)) +THIRD_LAYER_SEQUENCE_START=$(($SECOND_LAYER_SEQUENCE_END + 1)) +THIRD_LAYER_SEQUENCE_END=$(($THIRD_LAYER_SEQUENCE_START + $NUM_THIRD_LAYER_COMMITS - 1)) +FOURTH_LAYER_SEQUENCE_START=$(($THIRD_LAYER_SEQUENCE_END + 1)) +FOURTH_LAYER_SEQUENCE_END=$(($FOURTH_LAYER_SEQUENCE_START + $NUM_FOURTH_LAYER_COMMITS - 1)) +FIFTH_LAYER_SEQUENCE_START=$(($FOURTH_LAYER_SEQUENCE_END + 1)) +FIFTH_LAYER_SEQUENCE_END=$(($FIFTH_LAYER_SEQUENCE_START + $NUM_FIFTH_LAYER_COMMITS - 1)) + +# Current split graph chain: +# +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' + graphdir=".git/objects/info/commit-graphs" && + test_oid_cache <<-EOF && + oid_version sha1:1 + oid_version sha256:2 + EOF + git init mixed && + ( + cd mixed && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + for i in $(test_seq $NUM_FIRST_LAYER_COMMITS) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split && + graph_read_expect $NUM_FIRST_LAYER_COMMITS && + test_line_count = 1 $graphdir/commit-graph-chain && + for i in $(test_seq $SECOND_LAYER_SEQUENCE_START $SECOND_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge && + test_line_count = 2 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 1 + num_commits: $NUM_SECOND_LAYER_COMMITS + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify && + cat $graphdir/commit-graph-chain + ) +' + +# The new layer will be added without generation data chunk as it was not +# present on the layer underneath it. +# +# 7 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'do not write generation data chunk if not present on existing tip' ' + git clone mixed mixed-no-gdat && + ( + cd mixed-no-gdat && + for i in $(test_seq $THIRD_LAYER_SEQUENCE_START $THIRD_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split=no-merge && + test_line_count = 3 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: $NUM_THIRD_LAYER_COMMITS + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify + ) +' + +# Number of commits in each layer of the split-commit graph before merge: +# +# 8 commits (No GDAT) +# ------------------------ +# 7 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +# The top two layers are merged and do not have generation data chunk as layer below them does +# not have generation data chunk. +# +# 15 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'do not write generation data chunk if the topmost remaining layer does not have generation data chunk' ' + git clone mixed-no-gdat mixed-merge-no-gdat && + ( + cd mixed-merge-no-gdat && + for i in $(test_seq $FOURTH_LAYER_SEQUENCE_START $FOURTH_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split --size-multiple 1 && + test_line_count = 3 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: $(($NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS)) + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify + ) +' + +# Number of commits in each layer of the split-commit graph before merge: +# +# 16 commits (No GDAT) +# ------------------------ +# 15 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +# The top three layers are merged and has generation data chunk as the topmost remaining layer +# has generation data chunk. +# +# 47 commits (GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'write generation data chunk if topmost remaining layer has generation data chunk' ' + git clone mixed-merge-no-gdat mixed-merge-gdat && + ( + cd mixed-merge-gdat && + for i in $(test_seq $FIFTH_LAYER_SEQUENCE_START $FIFTH_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split --size-multiple 1 && + test_line_count = 2 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 5 1 + num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS)) + chunks: oid_fanout oid_lookup commit_metadata generation_data + EOF + test_cmp expect output + ) +' + +test_expect_success 'write generation data chunk when commit-graph chain is replaced' ' + git clone mixed mixed-replace && + ( + cd mixed-replace && + git commit-graph write --reachable --split=replace && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir && + graph_read_expect $(($NUM_FIRST_LAYER_COMMITS + $NUM_SECOND_LAYER_COMMITS)) && + git commit-graph verify + ) +' + test_done From patchwork Mon Feb 1 06:58:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12058187 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F0F8C433E9 for ; Mon, 1 Feb 2021 07:00:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F5EC64E31 for ; Mon, 1 Feb 2021 07:00:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231576AbhBAHAs (ORCPT ); Mon, 1 Feb 2021 02:00:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230389AbhBAHAZ (ORCPT ); Mon, 1 Feb 2021 02:00:25 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05EECC061793 for ; Sun, 31 Jan 2021 22:59:02 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id c12so15327985wrc.7 for ; Sun, 31 Jan 2021 22:59:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=lO2RGMGn6+dE/uMeW4xdGVD2u0Be20h8cfPlJwPnzmA=; b=YveFPG1N5S6lBNp92br9LnzVByp+A+zUEADU8HYGRBPR1UVbhuSYKIhsu8LjlVRH8v MYcSZy07t3i18uem5TtMLigS0JrdReRQJoQcTLRUfyTnAZlzectTUZGBCUcwgo1VCjsV /kt6OlgmP+vI6gFvro1u3SBDHNSxJtvt+stE2dgVSzjvW77Kp2SPuH1KUFnQBhN5MYvb WSnCj5tTh3jGKSGPaLD0SEJktlTy9NLqX53joefJcb09nHPNYb0JTSb8CfBdsiun/LMk wq6nKkkQT3ICDhSzrEr8SvXZw1jyi/LMxHp18dREthCFqy0RDl3iDIJGbDZB/YExYQBu mmRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=lO2RGMGn6+dE/uMeW4xdGVD2u0Be20h8cfPlJwPnzmA=; b=gLQYys8+/Rj5wLc/ZYcK5DnAOqqUJTyzJ1ef0UP3oa6t9zDv5/XR9c97cl+wx2B9BC EndUphGQVqdh0LiHLA8BJzTmWSXbtyb3tsz2WSDQ03BCejdrIsunpFt/Y0xOlH4cs6e7 OP68+SOKducxZ+L85vut32yPSFckOn9fjS6on3fqeyAosG4FMoHEED5YjJ32TthXVwyy O/yYMSuFLhOVocWI0yonvM43S1fyikxxbvQlbR8fsaKc0NPRi9WyKof4hyYzvLnimj0g VNhak6qxn+6CpmgaAmsx+bOPdhatJ//2avtnpp4RcAv/pv1Ae7EBcjSdjOO+pUbqHJ/N cgZg== X-Gm-Message-State: AOAM533uF4Ysjjfnu0NhkCBY+0FdllNb6zHVWDsdwyk3HeKPo/59u7d2 OGbNIFYHtkewllNoU75Kjg6r/rjegfA= X-Google-Smtp-Source: ABdhPJwm0551lM4qCqDRVztypvD7JIhyjVN6tMc6T0n0gsr8OdtqFEaJdSobSU1DztJb8/12F41c8g== X-Received: by 2002:a5d:55c6:: with SMTP id i6mr17050910wrw.145.1612162740534; Sun, 31 Jan 2021 22:59:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d17sm15473445wma.2.2021.01.31.22.58.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Jan 2021 22:58:59 -0800 (PST) Message-Id: <523e2d4a902b22fb7e0234bf9c207f7b071b7187.1612162726.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 01 Feb 2021 06:58:45 +0000 Subject: [PATCH v7 11/11] commit-reach: use corrected commit dates in paint_down_to_common() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Taylor Blau , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar 091f4cf (commit: don't use generation numbers if not needed, 2018-08-30) changed paint_down_to_common() to use commit dates instead of generation numbers v1 (topological levels) as the performance regressed on certain topologies. With generation number v2 (corrected commit dates) implemented, we no longer have to rely on commit dates and can use generation numbers. For example, the command `git merge-base v4.8 v4.9` on the Linux repository walks 167468 commits, taking 0.135s for committer date and 167496 commits, taking 0.157s for corrected committer date respectively. While using corrected commit dates, Git walks nearly the same number of commits as commit date, the process is slower as for each comparision we have to access a commit-slab (for corrected committer date) instead of accessing struct member (for committer date). This change incidentally broke the fragile t6404-recursive-merge test. t6404-recursive-merge sets up a unique repository where all commits have the same committer date without a well-defined merge-base. While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer date as a heuristic in paint_down_to_common(). 6404.1 'combined merge conflicts' merges commits in the order: - Merge C with B to form an intermediate commit. - Merge the intermediate commit with A. With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently use the corrected committer date, which changes the order in which commits are merged: - Merge A with B to form an intermediate commit. - Merge the intermediate commit with C. While resulting repositories are equivalent, 6404.4 'virtual trees were processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting different merge-bases and thus have different object ids for the intermediate commits. As this has already causes problems (as noted in 859fdc0 (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within t6404-recursive-merge. Signed-off-by: Abhishek Kumar --- commit-graph.c | 14 ++++++++++++++ commit-graph.h | 6 ++++++ commit-reach.c | 2 +- t/t6404-recursive-merge.sh | 5 ++++- 4 files changed, 25 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 77fef5a240e..bf735fac4ea 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -714,6 +714,20 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +int corrected_commit_dates_enabled(struct repository *r) +{ + struct commit_graph *g; + if (!prepare_commit_graph(r)) + return 0; + + g = r->objects->commit_graph; + + if (!g->num_commits) + return 0; + + return g->read_generation_data; +} + struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) { struct commit_graph *g = r->objects->commit_graph; diff --git a/commit-graph.h b/commit-graph.h index ad52130883b..97f3497c279 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -95,6 +95,12 @@ struct commit_graph *parse_commit_graph(struct repository *r, */ int generation_numbers_enabled(struct repository *r); +/* + * Return 1 if and only if the repository has a commit-graph + * file and generation data chunk has been written for the file. + */ +int corrected_commit_dates_enabled(struct repository *r); + struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); enum commit_graph_write_flags { diff --git a/commit-reach.c b/commit-reach.c index 9b24b0378d5..e38771ca5a1 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -39,7 +39,7 @@ static struct commit_list *paint_down_to_common(struct repository *r, int i; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; - if (!min_generation) + if (!min_generation && !corrected_commit_dates_enabled(r)) queue.compare = compare_commits_by_commit_date; one->object.flags |= PARENT1; diff --git a/t/t6404-recursive-merge.sh b/t/t6404-recursive-merge.sh index c7ab7048f58..eaf48e941e2 100755 --- a/t/t6404-recursive-merge.sh +++ b/t/t6404-recursive-merge.sh @@ -18,6 +18,8 @@ GIT_COMMITTER_DATE="2006-12-12 23:28:00 +0100" export GIT_COMMITTER_DATE test_expect_success 'setup tests' ' + GIT_TEST_COMMIT_GRAPH=0 && + export GIT_TEST_COMMIT_GRAPH && echo 1 >a1 && git add a1 && GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 && @@ -69,7 +71,7 @@ test_expect_success 'setup tests' ' ' test_expect_success 'combined merge conflicts' ' - test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git merge -m final G + test_must_fail git merge -m final G ' test_expect_success 'result contains a conflict' ' @@ -85,6 +87,7 @@ test_expect_success 'result contains a conflict' ' ' test_expect_success 'virtual trees were processed' ' + # TODO: fragile test, relies on ambigious merge-base resolution git ls-files --stage >out && cat >expect <<-EOF &&