From patchwork Sat Jan 16 18:11:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94B99C433E6 for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56F912255F for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727923AbhAPSMX (ORCPT ); Sat, 16 Jan 2021 13:12:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727883AbhAPSME (ORCPT ); Sat, 16 Jan 2021 13:12:04 -0500 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 953FAC061574 for ; Sat, 16 Jan 2021 10:11:23 -0800 (PST) Received: by mail-wm1-x331.google.com with SMTP id m187so3819098wme.2 for ; Sat, 16 Jan 2021 10:11:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=LGssCpLC5Gno2NgmTlbRNzcGw8aNrcxnTttkKi2m33A=; b=dUBqVQs/B+9LXWFQ1/Z6H98Oh00gl5qphWw8KkDjYfprX27JkxLKL6gQRXAAPWzmMU cxXFxBwCMD2mBttQM6TJwxajh+Ed+Ed6k/Q/r9NSQ5lupi8F+NpisahXaxl/6HHZanZJ OqLin2i9kn13HBX/t8tdRQ65ugDa8SKSPAxfGyCD9SN8txygQLwL2dnk2m9bCQv1afqN m3bsheo13ite8MVxS7DOWb9nIKAyzMsb+p8fc0P/Efbt10SQvYz1w8CsQBb9X1LMgzoA FAG+EYJnUsF91dZMim0Vg7a25aZ67NahA7ZzjWAvQ6nIW2lVBUIBShfYZt251n9AnaKN fo8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=LGssCpLC5Gno2NgmTlbRNzcGw8aNrcxnTttkKi2m33A=; b=iS+m0Cw0OgLf9Q0rYRpwPuAYpBncupomr22rkB7nHIpzpz0sQQCy1VIf+MJQzp2Jie iFT38n3zqHTF3Rmh8Yb1kvQBRl8pGgEW/uGvRAQUVUEG6p1XzN84z6OyN2c3I5y6Ni7T W+iLGZHu0+x7aSTW4DJbp2ZEwKpbUfimxLQRfk+gjWqMV94doGUXc63oRrGUcnTCkFpX 1nGkcbt55PRDkdVeTwNyqn01FNNwxK1RYnSbqckcrsDESZb1LhUsupIB34m7f2KIjrZd JA6dXFKq7ev/xPH1jT7Xyok2sdH92Uf7AfnnO4r6QOHM4DNBc5jpLi+WBopS1whEWKyY oSVw== X-Gm-Message-State: AOAM530SQ4jUPu5jHkFV6YLHluoCVYp4cqTjQwHkyVpmRlKu9z5X7vc4 xPq+/iWA7lwobWspWNGGEF4WhBUGCHQ= X-Google-Smtp-Source: ABdhPJzUbijJtejkKX/GCNzknuVG+ywBb+mW05nr4ZsizlRqHnbAzLwFdAGZyIa2BGLZu5i33aogjw== X-Received: by 2002:a1c:8153:: with SMTP id c80mr11892196wmd.74.1610820682051; Sat, 16 Jan 2021 10:11:22 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z8sm2197705wmi.44.2021.01.16.10.11.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:21 -0800 (PST) Message-Id: <4d8eb415578e67ab91ebbeefa158425da226ebbd.1610820679.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:08 +0000 Subject: [PATCH v6 01/11] commit-graph: fix regression when computing Bloom filters Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Before computing Bloom filters, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since c49c82aa4c (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for a 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_gen_cmp()' falls back to comparing commits by their date when they have equal generation number, and so since c49c82aa4c is purely a date comparison function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. Applying this patch (compared with v2.30.0) speeds up computing Bloom filters by factors ranging from 0.40% to 5.19% on various repositories [1]. So, avoid the useless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. [1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/ Signed-off-by: Abhishek Kumar --- commit-graph.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index e9124d4a412..0267886e76c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -139,13 +139,17 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } +/* + * Should be used only while writing commit-graph as it compares + * generation value of commits by directly accessing commit-slab. + */ static int commit_gen_cmp(const void *va, const void *vb) { const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + uint32_t generation_a = commit_graph_data_at(a)->generation; + uint32_t generation_b = commit_graph_data_at(b)->generation; /* lower generation commits first */ if (generation_a < generation_b) return -1; From patchwork Sat Jan 16 18:11:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8FA3C43381 for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9AF132255F for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727901AbhAPSMX (ORCPT ); Sat, 16 Jan 2021 13:12:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727886AbhAPSMF (ORCPT ); Sat, 16 Jan 2021 13:12:05 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78BB4C061575 for ; Sat, 16 Jan 2021 10:11:24 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id l12so7259719wry.2 for ; Sat, 16 Jan 2021 10:11:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=xoRicWyhiUAE/fzJwdKaDkEj93Xq55eXglhkH5BfEFY=; b=SFx6sRTbvRmw8JWxL8UtWgHq1tECqIS/F096lkm0SDGldWoTTzRoCl0zuytWm6WYr2 Lw+934hUMlkOXG2jh1CWNp6mB72vROTY34UsPKy85SzusvRRce6Y7JKezaqsurfpix8t qFQNIUIgdmPD+4N22fWB8z73FA9l4wHwE39H/KPDVc9Kn3xpzkL7e3H2sSBHF2q1Y9C7 tDa0Cd/yC/ZabdPGzpE4uFt+2RHCsxonVo+lNnpQKc7Wp6QNUB67X+OV8Itce4RLPC3e h6x905K3t9MTJ9whdmEHvblAHUT+a+a6Wqmxu3uv2xDSigtN3LuIvN2T/1ngNwB6CzfK BE0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=xoRicWyhiUAE/fzJwdKaDkEj93Xq55eXglhkH5BfEFY=; b=nN0U/z5IVSELwWcwq7KSrHgaS8h+tixLvVlnBYZ6nGLzO/lTnyMfzZ9Tw1nkzjgu49 VFbufru4DS7wbRB/2BkSdeBZwsL6y5ctr1Rp7m/VHNyjYByoail2cVBXkGZQ7r6D8Ipo THY/2V9AlfUrRlk6STOZXJjyS6YulZUfSByqq9N+ajGFF7iEk+gSGeKut8pzbUpC5TJW i6LnDXqgdcAFsetlmtdR+Dm2lxpG0AIgNyy6LSenbyRygdn3i9F1pJwihMHTi3tfIA0X iHQfFeMmIgltuqGYBcvwmEHn5BDEe5nmNGcjApQLbLan9SgVlW+d6dNyhNNRQg1FXa0D ogPg== X-Gm-Message-State: AOAM530VSPigx6Ld4SNimeaaeMVKA0zTssB41/ejz3/ClO6YudG+itHn mJQonGYy1wMKak2+OMWj/4qPo0EYtvk= X-Google-Smtp-Source: ABdhPJyFP/Uu8z/GNcH1JPs3C8l5DY63AO8Rsa1GzgplmouYZMwPCqQG73wn/NVI0MlTGKY2WNOFwA== X-Received: by 2002:adf:e710:: with SMTP id c16mr18978449wrm.295.1610820683006; Sat, 16 Jan 2021 10:11:23 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v6sm6768461wrx.32.2021.01.16.10.11.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:22 -0800 (PST) Message-Id: <05dcb8628186d8a71b84cc3cd2ce4877abee039a.1610820679.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:09 +0000 Subject: [PATCH v6 02/11] revision: parse parent in indegree_walk_step() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In indegree_walk_step(), we add unvisited parents to the indegree queue. However, parents are not guaranteed to be parsed. As the indegree queue sorts by generation number, let's parse parents before inserting them to ensure the correct priority order. Signed-off-by: Abhishek Kumar --- revision.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/revision.c b/revision.c index 1bb590ece78..be2d828a4cc 100644 --- a/revision.c +++ b/revision.c @@ -3397,6 +3397,9 @@ static void indegree_walk_step(struct rev_info *revs) struct commit *parent = p->item; int *pi = indegree_slab_at(&info->indegree, parent); + if (repo_parse_commit_gently(revs->repo, parent, 1) < 0) + return; + if (*pi) (*pi)++; else From patchwork Sat Jan 16 18:11:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E580DC4332B for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7078229C4 for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727896AbhAPSMW (ORCPT ); Sat, 16 Jan 2021 13:12:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727888AbhAPSMG (ORCPT ); Sat, 16 Jan 2021 13:12:06 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70AD0C061757 for ; Sat, 16 Jan 2021 10:11:25 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id 6so5190801wri.3 for ; Sat, 16 Jan 2021 10:11:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ac+MwXeGUKbNK/PNg6KPk7bDf6C2PPLDQznxPd6T63g=; b=cBYSkV33rVdlgujmx6Jj4C4dKMduxtrnZzj2LGHzP60Yb9e8SsDbJlGSYiwGDmVfP3 H9+dknthytKgerYHqHwYYJeEYx3nADKdXlcgkL3GsQJTQG6vPdrqmAh4manQzFkKV0F4 7Mxvb2VRZGDQo7iLaM7R3qfqeNaw2awkc6oTL5Fb0LWgmYd3YGs32d5bKDlD6Bakhg1A 3uAlW/4tmWRodnvQplJOuG3xnMBLnL5GNRVsbGfdHvOhaFg1314Iq3xbNRisoOeRIz7e iwIBBjHgFg760spp1+tcPUs1r5/Nn7mL2WRwWV0fK2Itn4QMjpYc3i+hAetW2WgxLvZS mFXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ac+MwXeGUKbNK/PNg6KPk7bDf6C2PPLDQznxPd6T63g=; b=tLAI+P0Bx2KG2+DaMHzJS09+mjJ2tVbzSSs69GuLVj8es+oPlns2Ar4i0+9r7OrKAU 4AzcIGD9hVQ9bOy5QG2e+sn0o+a6IY9iGOfOZZx2cgpRjBj+I1Bg7aPx1w7v9Fmobxet pJY3KJaK83GQL5eKWkVBhsQWKKnn+Bz49iEH/Jt2Eh0q3Xwvj/merT9V7kR4IyWSKr3S /Yg12BGKk3NJZUaNRbFB2MnwpSA0wdcoQtgpCmNwNLtrp+mCzLs8kWG2tOFXrLfcanK4 aKNEcWuA4JMI7XKFyDJUncZ4K9sFbPrBB4YWxSMhbcXKgH3uWP57jDfHy+bR9fX2V7Z0 xa9w== X-Gm-Message-State: AOAM531pO8BUjJVSNZb+++bi2AxPW2NKdfh6K7e+VTXX3hAc8qt8DXOC zN0rRPd0SHgSRSUeVX+CM0abWsS9bnA= X-Google-Smtp-Source: ABdhPJypg/TsRHIsKiNgPo1/XNxbolQm88XQv1vSN2+xFvvDQ1/eLSiTKgAtRtP3C0m7m/shGy0qfg== X-Received: by 2002:a05:6000:cc:: with SMTP id q12mr19221182wrx.335.1610820683926; Sat, 16 Jan 2021 10:11:23 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d7sm8729805wmb.47.2021.01.16.10.11.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:23 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:10 +0000 Subject: [PATCH v6 03/11] commit-graph: consolidate fill_commit_graph_info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Both fill_commit_graph_info() and fill_commit_in_graph() parse information present in commit data chunk. Let's simplify the implementation by calling fill_commit_graph_info() within fill_commit_in_graph(). fill_commit_graph_info() used to not load committer data from commit data chunk. However, with the upcoming switch to using corrected committer date as generation number v2, we will have to load committer date to compute generation number value anyway. e51217e15 (t5000: test tar files that overflow ustar headers, 30-06-2016) introduced a test 'generate tar with future mtime' that creates a commit with committer date of (2^36 + 1) seconds since EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus committer time overflows into generation number (within CDAT chunk) and has undefined behavior. The test used to pass as fill_commit_graph_info() would not set struct member `date` of struct commit and load committer date from the object database, generating a tar file with the expected mtime. However, with corrected commit date, we will load the committer date from CDAT chunk (truncated to lower 34-bits to populate the generation number. Thus, Git sets date and generates tar file with the truncated mtime. The ustar format (the header format used by most modern tar programs) only has room for 11 (or 12, depending on some implementations) octal digits for the size and mtime of each file. As the CDAT chunk is overflow by 12-octal digits but not 11-octal digits, we split the existing tests to test both implementations separately and add a new explicit test for 11-digit implementation. To test the 11-octal digit implementation, we create a future commit with committer date of 2^34 - 1, which overflows 11-octal digits without overflowing 34-bits of the Commit Date chunks. To test the 12-octal digit implementation, the smallest committer date possible is 2^36 + 1, which overflows the CDAT chunk and thus commit-graph must be disabled for the test. Signed-off-by: Abhishek Kumar --- commit-graph.c | 27 ++++++++++----------------- t/t5000-tar-tree.sh | 24 +++++++++++++++++++++--- 2 files changed, 31 insertions(+), 20 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 0267886e76c..3d59b8b905d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -753,15 +753,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, const unsigned char *commit_data; struct commit_graph_data *graph_data; uint32_t lex_index; + uint64_t date_high, date_low; while (pos < g->num_commits_in_base) g = g->base_graph; + if (pos >= g->num_commits + g->num_commits_in_base) + die(_("invalid commit position. commit-graph is likely corrupt")); + lex_index = pos - g->num_commits_in_base; commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; graph_data = commit_graph_data_at(item); graph_data->graph_pos = pos; + + date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; + date_low = get_be32(commit_data + g->hash_len + 12); + item->date = (timestamp_t)((date_high << 32) | date_low); + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } @@ -776,38 +785,22 @@ static int fill_commit_in_graph(struct repository *r, { uint32_t edge_value; uint32_t *parent_data_ptr; - uint64_t date_low, date_high; struct commit_list **pptr; - struct commit_graph_data *graph_data; const unsigned char *commit_data; uint32_t lex_index; while (pos < g->num_commits_in_base) g = g->base_graph; - if (pos >= g->num_commits + g->num_commits_in_base) - die(_("invalid commit position. commit-graph is likely corrupt")); + fill_commit_graph_info(item, g, pos); - /* - * Store the "full" position, but then use the - * "local" position for the rest of the calculation. - */ - graph_data = commit_graph_data_at(item); - graph_data->graph_pos = pos; lex_index = pos - g->num_commits_in_base; - commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; item->object.parsed = 1; set_commit_tree(item, NULL); - date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; - date_low = get_be32(commit_data + g->hash_len + 12); - item->date = (timestamp_t)((date_high << 32) | date_low); - - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; - pptr = &item->parents; edge_value = get_be32(commit_data + g->hash_len); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 3ebb0d3b652..7204799a0b5 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -431,15 +431,33 @@ test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' ' test_cmp expect actual ' -test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' ' +test_expect_success TIME_IS_64BIT 'set up repository with far-future (2^34 - 1) commit' ' + rm -f .git/index && + echo foo >file && + git add file && + GIT_COMMITTER_DATE="@17179869183 +0000" \ + git commit -m "tempori parendum" +' + +test_expect_success TIME_IS_64BIT 'generate tar with far-future mtime' ' + git archive HEAD >future.tar +' + +test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' ' + echo 2514 >expect && + tar_info future.tar | cut -d" " -f2 >actual && + test_cmp expect actual +' + +test_expect_success TIME_IS_64BIT 'set up repository with far-far-future (2^36 + 1) commit' ' rm -f .git/index && echo content >file && git add file && - GIT_COMMITTER_DATE="@68719476737 +0000" \ + GIT_TEST_COMMIT_GRAPH=0 GIT_COMMITTER_DATE="@68719476737 +0000" \ git commit -m "tempori parendum" ' -test_expect_success TIME_IS_64BIT 'generate tar with future mtime' ' +test_expect_success TIME_IS_64BIT 'generate tar with far-far-future mtime' ' git archive HEAD >future.tar ' From patchwork Sat Jan 16 18:11:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A8C5C4332E for ; Sat, 16 Jan 2021 18:12:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE03A22D50 for ; Sat, 16 Jan 2021 18:12:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727146AbhAPSMW (ORCPT ); Sat, 16 Jan 2021 13:12:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727896AbhAPSMH (ORCPT ); Sat, 16 Jan 2021 13:12:07 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0406DC0613C1 for ; Sat, 16 Jan 2021 10:11:27 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id d13so12453668wrc.13 for ; Sat, 16 Jan 2021 10:11:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=cEe95guw9s9mzUDrRNnTP/221TBTHwdatVA+Owv7Nr0=; b=Rg2JtbAwPSaOukfj62OMw2SoQPSeXQZHePcvc7gFS+0XTKfO0sZiyKgWZ95uqL6hAX vFVjeGfcubnm/iPk94pWusqMavScLhls/XNvLgNVc0Pj5ek3X1Y45V3URtA+eF6/5Uoz 1iYw4RjCgALe7BijbjeMpvUcuWszJaON7uZ7y1GmhY3PbifoDcswF1muqiIkGVx5fsIn 0CYxN75ZdxhFx7JzK4Ep9OdsneQO0+9LfmZfD2lnAKdlpHSA+FlMWaDQVf5XPJ/uIfML /UHj4JdRQ0ocO9rAgKBtaXNJPbHoDZaBPUxKL2mXc92s2hO6xLTs7IfnoVd9Us55hH+3 OHrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=cEe95guw9s9mzUDrRNnTP/221TBTHwdatVA+Owv7Nr0=; b=f7pcGWAKwfrmWpIBvqtKNktUciYnCw2hqP3WipVmsbumbH0t4/0IFyKsNzU5j+dvtj +E9a/IXBg3tRJ5vII3bRzyDlmZk1E1bnrknqgbP2wBpab5LnQPVw9luKfHZTEQeOprge 7Z5nhiMKFYaeByb0w+D+eDaVw1AC18OZP1pSVlO1BL3ozA1qmie+Z/sIC6FM0+VKKqC9 EHxgTegdhTjuKkZLT8ouhiEOitfHogrw49LKak5R6+pU4ljXGXlCH96dy+aMyFTjRjVn rZvNTKyyZF1+IMstHNV8jnVD7tKZUSvc3ZO5iVDyA4Jl0aaNJvEshiuLSvZ/sb1Vu3SI EKxw== X-Gm-Message-State: AOAM532tOpOCYd9CT1jZ7QH3uGc0A5tkGf3S6NN2s1aQbwjPh7YPoLWl XgimOIXrP9s23nm3w6lB24S/hB/gLGc= X-Google-Smtp-Source: ABdhPJxhrCEeGoKUfsMcno0KCZJYB0o3rOMhYLNh87CCe/fX1z8n/f89JY1oUy5WY6LsCS8B76Cgcw== X-Received: by 2002:a5d:4dc6:: with SMTP id f6mr19089576wru.336.1610820685431; Sat, 16 Jan 2021 10:11:25 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r20sm22283099wrg.66.2021.01.16.10.11.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:24 -0800 (PST) Message-Id: <4fbdee7ac9070345ad20d226a260b55767e375d1.1610820679.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:11 +0000 Subject: [PATCH v6 04/11] t6600-test-reach: generalize *_three_modes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In a preparatory step to implement generation number v2, we add tests to ensure Git can read and parse commit-graph files without Generation Data chunk. These files represent commit-graph files written by Old Git and are neccesary for backward compatability. We extend run_three_modes() and test_three_modes() to *_all_modes() with the fourth mode being "commit-graph without generation data chunk". Signed-off-by: Abhishek Kumar --- t/t6600-test-reach.sh | 62 +++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index f807276337d..af10f0dc090 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -58,7 +58,7 @@ test_expect_success 'setup' ' git config core.commitGraph true ' -run_three_modes () { +run_all_modes () { test_when_finished rm -rf .git/objects/info/commit-graph && "$@" actual && test_cmp expect actual && @@ -70,8 +70,8 @@ run_three_modes () { test_cmp expect actual } -test_three_modes () { - run_three_modes test-tool reach "$@" +test_all_modes () { + run_all_modes test-tool reach "$@" } test_expect_success 'ref_newer:miss' ' @@ -80,7 +80,7 @@ test_expect_success 'ref_newer:miss' ' B:commit-4-9 EOF echo "ref_newer(A,B):0" >expect && - test_three_modes ref_newer + test_all_modes ref_newer ' test_expect_success 'ref_newer:hit' ' @@ -89,7 +89,7 @@ test_expect_success 'ref_newer:hit' ' B:commit-2-3 EOF echo "ref_newer(A,B):1" >expect && - test_three_modes ref_newer + test_all_modes ref_newer ' test_expect_success 'in_merge_bases:hit' ' @@ -98,7 +98,7 @@ test_expect_success 'in_merge_bases:hit' ' B:commit-8-8 EOF echo "in_merge_bases(A,B):1" >expect && - test_three_modes in_merge_bases + test_all_modes in_merge_bases ' test_expect_success 'in_merge_bases:miss' ' @@ -107,7 +107,7 @@ test_expect_success 'in_merge_bases:miss' ' B:commit-5-9 EOF echo "in_merge_bases(A,B):0" >expect && - test_three_modes in_merge_bases + test_all_modes in_merge_bases ' test_expect_success 'in_merge_bases_many:hit' ' @@ -117,7 +117,7 @@ test_expect_success 'in_merge_bases_many:hit' ' X:commit-5-7 EOF echo "in_merge_bases_many(A,X):1" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'in_merge_bases_many:miss' ' @@ -127,7 +127,7 @@ test_expect_success 'in_merge_bases_many:miss' ' X:commit-8-6 EOF echo "in_merge_bases_many(A,X):0" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'in_merge_bases_many:miss-heuristic' ' @@ -137,7 +137,7 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' ' X:commit-6-6 EOF echo "in_merge_bases_many(A,X):0" >expect && - test_three_modes in_merge_bases_many + test_all_modes in_merge_bases_many ' test_expect_success 'is_descendant_of:hit' ' @@ -148,7 +148,7 @@ test_expect_success 'is_descendant_of:hit' ' X:commit-1-1 EOF echo "is_descendant_of(A,X):1" >expect && - test_three_modes is_descendant_of + test_all_modes is_descendant_of ' test_expect_success 'is_descendant_of:miss' ' @@ -159,7 +159,7 @@ test_expect_success 'is_descendant_of:miss' ' X:commit-7-6 EOF echo "is_descendant_of(A,X):0" >expect && - test_three_modes is_descendant_of + test_all_modes is_descendant_of ' test_expect_success 'get_merge_bases_many' ' @@ -174,7 +174,7 @@ test_expect_success 'get_merge_bases_many' ' git rev-parse commit-5-6 \ commit-4-7 | sort } >expect && - test_three_modes get_merge_bases_many + test_all_modes get_merge_bases_many ' test_expect_success 'reduce_heads' ' @@ -196,7 +196,7 @@ test_expect_success 'reduce_heads' ' commit-2-8 \ commit-1-10 | sort } >expect && - test_three_modes reduce_heads + test_all_modes reduce_heads ' test_expect_success 'can_all_from_reach:hit' ' @@ -219,7 +219,7 @@ test_expect_success 'can_all_from_reach:hit' ' Y:commit-8-1 EOF echo "can_all_from_reach(X,Y):1" >expect && - test_three_modes can_all_from_reach + test_all_modes can_all_from_reach ' test_expect_success 'can_all_from_reach:miss' ' @@ -241,7 +241,7 @@ test_expect_success 'can_all_from_reach:miss' ' Y:commit-8-5 EOF echo "can_all_from_reach(X,Y):0" >expect && - test_three_modes can_all_from_reach + test_all_modes can_all_from_reach ' test_expect_success 'can_all_from_reach_with_flag: tags case' ' @@ -264,7 +264,7 @@ test_expect_success 'can_all_from_reach_with_flag: tags case' ' Y:commit-8-1 EOF echo "can_all_from_reach_with_flag(X,_,_,0,0):1" >expect && - test_three_modes can_all_from_reach_with_flag + test_all_modes can_all_from_reach_with_flag ' test_expect_success 'commit_contains:hit' ' @@ -280,8 +280,8 @@ test_expect_success 'commit_contains:hit' ' X:commit-9-3 EOF echo "commit_contains(_,A,X,_):1" >expect && - test_three_modes commit_contains && - test_three_modes commit_contains --tag + test_all_modes commit_contains && + test_all_modes commit_contains --tag ' test_expect_success 'commit_contains:miss' ' @@ -297,8 +297,8 @@ test_expect_success 'commit_contains:miss' ' X:commit-9-3 EOF echo "commit_contains(_,A,X,_):0" >expect && - test_three_modes commit_contains && - test_three_modes commit_contains --tag + test_all_modes commit_contains && + test_all_modes commit_contains --tag ' test_expect_success 'rev-list: basic topo-order' ' @@ -310,7 +310,7 @@ test_expect_success 'rev-list: basic topo-order' ' commit-6-2 commit-5-2 commit-4-2 commit-3-2 commit-2-2 commit-1-2 \ commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \ >expect && - run_three_modes git rev-list --topo-order commit-6-6 + run_all_modes git rev-list --topo-order commit-6-6 ' test_expect_success 'rev-list: first-parent topo-order' ' @@ -322,7 +322,7 @@ test_expect_success 'rev-list: first-parent topo-order' ' commit-6-2 \ commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 commit-1-1 \ >expect && - run_three_modes git rev-list --first-parent --topo-order commit-6-6 + run_all_modes git rev-list --first-parent --topo-order commit-6-6 ' test_expect_success 'rev-list: range topo-order' ' @@ -334,7 +334,7 @@ test_expect_success 'rev-list: range topo-order' ' commit-6-2 commit-5-2 commit-4-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --topo-order commit-3-3..commit-6-6 + run_all_modes git rev-list --topo-order commit-3-3..commit-6-6 ' test_expect_success 'rev-list: range topo-order' ' @@ -346,7 +346,7 @@ test_expect_success 'rev-list: range topo-order' ' commit-6-2 commit-5-2 commit-4-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --topo-order commit-3-8..commit-6-6 + run_all_modes git rev-list --topo-order commit-3-8..commit-6-6 ' test_expect_success 'rev-list: first-parent range topo-order' ' @@ -358,7 +358,7 @@ test_expect_success 'rev-list: first-parent range topo-order' ' commit-6-2 \ commit-6-1 commit-5-1 commit-4-1 \ >expect && - run_three_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6 + run_all_modes git rev-list --first-parent --topo-order commit-3-8..commit-6-6 ' test_expect_success 'rev-list: ancestry-path topo-order' ' @@ -368,7 +368,7 @@ test_expect_success 'rev-list: ancestry-path topo-order' ' commit-6-4 commit-5-4 commit-4-4 commit-3-4 \ commit-6-3 commit-5-3 commit-4-3 \ >expect && - run_three_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6 + run_all_modes git rev-list --topo-order --ancestry-path commit-3-3..commit-6-6 ' test_expect_success 'rev-list: symmetric difference topo-order' ' @@ -382,7 +382,7 @@ test_expect_success 'rev-list: symmetric difference topo-order' ' commit-3-8 commit-2-8 commit-1-8 \ commit-3-7 commit-2-7 commit-1-7 \ >expect && - run_three_modes git rev-list --topo-order commit-3-8...commit-6-6 + run_all_modes git rev-list --topo-order commit-3-8...commit-6-6 ' test_expect_success 'get_reachable_subset:all' ' @@ -402,7 +402,7 @@ test_expect_success 'get_reachable_subset:all' ' commit-1-7 \ commit-5-6 | sort ) >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_expect_success 'get_reachable_subset:some' ' @@ -420,7 +420,7 @@ test_expect_success 'get_reachable_subset:some' ' git rev-parse commit-3-3 \ commit-1-7 | sort ) >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_expect_success 'get_reachable_subset:none' ' @@ -434,7 +434,7 @@ test_expect_success 'get_reachable_subset:none' ' Y:commit-2-8 EOF echo "get_reachable_subset(X,Y)" >expect && - test_three_modes get_reachable_subset + test_all_modes get_reachable_subset ' test_done From patchwork Sat Jan 16 18:11:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024985 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A24C5C4332D for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 882AF2255F for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727960AbhAPSMp (ORCPT ); Sat, 16 Jan 2021 13:12:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727926AbhAPSMo (ORCPT ); Sat, 16 Jan 2021 13:12:44 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12343C0613CF for ; Sat, 16 Jan 2021 10:11:28 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id k10so10074891wmi.3 for ; Sat, 16 Jan 2021 10:11:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1c1aCEdyHy2ZwQKD9MQBGvB9fCTDPABdf9O5BY4o+yg=; b=HXgCZKoh6F7OuEXFMs5znjpqy4x6kNcziFPdYdfMrywB0jrtDYruYk1Uxv8ThDPCAM NKz8PTvrG1vqMqnaPFgRNx2fyqiBaqgyJSEZ1agxHHliMYoFblrMo7XJ0Ib3X6jBNsmy ZGQwyYwk+K2DnRk0qKPHftkQHUxJjG11Dta5TWcd9G4lShInEF2SmFaFHPW20SuPKpuz UJh72Xq38CkU9opYmmPdd3w7eR9Ja+N39An/Jj/IqnMKTffX/ggi9sKa1zaVx39Uv4LJ Ui23qwoifwEQHx+4Sdz1XS99CYYR8B1duUdbqt3n5nygGhfGCkuZOtv635BLhEFzxdCM 1WbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1c1aCEdyHy2ZwQKD9MQBGvB9fCTDPABdf9O5BY4o+yg=; b=pL7XhkyW5w4Kq37OAMtAyaObrGmLBSLDQUJBUdGxs61H3tgDK2rOn+A+pnUaz2eB/v 3z2/MfKggHgcUhv0FAzT+9p4eYYLXcZCExZr8rovjfJA/qG+sN0o78QuenU2USkLWp4L kj//lJn7UnJtA3sHXLtn76D3GJWi2Nk5Fz0QoOFAwijPE7jYRx/gCnHRDUb1JuP0Z/c8 YT83NWjY/PNCYDlR4GZc7T5gPVdTjX+2Cvz/+XGgkwIOiCCX9vwUp5xSDhaAWqxWkGsz zo49GTh+ZkF/KUkVaGHyV8L1GJ3xHgBqPwEgB1fPhhs5e1qe5bkRO2uAU9TBzVrvwP8g qh5w== X-Gm-Message-State: AOAM533TkmL3S6mvtGIOOdIQjXZ5VAeB0TdCjExWfSD7p71ulIpfARNY 6a6INwz67An2uJRCii3dvGrkbHPHoto= X-Google-Smtp-Source: ABdhPJzm2k8ggmFgFVu9e8eMAxWqlzT4TPDqhZ9lGnEtC2bhcmgB53syNghCjvSY6XxvQ2Tpj1xqZw== X-Received: by 2002:a1c:24c4:: with SMTP id k187mr14324452wmk.14.1610820686480; Sat, 16 Jan 2021 10:11:26 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u205sm16543503wme.42.2021.01.16.10.11.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:25 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:12 +0000 Subject: [PATCH v6 05/11] commit-graph: add a slab to store topological levels Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In a later commit we will introduce corrected commit date as the generation number v2. Corrected commit dates will be stored in the new seperate Generation Data chunk. However, to ensure backwards compatibility with "Old" Git we need to continue to write generation number v1 (topological levels) to the commit data chunk. Thus, we need to compute and store both versions of generation numbers to write the commit-graph file. Therefore, let's introduce a commit-slab `topo_level_slab` to store topological levels; corrected commit date will be stored in the member `generation` of struct commit_graph_data. The macros `GENERATION_NUMBER_INFINITY` and `GENERATION_NUMBER_ZERO` mark commits not in the commit-graph file and commits written by a version of Git that did not compute generation numbers respectively. Generation numbers are computed identically for both kinds of commits. A "slab-miss" should return `GENERATION_NUMBER_INFINITY` as the commit is not in the commit-graph file. However, since the slab is zero-initialized, it returns 0 (or rather `GENERATION_NUMBER_ZERO`). Thus, we no longer need to check if the topological level of a commit is `GENERATION_NUMBER_INFINITY`. We will add a pointer to the slab in `struct write_commit_graph_context` and `struct commit_graph` to populate the slab in `fill_commit_graph_info` if the commit has a pre-computed topological level as in case of split commit-graphs. Signed-off-by: Abhishek Kumar --- commit-graph.c | 45 ++++++++++++++++++++++++++++++--------------- commit-graph.h | 1 + 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 3d59b8b905d..3b69c3cc329 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -64,6 +64,8 @@ void git_test_write_commit_graph_or_die(void) /* Remember to update object flag allocation in object.h */ #define REACHABLE (1u<<15) +define_commit_slab(topo_level_slab, uint32_t); + /* Keep track of the order in which commits are added to our list. */ define_commit_slab(commit_pos, int); static struct commit_pos commit_pos = COMMIT_SLAB_INIT(1, commit_pos); @@ -772,6 +774,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, item->date = (timestamp_t)((date_high << 32) | date_low); graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + + if (g->topo_levels) + *topo_level_slab_at(g->topo_levels, item) = get_be32(commit_data + g->hash_len + 8) >> 2; } static inline void set_commit_tree(struct commit *c, struct tree *t) @@ -960,6 +965,7 @@ struct write_commit_graph_context { changed_paths:1, order_by_pack:1; + struct topo_level_slab *topo_levels; const struct commit_graph_opts *opts; size_t total_bloom_filter_data_size; const struct bloom_filter_settings *bloom_settings; @@ -1106,7 +1112,7 @@ static int write_graph_chunk_data(struct hashfile *f, else packedDate[0] = 0; - packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2); + packedDate[0] |= htonl(*topo_level_slab_at(ctx->topo_levels, *list) << 2); packedDate[1] = htonl((*list)->date); hashwrite(f, packedDate, 8); @@ -1336,11 +1342,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) _("Computing commit graph generation numbers"), ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { - uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation; + uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); display_progress(ctx->progress, i + 1); - if (generation != GENERATION_NUMBER_INFINITY && - generation != GENERATION_NUMBER_ZERO) + if (level != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1348,29 +1353,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) struct commit *current = list->item; struct commit_list *parent; int all_parents_computed = 1; - uint32_t max_generation = 0; + uint32_t max_level = 0; for (parent = current->parents; parent; parent = parent->next) { - generation = commit_graph_data_at(parent->item)->generation; + level = *topo_level_slab_at(ctx->topo_levels, parent->item); - if (generation == GENERATION_NUMBER_INFINITY || - generation == GENERATION_NUMBER_ZERO) { + if (level == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (generation > max_generation) { - max_generation = generation; + } else if (level > max_level) { + max_level = level; } } if (all_parents_computed) { - struct commit_graph_data *data = commit_graph_data_at(current); - - data->generation = max_generation + 1; pop_commit(&list); - if (data->generation > GENERATION_NUMBER_MAX) - data->generation = GENERATION_NUMBER_MAX; + if (max_level > GENERATION_NUMBER_MAX - 1) + max_level = GENERATION_NUMBER_MAX - 1; + *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; } } } @@ -2106,6 +2108,7 @@ int write_commit_graph(struct object_directory *odb, int res = 0; int replace = 0; struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; + struct topo_level_slab topo_levels; prepare_repo_settings(the_repository); if (!the_repository->settings.core_commit_graph) { @@ -2132,6 +2135,18 @@ int write_commit_graph(struct object_directory *odb, bloom_settings.max_changed_paths); ctx->bloom_settings = &bloom_settings; + init_topo_level_slab(&topo_levels); + ctx->topo_levels = &topo_levels; + + if (ctx->r->objects->commit_graph) { + struct commit_graph *g = ctx->r->objects->commit_graph; + + while (g) { + g->topo_levels = &topo_levels; + g = g->base_graph; + } + } + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) ctx->changed_paths = 1; if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { diff --git a/commit-graph.h b/commit-graph.h index f8e92500c6e..00f00745b79 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -73,6 +73,7 @@ struct commit_graph { const unsigned char *chunk_bloom_indexes; const unsigned char *chunk_bloom_data; + struct topo_level_slab *topo_levels; struct bloom_filter_settings *bloom_filter_settings; }; From patchwork Sat Jan 16 18:11:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 899A8C43381 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 70824229C4 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727991AbhAPSMp (ORCPT ); Sat, 16 Jan 2021 13:12:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727925AbhAPSMo (ORCPT ); Sat, 16 Jan 2021 13:12:44 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CA34C0613D3 for ; Sat, 16 Jan 2021 10:11:29 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id 91so12495343wrj.7 for ; Sat, 16 Jan 2021 10:11:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=r03jRWjPyzS3gIVYPcMiRxJZCphuIcWbP4MOTnE+ocI=; b=ZnqA85tBrD3/kbdArxHUa98T9BvjOB8tBB1iRDZniBAExS1ddHu6We9r/fZI5OGMYj /GyALPDEHjeivWuh/h/Z+KDYHu0ukVPRKYy6AtX5OKflqZ3vqd1/bfJyWzuLKZfd2pkj 8hxxJ109/omhMxC8PlBjBFU3iAmBj6l2gNm3buBD11NgaZgkcNhS8N8WhEe6g9WP5AUz NLmKYbXq1YBXlvSoo/RiVlMfOCESta7dV/GLz3TYbSwIqil/PQ1AUilgo7GJn2qofwgG eAf/u7cqKfITeCuforTCjG7CUazqqYSKrT9KoajimsVARUSVvkQHy3TE2Hsm0vd2Nflu +2qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=r03jRWjPyzS3gIVYPcMiRxJZCphuIcWbP4MOTnE+ocI=; b=gqqDlNCElkVB+KXUh4sm9SO+gJN3WAIhqVwHA7sLvZ2Fu41M1+Lw49OwBOhEqgBujv AHpdNv0RPRlGtIv/5cS3xMXS4QuHKTGkZ20ua01N9RdynMxSl6RwXRJZX2L9Epuf2vkk shId8LtRT/R4gRYKH80ZOCSTzpIodmdl3f0+fXJDwIJDViOhMpPhudKwlkPW7lCi0mnv /NJOdlx13ujS1g8m6oVg6GXBg0+kCsArVWLHwhInOmXYpdRwFdUa7D4dEjrplW9bIByQ O73UoJRLk+FXde61L+mpyzHRerTQzvU+HAbeM/Ei2qkS4VZy2Fy4ZuyJBlIUmvWaS25J VwyA== X-Gm-Message-State: AOAM530Iho/F/Blq7HwCH3B56hTGrpwArE4l74uWslk9gK3Nuz1QKI07 Koe0fdchWyoZQh+66mR3est+jMQX4dA= X-Google-Smtp-Source: ABdhPJyQXitw6gINUX3pKWGs5YR83HGW+RqPuvsl7SsuzD9puNwH5AcFP9/P4SgEKIwpo3ZFvHPnQg== X-Received: by 2002:adf:b1db:: with SMTP id r27mr9027154wra.125.1610820687801; Sat, 16 Jan 2021 10:11:27 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h184sm16961410wmh.23.2021.01.16.10.11.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:27 -0800 (PST) Message-Id: <855ff662a445154d71cdecde5b4ee3fec510c700.1610820679.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:13 +0000 Subject: [PATCH v6 06/11] commit-graph: return 64-bit generation number Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar In a preparatory step for introducing corrected commit dates, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar --- commit-graph.c | 22 +++++++++++----------- commit-graph.h | 4 ++-- commit-reach.c | 36 ++++++++++++++++++------------------ commit-reach.h | 2 +- commit.c | 4 ++-- commit.h | 4 ++-- revision.c | 10 +++++----- upload-pack.c | 2 +- 8 files changed, 42 insertions(+), 42 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 3b69c3cc329..6d42e30cd9a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -101,7 +101,7 @@ uint32_t commit_graph_position(const struct commit *c) return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH; } -uint32_t commit_graph_generation(const struct commit *c) +timestamp_t commit_graph_generation(const struct commit *c) { struct commit_graph_data *data = commit_graph_data_slab_peek(&commit_graph_data_slab, c); @@ -150,8 +150,8 @@ static int commit_gen_cmp(const void *va, const void *vb) const struct commit *a = *(const struct commit **)va; const struct commit *b = *(const struct commit **)vb; - uint32_t generation_a = commit_graph_data_at(a)->generation; - uint32_t generation_b = commit_graph_data_at(b)->generation; + const timestamp_t generation_a = commit_graph_data_at(a)->generation; + const timestamp_t generation_b = commit_graph_data_at(b)->generation; /* lower generation commits first */ if (generation_a < generation_b) return -1; @@ -1370,8 +1370,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (all_parents_computed) { pop_commit(&list); - if (max_level > GENERATION_NUMBER_MAX - 1) - max_level = GENERATION_NUMBER_MAX - 1; + if (max_level > GENERATION_NUMBER_V1_MAX - 1) + max_level = GENERATION_NUMBER_V1_MAX - 1; *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; } } @@ -2367,8 +2367,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit, *odb_commit; struct commit_list *graph_parents, *odb_parents; - uint32_t max_generation = 0; - uint32_t generation; + timestamp_t max_generation = 0; + timestamp_t generation; display_progress(progress, i + 1); hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); @@ -2432,16 +2432,16 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) continue; /* - * If one of our parents has generation GENERATION_NUMBER_MAX, then - * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid + * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then + * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid * extra logic in the following condition. */ - if (max_generation == GENERATION_NUMBER_MAX) + if (max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); if (generation != max_generation + 1) - graph_report(_("commit-graph generation for commit %s is %u != %u"), + graph_report(_("commit-graph generation for commit %s is %"PRItime" != %"PRItime), oid_to_hex(&cur_oid), generation, max_generation + 1); diff --git a/commit-graph.h b/commit-graph.h index 00f00745b79..2e9aa7824ee 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -145,12 +145,12 @@ void disable_commit_graph(struct repository *r); struct commit_graph_data { uint32_t graph_pos; - uint32_t generation; + timestamp_t generation; }; /* * Commits should be parsed before accessing generation, graph positions. */ -uint32_t commit_graph_generation(const struct commit *); +timestamp_t commit_graph_generation(const struct commit *); uint32_t commit_graph_position(const struct commit *); #endif diff --git a/commit-reach.c b/commit-reach.c index 50175b159e7..9b24b0378d5 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -32,12 +32,12 @@ static int queue_has_nonstale(struct prio_queue *queue) static struct commit_list *paint_down_to_common(struct repository *r, struct commit *one, int n, struct commit **twos, - int min_generation) + timestamp_t min_generation) { struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; struct commit_list *result = NULL; int i; - uint32_t last_gen = GENERATION_NUMBER_INFINITY; + timestamp_t last_gen = GENERATION_NUMBER_INFINITY; if (!min_generation) queue.compare = compare_commits_by_commit_date; @@ -58,10 +58,10 @@ static struct commit_list *paint_down_to_common(struct repository *r, struct commit *commit = prio_queue_get(&queue); struct commit_list *parents; int flags; - uint32_t generation = commit_graph_generation(commit); + timestamp_t generation = commit_graph_generation(commit); if (min_generation && generation > last_gen) - BUG("bad generation skip %8x > %8x at %s", + BUG("bad generation skip %"PRItime" > %"PRItime" at %s", generation, last_gen, oid_to_hex(&commit->object.oid)); last_gen = generation; @@ -177,12 +177,12 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; - uint32_t min_generation = commit_graph_generation(array[i]); + timestamp_t min_generation = commit_graph_generation(array[i]); if (redundant[i]) continue; for (j = filled = 0; j < cnt; j++) { - uint32_t curr_generation; + timestamp_t curr_generation; if (i == j || redundant[j]) continue; filled_index[filled] = j; @@ -321,7 +321,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, { struct commit_list *bases; int ret = 0, i; - uint32_t generation, max_generation = GENERATION_NUMBER_ZERO; + timestamp_t generation, max_generation = GENERATION_NUMBER_ZERO; if (repo_parse_commit(r, commit)) return ret; @@ -470,7 +470,7 @@ static int in_commit_list(const struct commit_list *want, struct commit *c) static enum contains_result contains_test(struct commit *candidate, const struct commit_list *want, struct contains_cache *cache, - uint32_t cutoff) + timestamp_t cutoff) { enum contains_result *cached = contains_cache_at(cache, candidate); @@ -506,11 +506,11 @@ static enum contains_result contains_tag_algo(struct commit *candidate, { struct contains_stack contains_stack = { 0, 0, NULL }; enum contains_result result; - uint32_t cutoff = GENERATION_NUMBER_INFINITY; + timestamp_t cutoff = GENERATION_NUMBER_INFINITY; const struct commit_list *p; for (p = want; p; p = p->next) { - uint32_t generation; + timestamp_t generation; struct commit *c = p->item; load_commit_graph_info(the_repository, c); generation = commit_graph_generation(c); @@ -566,8 +566,8 @@ static int compare_commits_by_gen(const void *_a, const void *_b) const struct commit *a = *(const struct commit * const *)_a; const struct commit *b = *(const struct commit * const *)_b; - uint32_t generation_a = commit_graph_generation(a); - uint32_t generation_b = commit_graph_generation(b); + timestamp_t generation_a = commit_graph_generation(a); + timestamp_t generation_b = commit_graph_generation(b); if (generation_a < generation_b) return -1; @@ -580,7 +580,7 @@ int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, time_t min_commit_date, - uint32_t min_generation) + timestamp_t min_generation) { struct commit **list = NULL; int i; @@ -681,13 +681,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0; struct commit_list *from_iter = from, *to_iter = to; int result; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; while (from_iter) { add_object_array(&from_iter->item->object, NULL, &from_objs); if (!parse_commit(from_iter->item)) { - uint32_t generation; + timestamp_t generation; if (from_iter->item->date < min_commit_date) min_commit_date = from_iter->item->date; @@ -701,7 +701,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, while (to_iter) { if (!parse_commit(to_iter->item)) { - uint32_t generation; + timestamp_t generation; if (to_iter->item->date < min_commit_date) min_commit_date = to_iter->item->date; @@ -741,13 +741,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit_list *found_commits = NULL; struct commit **to_last = to + nr_to; struct commit **from_last = from + nr_from; - uint32_t min_generation = GENERATION_NUMBER_INFINITY; + timestamp_t min_generation = GENERATION_NUMBER_INFINITY; int num_to_find = 0; struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; for (item = to; item < to_last; item++) { - uint32_t generation; + timestamp_t generation; struct commit *c = *item; parse_commit(c); diff --git a/commit-reach.h b/commit-reach.h index b49ad71a317..148b56fea50 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -87,7 +87,7 @@ int can_all_from_reach_with_flag(struct object_array *from, unsigned int with_flag, unsigned int assign_flag, time_t min_commit_date, - uint32_t min_generation); + timestamp_t min_generation); int can_all_from_reach(struct commit_list *from, struct commit_list *to, int commit_date_cutoff); diff --git a/commit.c b/commit.c index bab8d5ab07c..4c717329ee0 100644 --- a/commit.c +++ b/commit.c @@ -753,8 +753,8 @@ int compare_commits_by_author_date(const void *a_, const void *b_, int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused) { const struct commit *a = a_, *b = b_; - const uint32_t generation_a = commit_graph_generation(a), - generation_b = commit_graph_generation(b); + const timestamp_t generation_a = commit_graph_generation(a), + generation_b = commit_graph_generation(b); /* newer commits first */ if (generation_a < generation_b) diff --git a/commit.h b/commit.h index f4e7b0158e2..742d96c41e8 100644 --- a/commit.h +++ b/commit.h @@ -11,8 +11,8 @@ #include "commit-slab.h" #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF -#define GENERATION_NUMBER_MAX 0x3FFFFFFF +#define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1) +#define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 struct commit_list { diff --git a/revision.c b/revision.c index be2d828a4cc..31fd3219e65 100644 --- a/revision.c +++ b/revision.c @@ -3300,7 +3300,7 @@ define_commit_slab(indegree_slab, int); define_commit_slab(author_date_slab, timestamp_t); struct topo_walk_info { - uint32_t min_generation; + timestamp_t min_generation; struct prio_queue explore_queue; struct prio_queue indegree_queue; struct prio_queue topo_queue; @@ -3368,7 +3368,7 @@ static void explore_walk_step(struct rev_info *revs) } static void explore_to_depth(struct rev_info *revs, - uint32_t gen_cutoff) + timestamp_t gen_cutoff) { struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; @@ -3413,7 +3413,7 @@ static void indegree_walk_step(struct rev_info *revs) } static void compute_indegrees_to_depth(struct rev_info *revs, - uint32_t gen_cutoff) + timestamp_t gen_cutoff) { struct topo_walk_info *info = revs->topo_walk_info; struct commit *c; @@ -3471,7 +3471,7 @@ static void init_topo_walk(struct rev_info *revs) info->min_generation = GENERATION_NUMBER_INFINITY; for (list = revs->commits; list; list = list->next) { struct commit *c = list->item; - uint32_t generation; + timestamp_t generation; if (repo_parse_commit_gently(revs->repo, c, 1)) continue; @@ -3539,7 +3539,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) for (p = commit->parents; p; p = p->next) { struct commit *parent = p->item; int *pi; - uint32_t generation; + timestamp_t generation; if (parent->object.flags & UNINTERESTING) continue; diff --git a/upload-pack.c b/upload-pack.c index 3b66bf92ba8..b87607e0dd4 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -500,7 +500,7 @@ static int got_oid(struct upload_pack_data *data, static int ok_to_give_up(struct upload_pack_data *data) { - uint32_t min_generation = GENERATION_NUMBER_ZERO; + timestamp_t min_generation = GENERATION_NUMBER_ZERO; if (!data->have_obj.nr) return 0; From patchwork Sat Jan 16 18:11:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5241FC433E0 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 22FB6229C4 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728010AbhAPSMr (ORCPT ); Sat, 16 Jan 2021 13:12:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727967AbhAPSMp (ORCPT ); Sat, 16 Jan 2021 13:12:45 -0500 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6ACFC0613D6 for ; Sat, 16 Jan 2021 10:11:30 -0800 (PST) Received: by mail-wm1-x335.google.com with SMTP id v184so6139511wma.1 for ; Sat, 16 Jan 2021 10:11:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1q6JEHI6MMjayciIxC6QRwexPgqQg5dji/4US0XAiSY=; b=oFH0rz3LvkpEFW6Jvx9zqEcpX0K6wALWo7BoXSDixFfY1M++1Wc0U2Y+hFj+w0WgzB BiunsJXNIucvBKNRLHlB6ZmikQkoAuBFleT+lPC+uafLrAlcYWfPacDAgu1nSjT/j8N8 UJRMEljQXs+T6Quf7BG7UlsjQ/y3gwaJR328q+3KBWQ5PqXfvtO/c+YYSC+s5LsjIyF0 QDkD1a/VTfQfpLG4kOiY21fDkAJU40PVfFxGnMKfAvmVf0fsDOGtou0jleQHU61F1DXP BMrEBu4+J7BqpZ0NXHImHxw9mjiSWUkl80N1JHF18tL+TrTvG3pzlw5I3uB/9eJU7R4S uNIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1q6JEHI6MMjayciIxC6QRwexPgqQg5dji/4US0XAiSY=; b=bEpkuaeIRHd4WzMlj/ZEqJG2jNl6rVZA3TrLZwVo74imqjQhvxI8QY6cn3Q67EC8kS UD4oXOlA84I1xk161rOwzCdw0qUpfiTUwnJ0QP1clfAGuKKo8tDi0WqRgvHGDqYomWcg FZpHW6+wg8RIgNUvM15P40qvFjIGkcC2SbXxBrRhpOVU2Of2HjiYcScNHuascWkZDZTx qbCydeFx199JoWVCktQf2cMzDmPi62t+M7GnmjuUIehYrFuPfZHfhkb+3uitW/++blBO TBU9DSnxEgSMjY1uW2GotNQEbdLZzYSm4EeySrPwPiKvjB4OSsRXW+M3OSD0YtUUiNpt uQdQ== X-Gm-Message-State: AOAM531md7gZ/JKGrYIv9AE2Svk3E5yG6bzPLO4gQOTRgtsnhWD/Y0SR gamZ1oLmv/EATMz6SrnTyMk2nrr8UtU= X-Google-Smtp-Source: ABdhPJwObuljrYTC+xJvBMn9jrlrZKzFV78ZyvrSGS054Uc/+hd2g9kas0Poy0QUGs1LX57nnka+rw== X-Received: by 2002:a1c:68d5:: with SMTP id d204mr13931978wmc.178.1610820689118; Sat, 16 Jan 2021 10:11:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j2sm19936467wrh.78.2021.01.16.10.11.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:28 -0800 (PST) Message-Id: <8fbe74864059ddde1c4107c0986b99518183de40.1610820679.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:14 +0000 Subject: [PATCH v6 07/11] commit-graph: implement corrected commit date Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar With most of preparations done, let's implement corrected commit date. The corrected commit date for a commit is defined as: * A commit with no parents (a root commit) has corrected commit date equal to its committer date. * A commit with at least one parent has corrected commit date equal to the maximum of its commit date and one more than the largest corrected commit date among its parents. As a special case, a root commit with timestamp of zero (01.01.1970 00:00:00Z) has corrected commit date of one, to be able to distinguish from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit date). To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file. The corrected commit date offset for a commit is defined as the difference between its corrected commit date and actual commit date. Storing corrected commit date requires sizeof(timestamp_t) bytes, which in most cases is 64 bits (uintmax_t). However, corrected commit date offsets can be safely stored using only 32-bits. This halves the size of GDAT chunk, which is a reduction of around 6% in the size of commit-graph file. However, using offsets be problematic if a commit is malformed but valid and has committer date of 0 Unix time, as the offset would be the same as corrected commit date and thus require 64-bits to be stored properly. While Git does not write out offsets at this stage, Git stores the corrected commit dates in member generation of struct commit_graph_data. It will begin writing commit date offsets with the introduction of generation data chunk. Signed-off-by: Abhishek Kumar --- commit-graph.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 6d42e30cd9a..a899f429093 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1343,9 +1343,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) ctx->commits.nr); for (i = 0; i < ctx->commits.nr; i++) { uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]); + timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation; display_progress(ctx->progress, i + 1); - if (level != GENERATION_NUMBER_ZERO) + if (level != GENERATION_NUMBER_ZERO && + corrected_commit_date != GENERATION_NUMBER_ZERO) continue; commit_list_insert(ctx->commits.list[i], &list); @@ -1354,17 +1356,24 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) struct commit_list *parent; int all_parents_computed = 1; uint32_t max_level = 0; + timestamp_t max_corrected_commit_date = 0; for (parent = current->parents; parent; parent = parent->next) { level = *topo_level_slab_at(ctx->topo_levels, parent->item); + corrected_commit_date = commit_graph_data_at(parent->item)->generation; - if (level == GENERATION_NUMBER_ZERO) { + if (level == GENERATION_NUMBER_ZERO || + corrected_commit_date == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; - } else if (level > max_level) { - max_level = level; } + + if (level > max_level) + max_level = level; + + if (corrected_commit_date > max_corrected_commit_date) + max_corrected_commit_date = corrected_commit_date; } if (all_parents_computed) { @@ -1373,6 +1382,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (max_level > GENERATION_NUMBER_V1_MAX - 1) max_level = GENERATION_NUMBER_V1_MAX - 1; *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; + + if (current->date && current->date > max_corrected_commit_date) + max_corrected_commit_date = current->date - 1; + commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; } } } From patchwork Sat Jan 16 18:11:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024987 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 720A4C433E6 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38C1822B3F for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728021AbhAPSMr (ORCPT ); Sat, 16 Jan 2021 13:12:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727926AbhAPSMp (ORCPT ); Sat, 16 Jan 2021 13:12:45 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4C7EC0613ED for ; Sat, 16 Jan 2021 10:11:31 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id a12so12461475wrv.8 for ; Sat, 16 Jan 2021 10:11:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=SQLZNeMVDHliWTVxVPYiD2PROFUG/sTGekxSMPMoM70=; b=hZ3v7C4HHHxfMR6Lmuv/2PmQqIyy91lx0027CHzhn+u+beat64XQQ5iR4A/Wi/sB2w FwMqFc3DvXPPwbMvEDnQNge/wUe65qo8G9XlB+VEC3Lbk3qYQ1nIDa4mXKkzIjU43G8z aADR5ZBhEE+gdE81W2tmHlutxeV/BzZ1Wa/f43h/+cGIQi1K2gQkxUafpXANdyVuNHEj ESqYvD3yWkDjockJOEjF5ptJn8lu9+LrT7mkGyMgqKf8dRCEbWNRGoLyXVB+Yon244On w/fYbXZtkUSpwYQOpppvITGKFnNiBRZNqwBYL0Z3Omh+DkFbZDb+4EBLZekOQB2AplOE xncg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=SQLZNeMVDHliWTVxVPYiD2PROFUG/sTGekxSMPMoM70=; b=FtYpaxuXKVu4ID6OiqBssIrZJ0a2ffrM2Lkuobjv6qbrCk4SU0ZA78E5nL26Z1f99J Z+M3VPdp9Pd0niYgEqxYCRCKlY/3Bn1FRQS78Rs6TxL3C6yX9gdhxH0juRLT9Ua5vYhN 5KbWEp/wxc/zxFJpQwCD+axsvWtJf4sMtPapIz/XZHSst1pn5QzCqCdTEqAaIcDW+HYS hOCPSkFYXfVrdk1ULrJz0k9P1KAwH/TBz5VH6Me3zAVQDqB1NJ/mKwIoyKY+p3Mr3AyJ w3DlDMcw+22Ma0FLE8PS1N7p9SWve8JK4YDq2FCN5VcSrvrW5JDhZqoMa3NBxFQjqN/2 kkpQ== X-Gm-Message-State: AOAM530pOuvZ9OM65HFcc/rChRDbMTQyVmkw+Pw80X0VaVmnZIxVsjHA SNslS0KFO493AERZ37Q03BQwEqN11Sg= X-Google-Smtp-Source: ABdhPJy15zcOf6gIREysmzeyf6h5gvPW2kR6L9RUchf1yFIbZevSo7Irk6ZhVSCOs9b151THKlU9Tw== X-Received: by 2002:adf:e552:: with SMTP id z18mr19612629wrm.29.1610820690109; Sat, 16 Jan 2021 10:11:30 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l5sm19636904wrv.44.2021.01.16.10.11.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:29 -0800 (PST) Message-Id: <6d0696ae216156e820a8d19ebbae00039a0f3509.1610820679.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:15 +0000 Subject: [PATCH v6 08/11] commit-graph: implement generation data chunk MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar --- commit-graph.c | 114 ++++++++++++++++++++++++++++++---- commit-graph.h | 3 + commit.h | 1 + t/README | 3 + t/helper/test-read-graph.c | 4 ++ t/t4216-log-bloom.sh | 4 +- t/t5318-commit-graph.sh | 79 +++++++++++++++++++---- t/t5324-split-commit-graph.sh | 12 ++-- t/t6600-test-reach.sh | 6 ++ t/test-lib-functions.sh | 6 ++ 10 files changed, 200 insertions(+), 32 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index a899f429093..7365958d9d3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -38,11 +38,13 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */ +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */ #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */ #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ -#define MAX_NUM_CHUNKS 7 +#define MAX_NUM_CHUNKS 9 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -61,6 +63,8 @@ void git_test_write_commit_graph_or_die(void) #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \ + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) +#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31) + /* Remember to update object flag allocation in object.h */ #define REACHABLE (1u<<15) @@ -394,6 +398,20 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->chunk_commit_data = data + chunk_offset; break; + case GRAPH_CHUNKID_GENERATION_DATA: + if (graph->chunk_generation_data) + chunk_repeated = 1; + else + graph->chunk_generation_data = data + chunk_offset; + break; + + case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW: + if (graph->chunk_generation_data_overflow) + chunk_repeated = 1; + else + graph->chunk_generation_data_overflow = data + chunk_offset; + break; + case GRAPH_CHUNKID_EXTRAEDGES: if (graph->chunk_extra_edges) chunk_repeated = 1; @@ -754,8 +772,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, { const unsigned char *commit_data; struct commit_graph_data *graph_data; - uint32_t lex_index; - uint64_t date_high, date_low; + uint32_t lex_index, offset_pos; + uint64_t date_high, date_low, offset; while (pos < g->num_commits_in_base) g = g->base_graph; @@ -773,7 +791,19 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; + if (g->chunk_generation_data) { + offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); + + if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { + if (!g->chunk_generation_data_overflow) + die(_("commit-graph requires overflow generation data but has none")); + + offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; + graph_data->generation = get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); + } else + graph_data->generation = item->date + offset; + } else + graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; if (g->topo_levels) *topo_level_slab_at(g->topo_levels, item) = get_be32(commit_data + g->hash_len + 8) >> 2; @@ -945,6 +975,7 @@ struct write_commit_graph_context { struct oid_array oids; struct packed_commit_list commits; int num_extra_edges; + int num_generation_data_overflows; unsigned long approx_nr_objects; struct progress *progress; int progress_done; @@ -963,7 +994,8 @@ struct write_commit_graph_context { report_progress:1, split:1, changed_paths:1, - order_by_pack:1; + order_by_pack:1, + write_generation_data:1; struct topo_level_slab *topo_levels; const struct commit_graph_opts *opts; @@ -1123,6 +1155,45 @@ static int write_graph_chunk_data(struct hashfile *f, return 0; } +static int write_graph_chunk_generation_data(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + int i, num_generation_data_overflows = 0; + + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + display_progress(ctx->progress, ++ctx->progress_cnt); + + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { + offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; + num_generation_data_overflows++; + } + + hashwrite_be32(f, offset); + } + + return 0; +} + +static int write_graph_chunk_generation_data_overflow(struct hashfile *f, + struct write_commit_graph_context *ctx) +{ + int i; + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + timestamp_t offset = commit_graph_data_at(c)->generation - c->date; + display_progress(ctx->progress, ++ctx->progress_cnt); + + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { + hashwrite_be32(f, offset >> 32); + hashwrite_be32(f, (uint32_t) offset); + } + } + + return 0; +} + static int write_graph_chunk_extra_edges(struct hashfile *f, struct write_commit_graph_context *ctx) { @@ -1386,6 +1457,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) if (current->date && current->date > max_corrected_commit_date) max_corrected_commit_date = current->date - 1; commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; + + if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) + ctx->num_generation_data_overflows++; } } } @@ -1719,6 +1793,21 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) chunks[2].id = GRAPH_CHUNKID_DATA; chunks[2].size = (hashsz + 16) * ctx->commits.nr; chunks[2].write_fn = write_graph_chunk_data; + + if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0)) + ctx->write_generation_data = 0; + if (ctx->write_generation_data) { + chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA; + chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr; + chunks[num_chunks].write_fn = write_graph_chunk_generation_data; + num_chunks++; + } + if (ctx->num_generation_data_overflows) { + chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW; + chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows; + chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow; + num_chunks++; + } if (ctx->num_extra_edges) { chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES; chunks[num_chunks].size = 4 * ctx->num_extra_edges; @@ -2139,6 +2228,8 @@ int write_commit_graph(struct object_directory *odb, ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; + ctx->write_generation_data = 1; + ctx->num_generation_data_overflows = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", bloom_settings.bits_per_entry); @@ -2445,16 +2536,17 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) continue; /* - * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then - * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid - * extra logic in the following condition. + * If we are using topological level and one of our parents has + * generation GENERATION_NUMBER_V1_MAX, then our generation is + * also GENERATION_NUMBER_V1_MAX. Decrement to avoid extra logic + * in the following condition. */ - if (max_generation == GENERATION_NUMBER_V1_MAX) + if (!g->chunk_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); - if (generation != max_generation + 1) - graph_report(_("commit-graph generation for commit %s is %"PRItime" != %"PRItime), + if (generation < max_generation + 1) + graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime), oid_to_hex(&cur_oid), generation, max_generation + 1); diff --git a/commit-graph.h b/commit-graph.h index 2e9aa7824ee..19a02001fde 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -6,6 +6,7 @@ #include "oidset.h" #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH" +#define GIT_TEST_COMMIT_GRAPH_NO_GDAT "GIT_TEST_COMMIT_GRAPH_NO_GDAT" #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE" #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS" @@ -68,6 +69,8 @@ struct commit_graph { const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_commit_data; + const unsigned char *chunk_generation_data; + const unsigned char *chunk_generation_data_overflow; const unsigned char *chunk_extra_edges; const unsigned char *chunk_base_graphs; const unsigned char *chunk_bloom_indexes; diff --git a/commit.h b/commit.h index 742d96c41e8..eff94f3f7c2 100644 --- a/commit.h +++ b/commit.h @@ -14,6 +14,7 @@ #define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1) #define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF #define GENERATION_NUMBER_ZERO 0 +#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1) struct commit_list { struct commit *item; diff --git a/t/README b/t/README index c730a707705..8a121487279 100644 --- a/t/README +++ b/t/README @@ -393,6 +393,9 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the commit-graph to be written after every 'git commit' command, and overrides the 'core.commitGraph' setting to true. +GIT_TEST_COMMIT_GRAPH_NO_GDAT=, when true, forces the +commit-graph to be written without generation data chunk. + GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=, when true, forces commit-graph write to compute and write changed path Bloom filters for every 'git commit-graph write', as if the `--changed-paths` option was diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 5f585a17256..75927b2c81d 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -33,6 +33,10 @@ int cmd__read_graph(int argc, const char **argv) printf(" oid_lookup"); if (graph->chunk_commit_data) printf(" commit_metadata"); + if (graph->chunk_generation_data) + printf(" generation_data"); + if (graph->chunk_generation_data_overflow) + printf(" generation_data_overflow"); if (graph->chunk_extra_edges) printf(" extra_edges"); if (graph->chunk_bloom_indexes) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index d11040ce41c..dbde0161882 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -40,11 +40,11 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' ' graph_read_expect () { - NUM_CHUNKS=5 + NUM_CHUNKS=6 cat >expect <<- EOF header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0 num_commits: $1 - chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data + chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data EOF test-tool read-graph >actual && test_cmp expect actual diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2ed0c1544da..fa27df579a5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -76,7 +76,7 @@ graph_git_behavior 'no graph' full commits/3 commits/1 graph_read_expect() { OPTIONAL="" NUM_CHUNKS=3 - if test ! -z $2 + if test ! -z "$2" then OPTIONAL=" $2" NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) @@ -103,14 +103,14 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' ' # valid commit and tree OID git rev-parse HEAD HEAD^{tree} >in && git commit-graph write --stdin-commits >commits-in && cat commits-in | git commit-graph write --stdin-commits && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "6" + graph_read_expect "6" "generation_data" ' graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1 @@ -297,7 +297,7 @@ test_expect_success 'build graph from commits with append' ' cd "$TRASH_DIRECTORY/full" && git rev-parse merge/3 | git commit-graph write --stdin-commits --append && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "10" "extra_edges" + graph_read_expect "10" "generation_data extra_edges" ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 @@ -307,7 +307,7 @@ test_expect_success 'build graph using --reachable' ' cd "$TRASH_DIRECTORY/full" && git commit-graph write --reachable && test_path_is_file $objdir/info/commit-graph && - graph_read_expect "11" "extra_edges" + graph_read_expect "11" "generation_data extra_edges" ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 @@ -328,7 +328,7 @@ test_expect_success 'write graph in bare repo' ' cd "$TRASH_DIRECTORY/bare" && git commit-graph write && test_path_is_file $baredir/info/commit-graph && - graph_read_expect "11" "extra_edges" + graph_read_expect "11" "generation_data extra_edges" ' graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1 @@ -454,8 +454,9 @@ test_expect_success 'warn on improper hash version' ' test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && - git rev-parse commits/8 | git commit-graph write --stdin-commits && - git commit-graph verify >output + git rev-parse commits/8 | GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --stdin-commits && + git commit-graph verify >output && + graph_read_expect 9 extra_edges ' NUM_COMMITS=9 @@ -741,4 +742,56 @@ test_expect_success 'corrupt commit-graph write (missing tree)' ' ) ' +# We test the overflow-related code with the following repo history: +# +# 4:F - 5:N - 6:U +# / \ +# 1:U - 2:N - 3:U M:N +# \ / +# 7:N - 8:F - 9:N +# +# Here the commits denoted by U have committer date of zero seconds +# since Unix epoch, the commits denoted by N have committer date +# starting from 1112354055 seconds since Unix epoch (default committer +# date for the test suite), and the commits denoted by F have committer +# date of (2 ^ 31 - 2) seconds since Unix epoch. +# +# The largest offset observed is 2 ^ 31, just large enough to overflow. +# + +test_expect_success 'set up and verify repo with generation data overflow chunk' ' + objdir=".git/objects" && + UNIX_EPOCH_ZERO="@0 +0000" && + FUTURE_DATE="@2147483646 +0000" && + test_oid_cache <<-EOF && + oid_version sha1:1 + oid_version sha256:2 + EOF + cd "$TRASH_DIRECTORY" && + mkdir repo && + cd repo && + git init && + test_commit --date "$UNIX_EPOCH_ZERO" 1 && + test_commit 2 && + test_commit --date "$UNIX_EPOCH_ZERO" 3 && + git commit-graph write --reachable && + graph_read_expect 3 generation_data && + test_commit --date "$FUTURE_DATE" 4 && + test_commit 5 && + test_commit --date "$UNIX_EPOCH_ZERO" 6 && + git branch left && + git reset --hard 3 && + test_commit 7 && + test_commit --date "$FUTURE_DATE" 8 && + test_commit 9 && + git branch right && + git reset --hard 3 && + test_merge M left right && + git commit-graph write --reachable && + graph_read_expect 10 "generation_data generation_data_overflow" && + git commit-graph verify +' + +graph_git_behavior 'generation data overflow chunk repo' repo left right + test_done diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 4d3842b83b9..587757b62d9 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -13,11 +13,11 @@ test_expect_success 'setup repo' ' infodir=".git/objects/info" && graphdir="$infodir/commit-graphs" && test_oid_cache <<-EOM - shallow sha1:1760 - shallow sha256:2064 + shallow sha1:2132 + shallow sha256:2436 - base sha1:1376 - base sha256:1496 + base sha1:1408 + base sha256:1528 oid_version sha1:1 oid_version sha256:2 @@ -31,9 +31,9 @@ graph_read_expect() { NUM_BASE=$2 fi cat >expect <<- EOF - header: 43475048 1 $(test_oid oid_version) 3 $NUM_BASE + header: 43475048 1 $(test_oid oid_version) 4 $NUM_BASE num_commits: $1 - chunks: oid_fanout oid_lookup commit_metadata + chunks: oid_fanout oid_lookup commit_metadata generation_data EOF test-tool read-graph >output && test_cmp expect output diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index af10f0dc090..e2d33a8a4c4 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -55,6 +55,9 @@ test_expect_success 'setup' ' git show-ref -s commit-5-5 | git commit-graph write --stdin-commits && mv .git/objects/info/commit-graph commit-graph-half && chmod u+w commit-graph-half && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable && + mv .git/objects/info/commit-graph commit-graph-no-gdat && + chmod u+w commit-graph-no-gdat && git config core.commitGraph true ' @@ -67,6 +70,9 @@ run_all_modes () { test_cmp expect actual && cp commit-graph-half .git/objects/info/commit-graph && "$@" actual && + test_cmp expect actual && + cp commit-graph-no-gdat .git/objects/info/commit-graph && + "$@" actual && test_cmp expect actual } diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 999982fe4a9..3ad712c3acc 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -202,6 +202,12 @@ test_commit () { --signoff) signoff="$1" ;; + --date) + notick=yes + GIT_COMMITTER_DATE="$2" + GIT_AUTHOR_DATE="$2" + shift + ;; -C) indir="$2" shift From patchwork Sat Jan 16 18:11:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3B84C43331 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B402B22C7D for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728059AbhAPSMx (ORCPT ); Sat, 16 Jan 2021 13:12:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727925AbhAPSMp (ORCPT ); Sat, 16 Jan 2021 13:12:45 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B45EAC061786 for ; Sat, 16 Jan 2021 10:11:32 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id o10so3765771wmc.1 for ; Sat, 16 Jan 2021 10:11:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Xm0Voi53c0mnE8hDYhxruCcV27lZV8ePud0DYZZitXM=; b=qqDf8YXy8wc3B44x00tLioI3Q621+GZC3pAOLsxRAxu6P9ZR3Bd69tGl/hh+Y5nmUf +/Zn+X9WjhkuM77B5wlY9g0HzaR+VMg1lPg1L6cxICCDk1YuDonT1lSZ4lBgobh254Xn 68GGp8aerIU3E9q/o02iaHvALCyUdI014Fdh83VLW3bAfcIH1CZq7OMF5XwyZK2l1/Kh ARoC0L+2D1ZG3xu0YL+76fAMDqVto4bC71WUmljQJPshPgvmqoiBpN4EA04JS7YX4IuJ DZfyMsZKaW1r4FAJl9VgZXhY/wzRk04Y+d7h29qTBlY4D+Q0HDVre990rui7dxSa1rhj fhIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Xm0Voi53c0mnE8hDYhxruCcV27lZV8ePud0DYZZitXM=; b=uONdCDjbAmwLnNpxNFxMgSUlHeZNrnkHSVa79NsR5GzbY1kQwFYiqgGkOVEgXaZnDb yzy4ROwZYex2F57Wru2hJK/VlJCLah0rsKJucGSq7Odl5ZBIR+E3tLQ2cHl/rrQO1JP/ UaL5rmWwOI0kU1HMahCoP8503Nhe4j1ElLc58DlA6/AD+wd/IXrgOoCYMWAgWSzBcPr/ WLjN4bDADPMBEzWXTE+P8X/iTDnYXllPgqRgBhmzkNKuiQSoObHZ3IHnlw/bj9gEknGQ S3CRZwKTjldyu/+RXqcGn3MFUDLEUS/aELQu5Mh+pHIw3u1a++pyb8DqNY3HfEdKvxNp cQeQ== X-Gm-Message-State: AOAM531ftiwTrSrSld0Zab+0t+L0hkdrCqITQNlLzO5mJ5tDOJSmz+fL ULAGJpVHmQa+Jtc7HcNduYlBWsEleq4= X-Google-Smtp-Source: ABdhPJy2wqWugzBMKbg8Bwtb+MtpJVUlCt/0fY2+d8KnIdmcebhAQ7jo+k7zD1G2m9jcOuBakKExYw== X-Received: by 2002:a1c:1bcc:: with SMTP id b195mr14084603wmb.131.1610820691179; Sat, 16 Jan 2021 10:11:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b10sm2054595wmj.2.2021.01.16.10.11.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:30 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:16 +0000 Subject: [PATCH v6 09/11] commit-graph: use generation v2 only if entire chain does Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar Since there are released versions of Git that understand generation numbers in the commit-graph's CDAT chunk but do not understand the GDAT chunk, the following scenario is possible: 1. "New" Git writes a commit-graph with the GDAT chunk. 2. "Old" Git writes a split commit-graph on top without a GDAT chunk. If each layer of split commit-graph is treated independently, as it was the case before this commit, with Git inspecting only the current layer for chunk_generation_data pointer, commits in the lower layer (one with GDAT) whould have corrected commit date as their generation number, while commits in the upper layer would have topological levels as their generation. Corrected commit dates usually have much larger values than topological levels. This means that if we take two commits, one from the upper layer, and one reachable from it in the lower layer, then the expectation that the generation of a parent is smaller than the generation of a child would be violated. It is difficult to expose this issue in a test. Since we _start_ with artificially low generation numbers, any commit walk that prioritizes generation numbers will walk all of the commits with high generation number before walking the commits with low generation number. In all the cases I tried, the commit-graph layers themselves "protect" any incorrect behavior since none of the commits in the lower layer can reach the commits in the upper layer. This issue would manifest itself as a performance problem in this case, especially with something like "git log --graph" since the low generation numbers would cause the in-degree queue to walk all of the commits in the lower layer before allowing the topo-order queue to write anything to output (depending on the size of the upper layer). Therefore, When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well. Rewriting layers follows similar approach: if the topmost layer below the set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either. Signed-off-by: Derrick Stolee Signed-off-by: Abhishek Kumar --- commit-graph.c | 30 +++++- commit-graph.h | 1 + t/t5324-split-commit-graph.sh | 181 ++++++++++++++++++++++++++++++++++ 3 files changed, 210 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 7365958d9d3..d32492f3724 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -614,6 +614,21 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static void validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data; + + if (!g) + return; + + read_generation_data = !!g->chunk_generation_data; + + while (g) { + g->read_generation_data = read_generation_data; + g = g->base_graph; + } +} + struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -622,6 +637,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); + validate_mixed_generation_chain(g); + return g; } @@ -791,7 +808,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - if (g->chunk_generation_data) { + if (g->read_generation_data) { offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { @@ -2019,6 +2036,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) if (i < ctx->num_commit_graphs_after) ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid)); + /* + * If the topmost remaining layer has generation data chunk, the + * resultant layer also has generation data chunk. + */ + if (i == ctx->num_commit_graphs_after - 2) + ctx->write_generation_data = !!g->chunk_generation_data; + i--; g = g->base_graph; } @@ -2343,6 +2367,8 @@ int write_commit_graph(struct object_directory *odb, } else ctx->num_commit_graphs_after = 1; + validate_mixed_generation_chain(ctx->r->objects->commit_graph); + compute_generation_numbers(ctx); if (ctx->changed_paths) @@ -2541,7 +2567,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags) * also GENERATION_NUMBER_V1_MAX. Decrement to avoid extra logic * in the following condition. */ - if (!g->chunk_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) + if (!g->read_generation_data && max_generation == GENERATION_NUMBER_V1_MAX) max_generation--; generation = commit_graph_generation(graph_commit); diff --git a/commit-graph.h b/commit-graph.h index 19a02001fde..ad52130883b 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -64,6 +64,7 @@ struct commit_graph { struct object_directory *odb; uint32_t num_commits_in_base; + unsigned int read_generation_data; struct commit_graph *base_graph; const uint32_t *chunk_oid_fanout; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 587757b62d9..8e90f3423b8 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -453,4 +453,185 @@ test_expect_success 'prevent regression for duplicate commits across layers' ' git -C dup commit-graph verify ' +NUM_FIRST_LAYER_COMMITS=64 +NUM_SECOND_LAYER_COMMITS=16 +NUM_THIRD_LAYER_COMMITS=7 +NUM_FOURTH_LAYER_COMMITS=8 +NUM_FIFTH_LAYER_COMMITS=16 +SECOND_LAYER_SEQUENCE_START=$(($NUM_FIRST_LAYER_COMMITS + 1)) +SECOND_LAYER_SEQUENCE_END=$(($SECOND_LAYER_SEQUENCE_START + $NUM_SECOND_LAYER_COMMITS - 1)) +THIRD_LAYER_SEQUENCE_START=$(($SECOND_LAYER_SEQUENCE_END + 1)) +THIRD_LAYER_SEQUENCE_END=$(($THIRD_LAYER_SEQUENCE_START + $NUM_THIRD_LAYER_COMMITS - 1)) +FOURTH_LAYER_SEQUENCE_START=$(($THIRD_LAYER_SEQUENCE_END + 1)) +FOURTH_LAYER_SEQUENCE_END=$(($FOURTH_LAYER_SEQUENCE_START + $NUM_FOURTH_LAYER_COMMITS - 1)) +FIFTH_LAYER_SEQUENCE_START=$(($FOURTH_LAYER_SEQUENCE_END + 1)) +FIFTH_LAYER_SEQUENCE_END=$(($FIFTH_LAYER_SEQUENCE_START + $NUM_FIFTH_LAYER_COMMITS - 1)) + +# Current split graph chain: +# +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' + graphdir=".git/objects/info/commit-graphs" && + test_oid_cache <<-EOF && + oid_version sha1:1 + oid_version sha256:2 + EOF + git init mixed && + ( + cd mixed && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + for i in $(test_seq $NUM_FIRST_LAYER_COMMITS) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split && + graph_read_expect $NUM_FIRST_LAYER_COMMITS && + test_line_count = 1 $graphdir/commit-graph-chain && + for i in $(test_seq $SECOND_LAYER_SEQUENCE_START $SECOND_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge && + test_line_count = 2 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 1 + num_commits: $NUM_SECOND_LAYER_COMMITS + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify && + cat $graphdir/commit-graph-chain + ) +' + +# The new layer will be added without generation data chunk as it was not +# present on the layer underneath it. +# +# 7 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'do not write generation data chunk if not present on existing tip' ' + git clone mixed mixed-no-gdat && + ( + cd mixed-no-gdat && + for i in $(test_seq $THIRD_LAYER_SEQUENCE_START $THIRD_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split=no-merge && + test_line_count = 3 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: $NUM_THIRD_LAYER_COMMITS + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify + ) +' + +# Number of commits in each layer of the split-commit graph before merge: +# +# 8 commits (No GDAT) +# ------------------------ +# 7 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +# The top two layers are merged and do not have generation data chunk as layer below them does +# not have generation data chunk. +# +# 15 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'do not write generation data chunk if the topmost remaining layer does not have generation data chunk' ' + git clone mixed-no-gdat mixed-merge-no-gdat && + ( + cd mixed-merge-no-gdat && + for i in $(test_seq $FOURTH_LAYER_SEQUENCE_START $FOURTH_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split --size-multiple 1 && + test_line_count = 3 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 4 2 + num_commits: $(($NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS)) + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output && + git commit-graph verify + ) +' + +# Number of commits in each layer of the split-commit graph before merge: +# +# 16 commits (No GDAT) +# ------------------------ +# 15 commits (No GDAT) +# ------------------------ +# 16 commits (No GDAT) +# ------------------------ +# 64 commits (GDAT) +# +# The top three layers are merged and has generation data chunk as the topmost remaining layer +# has generation data chunk. +# +# 47 commits (GDAT) +# ------------------------ +# 64 commits (GDAT) +# +test_expect_success 'write generation data chunk if topmost remaining layer has generation data chunk' ' + git clone mixed-merge-no-gdat mixed-merge-gdat && + ( + cd mixed-merge-gdat && + for i in $(test_seq $FIFTH_LAYER_SEQUENCE_START $FIFTH_LAYER_SEQUENCE_END) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git commit-graph write --reachable --split --size-multiple 1 && + test_line_count = 2 $graphdir/commit-graph-chain && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 $(test_oid oid_version) 5 1 + num_commits: $(($NUM_SECOND_LAYER_COMMITS + $NUM_THIRD_LAYER_COMMITS + $NUM_FOURTH_LAYER_COMMITS + $NUM_FIFTH_LAYER_COMMITS)) + chunks: oid_fanout oid_lookup commit_metadata generation_data + EOF + test_cmp expect output + ) +' + +test_expect_success 'write generation data chunk when commit-graph chain is replaced' ' + git clone mixed mixed-replace && + ( + cd mixed-replace && + git commit-graph write --reachable --split=replace && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir && + graph_read_expect $(($NUM_FIRST_LAYER_COMMITS + $NUM_SECOND_LAYER_COMMITS)) && + git commit-graph verify + ) +' + test_done From patchwork Sat Jan 16 18:11:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3F07C4332E for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C54C22B3F for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728049AbhAPSMx (ORCPT ); Sat, 16 Jan 2021 13:12:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727995AbhAPSMp (ORCPT ); Sat, 16 Jan 2021 13:12:45 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B35D2C061793 for ; Sat, 16 Jan 2021 10:11:33 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id h17so9886444wmq.1 for ; Sat, 16 Jan 2021 10:11:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=iWQ5ofDa0Bh/EPPPtQem8eshwGjEKPF2HmR9tObQh6A=; b=BjXymHSmMDO9dRyYAw9YFTYI8foWlAv9yN7zbgiIADrFMI7htWRl6kUtzH9kmOrNbZ zZ3YTd1EYbSYo/sCfW9sFTVqAOyNnmmz9+CdjRjqnM1yGrnhsj8h7Ct1VXq5fuqq+MbN bV8jlYekk2i5WTMMq3u2STtlQ+VkGNh7Xkd6+3+t7UuoGaPbbXWsBpch8APJnSirALRi GqgOkSQy9mqljXhybdrza/sx7+zQ6Hx9CkfAj5bPlE5bO12wgIqjwxVs3TS6qCFECmrb yA7qPte+1HRCwu3/1IwZacZbLHZMz3gPWX2NEMNiAa3LkmxblgJt7ZEC+WPBwZ0sx+TR bUcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=iWQ5ofDa0Bh/EPPPtQem8eshwGjEKPF2HmR9tObQh6A=; b=n49ncc4jt3zbiekRyTL7YbGooSJj+mFF/9k4iaD13HxiQZM9vCdtOHpxJZrULtWgW7 87WiOC1KiuIgwuS1ddF46rK2XpbmpxSl6o057moKqQqklC4Nn5vDp56FsGKYOMTjp92u u9AOjaXnZCI/sLwWczwi5/dQ5S6TJ5sVHZoIb4GtMe+Ww9sQXr/thZ41XYk+UqqB3Kxz KEaLCijv7E3vGdFFNnegpIjrduTg42HxHsM92yQjPXY5xs26vUiyeMx5jmS81KFmDoFR N3SyMD1eNuTg/1Pj8aRZeJ8g5PBSqxVQLT7HEP+0bDdcDXRVCgRYkUiq0FcSef5tE7Ag /IBg== X-Gm-Message-State: AOAM533zKjzzrQdZSwOYSNpGGfwu/B7CpYrvhFfHkPYVZFlxw2ocNdjZ cfs1oIhpF1a7Auxja4KkvJyzH2zstog= X-Google-Smtp-Source: ABdhPJzY6HmLhFGr2lkMHmGSFRXknDdGJR6R4/w6/j49LQaXqaXmDO+Xk2u8BRZfXvzz+g/1HTywHg== X-Received: by 2002:a1c:3b44:: with SMTP id i65mr13730139wma.53.1610820692143; Sat, 16 Jan 2021 10:11:32 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z15sm20003345wrv.67.2021.01.16.10.11.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:31 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:17 +0000 Subject: [PATCH v6 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar 091f4cf (commit: don't use generation numbers if not needed, 2018-08-30) changed paint_down_to_common() to use commit dates instead of generation numbers v1 (topological levels) as the performance regressed on certain topologies. With generation number v2 (corrected commit dates) implemented, we no longer have to rely on commit dates and can use generation numbers. For example, the command `git merge-base v4.8 v4.9` on the Linux repository walks 167468 commits, taking 0.135s for committer date and 167496 commits, taking 0.157s for corrected committer date respectively. While using corrected commit dates, Git walks nearly the same number of commits as commit date, the process is slower as for each comparision we have to access a commit-slab (for corrected committer date) instead of accessing struct member (for committer date). This change incidentally broke the fragile t6404-recursive-merge test. t6404-recursive-merge sets up a unique repository where all commits have the same committer date without a well-defined merge-base. While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer date as a heuristic in paint_down_to_common(). 6404.1 'combined merge conflicts' merges commits in the order: - Merge C with B to form an intermediate commit. - Merge the intermediate commit with A. With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently use the corrected committer date, which changes the order in which commits are merged: - Merge A with B to form an intermediate commit. - Merge the intermediate commit with C. While resulting repositories are equivalent, 6404.4 'virtual trees were processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting different merge-bases and thus have different object ids for the intermediate commits. As this has already causes problems (as noted in 859fdc0 (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within t6404-recursive-merge. Signed-off-by: Abhishek Kumar --- commit-graph.c | 14 ++++++++++++++ commit-graph.h | 6 ++++++ commit-reach.c | 2 +- t/t6404-recursive-merge.sh | 5 ++++- 4 files changed, 25 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d32492f3724..d3d14601d4d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -714,6 +714,20 @@ int generation_numbers_enabled(struct repository *r) return !!first_generation; } +int corrected_commit_dates_enabled(struct repository *r) +{ + struct commit_graph *g; + if (!prepare_commit_graph(r)) + return 0; + + g = r->objects->commit_graph; + + if (!g->num_commits) + return 0; + + return g->read_generation_data; +} + struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r) { struct commit_graph *g = r->objects->commit_graph; diff --git a/commit-graph.h b/commit-graph.h index ad52130883b..97f3497c279 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -95,6 +95,12 @@ struct commit_graph *parse_commit_graph(struct repository *r, */ int generation_numbers_enabled(struct repository *r); +/* + * Return 1 if and only if the repository has a commit-graph + * file and generation data chunk has been written for the file. + */ +int corrected_commit_dates_enabled(struct repository *r); + struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r); enum commit_graph_write_flags { diff --git a/commit-reach.c b/commit-reach.c index 9b24b0378d5..e38771ca5a1 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -39,7 +39,7 @@ static struct commit_list *paint_down_to_common(struct repository *r, int i; timestamp_t last_gen = GENERATION_NUMBER_INFINITY; - if (!min_generation) + if (!min_generation && !corrected_commit_dates_enabled(r)) queue.compare = compare_commits_by_commit_date; one->object.flags |= PARENT1; diff --git a/t/t6404-recursive-merge.sh b/t/t6404-recursive-merge.sh index b1c3d4dda49..86f74ae5847 100755 --- a/t/t6404-recursive-merge.sh +++ b/t/t6404-recursive-merge.sh @@ -15,6 +15,8 @@ GIT_COMMITTER_DATE="2006-12-12 23:28:00 +0100" export GIT_COMMITTER_DATE test_expect_success 'setup tests' ' + GIT_TEST_COMMIT_GRAPH=0 && + export GIT_TEST_COMMIT_GRAPH && echo 1 >a1 && git add a1 && GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 && @@ -66,7 +68,7 @@ test_expect_success 'setup tests' ' ' test_expect_success 'combined merge conflicts' ' - test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git merge -m final G + test_must_fail git merge -m final G ' test_expect_success 'result contains a conflict' ' @@ -82,6 +84,7 @@ test_expect_success 'result contains a conflict' ' ' test_expect_success 'virtual trees were processed' ' + # TODO: fragile test, relies on ambigious merge-base resolution git ls-files --stage >out && cat >expect <<-EOF && From patchwork Sat Jan 16 18:11:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Kumar X-Patchwork-Id: 12024983 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CCD1C433E9 for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 532022255F for ; Sat, 16 Jan 2021 18:13:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728044AbhAPSMs (ORCPT ); Sat, 16 Jan 2021 13:12:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727883AbhAPSMr (ORCPT ); Sat, 16 Jan 2021 13:12:47 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1C54C061794 for ; Sat, 16 Jan 2021 10:11:34 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id y17so12476419wrr.10 for ; Sat, 16 Jan 2021 10:11:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=YjaFqlWBLS/Ghmuj7hyNZ0dlOJahQzrJDl0+ZysUGw0=; b=gujqMcdvoXexteI5YA6PqLw7NIRw9qi8j7pDPndxLUC+q4cc/p4yo8aovLR0qtwUh9 WU5N6f4I1jH+62ELSNnxmaA3yqZPpq3ePL94Kx4YUYKAEPqEN586UkjKyuf6vWTBMUFz L5p9WsBwlSygPD98k5eBfIbl5+q55Vp3+BRggaORiOsjx+Dy86y0p1NEZCoUWG1V7Avz 0JrE4BzESIXug1ArhXQtxGEn3cfpS2CoRxaubdLCabasq+7Qx7TOHLzcMZEtwHL9VSv8 QhDIAIT31e+hH22HJQkIkOhi6BGsk9LGBsE9QBGG7cihBRsvrKP2IQzAZ/D8WuWnJ2ve 42HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=YjaFqlWBLS/Ghmuj7hyNZ0dlOJahQzrJDl0+ZysUGw0=; b=ih7tbJbnIxl29I7GrqwygUkc884TGa4mgOE6ZySJK3/+0CP7XIUrUiN8tvjrYBTF3e qEOk23ebQYv/Zmx+WYyZ3auP/ymrEeoNGVHGByWdhWB5vFm8gg5phKtvAvXSzzTlpS5z 6YXSdppa76/jlbBH47YLf7kdil07+gtwUv7bzgpui0cMLR6V3bT2w6zJTSdjf5M5z6uM gpsJNNxhXVchq/6m6EVaS/MKNgBMTwIFWVY5a+C1zVDXNoQtj6KNrVHcusJBmtVArSSG ccHaA0UxBjc1R5ujAaA3js+BhQpQsAVcqe7Ejk1A+wEbLB9/oWuoo4Z415qmqLa8zGvd 7Swg== X-Gm-Message-State: AOAM533BsUcD2YWh0ZUAUdHSMC9EfOyEbF0gkMXi8qyOT6cbZJeNA93l tP6l8/Yk0/ne0nznM8rqZ1KLBZ11RYw= X-Google-Smtp-Source: ABdhPJz4Jx/zSS2GRE5EL5OiLJSP2OfURXU1zHZSMyWC6OndnxYz2iPdeFTaXLSDo2b9Ao7TiD6TKQ== X-Received: by 2002:a5d:45cc:: with SMTP id b12mr11218278wrs.227.1610820693357; Sat, 16 Jan 2021 10:11:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c190sm16318034wme.19.2021.01.16.10.11.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 Jan 2021 10:11:32 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sat, 16 Jan 2021 18:11:18 +0000 Subject: [PATCH v6 11/11] doc: add corrected commit date info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jakub =?utf-8?b?TmFyxJlic2tp?= , Taylor Blau , Abhishek Kumar , SZEDER =?utf-8?b?R8OhYm9y?= , Abhishek Kumar , Abhishek Kumar Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Abhishek Kumar From: Abhishek Kumar With generation data chunk and corrected commit dates implemented, let's update the technical documentation for commit-graph. Signed-off-by: Abhishek Kumar --- .../technical/commit-graph-format.txt | 28 +++++-- Documentation/technical/commit-graph.txt | 77 +++++++++++++++---- 2 files changed, 86 insertions(+), 19 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b3b58880b92..b6658eff188 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -4,11 +4,7 @@ Git commit graph format The Git commit graph stores a list of commit OIDs and some associated metadata, including: -- The generation number of the commit. Commits with no parents have - generation number 1; commits with parents have generation number - one more than the maximum generation number of its parents. We - reserve zero as special, and can be used to mark a generation - number invalid or as "not computed". +- The generation number of the commit. - The root tree OID. @@ -86,13 +82,33 @@ CHUNK DATA: position. If there are more than two parents, the second value has its most-significant bit on and the other bits store an array position into the Extra Edge List chunk. - * The next 8 bytes store the generation number of the commit and + * The next 8 bytes store the topological level (generation number v1) + of the commit and the commit time in seconds since EPOCH. The generation number uses the higher 30 bits of the first 4 bytes, while the commit time uses the 32 bits of the second 4 bytes, along with the lowest 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time. + Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional] + * This list of 4-byte values store corrected commit date offsets for the + commits, arranged in the same order as commit data chunk. + * If the corrected commit date offset cannot be stored within 31 bits, + the value has its most-significant bit on and the other bits store + the position of corrected commit date into the Generation Data Overflow + chunk. + * Generation Data chunk is present only when commit-graph file is written + by compatible versions of Git and in case of split commit-graph chains, + the topmost layer also has Generation Data chunk. + + Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] + * This list of 8-byte values stores the corrected commit date offsets + for commits with corrected commit date offsets that cannot be + stored within 31 bits. + * Generation Data Overflow chunk is present only when Generation Data + chunk is present and atleast one corrected commit date offset cannot + be stored within 31 bits. + Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] This list of 4-byte values store the second through nth parents for all octopus merges. The second parent value in the commit data stores diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index f14a7659aa8..f05e7bda1a9 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph: Values 1-4 satisfy the requirements of parse_commit_gently(). -Define the "generation number" of a commit recursively as follows: +There are two definitions of generation number: +1. Corrected committer dates (generation number v2) +2. Topological levels (generation nummber v1) - * A commit with no parents (a root commit) has generation number one. +Define "corrected committer date" of a commit recursively as follows: - * A commit with at least one parent has generation number one more than - the largest generation number among its parents. + * A commit with no parents (a root commit) has corrected committer date + equal to its committer date. -Equivalently, the generation number of a commit A is one more than the + * A commit with at least one parent has corrected committer date equal to + the maximum of its commiter date and one more than the largest corrected + committer date among its parents. + + * As a special case, a root commit with timestamp zero has corrected commit + date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO + (that is, an uncomputed corrected commit date). + +Define the "topological level" of a commit recursively as follows: + + * A commit with no parents (a root commit) has topological level of one. + + * A commit with at least one parent has topological level one more than + the largest topological level among its parents. + +Equivalently, the topological level of a commit A is one more than the length of a longest path from A to a root commit. The recursive definition is easier to use for computation and observing the following property: @@ -60,6 +77,9 @@ is easier to use for computation and observing the following property: generation numbers, then we always expand the boundary commit with highest generation number and can easily detect the stopping condition. +The property applies to both versions of generation number, that is both +corrected committer dates and topological levels. + This property can be used to significantly reduce the time it takes to walk commits and determine topological relationships. Without generation numbers, the general heuristic is the following: @@ -67,7 +87,9 @@ numbers, the general heuristic is the following: If A and B are commits with commit time X and Y, respectively, and X < Y, then A _probably_ cannot reach B. -This heuristic is currently used whenever the computation is allowed to +In absence of corrected commit dates (for example, old versions of Git or +mixed generation graph chains), +this heuristic is currently used whenever the computation is allowed to violate topological relationships due to clock skew (such as "git log" with default order), but is not used when the topological order is required (such as merge base calculations, "git log --graph"). @@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite" generation number and walk until reaching commits with known generation number. -We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not +We use the macro GENERATION_NUMBER_INFINITY to mark commits not in the commit-graph file. If a commit-graph file was written by a version of Git that did not compute generation numbers, then those commits will have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. @@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in walking a few extra commits, but the simplicity in dealing with commits with generation number *_INFINITY or *_ZERO is valuable. -We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose -generation numbers are computed to be at least this value. We limit at -this value since it is the largest value that can be stored in the -commit-graph file using the 30 bits available to generation numbers. This -presents another case where a commit can have generation number equal to -that of a parent. +We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose +topological levels (generation number v1) are computed to be at least +this value. We limit at this value since it is the largest value that +can be stored in the commit-graph file using the 30 bits available +to topological levels. This presents another case where a commit can +have generation number equal to that of a parent. Design Details -------------- @@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Handling Mixed Generation Number Chains + +With the introduction of generation number v2 and generation data chunk, the +following scenario is possible: + +1. "New" Git writes a commit-graph with the corrected commit dates. +2. "Old" Git writes a split commit-graph on top without corrected commit dates. + +A naive approach of using the newest available generation number from +each layer would lead to violated expectations: the lower layer would +use corrected commit dates which are much larger than the topological +levels of the higher layer. For this reason, Git inspects the topmost +layer to see if the layer is missing corrected commit dates. In such a case +Git only uses topological level for generation numbers. + +When writing a new layer in split commit-graph, we write corrected commit +dates if the topmost layer has corrected commit dates written. This +guarantees that if a layer has corrected commit dates, all lower layers +must have corrected commit dates as well. + +When merging layers, we do not consider whether the merged layers had corrected +commit dates. Instead, the new layer will have corrected commit dates if the +layer below the new layer has corrected commit dates. + +While writing or merging layers, if the new layer is the only layer, it will +have corrected commit dates when written by compatible versions of Git. Thus, +rewriting split commit-graph as a single file (`--split=replace`) creates a +single layer with corrected commit dates. + ## Deleting graph-{hash} files After a new tip file is written, some `graph-{hash}` files may no longer