From patchwork Mon Mar 20 11:26:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44E2AC7618D for ; Mon, 20 Mar 2023 11:27:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230174AbjCTL1K (ORCPT ); Mon, 20 Mar 2023 07:27:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229652AbjCTL1I (ORCPT ); Mon, 20 Mar 2023 07:27:08 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3035E524C for ; Mon, 20 Mar 2023 04:27:01 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id u11-20020a05600c19cb00b003edcc414997so2396039wmq.3 for ; Mon, 20 Mar 2023 04:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311618; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=CzWG7yCW5ur5d7QC+XEvdDFRu02hEqMdMKyfKCZeia0=; b=bIQM3/lFeegWiwD94eUiV+gdE3oirdAi1MFZ1WdLsnzFKzuy/yY6rMDzaxttUgcF6+ hY50fa4e8J///1rsymbrmHPwqUhFCcPW85RNT+gMvOE17NFOh4uznFrZiSLZYDIYN0Vn agvtsJBLutU4Ma8V1l0NuQKwLi+ojr9FOEcvFIRTTF+kSEK9H9gER/AK9mX0gC63aLob qP48y07Q+VRNgYJxTvD1puxBkyf2T5zImOKCb93FFzy0LfOnfUcgzL1AKp2aut3VWlrQ qREQWP2JLrHb6urop5/uiyPXnHeTNlLtMSLjWER9X3anyGCwrb3dvztw1+n4FiIyWnhp NWLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311618; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CzWG7yCW5ur5d7QC+XEvdDFRu02hEqMdMKyfKCZeia0=; b=JS9JC3fYx4aXS0grdrtn8tMDlt9Ws/pmIYFmVDiNmAXongpKK4ASoNYB68tvi1DOcy s7Zg7m3jy3hsoVLQeA1dHz4qscS129i7WXLTyocRNu7LWHjaOR4TwvHcngDWVPxLVwEU 9ZxismknsH6G8A16Vdr5RysJJyHsFynXr/vSRSscQVrXvuJ1loLvsEEDnzNI1mux3Hzr WJdr9imrxMrGrxJEgrQGzYhF5nvE7tGzWv+6f6c7WsroPqWSkDnS4AkBNChp9lPMItfo B8BBcSjwx3wilhc7/9yfGHUskv3y4AS5szS4T+jlQa3Ab0ZUEQ+lnfl8U9zlbpuonjk9 yDjg== X-Gm-Message-State: AO0yUKVDe83olf3c6Xsw45c+p3OrTUlIh9tGujNbE8bmZVe1c9NaCJFi utS50SVh9PWUmXyXWOjPd38xafeDNtA= X-Google-Smtp-Source: AK7set9tXduoAsvREjRE79Qksj5lm3OFASlCGryFo4oRRhn4VSZ28X4/RhDqy61o8RRuKADWKEXXww== X-Received: by 2002:a05:600c:4f01:b0:3ed:31cf:fe6e with SMTP id l1-20020a05600c4f0100b003ed31cffe6emr15753408wmq.41.1679311618484; Mon, 20 Mar 2023 04:26:58 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 16-20020a05600c025000b003ed1ff06faasm10195386wmj.19.2023.03.20.04.26.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:26:58 -0700 (PDT) Message-Id: <27d94077aa98c7ea76787ac8c2a8fd15915c9eab.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:47 +0000 Subject: [PATCH v4 1/9] for-each-ref: add --stdin option Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a user wishes to input a large list of patterns to 'git for-each-ref' (likely a long list of exact refs) there are frequently system limits on the number of command-line arguments. Add a new --stdin option to instead read the patterns from standard input. Add tests that check that any unrecognized arguments are considered an error when --stdin is provided. Also, an empty pattern list is interpreted as the complete ref set. When reading from stdin, we populate the filter.name_patterns array dynamically as opposed to pointing to the 'argv' array directly. This is simple when using a strvec, as it is NULL-terminated in the same way. We then free the memory directly from the strvec. Helped-by: Phillip Wood Signed-off-by: Derrick Stolee --- Documentation/git-for-each-ref.txt | 7 +++++- builtin/for-each-ref.c | 23 ++++++++++++++++++- t/t6300-for-each-ref.sh | 37 ++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 2 deletions(-) diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index 6da899c6296..ccdc2911bb9 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -9,7 +9,8 @@ SYNOPSIS -------- [verse] 'git for-each-ref' [--count=] [--shell|--perl|--python|--tcl] - [(--sort=)...] [--format=] [...] + [(--sort=)...] [--format=] + [ --stdin | ... ] [--points-at=] [--merged[=]] [--no-merged[=]] [--contains[=]] [--no-contains[=]] @@ -32,6 +33,10 @@ OPTIONS literally, in the latter case matching completely or from the beginning up to a slash. +--stdin:: + If `--stdin` is supplied, then the list of patterns is read from + standard input instead of from the argument list. + --count=:: By default the command shows all refs that match ``. This option makes it stop after showing diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c index 6f62f40d126..9df16cfb854 100644 --- a/builtin/for-each-ref.c +++ b/builtin/for-each-ref.c @@ -5,6 +5,7 @@ #include "object.h" #include "parse-options.h" #include "ref-filter.h" +#include "strvec.h" static char const * const for_each_ref_usage[] = { N_("git for-each-ref [] []"), @@ -25,6 +26,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix) struct ref_format format = REF_FORMAT_INIT; struct strbuf output = STRBUF_INIT; struct strbuf err = STRBUF_INIT; + int from_stdin = 0; + struct strvec vec = STRVEC_INIT; struct option opts[] = { OPT_BIT('s', "shell", &format.quote_style, @@ -49,6 +52,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix) OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")), OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")), OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")), + OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")), OPT_END(), }; @@ -75,7 +79,23 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix) ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase); filter.ignore_case = icase; - filter.name_patterns = argv; + if (from_stdin) { + struct strbuf line = STRBUF_INIT; + + if (argv[0]) + die(_("unknown arguments supplied with --stdin")); + + while (strbuf_getline(&line, stdin) != EOF) + strvec_push(&vec, line.buf); + + strbuf_release(&line); + + /* vec.v is NULL-terminated, just like 'argv'. */ + filter.name_patterns = vec.v; + } else { + filter.name_patterns = argv; + } + filter.match_as_path = 1; filter_refs(&array, &filter, FILTER_REFS_ALL); ref_array_sort(sorting, &array); @@ -97,5 +117,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix) free_commit_list(filter.with_commit); free_commit_list(filter.no_commit); ref_sorting_release(sorting); + strvec_clear(&vec); return 0; } diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index c466fd989f1..a58053a54c5 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -1464,4 +1464,41 @@ sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)" sig_crlf=${sig_crlf%dummy} test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf" +test_expect_success 'git for-each-ref --stdin: empty' ' + >in && + git for-each-ref --format="%(refname)" --stdin actual && + git for-each-ref --format="%(refname)" >expect && + test_cmp expect actual +' + +test_expect_success 'git for-each-ref --stdin: fails if extra args' ' + >in && + test_must_fail git for-each-ref --format="%(refname)" \ + --stdin refs/heads/extra err && + grep "unknown arguments supplied with --stdin" err +' + +test_expect_success 'git for-each-ref --stdin: matches' ' + cat >in <<-EOF && + refs/tags/multi* + refs/heads/amb* + EOF + + cat >expect <<-EOF && + refs/heads/ambiguous + refs/tags/multi-ref1-100000-user1 + refs/tags/multi-ref1-100000-user2 + refs/tags/multi-ref1-200000-user1 + refs/tags/multi-ref1-200000-user2 + refs/tags/multi-ref2-100000-user1 + refs/tags/multi-ref2-100000-user2 + refs/tags/multi-ref2-200000-user1 + refs/tags/multi-ref2-200000-user2 + refs/tags/multiline + EOF + + git for-each-ref --format="%(refname)" --stdin actual && + test_cmp expect actual +' + test_done From patchwork Mon Mar 20 11:26:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CDDFC7618A for ; Mon, 20 Mar 2023 11:27:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230218AbjCTL1M (ORCPT ); Mon, 20 Mar 2023 07:27:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230012AbjCTL1J (ORCPT ); Mon, 20 Mar 2023 07:27:09 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 969D959CA for ; Mon, 20 Mar 2023 04:27:02 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id h17so9969073wrt.8 for ; Mon, 20 Mar 2023 04:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311619; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=rKehZwzMxp2IDxyfm57q9YaxorHBAAhc4ddH2zrnLQA=; b=ACSjYbaoyXHCfSwxQyu4ByB84DZyzlp6KpIJqZgh/2FiNtFMIKvM/4YHMBDV7GyF5j f9ziGc+SS57D1jQVeNIvUrp1iuWf8ep+sCto9sRXJC9fUOK+OtF7+Xn7zdH9guQWHXyz UBiFhEFktw7DSlVU2Bvw+o9iPuEQJQhM8rgPaobMy9MnbxguUV/5/abZTlB16W/mO/9t FhaH78qWLH+DmGt+ogFGZEhcKPAGvAxffvh+avLuJTxrbRIBJm7h1shQdzh10Du5Tet2 o0s3lV6Gh4DKqQRgj3GAuN5mnO9W2QjI8STfaR/Tc3c7NSaYTGJPuzTD+3h2LMZUELUt LBaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311619; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rKehZwzMxp2IDxyfm57q9YaxorHBAAhc4ddH2zrnLQA=; b=TKn7X3+2DVoAp8+qNDyE+sIoPobdBq0GuP/s0TEeF9P3JILYn1uXa7qP3TRFWy8//d yeQ+r+vuHeBJtEKEcBIH/cVX4yOmxgsoz6/ST4sHDeAdUkNjjNKex4EQ7Gs/rhYH81UN a0YOM/NQ8JiR0dGaWXep2kJ3RX2QpfVAoyMY06VxeppbkQN4tDX0yBvcRcsh2BSVglfa lrySN6FjtxcZZzBFDUmKSGg4QytTBNuS4xlElCDc7FMNOd2lHJoCExpF38TPGP45DRnu pouZxKA4pefbgUA30GIJBdd9sPCvrNmidJZHmSfUtI3o7fkh2gXCkR37euccc3vh+ka1 L39g== X-Gm-Message-State: AO0yUKXWm6y0OOll2uezUkd13glFf+FKn0fQ64VCuOH0zoSwpRQtYO5r v772bqsuWyFkMSlACzqnIdls8pMhFbQ= X-Google-Smtp-Source: AK7set+A2DKshESbG9nshUOk96x3Yt+Gfts97ptGvS3Bnih8wFGe1q89qnPFlBViFg8t+peaIOypgQ== X-Received: by 2002:a5d:694d:0:b0:2ce:adbf:cb14 with SMTP id r13-20020a5d694d000000b002ceadbfcb14mr13144111wrw.28.1679311619056; Mon, 20 Mar 2023 04:26:59 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a8-20020a056000100800b002d8566128e5sm227741wrx.25.2023.03.20.04.26.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:26:58 -0700 (PDT) Message-Id: <1e3d499431aee563f6104f5ef1192658ece4ae4b.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:48 +0000 Subject: [PATCH v4 2/9] for-each-ref: explicitly test no matches Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The for-each-ref builtin can take a list of ref patterns, but if none match, it still succeeds (but with no output). Add an explicit test that demonstrates that behavior. Signed-off-by: Derrick Stolee --- t/t6300-for-each-ref.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index a58053a54c5..6614469d2d6 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -1501,4 +1501,17 @@ test_expect_success 'git for-each-ref --stdin: matches' ' test_cmp expect actual ' +test_expect_success 'git for-each-ref with non-existing refs' ' + cat >in <<-EOF && + refs/heads/this-ref-does-not-exist + refs/tags/bogus + EOF + + git for-each-ref --format="%(refname)" --stdin actual && + test_must_be_empty actual && + + xargs git for-each-ref --format="%(refname)" actual && + test_must_be_empty actual +' + test_done From patchwork Mon Mar 20 11:26:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181050 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7427AC7618A for ; Mon, 20 Mar 2023 11:27:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230354AbjCTL12 (ORCPT ); Mon, 20 Mar 2023 07:27:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230173AbjCTL1K (ORCPT ); Mon, 20 Mar 2023 07:27:10 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDE276185 for ; Mon, 20 Mar 2023 04:27:02 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id r19-20020a05600c459300b003eb3e2a5e7bso7276777wmo.0 for ; Mon, 20 Mar 2023 04:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311620; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ktm0kB86VsEKCsVJG3XyhHWxo96yIiCHmCV7DDi5gzI=; b=TMvc/C3Dp3clffCP6ZXCgE5nW8V0TRUaJ0kOzXRP9RxeAzqXDY4q87gY4oqQGrQx/e tiGaGRKK6v/hNqp2FJUsI5gAEwwvl/eoizhKFw72VJEUtWDPEDhmdlYevOxONPFXmxaE Jea2HVinC5oYZ6eQnWLkJo2clgilojxdAQdOJTxUFAsJILC7n3/3K6DHf/Og9JOn5PMG hjUq+t1eSZWfFLzfR8fxw56qoBhTEp+69TCPXdm30/GRC0h2xhQv7WQlBOnIUwCw+tSg rS2LEVyPjKau11tfDRf/zfTyH9yE9dEcuonI5i5e3QO9Ap9I/h91RV34t3682Eje/LEG H9jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311620; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ktm0kB86VsEKCsVJG3XyhHWxo96yIiCHmCV7DDi5gzI=; b=V0wOOINuS7vUcEjHUKzI6QbY2yZGbSpRcIT81Ar6fOctFrTU3AwhFRFUrzcFHB9MUH nOdqSMxgWDc4qxWiC/3MQySDVwiPSivgho4x7037rNMF62l0cUz1YFk3cCrYPR9FgFi6 93OC9ZxIoGbTgEjM04ZAvk9jP97hGpL9awS365Rs1f+AscIxdsQzvFNRqOKSXwlRwsaH rdeoiwMZAGOnBMya2aqZ3C6VO3u2fctMGJS70B/3gYH9b94B4qd6qP414CIrn8cY8t+T oR8WvYv/rwlXoaBeVexvtp5lhztQBC2V6bjR2bK0VEYZ824agrdYEEKxKSgF3myy56vz 6JvA== X-Gm-Message-State: AO0yUKVK6A8uJlLzF0Xp1le03kL8VvKScaEDelj/kCsVQ8hfHgLQ9Jk/ koFXUczod7mog0IvbLO/Ycm/VG5PVxo= X-Google-Smtp-Source: AK7set+HRVVHN8sjiqBsjjVhkgOpsjn5MPCy4A9gqETOu+SJuVOT1dWyCD3PGAJSXtw5hhuQO/LERg== X-Received: by 2002:a1c:f601:0:b0:3ed:8bef:6a04 with SMTP id w1-20020a1cf601000000b003ed8bef6a04mr9880512wmc.27.1679311619750; Mon, 20 Mar 2023 04:26:59 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k22-20020a05600c0b5600b003ed2987690dsm10223179wmr.26.2023.03.20.04.26.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:26:59 -0700 (PDT) Message-Id: <79a57f30a850177ce6400b9e898f24981635541a.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:49 +0000 Subject: [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee This patch extracts the common code used to compute topological levels and corrected committer dates into a common routine, compute_reachable_generation_numbers(). For ease of reading, it only modifies compute_topological_levels() to use this new routine, leaving compute_generation_numbers() to be modified in the next change. This new routine dispatches to call the necessary functions to get and set the generation number for a given commit through a vtable (the compute_generation_info struct). Computing the generation number itself is done in compute_generation_from_max(), which dispatches its implementation based on the generation version requested, or issuing a BUG() for unrecognized generation versions. This does not use a vtable because the logic depends only on the generation number version, not where the data is being loaded from or being stored to. This is a subtle point that will make more sense in a future change that modifies the in-memory generation values instead of just preparing values for writing to a commit-graph file. This change looks like it adds a lot of new code. However, two upcoming changes will be quite small due to the work being done in this change. Co-authored-by: Taylor Blau Signed-off-by: Taylor Blau Signed-off-by: Derrick Stolee --- commit-graph.c | 106 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 83 insertions(+), 23 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index c11b59f28b3..4356c8c1f4b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1446,24 +1446,52 @@ static void close_reachable(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } -static void compute_topological_levels(struct write_commit_graph_context *ctx) +struct compute_generation_info { + struct repository *r; + struct packed_commit_list *commits; + struct progress *progress; + int progress_cnt; + + timestamp_t (*get_generation)(struct commit *c, void *data); + void (*set_generation)(struct commit *c, timestamp_t gen, void *data); + void *data; +}; + +static timestamp_t compute_generation_from_max(struct commit *c, + timestamp_t max_gen, + int generation_version) +{ + switch (generation_version) { + case 1: /* topological levels */ + if (max_gen > GENERATION_NUMBER_V1_MAX - 1) + max_gen = GENERATION_NUMBER_V1_MAX - 1; + return max_gen + 1; + + case 2: /* corrected commit date */ + if (c->date && c->date > max_gen) + max_gen = c->date - 1; + return max_gen + 1; + + default: + BUG("attempting unimplemented version"); + } +} + +static void compute_reachable_generation_numbers( + struct compute_generation_info *info, + int generation_version) { int i; struct commit_list *list = NULL; - if (ctx->report_progress) - ctx->progress = start_delayed_progress( - _("Computing commit graph topological levels"), - ctx->commits.nr); - for (i = 0; i < ctx->commits.nr; i++) { - struct commit *c = ctx->commits.list[i]; - uint32_t level; - - repo_parse_commit(ctx->r, c); - level = *topo_level_slab_at(ctx->topo_levels, c); + for (i = 0; i < info->commits->nr; i++) { + struct commit *c = info->commits->list[i]; + timestamp_t gen; + repo_parse_commit(info->r, c); + gen = info->get_generation(c, info->data); + display_progress(info->progress, info->progress_cnt + 1); - display_progress(ctx->progress, i + 1); - if (level != GENERATION_NUMBER_ZERO) + if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY) continue; commit_list_insert(c, &list); @@ -1471,31 +1499,63 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx) struct commit *current = list->item; struct commit_list *parent; int all_parents_computed = 1; - uint32_t max_level = 0; + uint32_t max_gen = 0; for (parent = current->parents; parent; parent = parent->next) { - repo_parse_commit(ctx->r, parent->item); - level = *topo_level_slab_at(ctx->topo_levels, parent->item); + repo_parse_commit(info->r, parent->item); + gen = info->get_generation(parent->item, info->data); - if (level == GENERATION_NUMBER_ZERO) { + if (gen == GENERATION_NUMBER_ZERO) { all_parents_computed = 0; commit_list_insert(parent->item, &list); break; } - if (level > max_level) - max_level = level; + if (gen > max_gen) + max_gen = gen; } if (all_parents_computed) { pop_commit(&list); - - if (max_level > GENERATION_NUMBER_V1_MAX - 1) - max_level = GENERATION_NUMBER_V1_MAX - 1; - *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; + gen = compute_generation_from_max( + current, max_gen, + generation_version); + info->set_generation(current, gen, info->data); } } } +} + +static timestamp_t get_topo_level(struct commit *c, void *data) +{ + struct write_commit_graph_context *ctx = data; + return *topo_level_slab_at(ctx->topo_levels, c); +} + +static void set_topo_level(struct commit *c, timestamp_t t, void *data) +{ + struct write_commit_graph_context *ctx = data; + *topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t; +} + +static void compute_topological_levels(struct write_commit_graph_context *ctx) +{ + struct compute_generation_info info = { + .r = ctx->r, + .commits = &ctx->commits, + .get_generation = get_topo_level, + .set_generation = set_topo_level, + .data = ctx, + }; + + if (ctx->report_progress) + info.progress = ctx->progress + = start_delayed_progress( + _("Computing commit graph topological levels"), + ctx->commits.nr); + + compute_reachable_generation_numbers(&info, 1); + stop_progress(&ctx->progress); } From patchwork Mon Mar 20 11:26:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0A33C6FD1D for ; Mon, 20 Mar 2023 11:27:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230214AbjCTL1R (ORCPT ); Mon, 20 Mar 2023 07:27:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230060AbjCTL1J (ORCPT ); Mon, 20 Mar 2023 07:27:09 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8729E659B for ; Mon, 20 Mar 2023 04:27:04 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id j2so9967650wrh.9 for ; Mon, 20 Mar 2023 04:27:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311620; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=lxNPHIbuE6ximt61ltxINwr0w2o4l2tgeAo3kZNTTR0=; b=BtygJc8g9ts46UXbqILoyFgKgweILsfi/P/qaTKDc+aF/wV6FWfTpf55DoMF1DYkhQ Q+GaZmAlwx6xWrp/5lWJq/MGjcn+WwAPeQuMLyMUUzDKWIaa1Ut/71R+TKVz/WP2mjUC IhAkF5TxNth6Rqt/8UBDmA/aFuFl0jTPn8xGtFXJunrUEoF04VvsJLZJJ0NX2aQL1cgs pN5T5As5zPQKwiMDKqoM5+i3x7FrM7T0g7aSMMneWKhHXsj9hSkK4E8akj97CEL1oiS4 fcEfM4rEFpvNpTLQxsM9+rjqCvu+toOFZajdTyVDTvZtLmNeFysmBpvp2RIKvO+vGIE3 xKLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311620; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lxNPHIbuE6ximt61ltxINwr0w2o4l2tgeAo3kZNTTR0=; b=3Hnm7DABE6mSrl/FDecpGKdmmsr0BzZA4I5xTSpXkIXS7dCh+fJ9f1shCUrl46sy5l xolIrGzBE0zfsbZhZfw993sGC3xc4gpgp3AUIu0N5/nl+frOPdw54vaU0GXuz8Sdxku0 lhQWh1q9JXvpkKKSPRdCCa/JfllJC1JRu/cFiWbE9tFgd9gX0upHZ/GMxIefAka3964W Azqy9GMiNqH+5gD1J91OINE0HNfpDowh/ep/syq+MEN4VZo3+6DTVdM5qRvRYgNWf4ab Zkm2jIVOBZrP5/jRWxGvPZN78yUT/R5P8tknT6NjLbSVWjdHp8SB2HFXFnjUY2AcCyM1 9Elg== X-Gm-Message-State: AO0yUKW4QrcKPLFd3MlCJpVtamnrKLXgcQpNONjroYr7EfAY7A+MqELE I7Ry1lDB6CsubLA8WX4zoigkYlUnlt0= X-Google-Smtp-Source: AK7set+heD96jspSaF6ZdZ5UvTqLEHNKh9SGTGQZFowyjRAVAVr1CTnGxPo/kDi1HcmWszEukj/auQ== X-Received: by 2002:adf:f1ce:0:b0:2d7:3cd3:85b2 with SMTP id z14-20020adff1ce000000b002d73cd385b2mr2046598wro.23.1679311620416; Mon, 20 Mar 2023 04:27:00 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d12-20020adffbcc000000b002c54e26bca5sm8700053wrs.49.2023.03.20.04.26.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:27:00 -0700 (PDT) Message-Id: <3fd6c75812924e90075dba4f5eadba44f1eb03cb.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:50 +0000 Subject: [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change introduced the generic algorithm compute_reachable_generation_numbers() and used it as the core functionality of compute_topological_levels(). Now, use it as the core functionality of compute_generation_numbers(). The main difference here is that we use generation version 2, which is used in to toggle the logic in compute_generation_from_max() for computing the corrected commit date based on the corrected commit dates of the parent commits (and the commit date of the current commit). It also uses different methods for (get|set)_generation in the vtable in order to store and access the value in the correct places. Co-authored-by: Taylor Blau Signed-off-by: Taylor Blau Signed-off-by: Derrick Stolee --- commit-graph.c | 64 +++++++++++++++++--------------------------------- 1 file changed, 21 insertions(+), 43 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 4356c8c1f4b..d1c98681e88 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1559,13 +1559,31 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static timestamp_t get_generation_from_graph_data(struct commit *c, void *data) +{ + return commit_graph_data_at(c)->generation; +} + +static void set_generation_v2(struct commit *c, timestamp_t t, void *data) +{ + struct commit_graph_data *g = commit_graph_data_at(c); + g->generation = (uint32_t)t; +} + static void compute_generation_numbers(struct write_commit_graph_context *ctx) { int i; - struct commit_list *list = NULL; + struct compute_generation_info info = { + .r = ctx->r, + .commits = &ctx->commits, + .get_generation = get_generation_from_graph_data, + .set_generation = set_generation_v2, + .data = ctx, + }; if (ctx->report_progress) - ctx->progress = start_delayed_progress( + info.progress = ctx->progress + = start_delayed_progress( _("Computing commit graph generation numbers"), ctx->commits.nr); @@ -1577,47 +1595,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) } } - for (i = 0; i < ctx->commits.nr; i++) { - struct commit *c = ctx->commits.list[i]; - timestamp_t corrected_commit_date; - - repo_parse_commit(ctx->r, c); - corrected_commit_date = commit_graph_data_at(c)->generation; - - display_progress(ctx->progress, i + 1); - if (corrected_commit_date != GENERATION_NUMBER_ZERO) - continue; - - commit_list_insert(c, &list); - while (list) { - struct commit *current = list->item; - struct commit_list *parent; - int all_parents_computed = 1; - timestamp_t max_corrected_commit_date = 0; - - for (parent = current->parents; parent; parent = parent->next) { - repo_parse_commit(ctx->r, parent->item); - corrected_commit_date = commit_graph_data_at(parent->item)->generation; - - if (corrected_commit_date == GENERATION_NUMBER_ZERO) { - all_parents_computed = 0; - commit_list_insert(parent->item, &list); - break; - } - - if (corrected_commit_date > max_corrected_commit_date) - max_corrected_commit_date = corrected_commit_date; - } - - if (all_parents_computed) { - pop_commit(&list); - - if (current->date && current->date > max_corrected_commit_date) - max_corrected_commit_date = current->date - 1; - commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; - } - } - } + compute_reachable_generation_numbers(&info, 2); for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; From patchwork Mon Mar 20 11:26:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF210C6FD1D for ; Mon, 20 Mar 2023 11:27:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231150AbjCTL13 (ORCPT ); Mon, 20 Mar 2023 07:27:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbjCTL1K (ORCPT ); Mon, 20 Mar 2023 07:27:10 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7179565B9 for ; Mon, 20 Mar 2023 04:27:03 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id j24so1069995wrd.0 for ; Mon, 20 Mar 2023 04:27:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311621; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=phli6c0KvFlEOLWT6ac0nc3wr3Jb+y2NaXDRHce8eIs=; b=h8T2o9p11jGrp6a8hVVjWs50Mo11Sbe05+Wm/mL2zNK+9Jo1oWSKdz3gIZRKYTajGh ZyZzrpj1JxiEcf+r22mSmLo1iuLORT5RFElKHM/66kkCMKbNIHnpw0Ex9pa4jMlzRYH3 cBSkBwHlmfhBXoXtIGUb8OVcsGx4xrhhWnW8YFnyGW7bertG+kJZ1YDuIgtQGkcxKAet ZIBoLxRg6iGe/CytGt1ALm8QBl0bFf8KhfGgK+IYhz37MtO2eG1qkmstMXx945Vq1/yi ktNTrexxcPzv2vaiiqeM5f9yxsnP/bcLjZ59pxm32xfNdGgsoyZuw2w21ffNRF9DPSOj DnHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311621; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=phli6c0KvFlEOLWT6ac0nc3wr3Jb+y2NaXDRHce8eIs=; b=V9uEP4w+KcfhUk5eDzWlFm8fP6CN9+O3ShnXf1Ov/er0IO5+9YYXqbcCu2+a7Vywn/ Tb4uRpLcE+6igqFqlyk1H0NRYrxXV9pyiPhxsJZqZD4nhsHwoVvzMLvoYrmM2Y8ZUC5G pX2sc1UzLOxkoKo0XLfIUuOQa5Y3LgTwERSIfrBnqZrG3/PjgeX8uj2tYVoXoSVfELpA Ghf7DN6VKDWOCgL/CEtDgH8HWKLBD56qg6UTFAxfdJiFGcNxcu4uFLjdo279iUNTvvkv jADLbED/qMrp4tbE9dOOhkOAuKLw0bBuxLzKYYm7dU59uxusD7jlYsBWnD00Lg8IVtRV oATw== X-Gm-Message-State: AO0yUKWJBGQY+elbybTwmWsJ6KKUU0m0kRjMx4CHvYvjgNloKI+YTUjG KrZd73mhCreCR2lF3+g/EaDvSYWzvqY= X-Google-Smtp-Source: AK7set+XFQnC+1loUEQRHwjZ7wyGRXVqAdrTFT4z2mI1NChBjJTtXxgXnCmCqB42Kk5DgpidFBKVcQ== X-Received: by 2002:a5d:6949:0:b0:2d1:7ade:aac with SMTP id r9-20020a5d6949000000b002d17ade0aacmr11824028wrw.0.1679311621071; Mon, 20 Mar 2023 04:27:01 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f18-20020adff452000000b002c5d3f0f737sm8701264wrp.30.2023.03.20.04.27.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:27:00 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:51 +0000 Subject: [PATCH v4 5/9] commit-graph: return generation from memory Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The commit_graph_generation() method used to report a value of GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance for the given commit but the graph_pos indicated the commit was not in the commit-graph file. However, an upcoming change will introduce the ability to set generation values in-memory without writing the commit-graph file. Thus, we can no longer trust 'graph_pos' to indicate whether or not the generation member can be trusted. Instead, trust the 'generation' member if the commit has a value in the slab _and_ the 'generation' member is non-zero. Otherwise, treat it as GENERATION_NUMBER_INFINITY. This only makes a difference for a very old case for the commit-graph: the very first Git release to write commit-graph files wrote zeroes in the topological level positions. If we are parsing a commit-graph with all zeroes, those commits will now appear to have GENERATION_NUMBER_INFINITY (as if they were not parsed from the commit-graph). I attempted several variations to work around the need for providing an uninitialized 'generation' member, but this was the best one I found. It does require a change to a verification test in t5318 because it reports a different error than the one about non-zero generation numbers. Signed-off-by: Derrick Stolee --- commit-graph.c | 8 +++----- t/t5318-commit-graph.sh | 2 +- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d1c98681e88..63a56483cf6 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -116,12 +116,10 @@ timestamp_t commit_graph_generation(const struct commit *c) struct commit_graph_data *data = commit_graph_data_slab_peek(&commit_graph_data_slab, c); - if (!data) - return GENERATION_NUMBER_INFINITY; - else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH) - return GENERATION_NUMBER_INFINITY; + if (data && data->generation) + return data->generation; - return data->generation; + return GENERATION_NUMBER_INFINITY; } static struct commit_graph_data *commit_graph_data_at(const struct commit *c) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 049c5fc8ead..b6e12115786 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -630,7 +630,7 @@ test_expect_success 'detect incorrect generation number' ' test_expect_success 'detect incorrect generation number' ' corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \ - "non-zero generation number" + "commit-graph generation for commit" ' test_expect_success 'detect incorrect commit date' ' From patchwork Mon Mar 20 11:26:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13181052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F5C3C7618D for ; Mon, 20 Mar 2023 11:27:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230162AbjCTL1a (ORCPT ); Mon, 20 Mar 2023 07:27:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230303AbjCTL12 (ORCPT ); Mon, 20 Mar 2023 07:27:28 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E2354215 for ; Mon, 20 Mar 2023 04:27:05 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id i5-20020a05600c354500b003edd24054e0so2172972wmq.4 for ; Mon, 20 Mar 2023 04:27:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311621; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=43PItVhDxJ3mQhLHon2EEHxJOGOuB59by0/UPK28OWc=; b=SzclkQSm9mv0OHnIQTxz7xW54Rfg/3WCOCRNB4reznMT5oLSv5+cGqyNxLJklJRVyH XOmMluht8mi9CwJrHTK5hnSurXeLWeV/pokW1s4LrNfDWmrjy6/Lh9qPNEhdym/VeVar TsQ5f9oZC5x45adhMWIfqajon+MwSPoi6JBs2FVAd3nOctNr0OA2jJu6/emFCc5OVIQR aKc0VF8qZhLDQ+NpqzDoQCnfwgXh8A2+uWW3gFVS7iZ2nVu++nQyACbXiasZSKkhV1AP up5zX3GLZUNpZX+UzyOJIO/1dqGZZvpJeuJvqSDS+fqpTWGU4+0clHFxKSaG+y3H1ioy bong== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311621; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=43PItVhDxJ3mQhLHon2EEHxJOGOuB59by0/UPK28OWc=; b=A7CbaO/zSBQFrKQe5lMb0rZHfDyFN52+uPceGTNYj4qNimdJqAQ2ezkyvyOJOVmLRo mmCLm1781D5poCYGPIrsBqMtbUfvv9+xEt32IXIEbDaUICkL1Fd0D9DARFli6ec/si/U QvBkg4xM268RYiZvmuC1u5+pM9mG45SljhS3/+kb/S/9/eFzug/fssIgI6Fd6J+xrayu D9otWTtHQJTxMoQ/VCQSomLwdg4ZVlk+c2M5pkLteNPxaIOvk9YldBdOvSxq+2r8s3wb z7y5brMKzBjbdrSif/N+7uu5fq5YNo1ZX4KPMHnadbPHKWbJ/g4IYWDzO8P/5T3s+oCc 1+8w== X-Gm-Message-State: AO0yUKUjYtXYLzSEsTxtvXGAVmtMbJvXJf4I9WqrwtllHf9HlsncSSff DYlW7N9z9lBbuCNOGUFweAevmmjRaxE= X-Google-Smtp-Source: AK7set9sl9L91TPfj00KV5bPP/+dRbAKXGcKwVV6JWMvydppG/UZEywipxZF0uiPB+4aing9wVnFQw== X-Received: by 2002:a7b:c5c8:0:b0:3ed:8079:27d7 with SMTP id n8-20020a7bc5c8000000b003ed807927d7mr9148922wmk.40.1679311621742; Mon, 20 Mar 2023 04:27:01 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p13-20020a7bcc8d000000b003ee1acdb036sm75568wma.17.2023.03.20.04.27.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:27:01 -0700 (PDT) Message-Id: <17a1fc9b15e27dab71c123876fb6ced535d7d4d8.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:52 +0000 Subject: [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Taylor Blau Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Taylor Blau From: Taylor Blau Use the just-introduced compute_reachable_generation_numbers_1() to implement a function which dynamically computes topological levels (or corrected commit dates) for out-of-graph commits. This will be useful for the ahead-behind algorithm we are about to introduce, which needs accurate topological levels on _all_ commits reachable from the tips in order to avoid over-counting. Co-authored-by: Derrick Stolee Signed-off-by: Taylor Blau Signed-off-by: Derrick Stolee --- commit-graph.c | 29 +++++++++++++++++++++++++++++ commit-graph.h | 8 ++++++++ 2 files changed, 37 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 63a56483cf6..172e679db19 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1604,6 +1604,35 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) stop_progress(&ctx->progress); } +static void set_generation_in_graph_data(struct commit *c, timestamp_t t, + void *data) +{ + commit_graph_data_at(c)->generation = t; +} + +/* + * After this method, all commits reachable from those in the given + * list will have non-zero, non-infinite generation numbers. + */ +void ensure_generations_valid(struct repository *r, + struct commit **commits, size_t nr) +{ + int generation_version = get_configured_generation_version(r); + struct packed_commit_list list = { + .list = commits, + .alloc = nr, + .nr = nr, + }; + struct compute_generation_info info = { + .r = r, + .commits = &list, + .get_generation = get_generation_from_graph_data, + .set_generation = set_generation_in_graph_data, + }; + + compute_reachable_generation_numbers(&info, generation_version); +} + static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx) { trace2_data_intmax("commit-graph", ctx->r, "filter-computed", diff --git a/commit-graph.h b/commit-graph.h index 37faee6b66d..73e182ab2d0 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -190,4 +190,12 @@ struct commit_graph_data { */ timestamp_t commit_graph_generation(const struct commit *); uint32_t commit_graph_position(const struct commit *); + +/* + * After this method, all commits reachable from those in the given + * list will have non-zero, non-infinite generation numbers. + */ +void ensure_generations_valid(struct repository *r, + struct commit **commits, size_t nr); + #endif From patchwork Mon Mar 20 11:26:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB26C6FD1D for ; Mon, 20 Mar 2023 11:27:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230251AbjCTL1c (ORCPT ); Mon, 20 Mar 2023 07:27:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230170AbjCTL11 (ORCPT ); Mon, 20 Mar 2023 07:27:27 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC1E66A4C for ; Mon, 20 Mar 2023 04:27:05 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id t17-20020a05600c451100b003edc906aeeaso1382684wmo.1 for ; Mon, 20 Mar 2023 04:27:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311622; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=voQ7YeXH9eF4Imoj+AIJFG6p42Mxn016kSvitTBIW3s=; b=A8TADtL0eQVPVy9zTv87Yr0jNAJUYs4Hf21fpIvvPJbJMAiJlz39WfdSfBbyFxn2nI 704OAF5G3liq9qFtVbmjsJ3fi2RPS2Rb5Dla3jJgSa+3mc/m9YUhSlbgiKUXzZMhpf8P UTf1Bm15VGz6zPQ/H2Oolr9nQAf+q15A5XZgU/GYGCmR1yghfr38UyDQjcE4QLXplpDO r3r3c0WbYg1ADMLrAymopNICOSqf27l+f5xUtQbcbPLCKpjzvQCtTEEC05wHBE1Id4mV WN8bpTNTsSn2ZO5bxq0aqv0af33MUQ6JY5DJaDMImJVaFgpr2IqIUs3SoLWQF1ZzopmI aoog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311622; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=voQ7YeXH9eF4Imoj+AIJFG6p42Mxn016kSvitTBIW3s=; b=spRLoeYpIWx3ldLj8KRTrXYpzrJxz0t0UL3rxAR0MRMHHi8hE+qDYf2ssAahbeqxC2 9mUyCFhymCuXpb3Yfg4qseGkDNOcDHdQRrcQAI4EhM4Lfi/jt1sEqUM24SxU4wdfJPo7 2FUXB5sjB2AHddxY0w6pKsayVc3kioWHi4ez4DCiNtW0kcWAV9F/aGsOLMR4UGeXaKO/ 4NKBSFk+r2VGKxnSN3Wvo0YWEUWBJVPpEjFzCUbW+G+IdFWOqwdxaK4iZZo/GK6tbUSA 78UthQBjqpc7Zu3kEfz9DioNL4AM5JQvpZuREWfDfD2ae0Q68+hpGtB2kXYjaVZU13FQ n/Ew== X-Gm-Message-State: AO0yUKWNBm2BV2W9IeT8XbJ1srUjhHD3i2HyTh7HOwlpD5B7MSyiGnBf ZJYvPNYx7PRHo4eR+iahrK0xN8BjCW0= X-Google-Smtp-Source: AK7set9yVm+AzJ4qa6rNAQf9Iw+Y03DZE57QN3twq8lwWKEs8TN//PgFE1KJFPRo/h4nQF0b/VIERA== X-Received: by 2002:a05:600c:19c9:b0:3ed:31fa:f563 with SMTP id u9-20020a05600c19c900b003ed31faf563mr16141712wmq.20.1679311622474; Mon, 20 Mar 2023 04:27:02 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p9-20020a05600c468900b003ed5909aab2sm12917774wmo.25.2023.03.20.04.27.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:27:02 -0700 (PDT) Message-Id: <5d937184a0eba9176d97423fb450850fc482e4de.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:53 +0000 Subject: [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Fully implement the commit-counting logic required to determine ahead/behind counts for a batch of commit pairs. This is a new library method within commit-reach.h. This method will be linked to the for-each-ref builtin in the next change. The interface for ahead_behind() uses two arrays. The first array of commits contains the list of all starting points for the walk. This includes all tip commits _and_ base commits. The second array specifies base/tip pairs by pointing to commits within the first array, by index. The second array also stores the resulting ahead/behind counts for each of these pairs. This implementation of ahead_behind() allows multiple bases, if desired. Even with multiple bases, there is only one commit walk used for counting the ahead/behind values, saving time when the base/tip ranges overlap significantly. This interface for ahead_behind() also makes it very easy to call ensure_generations_valid() on the entire array of bases and tips. This call is necessary because it is critical that the walk that counts ahead/behind values never walks a commit more than once. Without generation numbers on every commit, there is a possibility that a commit date skew could cause the walk to revisit a commit and then double-count it. For this reason, it is strongly recommended that 'git ahead-behind' is only run in a repository with a commit-graph file that covers most of the reachable commits, storing precomputed generation numbers. If no commit-graph exists, this walk will be much slower as it must walk all reachable commits in ensure_generations_valid() before performing the counting logic. It is possible to detect if generation numbers are available at run time and redirect the implementation to another algorithm that does not require this property. However, that implementation requires a commit walk per base/tip pair _and_ can be slower due to the commit date heuristics required. Such an implementation could be considered in the future if there is a reason to include it, but most Git hosts should already be generating a commit-graph file as part of repository maintenance. Most Git clients should also be generating commit-graph files as part of background maintenance or automatic GCs. Now, let's discuss the ahead/behind counting algorithm. The first array of commits are considered the starting commits. The index within that array will play a critical role. We create a new commit slab that maps commits to a bitmap. For a given commit (anywhere in the history), its bitmap stores information relative to which of the input commits can reach that commit. The ith bit will be on if the ith commit from the starting list can reach that commit. It is important to notice that these bitmaps are not the typical "reachability bitmaps" that are stored in .bitmap files. Instead of signalling which objects are reachable from the current commit, they instead signal "which starting commits can reach me?" It is also important to know that the bitmap is not necessarily "complete" until we walk that commit. We will perform a commit walk by generation number in such a way that we can guarantee the bitmap is correct when we visit that commit. At the beginning of the ahead_behind() method, we initialize the bitmaps for each of the starting commits. By enabling the ith bit for the ith starting commit, we signal "the ith commit can reach itself." We walk commits by popping the commit with maximum generation number out of the queue, guaranteeing that we will never walk a child of that commit in any future steps. As we walk, we load the bitmap for the current commit and perform two main steps. The _second_ step examines each parent of the current commit and adds the current commit's bitmap bits to each parent's bitmap. (We create a new bitmap for the parent if this is our first time seeing that parent.) After adding the bits to the parent's bitmap, the parent is added to the walk queue. Due to this passing of bits to parents, the current commit has a guarantee that the ith bit is enabled on its bitmap if and only if the ith commit can reach the current commit. The first step of the walk is to examine the bitmask on the current commit and decide which ranges the commit is in or not. Due to the "bit pushing" in the second step, we have a guarantee that the ith bit of the current commit's bitmap is on if and only if the ith starting commit can reach it. For each ahead_behind_count struct, check the base_index and tip_index to see if those bits are enabled on the current bitmap. If exactly one bit is enabled, then increment the corresponding 'ahead' or 'behind' count. This increment is the reason we _absolutely need_ to walk commits at most once. The only subtle thing to do with this walk is to check to see if a parent has all bits on in its bitmap, in which case it becomes "stale" and is marked with the STALE bit. This allows queue_has_nonstale() to be the terminating condition of the walk, which greatly reduces the number of commits walked if all of the commits are nearby in history. It avoids walking a large number of common commits when there is a deep history. We also use the helper method insert_no_dup() to add commits to the priority queue without adding them multiple times. This uses the PARENT2 flag. Thus, we must clear both the STALE and PARENT2 bits of all commits, in case ahead_behind() is called multiple times in the same process. Co-authored-by: Taylor Blau Signed-off-by: Taylor Blau Signed-off-by: Derrick Stolee --- commit-reach.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++ commit-reach.h | 31 +++++++++++++++ 2 files changed, 134 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index 2e33c599a82..cd990dce16a 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -8,6 +8,7 @@ #include "revision.h" #include "tag.h" #include "commit-reach.h" +#include "ewah/ewok.h" /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -941,3 +942,105 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, return found_commits; } + +define_commit_slab(bit_arrays, struct bitmap *); +static struct bit_arrays bit_arrays; + +static void insert_no_dup(struct prio_queue *queue, struct commit *c) +{ + if (c->object.flags & PARENT2) + return; + prio_queue_put(queue, c); + c->object.flags |= PARENT2; +} + +static struct bitmap *get_bit_array(struct commit *c, int width) +{ + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); + if (!*bitmap) + *bitmap = bitmap_word_alloc(width); + return *bitmap; +} + +static void free_bit_array(struct commit *c) +{ + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); + if (!*bitmap) + return; + bitmap_free(*bitmap); + *bitmap = NULL; +} + +void ahead_behind(struct repository *r, + struct commit **commits, size_t commits_nr, + struct ahead_behind_count *counts, size_t counts_nr) +{ + struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date }; + size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD); + + if (!commits_nr || !counts_nr) + return; + + for (size_t i = 0; i < counts_nr; i++) { + counts[i].ahead = 0; + counts[i].behind = 0; + } + + ensure_generations_valid(r, commits, commits_nr); + + init_bit_arrays(&bit_arrays); + + for (size_t i = 0; i < commits_nr; i++) { + struct commit *c = commits[i]; + struct bitmap *bitmap = get_bit_array(c, width); + + bitmap_set(bitmap, i); + insert_no_dup(&queue, c); + } + + while (queue_has_nonstale(&queue)) { + struct commit *c = prio_queue_get(&queue); + struct commit_list *p; + struct bitmap *bitmap_c = get_bit_array(c, width); + + for (size_t i = 0; i < counts_nr; i++) { + int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index); + int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index); + + if (reach_from_tip ^ reach_from_base) { + if (reach_from_base) + counts[i].behind++; + else + counts[i].ahead++; + } + } + + for (p = c->parents; p; p = p->next) { + struct bitmap *bitmap_p; + + repo_parse_commit(r, p->item); + + bitmap_p = get_bit_array(p->item, width); + bitmap_or(bitmap_p, bitmap_c); + + /* + * If this parent is reachable from every starting + * commit, then none of its ancestors can contribute + * to the ahead/behind count. Mark it as STALE, so + * we can stop the walk when every commit in the + * queue is STALE. + */ + if (bitmap_popcount(bitmap_p) == commits_nr) + p->item->object.flags |= STALE; + + insert_no_dup(&queue, p->item); + } + + free_bit_array(c); + } + + /* STALE is used here, PARENT2 is used by insert_no_dup(). */ + repo_clear_commit_marks(r, PARENT2 | STALE); + clear_bit_arrays(&bit_arrays); + clear_prio_queue(&queue); +} diff --git a/commit-reach.h b/commit-reach.h index 148b56fea50..f708c46e523 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -104,4 +104,35 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit **to, int nr_to, unsigned int reachable_flag); +struct ahead_behind_count { + /** + * As input, the *_index members indicate which positions in + * the 'tips' array correspond to the tip and base of this + * comparison. + */ + size_t tip_index; + size_t base_index; + + /** + * These values store the computed counts for each side of the + * symmetric difference: + * + * 'ahead' stores the number of commits reachable from the tip + * and not reachable from the base. + * + * 'behind' stores the number of commits reachable from the base + * and not reachable from the tip. + */ + unsigned int ahead; + unsigned int behind; +}; + +/* + * Given an array of commits and an array of ahead_behind_count pairs, + * compute the ahead/behind counts for each pair. + */ +void ahead_behind(struct repository *r, + struct commit **commits, size_t commits_nr, + struct ahead_behind_count *counts, size_t counts_nr); + #endif From patchwork Mon Mar 20 11:26:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFE35C7618A for ; Mon, 20 Mar 2023 11:27:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230527AbjCTL1i (ORCPT ); Mon, 20 Mar 2023 07:27:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230060AbjCTL12 (ORCPT ); Mon, 20 Mar 2023 07:27:28 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA94A3A8C for ; Mon, 20 Mar 2023 04:27:06 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id i10-20020a05600c354a00b003ee0da1132eso665167wmq.4 for ; Mon, 20 Mar 2023 04:27:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311623; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=/24v3yocMGxJmoI88bw0WqppXDMMlpa5w8iATzLQke0=; b=kFNBbFHpq1nHdRlVSkLW64YcLxEBxgoOJfd79GON6vjF3L3w95BHOWA/H2HLcXGicU C2ujNYTQx7d3s76iGoYz6TD4pwPwnvMZRe/z/3+QnS/OW9fvkm4nYsu5uSwOYOkzUInN /D15GKD2frG1RDFCYOsioekxwjSceiNaXaIqkKP1gLzwXLJLkQcffmskbkR20Z78mCjJ EZc0CI7mPvsf0cBibpuaBxWZTeVDFyGf7cSTWoWc3YhreGYULsyvXd9Rmm07McVwt+hK oi3FyhCUI8rb3LaeBi+JhjWyMrbF5wJ5ocPuWINTYQU8h+bZ4joyMWwACx5codISc0jJ geEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311623; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/24v3yocMGxJmoI88bw0WqppXDMMlpa5w8iATzLQke0=; b=EgYUEmTbaukGtGjm07AOgM/cdVN0/bVDMYpl9ccXbWihrEMfcH5/S6kybmlqsSyTmW herO1YsN8rbqh1GMtHuMWnpxuxUPa3MjTm/B1PM7RDCJgXv0OKJmQrirJr3MD6N9kOEE EOcROd/axkEnMPaKYGstN1yRXnaT2DRfBZH6+O38kWlW/kXDb/WgfZWQ3Kqh0i+erme4 bnKMvEPxcdjH4jhcdGyL2Xual6uhPoMIr9sDjyD76tdhvK9DKGd1/Av59rNJWrrSSjVb XH4aUUoL05ERAKp8vKVUEGVbKIYLDwg3swOjIbrIvlrVqKbeCpEKD4N7FDYdGBtm0vc+ 2Nbg== X-Gm-Message-State: AO0yUKXzu4aVqmvtY7LLEAe3RkqPUhEXUySNpQaUUbmyOyklZBpYiCrl 83NamaH2dTy7bl0EIc3CRbHdfvGtPYU= X-Google-Smtp-Source: AK7set+RtlLYW7azOdMXhU3LqAl61OYgL3NE8ON40+64fbYIN7HKHLwfCe0cnK+TS3RVc4gbN5d2Dg== X-Received: by 2002:a05:600c:4f01:b0:3ed:31cf:fe6e with SMTP id l1-20020a05600c4f0100b003ed31cffe6emr15753588wmq.41.1679311623174; Mon, 20 Mar 2023 04:27:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q14-20020a05600000ce00b002be505ab59asm8631810wrx.97.2023.03.20.04.27.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:27:02 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:54 +0000 Subject: [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee --- Documentation/git-for-each-ref.txt | 5 ++ builtin/branch.c | 1 + builtin/for-each-ref.c | 3 ++ builtin/tag.c | 1 + ref-filter.c | 73 +++++++++++++++++++++++++ ref-filter.h | 26 ++++++++- t/perf/p1500-graph-walks.sh | 45 ++++++++++++++++ t/t3203-branch-output.sh | 14 +++++ t/t6301-for-each-ref-errors.sh | 14 +++++ t/t6600-test-reach.sh | 86 ++++++++++++++++++++++++++++++ t/t7004-tag.sh | 28 ++++++++++ 11 files changed, 295 insertions(+), 1 deletion(-) create mode 100755 t/perf/p1500-graph-walks.sh diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index ccdc2911bb9..0713e49b499 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -222,6 +222,11 @@ worktreepath:: out, if it is checked out in any linked worktree. Empty string otherwise. +ahead-behind::: + Two integers, separated by a space, demonstrating the number of + commits ahead and behind, respectively, when comparing the output + ref to the `` specified in the format. + In addition to the above, for commit and tag objects, the header field names (`tree`, `parent`, `object`, `type`, and `tag`) can be used to specify the value in the header field. diff --git a/builtin/branch.c b/builtin/branch.c index f63fd45edb9..0554d7cebb3 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -448,6 +448,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin if (verify_ref_format(format)) die(_("unable to parse format string")); + filter_ahead_behind(the_repository, format, &array); ref_array_sort(sorting, &array); for (i = 0; i < array.nr; i++) { diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c index 9df16cfb854..6b3d07ef409 100644 --- a/builtin/for-each-ref.c +++ b/builtin/for-each-ref.c @@ -6,6 +6,7 @@ #include "parse-options.h" #include "ref-filter.h" #include "strvec.h" +#include "commit-reach.h" static char const * const for_each_ref_usage[] = { N_("git for-each-ref [] []"), @@ -98,6 +99,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix) filter.match_as_path = 1; filter_refs(&array, &filter, FILTER_REFS_ALL); + filter_ahead_behind(the_repository, &format, &array); + ref_array_sort(sorting, &array); if (!maxcount || array.nr < maxcount) diff --git a/builtin/tag.c b/builtin/tag.c index d428c45dc8d..1b3f49d7b4c 100644 --- a/builtin/tag.c +++ b/builtin/tag.c @@ -66,6 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting, die(_("unable to parse format string")); filter->with_commit_tag_algo = 1; filter_refs(&array, filter, FILTER_REFS_TAGS); + filter_ahead_behind(the_repository, format, &array); ref_array_sort(sorting, &array); for (i = 0; i < array.nr; i++) { diff --git a/ref-filter.c b/ref-filter.c index f8203c6b052..62135f649ec 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -158,6 +158,7 @@ enum atom_type { ATOM_THEN, ATOM_ELSE, ATOM_REST, + ATOM_AHEADBEHIND, }; /* @@ -586,6 +587,22 @@ static int rest_atom_parser(struct ref_format *format, struct used_atom *atom, return 0; } +static int ahead_behind_atom_parser(struct ref_format *format, struct used_atom *atom, + const char *arg, struct strbuf *err) +{ + struct string_list_item *item; + + if (!arg) + return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:)")); + + item = string_list_append(&format->bases, arg); + item->util = lookup_commit_reference_by_name(arg); + if (!item->util) + die("failed to find '%s'", arg); + + return 0; +} + static int head_atom_parser(struct ref_format *format, struct used_atom *atom, const char *arg, struct strbuf *err) { @@ -645,6 +662,7 @@ static struct { [ATOM_THEN] = { "then", SOURCE_NONE }, [ATOM_ELSE] = { "else", SOURCE_NONE }, [ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser }, + [ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser }, /* * Please update $__git_ref_fieldlist in git-completion.bash * when you add new atoms @@ -1848,6 +1866,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) struct object *obj; int i; struct object_info empty = OBJECT_INFO_INIT; + int ahead_behind_atoms = 0; CALLOC_ARRAY(ref->value, used_atom_cnt); @@ -1978,6 +1997,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) else v->s = xstrdup(""); continue; + } else if (atom_type == ATOM_AHEADBEHIND) { + if (ref->counts) { + const struct ahead_behind_count *count; + count = ref->counts[ahead_behind_atoms++]; + v->s = xstrfmt("%d %d", count->ahead, count->behind); + } else { + /* Not a commit. */ + v->s = xstrdup(""); + } + continue; } else continue; @@ -2328,6 +2357,7 @@ static void free_array_item(struct ref_array_item *item) free((char *)item->value[i].s); free(item->value); } + free(item->counts); free(item); } @@ -2356,6 +2386,8 @@ void ref_array_clear(struct ref_array *array) free_worktrees(ref_to_worktree_map.worktrees); ref_to_worktree_map.worktrees = NULL; } + + FREE_AND_NULL(array->counts); } #define EXCLUDE_REACHED 0 @@ -2418,6 +2450,47 @@ static void reach_filter(struct ref_array *array, free(to_clear); } +void filter_ahead_behind(struct repository *r, + struct ref_format *format, + struct ref_array *array) +{ + struct commit **commits; + size_t commits_nr = format->bases.nr + array->nr; + + if (!format->bases.nr || !array->nr) + return; + + ALLOC_ARRAY(commits, commits_nr); + for (size_t i = 0; i < format->bases.nr; i++) + commits[i] = format->bases.items[i].util; + + ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr)); + + commits_nr = format->bases.nr; + array->counts_nr = 0; + for (size_t i = 0; i < array->nr; i++) { + const char *name = array->items[i]->refname; + commits[commits_nr] = lookup_commit_reference_by_name(name); + + if (!commits[commits_nr]) + continue; + + CALLOC_ARRAY(array->items[i]->counts, format->bases.nr); + for (size_t j = 0; j < format->bases.nr; j++) { + struct ahead_behind_count *count; + count = &array->counts[array->counts_nr++]; + count->tip_index = commits_nr; + count->base_index = j; + + array->items[i]->counts[j] = count; + } + commits_nr++; + } + + ahead_behind(r, commits, commits_nr, array->counts, array->counts_nr); + free(commits); +} + /* * API for filtering a set of refs. Based on the type of refs the user * has requested, we iterate through those refs and apply filters diff --git a/ref-filter.h b/ref-filter.h index aa0eea4ecf5..c9a11495177 100644 --- a/ref-filter.h +++ b/ref-filter.h @@ -5,6 +5,7 @@ #include "refs.h" #include "commit.h" #include "parse-options.h" +#include "string-list.h" /* Quoting styles */ #define QUOTE_NONE 0 @@ -24,6 +25,7 @@ struct atom_value; struct ref_sorting; +struct ahead_behind_count; enum ref_sorting_order { REF_SORTING_REVERSE = 1<<0, @@ -40,6 +42,8 @@ struct ref_array_item { const char *symref; struct commit *commit; struct atom_value *value; + struct ahead_behind_count **counts; + char refname[FLEX_ARRAY]; }; @@ -47,6 +51,9 @@ struct ref_array { int nr, alloc; struct ref_array_item **items; struct rev_info *revs; + + struct ahead_behind_count *counts; + size_t counts_nr; }; struct ref_filter { @@ -80,9 +87,15 @@ struct ref_format { /* Internal state to ref-filter */ int need_color_reset_at_eol; + + /* List of bases for ahead-behind counts. */ + struct string_list bases; }; -#define REF_FORMAT_INIT { .use_color = -1 } +#define REF_FORMAT_INIT { \ + .use_color = -1, \ + .bases = STRING_LIST_INIT_DUP, \ +} /* Macros for checking --merged and --no-merged options */ #define _OPT_MERGED_NO_MERGED(option, filter, h) \ @@ -143,4 +156,15 @@ struct ref_array_item *ref_array_push(struct ref_array *array, const char *refname, const struct object_id *oid); +/* + * If the provided format includes ahead-behind atoms, then compute the + * ahead-behind values for the array of filtered references. Must be + * called after filter_refs() but before outputting the formatted refs. + * + * If this is not called, then any ahead-behind atoms will be blank. + */ +void filter_ahead_behind(struct repository *r, + struct ref_format *format, + struct ref_array *array); + #endif /* REF_FILTER_H */ diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh new file mode 100755 index 00000000000..439a448c2e6 --- /dev/null +++ b/t/perf/p1500-graph-walks.sh @@ -0,0 +1,45 @@ +#!/bin/sh + +test_description='Commit walk performance tests' +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs && + sort -r allrefs | head -n 50 >refs && + for ref in $(cat refs) + do + git branch -f ref-$ref $ref && + echo ref-$ref || + return 1 + done >branches && + for ref in $(cat refs) + do + git tag -f tag-$ref $ref && + echo tag-$ref || + return 1 + done >tags && + git commit-graph write --reachable +' + +test_perf 'ahead-behind counts: git for-each-ref' ' + git for-each-ref --format="%(ahead-behind:HEAD)" --stdin expect <<-\EOF && + (HEAD detached from fromtag) 0 0 + refs/heads/ambiguous 0 0 + refs/heads/branch-one 1 0 + refs/heads/branch-two 0 0 + refs/heads/main 1 0 + refs/heads/ref-to-branch 1 0 + refs/heads/ref-to-remote 1 0 + EOF + git branch --format="%(refname) %(ahead-behind:HEAD)" >actual && + test_cmp expect actual +' + test_expect_success 'git branch with --format=%(rest) must fail' ' test_must_fail git branch --format="%(rest)" >actual ' diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh index bfda1f46ad2..2667dd13fe3 100755 --- a/t/t6301-for-each-ref-errors.sh +++ b/t/t6301-for-each-ref-errors.sh @@ -54,4 +54,18 @@ test_expect_success 'Missing objects are reported correctly' ' test_must_be_empty brief-err ' +test_expect_success 'ahead-behind requires an argument' ' + test_must_fail git for-each-ref \ + --format="%(ahead-behind)" 2>err && + echo "fatal: expected format: %(ahead-behind:)" >expect && + test_cmp expect err +' + +test_expect_success 'missing ahead-behind base' ' + test_must_fail git for-each-ref \ + --format="%(ahead-behind:refs/heads/missing)" 2>err && + echo "fatal: failed to find '\''refs/heads/missing'\''" >expect && + test_cmp expect err +' + test_done diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 338a9c46a24..0cb50797ef7 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -443,4 +443,90 @@ test_expect_success 'get_reachable_subset:none' ' test_all_modes get_reachable_subset ' +test_expect_success 'for-each-ref ahead-behind:linear' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-1-3 + refs/heads/commit-1-5 + refs/heads/commit-1-8 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 0 8 + refs/heads/commit-1-3 0 6 + refs/heads/commit-1-5 0 4 + refs/heads/commit-1-8 0 1 + EOF + run_all_modes git for-each-ref \ + --format="%(refname) %(ahead-behind:commit-1-9)" --stdin +' + +test_expect_success 'for-each-ref ahead-behind:all' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-2-4 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 0 24 + refs/heads/commit-2-4 0 17 + refs/heads/commit-4-2 0 17 + refs/heads/commit-4-4 0 9 + EOF + run_all_modes git for-each-ref \ + --format="%(refname) %(ahead-behind:commit-5-5)" --stdin +' + +test_expect_success 'for-each-ref ahead-behind:some' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-5-3 + refs/heads/commit-4-8 + refs/heads/commit-9-9 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 0 53 + refs/heads/commit-4-8 8 30 + refs/heads/commit-5-3 0 39 + refs/heads/commit-9-9 27 0 + EOF + run_all_modes git for-each-ref \ + --format="%(refname) %(ahead-behind:commit-9-6)" --stdin +' + +test_expect_success 'for-each-ref ahead-behind:some, multibase' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-5-3 + refs/heads/commit-7-8 + refs/heads/commit-4-8 + refs/heads/commit-9-9 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 0 53 0 53 + refs/heads/commit-4-8 8 30 0 22 + refs/heads/commit-5-3 0 39 0 39 + refs/heads/commit-7-8 14 12 8 6 + refs/heads/commit-9-9 27 0 27 0 + EOF + run_all_modes git for-each-ref \ + --format="%(refname) %(ahead-behind:commit-9-6) %(ahead-behind:commit-6-9)" \ + --stdin +' + +test_expect_success 'for-each-ref ahead-behind:none' ' + cat >input <<-\EOF && + refs/heads/commit-7-5 + refs/heads/commit-4-8 + refs/heads/commit-9-9 + EOF + cat >expect <<-\EOF && + refs/heads/commit-4-8 16 16 + refs/heads/commit-7-5 7 4 + refs/heads/commit-9-9 49 0 + EOF + run_all_modes git for-each-ref \ + --format="%(refname) %(ahead-behind:commit-8-4)" --stdin +' + test_done diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh index 9aa1660651b..04a4b44183d 100755 --- a/t/t7004-tag.sh +++ b/t/t7004-tag.sh @@ -792,6 +792,34 @@ test_expect_success 'annotations for blobs are empty' ' test_cmp expect actual ' +# Run this before doing any signing, so the test has the same results +# regardless of the GPG prereq. +test_expect_success 'git tag --format with ahead-behind' ' + test_when_finished git reset --hard tag-one-line && + git commit --allow-empty -m "left" && + git tag -a -m left tag-left && + git reset --hard HEAD~1 && + git commit --allow-empty -m "right" && + git tag -a -m left tag-right && + + # Use " !" at the end to demonstrate whitespace + # around empty ahead-behind token for tag-blob. + cat >expect <<-EOF && + refs/tags/tag-blob ! + refs/tags/tag-left 1 1 ! + refs/tags/tag-lines 0 1 ! + refs/tags/tag-one-line 0 1 ! + refs/tags/tag-right 0 0 ! + refs/tags/tag-zero-lines 0 1 ! + EOF + git tag -l --format="%(refname) %(ahead-behind:HEAD) !" >actual 2>err && + grep "refs/tags/tag" actual >actual.focus && + test_cmp expect actual.focus && + + # Error reported for tags that point to non-commits. + grep "error: object [0-9a-f]* is a blob, not a commit" err +' + # trying to verify annotated non-signed tags: test_expect_success GPG \ From patchwork Mon Mar 20 11:26:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13181054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED617C7618D for ; Mon, 20 Mar 2023 11:27:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231243AbjCTL1g (ORCPT ); Mon, 20 Mar 2023 07:27:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230348AbjCTL12 (ORCPT ); Mon, 20 Mar 2023 07:27:28 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F03183CF for ; Mon, 20 Mar 2023 04:27:06 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id o7so9984431wrg.5 for ; Mon, 20 Mar 2023 04:27:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679311624; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=BpJj3cxaboiEVcMTdxd1AhVtKxptQqASnjV5ErAmi28=; b=QqgiXlaIgw8OMzAqM7+r7Q6WYmCKYOYt2cRl5dmWNsCGYpdsBrK9G1TJzDW5U+d2BU StojrUhOgBWi7oLc1ZrwGKal7SjUvSMnvzIvBIEgzPzUd8L/dwmFuWfXQscAe/quyq1b BdiVWL9Lh5waT+hsf7P8T1sfFCfGh6PLdxS8Bw4edGBT8r9+jcvM4XFQqGvr/+lL56Ys qZprFRRuRUnCYHiTqBX58Rzkz3gs2e6Tfjnp1h5UhdKWsQGpd1v5N5ZN1z5AMaNDxXUg yIgYtS7ZS4F1TtjE9uJzqCqwOTJq2xyPpU72XDBR8psR6K1bihgBTTY73pXdES9rAS3p chOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679311624; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BpJj3cxaboiEVcMTdxd1AhVtKxptQqASnjV5ErAmi28=; b=MPP4/CwhBKfBM5WpfsC6cMz8EDK9/eHfTFqps2ixY4X/YmcISjdoMSGI08oFPHFntB JK4SKNoqOuhz5FZOFm1Z7X46xp1kkGF4xQzj6yM9yal/rEh/7TFaLgM7ukmXktHtIB3F WCp3AxwlLtTIPm+pv8Fv1lLyjiBGUsOfLNa6L489O4+O6bCHbDgd48fGa3Sdkp8Tjbrb 1liPxsz7K1iPC5U3OByB4JfwVPWeEBfuyRm7poDS6w5jXD0ZGWP5tcSNHc80gEXQ4SUT 6mOD9WgHlHSzgF/M1W3ep1Intoy/fSPUnBhHGAjDx7XUqOWMAtZYnv8wfkVrlCDAhSxU 7cWQ== X-Gm-Message-State: AO0yUKU9ggOT229w1gU2jHeEIU3b3rGZQrp4wA50V7zWp/PMpSrGWFgS sT5ckryY6QSzvTMzfZNcvXHRssNPgvQ= X-Google-Smtp-Source: AK7set9n8+cc9NN9w7A9sye5ZnWl+cGz0i8n7e9fmOXxm3WsVQ2rxzuPslvAlYkG7N+vv7+b8GZb3g== X-Received: by 2002:a5d:5405:0:b0:2c5:5687:5ed5 with SMTP id g5-20020a5d5405000000b002c556875ed5mr13279119wrv.18.1679311623830; Mon, 20 Mar 2023 04:27:03 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m6-20020a056000008600b002cde25fba30sm8708560wrx.1.2023.03.20.04.27.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Mar 2023 04:27:03 -0700 (PDT) Message-Id: <87fe9676aec5224d00b786515c822975b2fce9b6.1679311616.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 20 Mar 2023 11:26:55 +0000 Subject: [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, Jeff King , Phillip Wood , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Jonathan Tan , Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Both 'git for-each-ref --merged=' and 'git branch --merged=' use the ref-filter machinery to select references or branches (respectively) that are reachable from a set of commits presented by one or more --merged arguments. This happens within reach_filter(), which uses the revision-walk machinery to walk history in a standard way. However, the commit-reach.c file is full of custom searches that are more efficient, especially for reachability queries that can terminate early when reachability is discovered. Add a new tips_reachable_from_bases() method to commit-reach.c and call it from within reach_filter() in ref-filter.c. This affects both 'git branch' and 'git for-each-ref' as tested in p1500-graph-walks.sh. For the Linux kernel repository, we take an already-fast algorithm and make it even faster: Test HEAD~1 HEAD ------------------------------------------------------------------- 1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6% 1500.6: contains: git branch --merged 0.14 0.02 -85.7% 1500.7: contains: git tag --merged 0.15 0.03 -80.0% (Note that we remove the iterative 'git rev-list' test from p1500 because it no longer makes sense as a comparison to 'git for-each-ref' and would just waste time running it for these comparisons.) The algorithm is implemented in commit-reach.c in the method tips_reachable_from_base(). This method takes a string_list of tips and assigns the 'util' for each item with the value 1 if the base commit can reach those tips. Like other reachability queries in commit-reach.c, the fastest way to search for "can A reach B?" is to do a depth-first search up to the generation number of B, preferring to explore first parents before later parents. While we must walk all reachable commits up to that generation number when the answer is "no", the depth-first search can answer "yes" much faster than other approaches in most cases. This search becomes trickier when there are multiple targets for the depth-first search. The commits with lower generation number are more likely to be within the history of the start commit, but we don't want to waste time searching commits of low generation number if the commit target with lowest generation number has already been found. The trick here is to take the input commits and sort them by generation number in ascending order. Track the index within this order as min_generation_index. When we find a commit, if its index in the list is equal to min_generation_index, then we can increase the generation number boundary of our search to the next-lowest value in the list. With this mechanism, the number of commits to search is minimized with respect to the depth-first search heuristic. We will walk all commits up to the minimum generation number of a commit that is _not_ reachable from the start, but we will walk only the necessary portion of the depth-first search for the reachable commits of lower generation. Add extra tests for this behavior in t6600-test-reach.sh as the interesting data shape of that repository can sometimes demonstrate corner case bugs. Signed-off-by: Derrick Stolee --- commit-reach.c | 113 ++++++++++++++++++++++++++++++++++++ commit-reach.h | 9 +++ ref-filter.c | 20 ++----- t/perf/p1500-graph-walks.sh | 15 +++-- t/t6600-test-reach.sh | 83 ++++++++++++++++++++++++++ 5 files changed, 219 insertions(+), 21 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index cd990dce16a..c1edeb46106 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -1044,3 +1044,116 @@ void ahead_behind(struct repository *r, clear_bit_arrays(&bit_arrays); clear_prio_queue(&queue); } + +struct commit_and_index { + struct commit *commit; + unsigned int index; + timestamp_t generation; +}; + +static int compare_commit_and_index_by_generation(const void *va, const void *vb) +{ + const struct commit_and_index *a = (const struct commit_and_index *)va; + const struct commit_and_index *b = (const struct commit_and_index *)vb; + + if (a->generation > b->generation) + return 1; + if (a->generation < b->generation) + return -1; + return 0; +} + +void tips_reachable_from_bases(struct repository *r, + struct commit_list *bases, + struct commit **tips, size_t tips_nr, + int mark) +{ + struct commit_and_index *commits; + size_t min_generation_index = 0; + timestamp_t min_generation; + struct commit_list *stack = NULL; + + if (!bases || !tips || !tips_nr) + return; + + /* + * Do a depth-first search starting at 'bases' to search for the + * tips. Stop at the lowest (un-found) generation number. When + * finding the lowest commit, increase the minimum generation + * number to the next lowest (un-found) generation number. + */ + + CALLOC_ARRAY(commits, tips_nr); + + for (size_t i = 0; i < tips_nr; i++) { + commits[i].commit = tips[i]; + commits[i].index = i; + commits[i].generation = commit_graph_generation(tips[i]); + } + + /* Sort with generation number ascending. */ + QSORT(commits, tips_nr, compare_commit_and_index_by_generation); + min_generation = commits[0].generation; + + while (bases) { + repo_parse_commit(r, bases->item); + commit_list_insert(bases->item, &stack); + bases = bases->next; + } + + while (stack) { + int explored_all_parents = 1; + struct commit_list *p; + struct commit *c = stack->item; + timestamp_t c_gen = commit_graph_generation(c); + + /* Does it match any of our tips? */ + for (size_t j = min_generation_index; j < tips_nr; j++) { + if (c_gen < commits[j].generation) + break; + + if (commits[j].commit == c) { + tips[commits[j].index]->object.flags |= mark; + + if (j == min_generation_index) { + unsigned int k = j + 1; + while (k < tips_nr && + (tips[commits[k].index]->object.flags & mark)) + k++; + + /* Terminate early if all found. */ + if (k >= tips_nr) + goto done; + + min_generation_index = k; + min_generation = commits[k].generation; + } + } + } + + for (p = c->parents; p; p = p->next) { + repo_parse_commit(r, p->item); + + /* Have we already explored this parent? */ + if (p->item->object.flags & SEEN) + continue; + + /* Is it below the current minimum generation? */ + if (commit_graph_generation(p->item) < min_generation) + continue; + + /* Ok, we will explore from here on. */ + p->item->object.flags |= SEEN; + explored_all_parents = 0; + commit_list_insert(p->item, &stack); + break; + } + + if (explored_all_parents) + pop_commit(&stack); + } + +done: + free(commits); + repo_clear_commit_marks(r, SEEN); +} diff --git a/commit-reach.h b/commit-reach.h index f708c46e523..d6321ae700e 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -135,4 +135,13 @@ void ahead_behind(struct repository *r, struct commit **commits, size_t commits_nr, struct ahead_behind_count *counts, size_t counts_nr); +/* + * For all tip commits, add 'mark' to their flags if and only if they + * are reachable from one of the commits in 'bases'. + */ +void tips_reachable_from_bases(struct repository *r, + struct commit_list *bases, + struct commit **tips, size_t tips_nr, + int mark); + #endif diff --git a/ref-filter.c b/ref-filter.c index 62135f649ec..c724ff94113 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2396,33 +2396,22 @@ static void reach_filter(struct ref_array *array, struct commit_list *check_reachable, int include_reached) { - struct rev_info revs; int i, old_nr; struct commit **to_clear; - struct commit_list *cr; if (!check_reachable) return; CALLOC_ARRAY(to_clear, array->nr); - - repo_init_revisions(the_repository, &revs, NULL); - for (i = 0; i < array->nr; i++) { struct ref_array_item *item = array->items[i]; - add_pending_object(&revs, &item->commit->object, item->refname); to_clear[i] = item->commit; } - for (cr = check_reachable; cr; cr = cr->next) { - struct commit *merge_commit = cr->item; - merge_commit->object.flags |= UNINTERESTING; - add_pending_object(&revs, &merge_commit->object, ""); - } - - revs.limited = 1; - if (prepare_revision_walk(&revs)) - die(_("revision walk setup failed")); + tips_reachable_from_bases(the_repository, + check_reachable, + to_clear, array->nr, + UNINTERESTING); old_nr = array->nr; array->nr = 0; @@ -2446,7 +2435,6 @@ static void reach_filter(struct ref_array *array, clear_commit_marks(merge_commit, ALL_REV_FLAGS); } - release_revisions(&revs); free(to_clear); } diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh index 439a448c2e6..e14e7620cce 100755 --- a/t/perf/p1500-graph-walks.sh +++ b/t/perf/p1500-graph-walks.sh @@ -35,11 +35,16 @@ test_perf 'ahead-behind counts: git tag' ' xargs git tag -l --format="%(ahead-behind:HEAD)" input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-1-3 + refs/heads/commit-1-5 + refs/heads/commit-1-8 + refs/heads/commit-2-1 + refs/heads/commit-5-1 + refs/heads/commit-9-1 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-1-3 + refs/heads/commit-1-5 + refs/heads/commit-1-8 + EOF + run_all_modes git for-each-ref --merged=commit-1-9 \ + --format="%(refname)" --stdin +' + +test_expect_success 'for-each-ref merged:all' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-2-4 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-2-4 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + EOF + run_all_modes git for-each-ref --merged=commit-5-5 \ + --format="%(refname)" --stdin +' + +test_expect_success 'for-each-ref ahead-behind:some' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-5-3 + refs/heads/commit-4-8 + refs/heads/commit-9-9 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-5-3 + EOF + run_all_modes git for-each-ref --merged=commit-9-6 \ + --format="%(refname)" --stdin +' + +test_expect_success 'for-each-ref merged:some, multibase' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-5-3 + refs/heads/commit-7-8 + refs/heads/commit-4-8 + refs/heads/commit-9-9 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-4-8 + refs/heads/commit-5-3 + EOF + run_all_modes git for-each-ref \ + --merged=commit-5-8 \ + --merged=commit-8-5 \ + --format="%(refname)" \ + --stdin +' + +test_expect_success 'for-each-ref merged:none' ' + cat >input <<-\EOF && + refs/heads/commit-7-5 + refs/heads/commit-4-8 + refs/heads/commit-9-9 + EOF + >expect && + run_all_modes git for-each-ref --merged=commit-8-4 \ + --format="%(refname)" --stdin +' + test_done