From patchwork Thu Aug 1 22:10:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13750924 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C9AD14BFBF for ; Thu, 1 Aug 2024 22:10:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722550234; cv=none; b=X9cXJSCnP2UarN6re5EelPeYBgPwQSSRVsbt5TvaaxuSB2K76fu2G8BIP2e/0AX/GGubl20++ghve6zZtaw9oI7YLMcLipALk6depCs63iH4L351F/bddSRZ/Q37fpZdwpzZD2O4dahV/fbMPBl7Ah/mTSX7U+8ao4QMZuR14Bs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722550234; c=relaxed/simple; bh=9jczXv6dI6wYYzJE3lXIq0cV0USNa1dmsUNdi7/yuEs=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=CdyOGN2WjdZMmSlCOTF5TxG+sBIF9N3eKdZxWD+vJFlt+kjRrsdA4FDERVQ9cC6l1aL3fjLndQD4yjG8sKYEGAyAdk0I2VZsc7q69hU/GNzE8zhq71JNZYN1TU/GaY3r5W4LbW0Q6cXvSV/qlBM+nZKVRSKTQ+1/HHqaP7HdyXk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MFKc/oVv; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MFKc/oVv" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-426526d30aaso49188375e9.0 for ; Thu, 01 Aug 2024 15:10:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722550229; x=1723155029; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ylqcONo4Sk7XfySP3f2Dw2NIxX1V/jevkcZWaUBTYM8=; b=MFKc/oVvCLW9FFEgzoUuDK/SHe4q64WWXn72w4ysBxETpVokHH2bARMOu9rinK6lHt cmUcaa2a/V6Jzzqd0S42nI0IW4ITo40TAgGXb8+YnmzRedXq0akLh1W+EK/YryhnM+e0 A1r+VjKXkHEkLwJB94tFoWmIT7xXSEdVQU+JBqC0k6ZZxA/aZWhESqJUrZnC9MVRWJDq /HEf1eyXlNbhVxEZgNQSjdqLo6fjD5tgaJcmSAxJrxvl4tCrZ9pK5/xwchK+h3ejB3QP e/vJxuX7vUFzlEi7CVeAIg8uadl1tlo7jUvgBGLnJb8OvqCqC3cXY6hmFtKevpNnY/4f k5MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722550229; x=1723155029; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ylqcONo4Sk7XfySP3f2Dw2NIxX1V/jevkcZWaUBTYM8=; b=CuKmPRHbZZGt1vCpS0XQPvpI+grgBTHkfEpbb00OaRw8ynfpH9Cte112qWTgpu5kVW Z2uBElfseJktbk/eR7dl9racMxjnyZTlo3bDH0+R42nSxwTrja2QqtW17IS6mL5wudrf pkPUzN3zE7OpVXA8mnfQd15dFvB+jAYp9PjB+wKWKrNgMWZrUqWW/YzlSQkBMDvPBKEn UY56IdrKSk9ET3AaI/zMyKdlQHlB1ubgnD+6EtpuwbT3Xs67EnU5v/9D7n+tWV7Ht0hV E29NWebNcwuvmMo7rAs4+2Xcdfv6f1cTuUxzBuHW4ES4qd140Xp6kxXSUHR6rVa1fHye FNTA== X-Gm-Message-State: AOJu0YwwsL6uO5EBXqTIhGGiHH6LqL9L/PBeyHIlIKEe9Hoow9Kf9cSV x7cg/8lULSAyVrzEVJoZ2qNCNM3hDwih+JQWCvuIyFeWXz2QZ2CVAM3ryw== X-Google-Smtp-Source: AGHT+IFaJWj95PllY81gkk8fiocpVRt48Gfml29NaJ47lE6r4zktZ08Xfc4hAboXIbWwl90UixUTeA== X-Received: by 2002:a05:600c:314a:b0:426:6688:2421 with SMTP id 5b1f17b1804b1-428e6b059a6mr7817025e9.11.1722550228223; Thu, 01 Aug 2024 15:10:28 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4282bab9f7esm68975905e9.21.2024.08.01.15.10.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Aug 2024 15:10:27 -0700 (PDT) Message-Id: <580026f910daaae6dba599fcd2408721b4f86c59.1722550226.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 01 Aug 2024 22:10:24 +0000 Subject: [PATCH 1/3] commit-reach: add get_branch_base_for_tip Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Add a new reachability algorithm that intends to discover (from a heuristic) which branch was used as the starting point for a given commit. Add focused tests using the 'test-tool reach' command. Repositories that use pull requests (or merge requests) to advance one or more "protected" branches, the history of that reference can be recovered by following the first-parent history in most cases. Most are completed using no-fast-forward merges, though squash merges are quite common. Less common is rebase-and-merge, which still validates this assumption. Finally, the case that breaks this assumption is the fast-forward update (with potential rebasing). Even in this case, the previous commit commonly appears in the first-parent history of the branch. Similar assumptions can be made for a topic branch created by a single user with the intention to merge back into another branch. Using 'git commit', 'git merge', and 'git cherry-pick' from HEAD will default to having the first-parent commit be the previous commit at HEAD. This history changes only with commands such as 'git reset' or 'git rebase', where the command names also imply that the branch is starting from a new location. With this movement of branches in mind, the following heuristic is proposed as a way to determine the base branch for a given source branch: Among a list of candidate base branches, select the candidate that minimizes the number of commits in the first-parent history of the source that are not in the first-parent history of the candidate. Prior third-party solutions to this problem have used this optimization criteria, but have relied upon extracting the first-parent history and comparing those lists as tables instead of using commit-graph walks. Given current command-line interface options, this optimization criteria is not easy to detect directly. Even using the command git rev-list --count --first-parent .. does not measure this count, as it uses full reachability from to determine which commits to remove from the range '..'. This may lead to one asking if we should instead be using the full reachability of the candidate and only the first-parent history of the source. This, unfortunately, does not work for repositories that use long-lived branches and automation to merge across those branches. In extremely large repositories, merging into a single trunk may not be feasible. This is usually due to the desired frequency of updates (thousands of engineers doing daily work) combined with the time required to perform a validation build. These factors combine to create significant risk of semantic merge conflicts, leading to build breaks on the trunk. In response, repository maintainers can create a single Level Zero (L0) trunk and multiple Level One (L1) branches. By partitioning the engineers by organization, these engineers may see lower risk of semantic merge conflicts as well as be protected against build breaks in other L1 branches. The key to making this system work is a semi-automated process of merging L1 branches into the L0 trunk and vice-versa. In a large enough organization, these L1 branches may further split into L2 or L3 branches, but the same principles apply for merging across deeper levels. If these automated merges use a typical merge with the second parent bringing in the "new" content, then each L0 and L1 branch can track its previous positions by following first-parent history, which appear as parallel paths (until reaching the first place where the branches diverged). If we also walk to second parents, then the histories overlap significantly and cannot be distinguished except for very-recent changes. For this reason, the first-parent condition should be symmetrical across the base and source branches. Another common case for desiring the result of this optimization method is the use of release branches. When releasing a version of a repository, a branch can be used to track that release. Any updates that are worth fixing in that release can be merged to the release branch and shipped with only the necessary fixes without any new features introduced in the trunk branch. The 'maint-2.' branches represent this pattern in the Git project. The microsoft/git fork uses 'vfs-2..' branches to track the changes that are custom to that fork on top of each upstream Git release 2... This application doesn't need the symmetrical first-parent condition, but the use of first-parent histories does not change the results for these branches. To determine the base branch from a list of candidates, create a new method in commit-reach.c that performs a single* commit-graph walk. The core concept is to walk first-parents starting at the candidate bases and the source, tracking the "best" base to reach a given commit. Use generation numbers to ensure that a commit is walked at most once and all children have been explored before visiting it. When reaching a commit that is reachable from both a base and the source, we will then have a guarantee that this is the closest intersection of first-parent histories. Track the best base to reach that commit and return it as a result. In rare cases involving multiple root commits, the first-parent history of the source may never intersect any of the candidates and thus a null result is returned. * There are up to two walks, since we require all commits to have a computed generation number in order to avoid incorrect results. This is similar to the need for computed generation numbers in ahead_behind() as implemented in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20). In order to track the "best" base, use a new commit slab that stores an integer. This value defaults to zero upon initialization, so use -1 to track that the source commit can reach this commit and use 'i + 1' to track that the ith base can reach this commit. When multiple bases can reach a commit, minimize the index to break ties. This allows the caller to specify an order to the bases that determines some amount of preference when the heuristic does not result in a unique result. The trickiest part of the integer slab is what happens when reaching a collision among the histories of the bases and the history of the source. This is noticed when viewing the first parent and seeing that it has a slab value that differs in sign (negative or positive). In this case, the collision commit is stored in the method variable 'branch_point' and its slab value is set to -1. The index of the best base (so far) is stored in the method variable 'best_index'. It is possible that there are multiple commits that have the branch_point as its first parent, leading to multiple updates of best_index. The result is determined when 'branch_point' is visited in the commit walk, giving the guarantee that all commits that could reach 'branch_point' were visited. Several interesting cases of collisions and different results are tested in the t6600-test-reach.sh script. Recall that this script also tests the algorithm in three possible states involving the commit-graph file and how many commits are written in the file. This provides some coverage of the need (and lack of need) for the ensure_generations_valid() method. Signed-off-by: Derrick Stolee --- commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++++++++ commit-reach.h | 17 ++++++ t/helper/test-reach.c | 2 + t/t6600-test-reach.sh | 47 +++++++++++++++++ 4 files changed, 184 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index 8f9b008f876..1b56fb081a6 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -1222,3 +1222,121 @@ done: free(commits); repo_clear_commit_marks(r, SEEN); } + +/* + * This slab initializes integers to zero, so use "-1" for "tip is best" and + * "i + 1" for "bases[i] is best". + */ +define_commit_slab(best_branch_base, int); +static struct best_branch_base best_branch_base; +#define get_best(c) (*best_branch_base_at(&best_branch_base, c)) +#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v) + +int get_branch_base_for_tip(struct repository *r, + struct commit *tip, + struct commit **bases, + size_t bases_nr) +{ + int best_index = -1; + struct commit *branch_point = NULL; + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + int found_missing_gen = 0; + + if (!bases_nr) + return -1; + + repo_parse_commit(r, tip); + if (commit_graph_generation(tip) == GENERATION_NUMBER_INFINITY) + found_missing_gen = 1; + + /* Check for missing generation numbers. */ + for (size_t i = 0; i < bases_nr; i++) { + struct commit *c = bases[i]; + repo_parse_commit(r, c); + if (commit_graph_generation(c) == GENERATION_NUMBER_INFINITY) + found_missing_gen = 1; + } + + if (found_missing_gen) { + struct commit **commits; + size_t commits_nr = bases_nr + 1; + + CALLOC_ARRAY(commits, commits_nr); + COPY_ARRAY(commits, bases, bases_nr); + commits[bases_nr] = tip; + ensure_generations_valid(r, commits, commits_nr); + free(commits); + } + + /* Initialize queue and slab now that generations are guaranteed. */ + init_best_branch_base(&best_branch_base); + set_best(tip, -1); + prio_queue_put(&queue, tip); + + for (size_t i = 0; i < bases_nr; i++) { + struct commit *c = bases[i]; + + /* Has this already been marked as best by another commit? */ + if (get_best(c)) + continue; + + set_best(c, i + 1); + prio_queue_put(&queue, c); + } + + while (queue.nr) { + struct commit *c = prio_queue_get(&queue); + int best_for_c = get_best(c); + int best_for_p, positive; + struct commit *parent; + + /* Have we reached a known branch point? It's optimal. */ + if (c == branch_point) + break; + + repo_parse_commit(r, c); + if (!c->parents) + continue; + + parent = c->parents->item; + repo_parse_commit(r, parent); + best_for_p = get_best(parent); + + if (!best_for_p) { + /* 'parent' is new, so pass along best_for_c. */ + set_best(parent, best_for_c); + prio_queue_put(&queue, parent); + continue; + } + + if (best_for_p > 0 && best_for_c > 0) { + /* Collision among bases. Minimize. */ + if (best_for_c < best_for_p) + set_best(parent, best_for_c); + continue; + } + + /* + * At this point, we have reached a commit that is reachable + * from the tip, either from 'c' or from an earlier commit to + * have 'parent' as its first parent. + * + * Update 'best_index' to match the minimum of all base indices + * to reach 'parent'. + */ + + /* Exactly one is positive due to initial conditions. */ + positive = (best_for_c < 0) ? best_for_p : best_for_c; + + if (best_index < 0 || positive < best_index) + best_index = positive; + + /* No matter what, track that the parent is reachable from tip. */ + set_best(parent, -1); + branch_point = parent; + } + + clear_best_branch_base(&best_branch_base); + clear_prio_queue(&queue); + return best_index > 0 ? best_index - 1 : -1; +} diff --git a/commit-reach.h b/commit-reach.h index bf63cc468fd..9a745b7e176 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -139,4 +139,21 @@ void tips_reachable_from_bases(struct repository *r, struct commit **tips, size_t tips_nr, int mark); +/* + * Given a 'tip' commit and a list potential 'bases', return the index 'i' that + * minimizes the number of commits in the first-parent history of 'tip' and not + * in the first-parent history of 'bases[i]'. + * + * Among a list of long-lived branches that are updated only by merges (with the + * first parent being the previous position of the branch), this would inform + * which branch was used to create the tip reference. + * + * Returns -1 if no common point is found in first-parent histories, which is + * rare, but possible with multiple root commits. + */ +int get_branch_base_for_tip(struct repository *r, + struct commit *tip, + struct commit **bases, + size_t bases_nr); + #endif diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 1e3b431e3e7..8579b607aa5 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -114,6 +114,8 @@ int cmd__reach(int ac, const char **av) repo_in_merge_bases_many(the_repository, A, X_nr, X_array, 0)); else if (!strcmp(av[1], "is_descendant_of")) printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X)); + else if (!strcmp(av[1], "get_branch_base_for_tip")) + printf("%s(A,X):%d\n", av[1], get_branch_base_for_tip(r, A, X_array, X_nr)); else if (!strcmp(av[1], "get_merge_bases_many")) { struct commit_list *list = NULL; if (repo_get_merge_bases_many(the_repository, diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index b330945f497..3069efc8601 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -612,4 +612,51 @@ test_expect_success 'for-each-ref merged:none' ' --format="%(refname)" --stdin ' +# For get_branch_base_for_tip, we only care about +# first-parent history. Here is the test graph with +# second parents removed: +# +# (10,10) +# / +# (10,9) (9,10) +# / / +# (10,8) (9,9) (8,10) +# / / / +# ( continued...) +# \ / / / +# (3,1) (2,2) (1,3) +# \ / / +# (2,1) (1,2) +# \ / +# (1,1) +# +# In short, for a commit (i,j), the first-parent history +# walks all commits (i, k) with k from j to 1, then the +# commits (l, 1) with l from i to 1. + +test_expect_success 'get_branch_base_for_tip: none reach' ' + # (2,3) branched from the first tip (i,4) in X with i > 2 + cat >input <<-\EOF && + A:commit-2-3 + X:commit-1-2 + X:commit-1-4 + X:commit-4-4 + X:commit-8-4 + X:commit-10-4 + EOF + echo "get_branch_base_for_tip(A,X):2" >expect && + test_all_modes get_branch_base_for_tip +' + +test_expect_success 'get_branch_base_for_tip: all reach tip' ' + # (2,3) branched from the first tip (i,4) in X with i > 2 + cat >input <<-\EOF && + A:commit-4-1 + X:commit-4-2 + X:commit-5-1 + EOF + echo "get_branch_base_for_tip(A,X):0" >expect && + test_all_modes get_branch_base_for_tip +' + test_done From patchwork Thu Aug 1 22:10:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13750923 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90DF314C581 for ; Thu, 1 Aug 2024 22:10:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722550234; cv=none; b=jwiUNUsdYqFXIcrQitG3+0+O6sAGWuApjG8w8ygI3ZkbsWv5EHv67IfL7zwtF9jEl9sy60w74nsFQlHm2x1UXSALbagZH/UpJB7SDa2amxWcIK+cz8g7XW79Ikq/1Ij5M9pOE/+ZWe5TZaWoQxYzpSTc52dj3Yepqrzr2vEUlt4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722550234; c=relaxed/simple; bh=Vt3H32IemtxNNUBvkdsZob1QZ6eXxwCXLKk9THfB+bo=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=c41SFUtsjw71t66882D+Z7xfIYMXO8+g/ENvvS22bK2W2MQmB2y5Ali9CtWrypTQX+nSI44WYoxItAEkLZcZh4xrKRqY+wHimZc49gUNCTJJUWAk54O6R8bSZDKKbcBXyoqwvCLkyxZWhzAGK2KS8YSHpXCRtZq18laAbCG8ZeQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=L86iYqcz; arc=none smtp.client-ip=209.85.128.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="L86iYqcz" Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-427fc97a88cso50484085e9.0 for ; Thu, 01 Aug 2024 15:10:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722550229; x=1723155029; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=0k0u6wbd0+NIsqEvViikE4OHcuI9PDNcFloMQf67GQE=; b=L86iYqcz2H6cvs+cq48/IC/hmdsE/Rs5Z3bbFyaij6MeONMLNkgg2lE2uD3f7Xx19t Q/23aMSoegrxPbs+rXHz38BTH4ilTAtIyKkLzZVGmvZ7kEfIK/8Vo9ubMNcPmLN2ukMJ iZk9cIVVJIYhjJ2x6YzKntnLeDw2UoSlrn6JynTGnbwtPiINDZ6eduIB478wwRRSq9YF w40FkeioCsUMpK8f+gNYWXEQyp5Va1GwjvbsVx/LLtcCPB06bXEedPlQoQ/2qvxKzKS6 74cIwQ3f6nrR2e90AUMeUo3px3BKqoTchhHbCUhIJls/+FHPSw8mg6QHrr1iy4Dpf2RD SsRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722550229; x=1723155029; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0k0u6wbd0+NIsqEvViikE4OHcuI9PDNcFloMQf67GQE=; b=WdgnGejQ6e1yok2IwtLg8X2Slhb7eK1DFISGT5s9TmeS7v+9BIBm088qu0L8NTyhcK nk0v3ztAhPyslnf7LCTbbB47APCiGU0oPNFowyFxjDjK3GLCyi68zn8poWwjFueL5Vph iWXTcoNVz4Hg3xkXkiW5Y8qVZzAPB+HEcBm3O/GZoll1WWOizddhnLx7tqSff9Gpnl3/ XTALvovqkMxDvasDstdBEW6MlnnJH/NDOoCB8ZnZP9rgKyqMM8YmuHbAi5mOl5dH5+Wm Vjc2wisyERoioZ2Q9VAdIwWZOlW6YAAiDWWBdey50gRm76Ni+qqnvvDX1ERHIXWrEe/z dSiA== X-Gm-Message-State: AOJu0YxIZwNktclqc/6lBlJMW1t4cVvalCOiFt2hMVbOs59vgOO0e/1Z HKmSuuosQxAmt3ktwboWDZ+Gs19W3Lg42D6SsG1eVVC4X/V1+GFGNMZ+zA== X-Google-Smtp-Source: AGHT+IFtitrvON+iN4RVAftb58W31mkWA5TlrL6Ln2opJ4wsjy7IoE21/vQkBc65oJ/8N2alu3GIyg== X-Received: by 2002:a05:600c:4fcb:b0:426:647b:1bfa with SMTP id 5b1f17b1804b1-428e6aeaa8cmr9517855e9.8.1722550228929; Thu, 01 Aug 2024 15:10:28 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-428e6d648a0sm10112745e9.10.2024.08.01.15.10.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Aug 2024 15:10:28 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 01 Aug 2024 22:10:25 +0000 Subject: [PATCH 2/3] for-each-ref: add 'is-base' token Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee The previous change introduced the get_branch_base_for_tip() method in commit-reach.c. The motivation of that change was about using a heuristic to deteremine the base branch for a source commit from a list of candidate commit tips. This change makes that algorithm visible to users via a new atom in the 'git for-each-ref' format. This change is very similar to the chang in 49abcd21da6 (for-each-ref: add ahead-behind format atom, 2023-03-20). Introduce the 'is-base:' atom, which will indicate that the algorithm should be computed and the result of the algorithm is reported using an indicator of the form '()'. For example, using '%(is-base:HEAD)' would result in one line having the token '(HEAD)'. Use the sorted order of refs included in the ref filter to break ties in the algorithm's heuristic. In the previous change, the motivating examples include using an L0 trunk, long-lived L1 branches, and temporary release branches. A caller could communicate the ordered preference among these categories using the input refpecs and avoiding a different sort mechanism. This sorting behavior is tested in the test scripts. It is important to include this atom as a special case to can_do_iterative_format() to match the expectations created in bd98f9774e1 (ref-filter.c: filter & format refs in the same callback, 2023-11-14). The ahead-behind atom was one of the special cases, and this similarly requires using an algorithm across all input refs before starting the format of any single ref. In the test script, the format tokens use colons or lack whitespace to avoid Git complaining about trailing whitespace errors. Signed-off-by: Derrick Stolee --- ref-filter.c | 78 ++++++++++++++++++++++++++++++++++++++++++- ref-filter.h | 15 +++++++++ t/t6600-test-reach.sh | 47 ++++++++++++++++++++++++++ 3 files changed, 139 insertions(+), 1 deletion(-) diff --git a/ref-filter.c b/ref-filter.c index 59ad6f54ddb..59689672da1 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -167,6 +167,7 @@ enum atom_type { ATOM_ELSE, ATOM_REST, ATOM_AHEADBEHIND, + ATOM_ISBASE, }; /* @@ -889,6 +890,23 @@ static int ahead_behind_atom_parser(struct ref_format *format, return 0; } +static int is_base_atom_parser(struct ref_format *format, + struct used_atom *atom UNUSED, + const char *arg, struct strbuf *err) +{ + struct string_list_item *item; + + if (!arg) + return strbuf_addf_ret(err, -1, _("expected format: %%(is-base:)")); + + item = string_list_append(&format->is_base_tips, arg); + item->util = lookup_commit_reference_by_name(arg); + if (!item->util) + die("failed to find '%s'", arg); + + return 0; +} + static int head_atom_parser(struct ref_format *format UNUSED, struct used_atom *atom, const char *arg, struct strbuf *err) @@ -952,6 +970,7 @@ static struct { [ATOM_ELSE] = { "else", SOURCE_NONE }, [ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser }, [ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser }, + [ATOM_ISBASE] = { "is-base", SOURCE_OTHER, FIELD_STR, is_base_atom_parser }, /* * Please update $__git_ref_fieldlist in git-completion.bash * when you add new atoms @@ -2334,6 +2353,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) int i; struct object_info empty = OBJECT_INFO_INIT; int ahead_behind_atoms = 0; + int is_base_atoms = 0; CALLOC_ARRAY(ref->value, used_atom_cnt); @@ -2475,6 +2495,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) v->s = xstrdup(""); } continue; + } else if (atom_type == ATOM_ISBASE) { + if (ref->is_base && ref->is_base[is_base_atoms]) { + v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]); + free(ref->is_base[is_base_atoms]); + } else { + /* Not a commit. */ + v->s = xstrdup(""); + } + is_base_atoms++; + continue; } else continue; @@ -2876,6 +2906,7 @@ static void free_array_item(struct ref_array_item *item) free(item->value); } free(item->counts); + free(item->is_base); free(item); } @@ -3040,6 +3071,49 @@ void filter_ahead_behind(struct repository *r, free(commits); } +void filter_is_base(struct repository *r, + struct ref_format *format, + struct ref_array *array) +{ + struct commit **bases; + size_t bases_nr = 0; + struct ref_array_item **back_index; + + if (!format->is_base_tips.nr || !array->nr) + return; + + CALLOC_ARRAY(back_index, array->nr); + CALLOC_ARRAY(bases, array->nr); + + for (size_t i = 0; i < array->nr; i++) { + const char *name = array->items[i]->refname; + struct commit *c = lookup_commit_reference_by_name(name); + + CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr); + + if (!c) + continue; + + back_index[bases_nr] = array->items[i]; + bases[bases_nr] = c; + bases_nr++; + } + + for (size_t i = 0; i < format->is_base_tips.nr; i++) { + struct commit *tip = format->is_base_tips.items[i].util; + int base_index = get_branch_base_for_tip(r, tip, bases, bases_nr); + + if (base_index < 0) + continue; + + /* Store the string for use in output later. */ + back_index[base_index]->is_base[i] = xstrdup(format->is_base_tips.items[i].string); + } + + free(back_index); + free(bases); +} + static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data) { int ret = 0; @@ -3126,7 +3200,8 @@ static inline int can_do_iterative_format(struct ref_filter *filter, return !(filter->reachable_from || filter->unreachable_from || sorting || - format->bases.nr); + format->bases.nr || + format->is_base_tips.nr); } void filter_and_format_refs(struct ref_filter *filter, unsigned int type, @@ -3150,6 +3225,7 @@ void filter_and_format_refs(struct ref_filter *filter, unsigned int type, struct ref_array array = { 0 }; filter_refs(&array, filter, type); filter_ahead_behind(the_repository, format, &array); + filter_is_base(the_repository, format, &array); ref_array_sort(sorting, &array); print_formatted_ref_array(&array, format); ref_array_clear(&array); diff --git a/ref-filter.h b/ref-filter.h index 0ca28d2bba6..20419a56218 100644 --- a/ref-filter.h +++ b/ref-filter.h @@ -48,6 +48,7 @@ struct ref_array_item { struct commit *commit; struct atom_value *value; struct ahead_behind_count **counts; + char **is_base; char refname[FLEX_ARRAY]; }; @@ -101,6 +102,9 @@ struct ref_format { /* List of bases for ahead-behind counts. */ struct string_list bases; + /* List of bases for is-base indicators. */ + struct string_list is_base_tips; + struct { int max_count; int omit_empty; @@ -114,6 +118,7 @@ struct ref_format { #define REF_FORMAT_INIT { \ .use_color = -1, \ .bases = STRING_LIST_INIT_DUP, \ + .is_base_tips = STRING_LIST_INIT_DUP, \ } /* Macros for checking --merged and --no-merged options */ @@ -203,6 +208,16 @@ void filter_ahead_behind(struct repository *r, struct ref_format *format, struct ref_array *array); +/* + * If the provided format includes is-base atoms, then compute the base checks + * for those tips against all refs. + * + * If this is not called, then any is-base atoms will be blank. + */ +void filter_is_base(struct repository *r, + struct ref_format *format, + struct ref_array *array); + void ref_filter_init(struct ref_filter *filter); void ref_filter_clear(struct ref_filter *filter); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 3069efc8601..6c7f92bcb38 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -659,4 +659,51 @@ test_expect_success 'get_branch_base_for_tip: all reach tip' ' test_all_modes get_branch_base_for_tip ' +test_expect_success 'for-each-ref is-base: none reach' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + refs/heads/commit-8-4 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1: + refs/heads/commit-4-2:(commit-2-3) + refs/heads/commit-4-4: + refs/heads/commit-8-4: + EOF + run_all_modes git for-each-ref \ + --format="%(refname):%(is-base:commit-2-3)" --stdin +' + +test_expect_success 'for-each-ref is-base: all reach' ' + cat >input <<-\EOF && + refs/heads/commit-4-2 + refs/heads/commit-5-1 + EOF + cat >expect <<-\EOF && + refs/heads/commit-4-2:(commit-4-1) + refs/heads/commit-5-1: + EOF + run_all_modes git for-each-ref \ + --format="%(refname):%(is-base:commit-4-1)" --stdin +' + +test_expect_success 'for-each-ref is-base:multiple' ' + cat >input <<-\EOF && + refs/heads/commit-1-1 + refs/heads/commit-4-2 + refs/heads/commit-4-4 + refs/heads/commit-8-4 + EOF + cat >expect <<-\EOF && + refs/heads/commit-1-1[-] + refs/heads/commit-4-2[(commit-2-3)-] + refs/heads/commit-4-4[-] + refs/heads/commit-8-4[-(commit-6-5)] + EOF + run_all_modes git for-each-ref \ + --format="%(refname)[%(is-base:commit-2-3)-%(is-base:commit-6-5)]" --stdin +' + test_done From patchwork Thu Aug 1 22:10:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13750922 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C72B814C58C for ; Thu, 1 Aug 2024 22:10:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722550234; cv=none; b=Drcrf/NSRI/y9OxIhCZ07rJJqXw9uLbdog0LPRmac9mabTboHbILLMK0gvjCJOLqeqeJI/Ag9WO8/OSiM4qm+Bw+NhbqgGj/k9eFN4D6VVEylunyOxGLl39UxvlMVZ+URKq4z+EByGnwoDpKXjbFUPOGozmxxqPbXtRRQo8A7TA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722550234; c=relaxed/simple; bh=jgBi6oNQL+qdRE7utMh8FfsQo+FyxbRlKAjlGaLSrHE=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=TQzU+W2IZ7R4YBhytdhXuG40AtiiSxM7TZ9SKtUccpUPrbktJwAnvlm0ZS0OUjNGDBmC2+l00eAxQgsJhZAB3UvW6z7WLsNOVFeyt30Q7v3hTMJ2u1/PU38pDIuvCau+BunUT8FBVHWVZ/QQu3je9omVcNhmQPo8hy+NzbGwhgQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aYTpTX05; arc=none smtp.client-ip=209.85.221.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aYTpTX05" Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-367990aaef3so3902722f8f.0 for ; Thu, 01 Aug 2024 15:10:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722550231; x=1723155031; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=56hSNiOlt1LylH+n3beA4JrC5B+sMVlxoMQsGPznpV4=; b=aYTpTX05GQuNu/+GeO2Y5UFgdUwSVab1GncHRGO2Wk5uvy5u+FdWJh/kOu5j6cmRp3 hipDRqr6CSQK1eK+sL0Yz62CsBSFelK2YUrxINyxsnD5zJFW6rfsBv4dEBBL7eXHI43o CWn0pCzfdfyyAQUYHEaqUhQnYfwhhxSHt0zDaMsUxZmLUY+sQfZ2B806y6H6DLwEUZDV zyFwMcJlZetLrFu47xQGIzYkjNA+PdMNxtDg+0DCqfe27VKbMWbe/Us2mRsNPX0nKFyD UnC9E0b5KDyLkmU6t1LGoySCoQ6FgBm12cH+Rptwb9eYIxFl2/ztW8VYKwR4VUpBxKBG iibQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722550231; x=1723155031; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=56hSNiOlt1LylH+n3beA4JrC5B+sMVlxoMQsGPznpV4=; b=Ityl50P6EOv4gW/AsV/e9GYiNhro18NSHsae/W1SCRneTy80HEopJyhVgtwOtGipjq OjWb0lsJultUiH99o2wmCzBW20FiabwZCe1ZUOlJ51P5PO5Ygwzh5FHGINxkd1Gtkci6 35WisAr+DklBPXbEE7dSoa3ZakGzj4LFRDpYgF5tM8uJW13gZ7NKtEYunbbvVq1WIhar hgL4mYKXlNB/byaizG8zNGMJMvutCCzL55DqocOC0bxYlAv9xAhS///sH4zJnFWx2gDw wFbKv9CgO3Ed8zE/O2of4gZdXAviYWkayAG9eNj8SDCW86QXCYm9ImTe1ZWT9RYi13pZ EVoA== X-Gm-Message-State: AOJu0YzDWBP6g1sezVvXP0AXSNx4N596N9vN4NDY1XuMMGilIgl71C5f Ef9xfTn+I7ezb0QVfjwiuJDYxapwTJQnOLd8m2qopjbgLQvlDfwgfeneQQ== X-Google-Smtp-Source: AGHT+IGY5TGGKVbCVuoEWk9uPPobRgEDFBMij2FHZ/PSBEud5Q1EBeHFBzFw8lX8X/TeNLUYWce/OQ== X-Received: by 2002:a05:6000:2c6:b0:368:4da3:a3ac with SMTP id ffacd0b85a97d-36bbc16313bmr771078f8f.40.1722550230542; Thu, 01 Aug 2024 15:10:30 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36bbd02a451sm500302f8f.63.2024.08.01.15.10.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Aug 2024 15:10:29 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 01 Aug 2024 22:10:26 +0000 Subject: [PATCH 3/3] p1500: add is-base performance tests Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee The previous two changes introduced a commit walking heuristic for finding the most likely base branch for a given source. This algorithm walks first-parent histories until reaching a collision. This walk _should_ be very fast. Exceptions include cases where a commit-graph file does not exist, leading to a full walk of all reachable commits to compute generation numbers, or a case where no collision in the first-parent history exists, leading to a walk of all first-parent history to the root commits. The p1500 test script guarantees a complete commit-graph file during its setup, so we will not test that scenario. Do create a new root commit in an effort to test the scenario of parallel first-parent histories. Even with the extra root commit, these tests take no longer than 0.02 seconds on my machine for the Git repository. However, the results are slightly more interesting in a copy of the Linux kernel repository: Test --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.12 1500.3: ahead-behind counts: git branch 0.12 1500.4: ahead-behind counts: git tag 0.12 1500.5: contains: git for-each-ref --merged 0.04 1500.6: contains: git branch --merged 0.04 1500.7: contains: git tag --merged 0.04 1500.8: is-base check: test-tool reach (refs) 0.03 1500.9: is-base check: test-tool reach (tags) 0.03 1500.10: is-base check: git for-each-ref 0.03 1500.11: is-base check: git for-each-ref (disjoint-base) 0.07 Signed-off-by: Derrick Stolee --- t/perf/p1500-graph-walks.sh | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh index e14e7620cce..5b23ce5db93 100755 --- a/t/perf/p1500-graph-walks.sh +++ b/t/perf/p1500-graph-walks.sh @@ -20,6 +20,21 @@ test_expect_success 'setup' ' echo tag-$ref || return 1 done >tags && + + echo "A:HEAD" >test-tool-refs && + for line in $(cat refs) + do + echo "X:$line" >>test-tool-refs || return 1 + done && + echo "A:HEAD" >test-tool-tags && + for line in $(cat tags) + do + echo "X:$line" >>test-tool-tags || return 1 + done && + + commit=$(git commit-tree $(git rev-parse HEAD^{tree})) && + git update-ref refs/heads/disjoint-base $commit && + git commit-graph write --reachable ' @@ -47,4 +62,20 @@ test_perf 'contains: git tag --merged' ' xargs git tag --merged=HEAD