mbox series

[v2,0/3] git for-each-ref: is-base atom and base branches

Message ID pull.1768.v2.git.1723397687.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series git for-each-ref: is-base atom and base branches | expand

Message

John Cai via GitGitGadget Aug. 11, 2024, 5:34 p.m. UTC
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very
similar way to the 'ahead-behind' atom. As detailed carefully in the first
change, this is motivated by the need to detect the concept of a "base
branch" in a repository with multiple long-lived branches.

This change is motivated by a third-party tool created to make this
detection with the same optimization mechanism, but using a much slower
technique due to the limitations of the Git CLI not presenting this
information. The existing algorithm involves using git rev-list
--first-parent -<N> in batches for the collection of considered references,
comparing those lists, and increasing <N> as needed until finding a
collision. This new use of 'git for-each-ref' will allow determining this
mechanism within a single process and walking a minimal number of commits.

There are benefits to users both on client-side and server-side. In an
internal monorepo, this base branch detection algorithm is used to determine
a long-lived branch based on the HEAD commit, mapping to a group within the
organizational structure of the repository, which determines a set of
projects that the user will likely need to build; this leads to
automatically selecting an initial sparse-checkout definition based on the
build dependencies required. An upcoming feature in Azure Repos will use
this algorithm to automatically create a pull request against the correct
target branch, reducing user pain from needing to select a different branch
after a large commit diff is rendered against the default branch. This atom
unlocks that ability for Git hosting services that use Git in their backend.

Thanks, -Stolee


Updates in v2
=============

 * I had forgotten to include a documentation change in v1. My attempt to
   create a succinct doc change in a follow-up hunk continued to be
   confusing. This version includes a more expanded version of the
   documentation blurb for the is-base token.

Derrick Stolee (3):
  commit-reach: add get_branch_base_for_tip
  for-each-ref: add 'is-base' token
  p1500: add is-base performance tests

 Documentation/git-for-each-ref.txt |  42 ++++++++++
 commit-reach.c                     | 118 +++++++++++++++++++++++++++++
 commit-reach.h                     |  17 +++++
 ref-filter.c                       |  78 ++++++++++++++++++-
 ref-filter.h                       |  15 ++++
 t/helper/test-reach.c              |   2 +
 t/perf/p1500-graph-walks.sh        |  31 ++++++++
 t/t6600-test-reach.sh              |  94 +++++++++++++++++++++++
 8 files changed, 396 insertions(+), 1 deletion(-)


base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1768%2Fderrickstolee%2Ftarget-ref-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1768/derrickstolee/target-ref-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1768

Range-diff vs v1:

 1:  580026f910d = 1:  580026f910d commit-reach: add get_branch_base_for_tip
 2:  a1fbdca374f ! 2:  13341e7e512 for-each-ref: add 'is-base' token
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <stolee@gmail.com>
      
     + ## Documentation/git-for-each-ref.txt ##
     +@@ Documentation/git-for-each-ref.txt: ahead-behind:<committish>::
     + 	commits ahead and behind, respectively, when comparing the output
     + 	ref to the `<committish>` specified in the format.
     + 
     ++is-base:<committish>::
     ++	In at most one row, `(<committish>)` will appear to indicate the ref
     ++	that is most likely the ref used as a starting point for the branch
     ++	that produced `<committish>`. This choice is made using a heuristic:
     ++	choose the ref that minimizes the number of commits in the
     ++	first-parent history of `<committish>` and not in the first-parent
     ++	history of the ref.
     +++
     ++For example, consider the following figure of first-parent histories of
     ++several refs:
     +++
     ++----
     ++*--*--*--*--*--* refs/heads/A
     ++\
     ++ \
     ++  *--*--*--* refs/heads/B
     ++   \     \
     ++    \     \
     ++     *     * refs/heads/C
     ++      \
     ++       \
     ++	*--* refs/heads/D
     ++----
     +++
     ++Here, if `A`, `B`, and `C` are the filtered references, and the format
     ++string is `%(refname):%(is-base:D)`, then the output would be
     +++
     ++----
     ++refs/heads/A:
     ++refs/heads/B:(D)
     ++refs/heads/C:
     ++----
     +++
     ++This is because the first-parent history of `D` has its earliest
     ++intersection with the first-parent histories of the filtered refs at a
     ++common first-parent ancestor of `B` and `C` and ties are broken by the
     ++earliest ref in the sorted order.
     +++
     ++Note that this token will not appear if the first-parent history of
     ++`<committish>` does not intersect the first-parent histories of the
     ++filtered refs.
     ++
     + describe[:options]::
     + 	A human-readable name, like linkgit:git-describe[1];
     + 	empty string for undescribable commits. The `describe` string may
     +
       ## ref-filter.c ##
      @@ ref-filter.c: enum atom_type {
       	ATOM_ELSE,
 3:  db87434e146 = 3:  757c20090db p1500: add is-base performance tests