mbox series

[0/3] git for-each-ref: is-base atom and base branches

Message ID pull.1768.git.1722550226.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series git for-each-ref: is-base atom and base branches | expand

Message

John Cai via GitGitGadget Aug. 1, 2024, 10:10 p.m. UTC
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very
similar way to the 'ahead-behind' atom. As detailed carefully in the first
change, this is motivated by the need to detect the concept of a "base
branch" in a repository with multiple long-lived branches.

This change is motivated by a third-party tool created to make this
detection with the same optimization mechanism, but using a much slower
technique due to the limitations of the Git CLI not presenting this
information. The existing algorithm involves using git rev-list
--first-parent -<N> in batches for the collection of considered references,
comparing those lists, and increasing <N> as needed until finding a
collision. This new use of 'git for-each-ref' will allow determining this
mechanism within a single process and walking a minimal number of commits.

There are benefits to users both on client-side and server-side. In an
internal monorepo, this base branch detection algorithm is used to determine
a long-lived branch based on the HEAD commit, mapping to a group within the
organizational structure of the repository, which determines a set of
projects that the user will likely need to build; this leads to
automatically selecting an initial sparse-checkout definition based on the
build dependencies required. An upcoming feature in Azure Repos will use
this algorithm to automatically create a pull request against the correct
target branch, reducing user pain from needing to select a different branch
after a large commit diff is rendered against the default branch. This atom
unlocks that ability for Git hosting services that use Git in their backend.

Thanks, -Stolee

Derrick Stolee (3):
  commit-reach: add get_branch_base_for_tip
  for-each-ref: add 'is-base' token
  p1500: add is-base performance tests

 commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
 commit-reach.h              |  17 ++++++
 ref-filter.c                |  78 +++++++++++++++++++++++-
 ref-filter.h                |  15 +++++
 t/helper/test-reach.c       |   2 +
 t/perf/p1500-graph-walks.sh |  31 ++++++++++
 t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
 7 files changed, 354 insertions(+), 1 deletion(-)


base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1768%2Fderrickstolee%2Ftarget-ref-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1768/derrickstolee/target-ref-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1768

Comments

Junio C Hamano Aug. 1, 2024, 11:06 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Derrick Stolee (3):
>   commit-reach: add get_branch_base_for_tip
>   for-each-ref: add 'is-base' token
>   p1500: add is-base performance tests
>
>  commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
>  commit-reach.h              |  17 ++++++
>  ref-filter.c                |  78 +++++++++++++++++++++++-
>  ref-filter.h                |  15 +++++
>  t/helper/test-reach.c       |   2 +
>  t/perf/p1500-graph-walks.sh |  31 ++++++++++
>  t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
>  7 files changed, 354 insertions(+), 1 deletion(-)

I was expecting to see an documentation update to for-each-ref (and
probably branch and tag) so that what this new atom means.  Is it
that %(is-base:<commit>) interpolates to <commit> for a ref that is an
ancestor of <commit>, and interpolates to an empty string for a ref
that is not, or something?

Thanks.
Derrick Stolee Aug. 2, 2024, 2:32 p.m. UTC | #2
On 8/1/24 7:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Derrick Stolee (3):
>>    commit-reach: add get_branch_base_for_tip
>>    for-each-ref: add 'is-base' token
>>    p1500: add is-base performance tests
>>
>>   commit-reach.c              | 118 ++++++++++++++++++++++++++++++++++++
>>   commit-reach.h              |  17 ++++++
>>   ref-filter.c                |  78 +++++++++++++++++++++++-
>>   ref-filter.h                |  15 +++++
>>   t/helper/test-reach.c       |   2 +
>>   t/perf/p1500-graph-walks.sh |  31 ++++++++++
>>   t/t6600-test-reach.sh       |  94 ++++++++++++++++++++++++++++
>>   7 files changed, 354 insertions(+), 1 deletion(-)
> 
> I was expecting to see an documentation update to for-each-ref (and
> probably branch and tag) so that what this new atom means.  Is it
> that %(is-base:<commit>) interpolates to <commit> for a ref that is an
> ancestor of <commit>, and interpolates to an empty string for a ref
> that is not, or something?

You are absolutely right that I missed this crucial detail. I will
eventually send a v2 with this oversight corrected. For now, please
consider this documentation diff, and I look forward to other review
comments that I can use to improve this series before sending v2.

diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..5154ba3e2a7 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,16 @@ ahead-behind:<committish>::
  	commits ahead and behind, respectively, when comparing the output
  	ref to the `<committish>` specified in the format.

+is-base:<committish>::
+	In at most one row, `(<committish>)` will appear to indicate the ref
+	that minimizes the number of commits in the first-parent history of
+	`<committish>` and not in the first-parent history of the ref. Ties
+	are broken by favoring the earliest ref in the list. Note that this
+	token will not appear if the first-parent history of `<committish>`
+	does not intersect the first-parent histories of the filtered refs.
+	This can be used as a heuristic to guess which of the filtered refs
+	was used as the base of the branch that produced `<committish>`.
+
  describe[:options]::
  	A human-readable name, like linkgit:git-describe[1];
  	empty string for undescribable commits. The `describe` string may
Junio C Hamano Aug. 2, 2024, 4:55 p.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

> consider this documentation diff, and I look forward to other review
> comments that I can use to improve this series before sending v2.
>
> diff --git a/Documentation/git-for-each-ref.txt
> b/Documentation/git-for-each-ref.txt
> index c1dd12b93cf..5154ba3e2a7 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -264,6 +264,16 @@ ahead-behind:<committish>::
>  	commits ahead and behind, respectively, when comparing the output
>  	ref to the `<committish>` specified in the format.
>
> +is-base:<committish>::
> +	In at most one row, `(<committish>)` will appear to indicate the ref
> +	that minimizes the number of commits in the first-parent history of
> +	`<committish>` and not in the first-parent history of the ref. Ties
> +	are broken by favoring the earliest ref in the list. Note that this
> +	token will not appear if the first-parent history of `<committish>`
> +	does not intersect the first-parent histories of the filtered refs.
> +	This can be used as a heuristic to guess which of the filtered refs
> +	was used as the base of the branch that produced `<committish>`.
> +

OK.  Knowing what definition you used is crucial when reading the
implementation, as we cannot tell what you wanted to implement
without it ;-)

Thanks.
Junio C Hamano Aug. 2, 2024, 5:30 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

>> +is-base:<committish>::
>> +	In at most one row, `(<committish>)` will appear to indicate the ref
>> +	that minimizes the number of commits in the first-parent history of
>> +	`<committish>` and not in the first-parent history of the ref.

This was a bit too dense for me to grok.  So if I have a <commit>
that is at the tip of a branch B forked from 'master', and then
'master' advanced by a lot since the branch forked, the number this
is minimizing for 'master' is the commits on the branch B, but when
showing 'maint', then even though the branch B may have the tip of
'maint' as an ancestor, the number for 'maint' would be a lot more
than the number for 'master'.  If there were another branch C that
was forked from 'master' and shared some (or all) commits that are
near the tip of branch B, e.g.

    ---o---o---o---o---o---o---o---o---o 'master'
            \
             o---o---o---o 'C'
                      \    
                       o---o---o---o 'B'

then the number may be even smaller for branch 'C' than 'master'.

And for at most one ref, %(is-base:<commit>) becomes "(<commit>)";
for all other refs, it becomes an empty string.

OK.

> OK.  Knowing what definition you used is crucial when reading the
> implementation, as we cannot tell what you wanted to implement
> without it ;-)
>
> Thanks.