diff mbox series

[RFC] hooks--pre-push.sample: identify branch point

Message ID 20230309220405.219212-1-anarcat@debian.org (mailing list archive)
State New, archived
Headers show
Series [RFC] hooks--pre-push.sample: identify branch point | expand

Commit Message

Antoine Beaupré March 9, 2023, 10:04 p.m. UTC
The pre-push hook introduced in 87c86dd14a (Add sample pre-push hook
script, 2013-01-13) has a pretty naive implementation that inspects
the entirety of that branch history, regardless of previous merges.

In other words, if you create a topic branch from a current history,
the entire history will be inspected by the pre-push hook. In my case,
there was an old "WIP" commit log that broke the hook, even though
that commit wasn't specific to the branch in question, nor was it
introduced by the push.

This patch aims at fixing that problem by restricting the revisions inspected when a new branch is pushed to something that is more specific to that branch.

This implementation will first attempt to find an ancestor that the
current branch is related to (`--merged=`). This is where this
implementation is the most questionable; normally you would put
`master` or `main` as a base branch, but who knows what people
actually use for this nowadays. And besides, it's fair to assume you
could be pushing something based on a branch that already exists
upstream that is *not* master or main... But still, that's a tricky
bit I'm not sure of.

Then we find the "branch point" which is the latest commit on the
ancestor branch that's shared with the inspected ref. This,
interestingly, seems to be a really tricky problem as well. I base my
implementation off this answer on Stack Overflow (I know! at least
it's not ChatGPT!):

https://stackoverflow.com/a/71193866/1174784

There are currently a whopping twenty-five answers to that question in
that thread, and I'm hoping the community here will have a more
definitive answer to this question. I have picked the answer that uses
the least possible external commands, but it still uses a `tail -1`
which I'm somewhat unhappy about. I have thought of using
`--max-count` for this instead, but I understand that probably does
the equivalent of a `head -n` *and* it's applied before `--reverse`,
so there's not other way to do this.

The final question to answer here is whether this is a good idea in
the first place, and whether this is the right place to answer this
kind of question. I happen to really like using pre-push (instead of
pre-commit) for inspecting my work before submitting it upstream, so
it was a natural fit for me, but this might be everyone's taste.

As the subject indicates, I would very much welcome comments on
this. I would be happy to submit a more elaborate version of
this (e.g. with unit tests) if it's interesting for the community, or
receive guidance on where best this could be implemented or improved.

Signed-off-by: Antoine Beaupré <anarcat@debian.org>
---
 templates/hooks--pre-push.sample | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Felipe Contreras March 9, 2023, 11:22 p.m. UTC | #1
Hi Antoine,

On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote:

> https://stackoverflow.com/a/71193866/1174784
>
> There are currently a whopping twenty-five answers to that question in
> that thread, and I'm hoping the community here will have a more
> definitive answer to this question. I have picked the answer that uses
> the least possible external commands, but it still uses a `tail -1`
> which I'm somewhat unhappy about. I have thought of using
> `--max-count` for this instead, but I understand that probably does
> the equivalent of a `head -n` *and* it's applied before `--reverse`,
> so there's not other way to do this.

I spent an inordinate amount of time trying to answer that question a
decade ago, and my conclusion after trying every possible combination
is that it's simply not possible. Every solution at the end of the day
will be a hack that can be broken with a corner case. It has already
been discussed in this mailing list [1], and nobody found a solution.

That's why I wrote a patch to implement a branch@{tail} helper to show
an auxiliary ref to the beginning of the branch. I don't think I ever
sent it to the mailing list, as my patches are rarely merged, but I'm
sure I have it somewhere.

The other solution I thought of was adding an update-branch hook that
could be run every time a branch is updated, and then the hook would
update the branch tail reference [2]. As expected, that patch wasn't
merged either.

It's interesting how we keep coming back to the same problems; right
now there's a discussion in the git-users mailing list precisely about
the same topic: how to find the branch point, in particular so `git
name-rev` shows the correct branch a commit belongs to (which is
otherwise just a bad guess).

FWIW my motivation at the time was to prove Mercurial users wrong
regarding features that they have and Git doesn't, I contended that
Mercurial named branches (aka commit labels) were not necessary, and
everything they achieved could be achieved in Git through different
means. That turned out to be untrue, as there is one thing Mercurial
can do that Git can't: show the precise point a branch started from.

I abandoned my efforts back then, but the topic seems inescapable, as
that is also needed by new tools like `git range-diff`, so in my own
tool to keep track of patch series (`git send-series`[2])I end up
creating a ton of refs just to properly keep track of the branch
points of my different patch series.

If only Git developers acknowledged the current very real limitation,
something could be done about it.

Cheers.

[1] https://lore.kernel.org/git/CAMP44s0f7AJPQSTDgvy0U7vx8nxzq2a3vMhSr2Tcc61fetFkJA@mail.gmail.com/
[2] https://lore.kernel.org/git/1398047016-21643-1-git-send-email-felipe.contreras@gmail.com/
[3] https://github.com/felipec/git-send-series
Antoine Beaupré March 10, 2023, 4:28 p.m. UTC | #2
On 2023-03-09 17:22:55, Felipe Contreras wrote:
> Hi Antoine,
>
> On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote:
>
>> https://stackoverflow.com/a/71193866/1174784
>>
>> There are currently a whopping twenty-five answers to that question in
>> that thread, and I'm hoping the community here will have a more
>> definitive answer to this question. I have picked the answer that uses
>> the least possible external commands, but it still uses a `tail -1`
>> which I'm somewhat unhappy about. I have thought of using
>> `--max-count` for this instead, but I understand that probably does
>> the equivalent of a `head -n` *and* it's applied before `--reverse`,
>> so there's not other way to do this.
>
> I spent an inordinate amount of time trying to answer that question a
> decade ago, and my conclusion after trying every possible combination
> is that it's simply not possible. Every solution at the end of the day
> will be a hack that can be broken with a corner case. It has already
> been discussed in this mailing list [1], and nobody found a solution.

That's what I have gathered from reading through that Stack Overflow
thread as well.

> That's why I wrote a patch to implement a branch@{tail} helper to show
> an auxiliary ref to the beginning of the branch. I don't think I ever
> sent it to the mailing list, as my patches are rarely merged, but I'm
> sure I have it somewhere.

That would be interesting for the world to see, I bet, if only as a
future reference to avoid other people trying to bang their head on the
problem the same way. :)

> The other solution I thought of was adding an update-branch hook that
> could be run every time a branch is updated, and then the hook would
> update the branch tail reference [2]. As expected, that patch wasn't
> merged either.

That seems like a major change in workflow though, adding basically a
new ref for each branch, right?

> It's interesting how we keep coming back to the same problems; right
> now there's a discussion in the git-users mailing list precisely about
> the same topic: how to find the branch point, in particular so `git
> name-rev` shows the correct branch a commit belongs to (which is
> otherwise just a bad guess).

Well, it's a need people certainly seem to have. :)

I feel we are letting perfection be the enemy of good here. No, there
are no solutions that work for the general case, you always find a
corner case that breaks it. But what if we could have a simple solution
that works for *most* cases and then *fails* gracefully for the corner
cases?

In the case of the pre-push hook, specifically, we don't absolutely need
something completely rock solid; if your branches are a mess of merges
and cherry-picks and cross merges, yes, it might get confused but it's
not like it's going to lose a commit or something. The worse case is
that it's going to miss *checking* a commit and for this case it's not
satisfying, but it's also not data loss.

> FWIW my motivation at the time was to prove Mercurial users wrong
> regarding features that they have and Git doesn't, I contended that
> Mercurial named branches (aka commit labels) were not necessary, and
> everything they achieved could be achieved in Git through different
> means. That turned out to be untrue, as there is one thing Mercurial
> can do that Git can't: show the precise point a branch started from.

I have given up on Mercurial a long, long time ago. It's extremely rare
that I find myself in such a situation and typically, there various
hacks that answer the need without going into too much complexity.

The only reason I raise the issue here is because I wasn't satisfied
hardcoding "master" (or main, for that matter) in the hook because I
wanted a more generic answer. I suspect many people could be perfectly
fine with hardcoding that in their hook or, failing that, with the
incomplete heuristic I am proposing.

Or they could even have a per-branch .git/config entry to map the branch
to an upstream branch, and *that* could even "default" to "main" or
whatever that setting is called now. :)

A.
Felipe Contreras March 10, 2023, 10:09 p.m. UTC | #3
On Fri, Mar 10, 2023 at 10:28 AM Antoine Beaupré <anarcat@debian.org> wrote:
>
> On 2023-03-09 17:22:55, Felipe Contreras wrote:
> > Hi Antoine,
> >
> > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote:
> >
> >> https://stackoverflow.com/a/71193866/1174784
> >>
> >> There are currently a whopping twenty-five answers to that question in
> >> that thread, and I'm hoping the community here will have a more
> >> definitive answer to this question. I have picked the answer that uses
> >> the least possible external commands, but it still uses a `tail -1`
> >> which I'm somewhat unhappy about. I have thought of using
> >> `--max-count` for this instead, but I understand that probably does
> >> the equivalent of a `head -n` *and* it's applied before `--reverse`,
> >> so there's not other way to do this.
> >
> > I spent an inordinate amount of time trying to answer that question a
> > decade ago, and my conclusion after trying every possible combination
> > is that it's simply not possible. Every solution at the end of the day
> > will be a hack that can be broken with a corner case. It has already
> > been discussed in this mailing list [1], and nobody found a solution.
>
> That's what I have gathered from reading through that Stack Overflow
> thread as well.
>
> > That's why I wrote a patch to implement a branch@{tail} helper to show
> > an auxiliary ref to the beginning of the branch. I don't think I ever
> > sent it to the mailing list, as my patches are rarely merged, but I'm
> > sure I have it somewhere.
>
> That would be interesting for the world to see, I bet, if only as a
> future reference to avoid other people trying to bang their head on the
> problem the same way. :)

OK, I've rebased my patches on top of the current master (which wasn't
easy) and sent them to the mailing list [1]. They are pretty hacky,
but they show what an actual solution could look like.

> > The other solution I thought of was adding an update-branch hook that
> > could be run every time a branch is updated, and then the hook would
> > update the branch tail reference [2]. As expected, that patch wasn't
> > merged either.
>
> That seems like a major change in workflow though, adding basically a
> new ref for each branch, right?

There's no change in the workflow, the user keeps typing the same
commands, there's just extra information.

But yeah, every branch now has two refs.

> > It's interesting how we keep coming back to the same problems; right
> > now there's a discussion in the git-users mailing list precisely about
> > the same topic: how to find the branch point, in particular so `git
> > name-rev` shows the correct branch a commit belongs to (which is
> > otherwise just a bad guess).
>
> Well, it's a need people certainly seem to have. :)
>
> I feel we are letting perfection be the enemy of good here. No, there
> are no solutions that work for the general case, you always find a
> corner case that breaks it. But what if we could have a simple solution
> that works for *most* cases and then *fails* gracefully for the corner
> cases?

I did propose such a solution, I wrote extensive tests to make sure it
worked properly, but it was largely ignored [2].

The solution with --exclude-first-parent-only fails my tests in a very
complex case:

   X (master)
    \
     A (topic)

Sure, it's probably easy to fix, but the point is that a reliable and
robust solution everyone agrees with doesn't exist.

> In the case of the pre-push hook, specifically, we don't absolutely need
> something completely rock solid; if your branches are a mess of merges
> and cherry-picks and cross merges, yes, it might get confused but it's
> not like it's going to lose a commit or something. The worse case is
> that it's going to miss *checking* a commit and for this case it's not
> satisfying, but it's also not data loss.

But I bet you want the simple case of a sequence of commits with no
merges to properly detect something.

> > FWIW my motivation at the time was to prove Mercurial users wrong
> > regarding features that they have and Git doesn't, I contended that
> > Mercurial named branches (aka commit labels) were not necessary, and
> > everything they achieved could be achieved in Git through different
> > means. That turned out to be untrue, as there is one thing Mercurial
> > can do that Git can't: show the precise point a branch started from.
>
> I have given up on Mercurial a long, long time ago. It's extremely rare
> that I find myself in such a situation and typically, there various
> hacks that answer the need without going into too much complexity.
>
> The only reason I raise the issue here is because I wasn't satisfied
> hardcoding "master" (or main, for that matter) in the hook because I
> wanted a more generic answer. I suspect many people could be perfectly
> fine with hardcoding that in their hook or, failing that, with the
> incomplete heuristic I am proposing.
>
> Or they could even have a per-branch .git/config entry to map the branch
> to an upstream branch, and *that* could even "default" to "main" or
> whatever that setting is called now. :)

Sounds like you are talking about the upstream tracking branch [3].
Are you familiar with that?

Cheers.

[1] https://lore.kernel.org/git/20230310214515.39154-1-felipe.contreras@gmail.com/
[2] https://stackoverflow.com/a/10821591/10474
[3] https://felipec.wordpress.com/2013/09/01/advanced-git-concepts-the-upstream-tracking-branch/
Antoine Beaupré March 12, 2023, 6:14 p.m. UTC | #4
On 2023-03-10 16:09:43, Felipe Contreras wrote:
> On Fri, Mar 10, 2023 at 10:28 AM Antoine Beaupré <anarcat@debian.org> wrote:
>>
>> On 2023-03-09 17:22:55, Felipe Contreras wrote:
>> > Hi Antoine,
>> >
>> > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote:
>> >

[...]

>> > It's interesting how we keep coming back to the same problems; right
>> > now there's a discussion in the git-users mailing list precisely about
>> > the same topic: how to find the branch point, in particular so `git
>> > name-rev` shows the correct branch a commit belongs to (which is
>> > otherwise just a bad guess).
>>
>> Well, it's a need people certainly seem to have. :)
>>
>> I feel we are letting perfection be the enemy of good here. No, there
>> are no solutions that work for the general case, you always find a
>> corner case that breaks it. But what if we could have a simple solution
>> that works for *most* cases and then *fails* gracefully for the corner
>> cases?
>
> I did propose such a solution, I wrote extensive tests to make sure it
> worked properly, but it was largely ignored [2].
>
> The solution with --exclude-first-parent-only fails my tests in a very
> complex case:
>
>    X (master)
>     \
>      A (topic)
>
> Sure, it's probably easy to fix, but the point is that a reliable and
> robust solution everyone agrees with doesn't exist.

Hm... that's odd, I'm surprised that doesn't work. But that's certainly
a "special" (!) case that should be handled properly.

[...]

>> Or they could even have a per-branch .git/config entry to map the branch
>> to an upstream branch, and *that* could even "default" to "main" or
>> whatever that setting is called now. :)
>
> Sounds like you are talking about the upstream tracking branch [3].
> Are you familiar with that?

No, I'm not refering to branch.NAME.upstream here, sorry if my use of
"upstream" here was confusing. I mean "the branch this branch has been
forked from" not "the upstream equivalent to this local branch".

a.
Felipe Contreras March 16, 2023, 5:32 p.m. UTC | #5
On Sun, Mar 12, 2023 at 12:14 PM Antoine Beaupré <anarcat@debian.org> wrote:
>
> On 2023-03-10 16:09:43, Felipe Contreras wrote:
> > On Fri, Mar 10, 2023 at 10:28 AM Antoine Beaupré <anarcat@debian.org> wrote:
> >>
> >> On 2023-03-09 17:22:55, Felipe Contreras wrote:
> >> > Hi Antoine,
> >> >
> >> > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote:
> >> >
>
> [...]
>
> >> > It's interesting how we keep coming back to the same problems; right
> >> > now there's a discussion in the git-users mailing list precisely about
> >> > the same topic: how to find the branch point, in particular so `git
> >> > name-rev` shows the correct branch a commit belongs to (which is
> >> > otherwise just a bad guess).
> >>
> >> Well, it's a need people certainly seem to have. :)
> >>
> >> I feel we are letting perfection be the enemy of good here. No, there
> >> are no solutions that work for the general case, you always find a
> >> corner case that breaks it. But what if we could have a simple solution
> >> that works for *most* cases and then *fails* gracefully for the corner
> >> cases?
> >
> > I did propose such a solution, I wrote extensive tests to make sure it
> > worked properly, but it was largely ignored [2].
> >
> > The solution with --exclude-first-parent-only fails my tests in a very
> > complex case:
> >
> >    X (master)
> >     \
> >      A (topic)
> >
> > Sure, it's probably easy to fix, but the point is that a reliable and
> > robust solution everyone agrees with doesn't exist.
>
> Hm... that's odd, I'm surprised that doesn't work. But that's certainly
> a "special" (!) case that should be handled properly.

That's because the command wasn't meant to be called from a script,
but by a human who knows what he is doing.

To make it into a command that "just works" regardless of the
situation some work would be needed to make sure it works in all the
cases people have already debated.

My command just works, I would be willing to do the work of
investigating if  --exclude-first-parent-only could be used instead,
but it's not very tempting to do that work again if it's going to be
ignored again.

> >> Or they could even have a per-branch .git/config entry to map the branch
> >> to an upstream branch, and *that* could even "default" to "main" or
> >> whatever that setting is called now. :)
> >
> > Sounds like you are talking about the upstream tracking branch [3].
> > Are you familiar with that?
>
> No, I'm not refering to branch.NAME.upstream here, sorry if my use of
> "upstream" here was confusing. I mean "the branch this branch has been
> forked from" not "the upstream equivalent to this local branch".

Unfortunately Git conflates two different concepts into @{upstream}:
the branch we want to rebase to, and the branch we want to be merged
to. By "upstream" I meant the upstream tracking branch when it's
configured to the branch we want to rebase to. For example:

  git switch --create topic --track master

In this case topic@{u} is the branch that we forked from.

In my fork of git I de-conflate these two concepts, which are clearly
different: @{upstream} versus @{publish}.

In my personal workflow @{upstream} is *always* the branch I forked
from, and I want to rebase to, and when it's not configured "master"
is a safe default.

Because it's tedious to do this check every time, I have a script to
basically do:

  local u="${branch}@{u}"
  git rev-parse --verify --quiet "$u" || u=master
  echo "${u}..$branch"

It would be nice if git supported @{upstream|default} or even better: @{base}.

Cheers.
diff mbox series

Patch

diff --git a/templates/hooks--pre-push.sample b/templates/hooks--pre-push.sample
index 4ce688d32b..f871b65195 100755
--- a/templates/hooks--pre-push.sample
+++ b/templates/hooks--pre-push.sample
@@ -33,8 +33,24 @@  do
 	else
 		if test "$remote_oid" = "$zero"
 		then
-			# New branch, examine all commits
-			range="$local_oid"
+			# new branch
+			#
+			# search for a base branch that's part of this branch, latest modified
+			#
+			# it's a better heuristic than hardcoding "master" or "main"
+			base_branch=$(git for-each-ref \
+					  --merged="$local_ref" \
+					  --no-contains="$local_ref" \
+					  --format="%(refname:strip=-1)" \
+					  --sort='-*authordate' \
+					  refs/heads )
+			# find the place where we branched off the base branch
+			branch_point=$(git rev-parse \
+					   $(git rev-list --exclude-first-parent-only \
+						 ^"$base_branch" "$local_ref"| tail -1)^ \
+                                    )
+			# examine all commits up to the branch point
+		        range="$branch_point..$local_oid"
 		else
 			# Update to existing branch, examine new commits
 			range="$remote_oid..$local_oid"