Message ID | 20230309220405.219212-1-anarcat@debian.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] hooks--pre-push.sample: identify branch point | expand |
Hi Antoine, On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote: > https://stackoverflow.com/a/71193866/1174784 > > There are currently a whopping twenty-five answers to that question in > that thread, and I'm hoping the community here will have a more > definitive answer to this question. I have picked the answer that uses > the least possible external commands, but it still uses a `tail -1` > which I'm somewhat unhappy about. I have thought of using > `--max-count` for this instead, but I understand that probably does > the equivalent of a `head -n` *and* it's applied before `--reverse`, > so there's not other way to do this. I spent an inordinate amount of time trying to answer that question a decade ago, and my conclusion after trying every possible combination is that it's simply not possible. Every solution at the end of the day will be a hack that can be broken with a corner case. It has already been discussed in this mailing list [1], and nobody found a solution. That's why I wrote a patch to implement a branch@{tail} helper to show an auxiliary ref to the beginning of the branch. I don't think I ever sent it to the mailing list, as my patches are rarely merged, but I'm sure I have it somewhere. The other solution I thought of was adding an update-branch hook that could be run every time a branch is updated, and then the hook would update the branch tail reference [2]. As expected, that patch wasn't merged either. It's interesting how we keep coming back to the same problems; right now there's a discussion in the git-users mailing list precisely about the same topic: how to find the branch point, in particular so `git name-rev` shows the correct branch a commit belongs to (which is otherwise just a bad guess). FWIW my motivation at the time was to prove Mercurial users wrong regarding features that they have and Git doesn't, I contended that Mercurial named branches (aka commit labels) were not necessary, and everything they achieved could be achieved in Git through different means. That turned out to be untrue, as there is one thing Mercurial can do that Git can't: show the precise point a branch started from. I abandoned my efforts back then, but the topic seems inescapable, as that is also needed by new tools like `git range-diff`, so in my own tool to keep track of patch series (`git send-series`[2])I end up creating a ton of refs just to properly keep track of the branch points of my different patch series. If only Git developers acknowledged the current very real limitation, something could be done about it. Cheers. [1] https://lore.kernel.org/git/CAMP44s0f7AJPQSTDgvy0U7vx8nxzq2a3vMhSr2Tcc61fetFkJA@mail.gmail.com/ [2] https://lore.kernel.org/git/1398047016-21643-1-git-send-email-felipe.contreras@gmail.com/ [3] https://github.com/felipec/git-send-series
On 2023-03-09 17:22:55, Felipe Contreras wrote: > Hi Antoine, > > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote: > >> https://stackoverflow.com/a/71193866/1174784 >> >> There are currently a whopping twenty-five answers to that question in >> that thread, and I'm hoping the community here will have a more >> definitive answer to this question. I have picked the answer that uses >> the least possible external commands, but it still uses a `tail -1` >> which I'm somewhat unhappy about. I have thought of using >> `--max-count` for this instead, but I understand that probably does >> the equivalent of a `head -n` *and* it's applied before `--reverse`, >> so there's not other way to do this. > > I spent an inordinate amount of time trying to answer that question a > decade ago, and my conclusion after trying every possible combination > is that it's simply not possible. Every solution at the end of the day > will be a hack that can be broken with a corner case. It has already > been discussed in this mailing list [1], and nobody found a solution. That's what I have gathered from reading through that Stack Overflow thread as well. > That's why I wrote a patch to implement a branch@{tail} helper to show > an auxiliary ref to the beginning of the branch. I don't think I ever > sent it to the mailing list, as my patches are rarely merged, but I'm > sure I have it somewhere. That would be interesting for the world to see, I bet, if only as a future reference to avoid other people trying to bang their head on the problem the same way. :) > The other solution I thought of was adding an update-branch hook that > could be run every time a branch is updated, and then the hook would > update the branch tail reference [2]. As expected, that patch wasn't > merged either. That seems like a major change in workflow though, adding basically a new ref for each branch, right? > It's interesting how we keep coming back to the same problems; right > now there's a discussion in the git-users mailing list precisely about > the same topic: how to find the branch point, in particular so `git > name-rev` shows the correct branch a commit belongs to (which is > otherwise just a bad guess). Well, it's a need people certainly seem to have. :) I feel we are letting perfection be the enemy of good here. No, there are no solutions that work for the general case, you always find a corner case that breaks it. But what if we could have a simple solution that works for *most* cases and then *fails* gracefully for the corner cases? In the case of the pre-push hook, specifically, we don't absolutely need something completely rock solid; if your branches are a mess of merges and cherry-picks and cross merges, yes, it might get confused but it's not like it's going to lose a commit or something. The worse case is that it's going to miss *checking* a commit and for this case it's not satisfying, but it's also not data loss. > FWIW my motivation at the time was to prove Mercurial users wrong > regarding features that they have and Git doesn't, I contended that > Mercurial named branches (aka commit labels) were not necessary, and > everything they achieved could be achieved in Git through different > means. That turned out to be untrue, as there is one thing Mercurial > can do that Git can't: show the precise point a branch started from. I have given up on Mercurial a long, long time ago. It's extremely rare that I find myself in such a situation and typically, there various hacks that answer the need without going into too much complexity. The only reason I raise the issue here is because I wasn't satisfied hardcoding "master" (or main, for that matter) in the hook because I wanted a more generic answer. I suspect many people could be perfectly fine with hardcoding that in their hook or, failing that, with the incomplete heuristic I am proposing. Or they could even have a per-branch .git/config entry to map the branch to an upstream branch, and *that* could even "default" to "main" or whatever that setting is called now. :) A.
On Fri, Mar 10, 2023 at 10:28 AM Antoine Beaupré <anarcat@debian.org> wrote: > > On 2023-03-09 17:22:55, Felipe Contreras wrote: > > Hi Antoine, > > > > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote: > > > >> https://stackoverflow.com/a/71193866/1174784 > >> > >> There are currently a whopping twenty-five answers to that question in > >> that thread, and I'm hoping the community here will have a more > >> definitive answer to this question. I have picked the answer that uses > >> the least possible external commands, but it still uses a `tail -1` > >> which I'm somewhat unhappy about. I have thought of using > >> `--max-count` for this instead, but I understand that probably does > >> the equivalent of a `head -n` *and* it's applied before `--reverse`, > >> so there's not other way to do this. > > > > I spent an inordinate amount of time trying to answer that question a > > decade ago, and my conclusion after trying every possible combination > > is that it's simply not possible. Every solution at the end of the day > > will be a hack that can be broken with a corner case. It has already > > been discussed in this mailing list [1], and nobody found a solution. > > That's what I have gathered from reading through that Stack Overflow > thread as well. > > > That's why I wrote a patch to implement a branch@{tail} helper to show > > an auxiliary ref to the beginning of the branch. I don't think I ever > > sent it to the mailing list, as my patches are rarely merged, but I'm > > sure I have it somewhere. > > That would be interesting for the world to see, I bet, if only as a > future reference to avoid other people trying to bang their head on the > problem the same way. :) OK, I've rebased my patches on top of the current master (which wasn't easy) and sent them to the mailing list [1]. They are pretty hacky, but they show what an actual solution could look like. > > The other solution I thought of was adding an update-branch hook that > > could be run every time a branch is updated, and then the hook would > > update the branch tail reference [2]. As expected, that patch wasn't > > merged either. > > That seems like a major change in workflow though, adding basically a > new ref for each branch, right? There's no change in the workflow, the user keeps typing the same commands, there's just extra information. But yeah, every branch now has two refs. > > It's interesting how we keep coming back to the same problems; right > > now there's a discussion in the git-users mailing list precisely about > > the same topic: how to find the branch point, in particular so `git > > name-rev` shows the correct branch a commit belongs to (which is > > otherwise just a bad guess). > > Well, it's a need people certainly seem to have. :) > > I feel we are letting perfection be the enemy of good here. No, there > are no solutions that work for the general case, you always find a > corner case that breaks it. But what if we could have a simple solution > that works for *most* cases and then *fails* gracefully for the corner > cases? I did propose such a solution, I wrote extensive tests to make sure it worked properly, but it was largely ignored [2]. The solution with --exclude-first-parent-only fails my tests in a very complex case: X (master) \ A (topic) Sure, it's probably easy to fix, but the point is that a reliable and robust solution everyone agrees with doesn't exist. > In the case of the pre-push hook, specifically, we don't absolutely need > something completely rock solid; if your branches are a mess of merges > and cherry-picks and cross merges, yes, it might get confused but it's > not like it's going to lose a commit or something. The worse case is > that it's going to miss *checking* a commit and for this case it's not > satisfying, but it's also not data loss. But I bet you want the simple case of a sequence of commits with no merges to properly detect something. > > FWIW my motivation at the time was to prove Mercurial users wrong > > regarding features that they have and Git doesn't, I contended that > > Mercurial named branches (aka commit labels) were not necessary, and > > everything they achieved could be achieved in Git through different > > means. That turned out to be untrue, as there is one thing Mercurial > > can do that Git can't: show the precise point a branch started from. > > I have given up on Mercurial a long, long time ago. It's extremely rare > that I find myself in such a situation and typically, there various > hacks that answer the need without going into too much complexity. > > The only reason I raise the issue here is because I wasn't satisfied > hardcoding "master" (or main, for that matter) in the hook because I > wanted a more generic answer. I suspect many people could be perfectly > fine with hardcoding that in their hook or, failing that, with the > incomplete heuristic I am proposing. > > Or they could even have a per-branch .git/config entry to map the branch > to an upstream branch, and *that* could even "default" to "main" or > whatever that setting is called now. :) Sounds like you are talking about the upstream tracking branch [3]. Are you familiar with that? Cheers. [1] https://lore.kernel.org/git/20230310214515.39154-1-felipe.contreras@gmail.com/ [2] https://stackoverflow.com/a/10821591/10474 [3] https://felipec.wordpress.com/2013/09/01/advanced-git-concepts-the-upstream-tracking-branch/
On 2023-03-10 16:09:43, Felipe Contreras wrote: > On Fri, Mar 10, 2023 at 10:28 AM Antoine Beaupré <anarcat@debian.org> wrote: >> >> On 2023-03-09 17:22:55, Felipe Contreras wrote: >> > Hi Antoine, >> > >> > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote: >> > [...] >> > It's interesting how we keep coming back to the same problems; right >> > now there's a discussion in the git-users mailing list precisely about >> > the same topic: how to find the branch point, in particular so `git >> > name-rev` shows the correct branch a commit belongs to (which is >> > otherwise just a bad guess). >> >> Well, it's a need people certainly seem to have. :) >> >> I feel we are letting perfection be the enemy of good here. No, there >> are no solutions that work for the general case, you always find a >> corner case that breaks it. But what if we could have a simple solution >> that works for *most* cases and then *fails* gracefully for the corner >> cases? > > I did propose such a solution, I wrote extensive tests to make sure it > worked properly, but it was largely ignored [2]. > > The solution with --exclude-first-parent-only fails my tests in a very > complex case: > > X (master) > \ > A (topic) > > Sure, it's probably easy to fix, but the point is that a reliable and > robust solution everyone agrees with doesn't exist. Hm... that's odd, I'm surprised that doesn't work. But that's certainly a "special" (!) case that should be handled properly. [...] >> Or they could even have a per-branch .git/config entry to map the branch >> to an upstream branch, and *that* could even "default" to "main" or >> whatever that setting is called now. :) > > Sounds like you are talking about the upstream tracking branch [3]. > Are you familiar with that? No, I'm not refering to branch.NAME.upstream here, sorry if my use of "upstream" here was confusing. I mean "the branch this branch has been forked from" not "the upstream equivalent to this local branch". a.
On Sun, Mar 12, 2023 at 12:14 PM Antoine Beaupré <anarcat@debian.org> wrote: > > On 2023-03-10 16:09:43, Felipe Contreras wrote: > > On Fri, Mar 10, 2023 at 10:28 AM Antoine Beaupré <anarcat@debian.org> wrote: > >> > >> On 2023-03-09 17:22:55, Felipe Contreras wrote: > >> > Hi Antoine, > >> > > >> > On Thu, Mar 9, 2023 at 4:34 PM Antoine Beaupré <anarcat@debian.org> wrote: > >> > > > [...] > > >> > It's interesting how we keep coming back to the same problems; right > >> > now there's a discussion in the git-users mailing list precisely about > >> > the same topic: how to find the branch point, in particular so `git > >> > name-rev` shows the correct branch a commit belongs to (which is > >> > otherwise just a bad guess). > >> > >> Well, it's a need people certainly seem to have. :) > >> > >> I feel we are letting perfection be the enemy of good here. No, there > >> are no solutions that work for the general case, you always find a > >> corner case that breaks it. But what if we could have a simple solution > >> that works for *most* cases and then *fails* gracefully for the corner > >> cases? > > > > I did propose such a solution, I wrote extensive tests to make sure it > > worked properly, but it was largely ignored [2]. > > > > The solution with --exclude-first-parent-only fails my tests in a very > > complex case: > > > > X (master) > > \ > > A (topic) > > > > Sure, it's probably easy to fix, but the point is that a reliable and > > robust solution everyone agrees with doesn't exist. > > Hm... that's odd, I'm surprised that doesn't work. But that's certainly > a "special" (!) case that should be handled properly. That's because the command wasn't meant to be called from a script, but by a human who knows what he is doing. To make it into a command that "just works" regardless of the situation some work would be needed to make sure it works in all the cases people have already debated. My command just works, I would be willing to do the work of investigating if --exclude-first-parent-only could be used instead, but it's not very tempting to do that work again if it's going to be ignored again. > >> Or they could even have a per-branch .git/config entry to map the branch > >> to an upstream branch, and *that* could even "default" to "main" or > >> whatever that setting is called now. :) > > > > Sounds like you are talking about the upstream tracking branch [3]. > > Are you familiar with that? > > No, I'm not refering to branch.NAME.upstream here, sorry if my use of > "upstream" here was confusing. I mean "the branch this branch has been > forked from" not "the upstream equivalent to this local branch". Unfortunately Git conflates two different concepts into @{upstream}: the branch we want to rebase to, and the branch we want to be merged to. By "upstream" I meant the upstream tracking branch when it's configured to the branch we want to rebase to. For example: git switch --create topic --track master In this case topic@{u} is the branch that we forked from. In my fork of git I de-conflate these two concepts, which are clearly different: @{upstream} versus @{publish}. In my personal workflow @{upstream} is *always* the branch I forked from, and I want to rebase to, and when it's not configured "master" is a safe default. Because it's tedious to do this check every time, I have a script to basically do: local u="${branch}@{u}" git rev-parse --verify --quiet "$u" || u=master echo "${u}..$branch" It would be nice if git supported @{upstream|default} or even better: @{base}. Cheers.
diff --git a/templates/hooks--pre-push.sample b/templates/hooks--pre-push.sample index 4ce688d32b..f871b65195 100755 --- a/templates/hooks--pre-push.sample +++ b/templates/hooks--pre-push.sample @@ -33,8 +33,24 @@ do else if test "$remote_oid" = "$zero" then - # New branch, examine all commits - range="$local_oid" + # new branch + # + # search for a base branch that's part of this branch, latest modified + # + # it's a better heuristic than hardcoding "master" or "main" + base_branch=$(git for-each-ref \ + --merged="$local_ref" \ + --no-contains="$local_ref" \ + --format="%(refname:strip=-1)" \ + --sort='-*authordate' \ + refs/heads ) + # find the place where we branched off the base branch + branch_point=$(git rev-parse \ + $(git rev-list --exclude-first-parent-only \ + ^"$base_branch" "$local_ref"| tail -1)^ \ + ) + # examine all commits up to the branch point + range="$branch_point..$local_oid" else # Update to existing branch, examine new commits range="$remote_oid..$local_oid"
The pre-push hook introduced in 87c86dd14a (Add sample pre-push hook script, 2013-01-13) has a pretty naive implementation that inspects the entirety of that branch history, regardless of previous merges. In other words, if you create a topic branch from a current history, the entire history will be inspected by the pre-push hook. In my case, there was an old "WIP" commit log that broke the hook, even though that commit wasn't specific to the branch in question, nor was it introduced by the push. This patch aims at fixing that problem by restricting the revisions inspected when a new branch is pushed to something that is more specific to that branch. This implementation will first attempt to find an ancestor that the current branch is related to (`--merged=`). This is where this implementation is the most questionable; normally you would put `master` or `main` as a base branch, but who knows what people actually use for this nowadays. And besides, it's fair to assume you could be pushing something based on a branch that already exists upstream that is *not* master or main... But still, that's a tricky bit I'm not sure of. Then we find the "branch point" which is the latest commit on the ancestor branch that's shared with the inspected ref. This, interestingly, seems to be a really tricky problem as well. I base my implementation off this answer on Stack Overflow (I know! at least it's not ChatGPT!): https://stackoverflow.com/a/71193866/1174784 There are currently a whopping twenty-five answers to that question in that thread, and I'm hoping the community here will have a more definitive answer to this question. I have picked the answer that uses the least possible external commands, but it still uses a `tail -1` which I'm somewhat unhappy about. I have thought of using `--max-count` for this instead, but I understand that probably does the equivalent of a `head -n` *and* it's applied before `--reverse`, so there's not other way to do this. The final question to answer here is whether this is a good idea in the first place, and whether this is the right place to answer this kind of question. I happen to really like using pre-push (instead of pre-commit) for inspecting my work before submitting it upstream, so it was a natural fit for me, but this might be everyone's taste. As the subject indicates, I would very much welcome comments on this. I would be happy to submit a more elaborate version of this (e.g. with unit tests) if it's interesting for the community, or receive guidance on where best this could be implemented or improved. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- templates/hooks--pre-push.sample | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)