mbox series

[RFC,0/3] Support for tail (branch point) experiment

Message ID 20230310214515.39154-1-felipe.contreras@gmail.com (mailing list archive)
Headers show
Series Support for tail (branch point) experiment | expand

Message

Felipe Contreras March 10, 2023, 9:45 p.m. UTC
This is *not* meant a serious proposal, it's just an exploration of an
idea.

The topic of finding the actual point a branch started to fork has been
discussed for decades [1] and yet no clear solution is in sight. That's
why this idea I had in 2013 keeps coming back.

The idea is simple: add the concept of a branch tail (e.g.
`master@{tail}`.

The motivation is that Git's main competitor--Mercurial--does have the
ability to tell with 100% accuracy where a branch started, Git does not.

Many hacks have been proposed, such as parsing the commit messages for
"Merge branch", using the reflog, adding options like
--exclude-first-parent-only. All these clever solutions fail in one way
or another.

If we stopped trying to be clever we could go for the easy solution:
simply add a tail marker.

This has many advantages:

 * `git rebase` can simply use that
 * `git send-email` can use that
 * `git range-diff` can use that
 * `git name-rev` will now be accurate

I know most of my tools (`git send-seriels`, `git related`, and `git
smart-list`) would greatly benefit from this information.

Moreover, for one the most important commands of git, it makes much more
sense semantically:

    git rebase --onto branch@{upstream} branch@{tail} branch

Than the current:

    git rebase --onto something branch@{upstream} branch

I'm not expecting this to be seriously considered (given the track
record of my proposals), but now the idea is on the record, so it can be
referenced in future discussions (which are likely not going to just
stop).

Cheers.

[1] https://stackoverflow.com/questions/1527234/finding-a-branch-point-with-git/71193866#71193866

Felipe Contreras (3):
  branch: add new 'tail' concept
  sha1-name: add @{tail} helpers
  rebase: update branch tail after rebasing

 branch.c                  | 12 ++++++++++++
 branch.h                  |  1 +
 builtin/clone.c           |  1 +
 builtin/rebase.c          | 15 +++++++++++++++
 object-name.c             | 30 +++++++++++++++++++++++++++++-
 t/t1514-rev-parse-tail.sh | 39 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100755 t/t1514-rev-parse-tail.sh

Comments

Junio C Hamano March 11, 2023, 12:04 a.m. UTC | #1
Felipe Contreras <felipe.contreras@gmail.com> writes:

> This is *not* meant a serious proposal, it's just an exploration of an
> idea.

It is easy to explain and understand the benefit of keeping a
separate pointer to the bottom [*] of the branch on top of which the
history leading to the commit at the tip of the branch has been
built, but the devil is in the details of how such a bottom pointer
will be maintained.

    side note: below, I use "bottom" because for me it is the most
    natural term to refer to the starting end of the range of
    commits.  In the context of this topic, readers can replace any
    "bottom" they see with "tail", if they prefer.

In a sense, this is very similar to the idea of "notes".  It is easy
to explain and understand that a bag of objects, in which additional
data can be associated with an object name, can be used to keep
track of extra data on commits (and other objects) after they are
created without invalidating their object name.  As long as they are
copied/moved when a commit is used to create another copy of it.
The "notes" are automatically copied across "rebasing", which is one
of the many details that makes the "notes" usable, but cherry-pick
that does not honor notes.rewriteRef sometimes leads to frustration.

Creation of a new branch with "git branch" would be an obvious point
to add such a bottom pointer, and "git rebase" is a good point to
update such a bottom pointer.  But there are many other ways that
people update their branches, depending on the workflow, and
guessing when to update the bottom pointer and trying to be complete
with the heuristics will lead to the same "no, we do not know all
users' workflows" that made approaches based on reflog parsing
etc. fail to solve the "where did the branch start?" puzzle.

And I think what is sketched in these RFC patches can be a good
starting point for a solution that strikes a good balance.  "git
rebase", which is the most common way to mangle branches, is taught
to update the bottom pointer automatically.

Giving users an explicit way to set the bottom when manipulating
branches would help those who mangle their branches with something
other than "git rebase" in the most trivial form.  I suspect that is
still missing in this RFC?  Of course other things on the consuming
side may be missing, like send-email or format-patch, but they are a
lot more trivial to add and will be useful.  As long as the bottom
pointer is properly maintained, that is.

A few of the things that I often do to mangle my branches are
listed.  Some of them are not application of "git rebase" in the
trivial form:

 * I have a patch series (single strand of pearls).  I update on
   top of the updated upstream:

    $ git rebase -i --onto master @{bottom}
    $ git range-diff @{bottom}@{1}..@{1} @{bottom}..HEAD

   No, this is not what "I often do" yet, but I hope to see become
   doable.  Rebase the current branch from its bottom on top of the
   master, and then take the range diff between the old branch
   (i.e. @{bottom} refers to the bottom pointer, but because it is
   implemented as a ref, its reflog knows what the previous value of
   it was---@{bottom}@{1}..@{1} would be the range of commits on the
   branch before I did the above rebase) and the new one.

 * I have 7 patch series (single strand of pearls).  I only need to
   touch the top 3.

    $ git rebase -i HEAD~3
    $ git range-diff @{1}...

   In this case, I am not updating the bottom to HEAD~3 and reducing
   the branch into 3-patch series.  I am keeping the bottom of the
   branch, and the commits that happen to be updated are only the
   topmost 3.

 * In the same situation, but the top 3 in the original are so bad
   that I am better off redoing them from scratch, taking advantage
   of new features in 'master'.

    $ git checkout --detach master
    ... work on detached HEAD ...
    ... first pick the bottom commits ...
    $ git cherry-pick master..@{-1}~3
    ... still working on detached HEAD ...
    ... redo the topmost commits from scratch ...
    $ git range-diff master..@{-1} master..
    $ git checkout -B @{-1}

   I do not mind "checkout -B" *not* learning any trick to
   automatically update the bottom pointer for the branch to
   'master' in this case, but I should be able to manually update
   the bottom of the branch easily.  Something like "git checkout -B
   @{-1} --set-bottom=master" might be acceptable here.

 * I have an existing series, and want to replace it.  To keep the
   reflog of these branches useful, I apply patches, fix author's
   mistakes, etc., on detached HEAD and update the original branch
   after everything is done.

    $ git checkout --detach master... 
    # This could  be "git checkout --detach @{bottom}"
    $ git am -s mbox
    $ git range-diff @{-1}...
    $ git checkout -B @{-1}

   In this case, the bottom of the branch should stay the same.

 * I tried to do the above, but failed at "git am" step, because new
   iteration requires to be on updated master.

    $ git checkout --detach master... 
    # This could  be "git checkout --detach @{bottom}"
    $ git am -s mbox
    $ git am --abort
    $ git reset --hard master
    $ git am -s mbox
    $ git range-diff master..@{-1} master..
    $ git checkout -B @{-1}
    # or "git branch -f @{-1}"

   In this case, I should be able to manually update the bottom of
   the branch, and making it easy (e.g. "git checkout -B @{-1}
   --set-bottom=master") is much easier and more robust than
   teaching "checkout -B" to guess my intention.

IOW, I do not mind if maintenance of the bottom of the branch is not
always automatic (and prone to heuristic making an incorrect guess).
But I think we should make sure it is easy for the user to assist
the tool to maintain it correctly [*].

    Side note: and that is what I find "frustrating" in the "notes"
    world.  "notes" can be copied after cherry-pick manually, but
    that is a very tedious process, and at some point, being "merely
    possible" stops to have much value, unless it is "easily
    doable".

There are of course other things people do to their branches, and I
do not think we need to teach all the tools used in these workflows
to update the bottom pointer automatically (even though the more we
can do automatically would make it easier for users, as long as the
automation never makes any mistakes).  Again, I think the key to the
success for this "we record the fork point of a branch" idea is to
make it easy and simple for users to help the tools to maintain it
correctly.

Thanks.
Felipe Contreras March 11, 2023, 3:26 a.m. UTC | #2
On Fri, Mar 10, 2023 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> > This is *not* meant a serious proposal, it's just an exploration of an
> > idea.
>
> It is easy to explain and understand the benefit of keeping a
> separate pointer to the bottom [*] of the branch on top of which the
> history leading to the commit at the tip of the branch has been
> built, but the devil is in the details of how such a bottom pointer
> will be maintained.
>
>     side note: below, I use "bottom" because for me it is the most
>     natural term to refer to the starting end of the range of
>     commits.  In the context of this topic, readers can replace any
>     "bottom" they see with "tail", if they prefer.

Perhaps @{base} would be better (I think that was my original name).
Mercurial has an experimental feature called "topics", and that's the
name they use for the starting point of a topic.

> In a sense, this is very similar to the idea of "notes".  It is easy
> to explain and understand that a bag of objects, in which additional
> data can be associated with an object name, can be used to keep
> track of extra data on commits (and other objects) after they are
> created without invalidating their object name.  As long as they are
> copied/moved when a commit is used to create another copy of it.
> The "notes" are automatically copied across "rebasing", which is one
> of the many details that makes the "notes" usable, but cherry-pick
> that does not honor notes.rewriteRef sometimes leads to frustration.

I implemented that in 2014 [1].

There's no actual reason for that to not work in 2023 if we wanted.

But this is an argument in favor of @{base} (or whatever): even if
notes are not perfect, they still can be useful in certain situations,
and it's certainly better than not having that information. Similarly,
@{base} doesn't have to be perfect in the first iteration, the natural
points in which it's updated can be implemented later, by just
existing it would provide some potentially useful information to the
user, which is better than nothing.

> Creation of a new branch with "git branch" would be an obvious point
> to add such a bottom pointer, and "git rebase" is a good point to
> update such a bottom pointer.  But there are many other ways that
> people update their branches, depending on the workflow, and
> guessing when to update the bottom pointer and trying to be complete
> with the heuristics will lead to the same "no, we do not know all
> users' workflows" that made approaches based on reflog parsing
> etc. fail to solve the "where did the branch start?" puzzle.
>
> And I think what is sketched in these RFC patches can be a good
> starting point for a solution that strikes a good balance.  "git
> rebase", which is the most common way to mangle branches, is taught
> to update the bottom pointer automatically.
>
> Giving users an explicit way to set the bottom when manipulating
> branches would help those who mangle their branches with something
> other than "git rebase" in the most trivial form.  I suspect that is
> still missing in this RFC?

Yes, we would want a way to update the base manually, just like with
@{upstream}.

> Of course other things on the consuming side may be missing, like
> send-email or format-patch, but they are a lot more trivial to add and
> will be useful.  As long as the bottom pointer is properly maintained,
> that is.

Yes, but that can be done later. If @{base} is useful and updated in a
good enough manner, users are obviously going to want it used in tools
like `git send-email`, but even before that, just being able to do
`@{base}..` is useful (even if manually).

> A few of the things that I often do to mangle my branches are
> listed.  Some of them are not application of "git rebase" in the
> trivial form:
>
>  * I have a patch series (single strand of pearls).  I update on
>    top of the updated upstream:
>
>     $ git rebase -i --onto master @{bottom}
>     $ git range-diff @{bottom}@{1}..@{1} @{bottom}..HEAD
>
>    No, this is not what "I often do" yet, but I hope to see become
>    doable.  Rebase the current branch from its bottom on top of the
>    master, and then take the range diff between the old branch
>    (i.e. @{bottom} refers to the bottom pointer, but because it is
>    implemented as a ref, its reflog knows what the previous value of
>    it was---@{bottom}@{1}..@{1} would be the range of commits on the
>    branch before I did the above rebase) and the new one.

That would work only if the last update was a rebase. To make it work
reliably we would need some sort of branchlog.

Personally I have a similar use case, but I want to use range-diff
mainly before sending a patch series. What my tool `git send-series`
does is store for example `refs/sent/test-aggregate/v2` and
`refs/sent/test-aggregate/v2-tail`. Conceptually this is v2 of the
patch series.

>  * I have 7 patch series (single strand of pearls).  I only need to
>    touch the top 3.
>
>     $ git rebase -i HEAD~3
>     $ git range-diff @{1}...
>
>    In this case, I am not updating the bottom to HEAD~3 and reducing
>    the branch into 3-patch series.  I am keeping the bottom of the
>    branch, and the commits that happen to be updated are only the
>    topmost 3.

Right, maybe the base should be updated only when --onto is supplied,
or perhaps even a new --base option so it's clear the user wants the
new behavior.

>  * In the same situation, but the top 3 in the original are so bad
>    that I am better off redoing them from scratch, taking advantage
>    of new features in 'master'.
>
>     $ git checkout --detach master
>     ... work on detached HEAD ...
>     ... first pick the bottom commits ...
>     $ git cherry-pick master..@{-1}~3
>     ... still working on detached HEAD ...
>     ... redo the topmost commits from scratch ...
>     $ git range-diff master..@{-1} master..
>     $ git checkout -B @{-1}
>
>    I do not mind "checkout -B" *not* learning any trick to
>    automatically update the bottom pointer for the branch to
>    'master' in this case, but I should be able to manually update
>    the bottom of the branch easily.  Something like "git checkout -B
>    @{-1} --set-bottom=master" might be acceptable here.

Yes, something like that would be needed.

One obvious use case for me is "show me the current branch", as in
`git log @{base}..@`. Because `git log` is very efficient that's
usually not necessary, but I often launch `gitk`, and it's annoying
that it tried to load *all* the commits reachable, wasting resources
and polluting the view, which is why I started developing a tool that
essentially did `gitk $1@{u}..$1`, but that quickly becomes complex if
upstream isn't configured. With my tool I can do `git vs` (show the
current branch visually), or `git ls` (show the current branch on the
command line).

Weirdly enough, Mercurial's new topic extension has a command that
shows precisely that `hg stack` shows only the commits on the current
topic (starting from a base).

And this reminds me of the previous discussion: What actually is a branch? [2]

If we can agree that `branch@{base}..branch` semantically is
*something* (whatever you want to call it), then it might make sense
to have a way to refer to it, for example `branch^b` or `branch+`.

Then interesting combinations immediately become obvious, for example your:

    git range-diff @{bottom}@{1}..@{1} @@{bottom}..@

Becomes:

    git range-diff @{1}+ @+

Then if we expand that we can see that @{base} should be an operation
on @{1} (@{1}@{base}), not the other way around.

> IOW, I do not mind if maintenance of the bottom of the branch is not
> always automatic (and prone to heuristic making an incorrect guess).
> But I think we should make sure it is easy for the user to assist
> the tool to maintain it correctly [*].
>
>     Side note: and that is what I find "frustrating" in the "notes"
>     world.  "notes" can be copied after cherry-pick manually, but
>     that is a very tedious process, and at some point, being "merely
>     possible" stops to have much value, unless it is "easily
>     doable".

Agreed. Similarly, I did not start to use @{upstream} until it was easy to use.

But again: @{upstream} was not easy to use at the start, and @{base}
doesn't have to be either.

I think the important thing to not forget is that this is useful
information, and many would argue git is missing it.

Cheers.

[1] https://lore.kernel.org/git/1398307491-21314-13-git-send-email-felipe.contreras@gmail.com/
[2] https://lore.kernel.org/git/60e61bbd7a37d_3030aa2081a@natae.notmuch/