Message ID | 20230310214515.39154-1-felipe.contreras@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Support for tail (branch point) experiment | expand |
Felipe Contreras <felipe.contreras@gmail.com> writes: > This is *not* meant a serious proposal, it's just an exploration of an > idea. It is easy to explain and understand the benefit of keeping a separate pointer to the bottom [*] of the branch on top of which the history leading to the commit at the tip of the branch has been built, but the devil is in the details of how such a bottom pointer will be maintained. side note: below, I use "bottom" because for me it is the most natural term to refer to the starting end of the range of commits. In the context of this topic, readers can replace any "bottom" they see with "tail", if they prefer. In a sense, this is very similar to the idea of "notes". It is easy to explain and understand that a bag of objects, in which additional data can be associated with an object name, can be used to keep track of extra data on commits (and other objects) after they are created without invalidating their object name. As long as they are copied/moved when a commit is used to create another copy of it. The "notes" are automatically copied across "rebasing", which is one of the many details that makes the "notes" usable, but cherry-pick that does not honor notes.rewriteRef sometimes leads to frustration. Creation of a new branch with "git branch" would be an obvious point to add such a bottom pointer, and "git rebase" is a good point to update such a bottom pointer. But there are many other ways that people update their branches, depending on the workflow, and guessing when to update the bottom pointer and trying to be complete with the heuristics will lead to the same "no, we do not know all users' workflows" that made approaches based on reflog parsing etc. fail to solve the "where did the branch start?" puzzle. And I think what is sketched in these RFC patches can be a good starting point for a solution that strikes a good balance. "git rebase", which is the most common way to mangle branches, is taught to update the bottom pointer automatically. Giving users an explicit way to set the bottom when manipulating branches would help those who mangle their branches with something other than "git rebase" in the most trivial form. I suspect that is still missing in this RFC? Of course other things on the consuming side may be missing, like send-email or format-patch, but they are a lot more trivial to add and will be useful. As long as the bottom pointer is properly maintained, that is. A few of the things that I often do to mangle my branches are listed. Some of them are not application of "git rebase" in the trivial form: * I have a patch series (single strand of pearls). I update on top of the updated upstream: $ git rebase -i --onto master @{bottom} $ git range-diff @{bottom}@{1}..@{1} @{bottom}..HEAD No, this is not what "I often do" yet, but I hope to see become doable. Rebase the current branch from its bottom on top of the master, and then take the range diff between the old branch (i.e. @{bottom} refers to the bottom pointer, but because it is implemented as a ref, its reflog knows what the previous value of it was---@{bottom}@{1}..@{1} would be the range of commits on the branch before I did the above rebase) and the new one. * I have 7 patch series (single strand of pearls). I only need to touch the top 3. $ git rebase -i HEAD~3 $ git range-diff @{1}... In this case, I am not updating the bottom to HEAD~3 and reducing the branch into 3-patch series. I am keeping the bottom of the branch, and the commits that happen to be updated are only the topmost 3. * In the same situation, but the top 3 in the original are so bad that I am better off redoing them from scratch, taking advantage of new features in 'master'. $ git checkout --detach master ... work on detached HEAD ... ... first pick the bottom commits ... $ git cherry-pick master..@{-1}~3 ... still working on detached HEAD ... ... redo the topmost commits from scratch ... $ git range-diff master..@{-1} master.. $ git checkout -B @{-1} I do not mind "checkout -B" *not* learning any trick to automatically update the bottom pointer for the branch to 'master' in this case, but I should be able to manually update the bottom of the branch easily. Something like "git checkout -B @{-1} --set-bottom=master" might be acceptable here. * I have an existing series, and want to replace it. To keep the reflog of these branches useful, I apply patches, fix author's mistakes, etc., on detached HEAD and update the original branch after everything is done. $ git checkout --detach master... # This could be "git checkout --detach @{bottom}" $ git am -s mbox $ git range-diff @{-1}... $ git checkout -B @{-1} In this case, the bottom of the branch should stay the same. * I tried to do the above, but failed at "git am" step, because new iteration requires to be on updated master. $ git checkout --detach master... # This could be "git checkout --detach @{bottom}" $ git am -s mbox $ git am --abort $ git reset --hard master $ git am -s mbox $ git range-diff master..@{-1} master.. $ git checkout -B @{-1} # or "git branch -f @{-1}" In this case, I should be able to manually update the bottom of the branch, and making it easy (e.g. "git checkout -B @{-1} --set-bottom=master") is much easier and more robust than teaching "checkout -B" to guess my intention. IOW, I do not mind if maintenance of the bottom of the branch is not always automatic (and prone to heuristic making an incorrect guess). But I think we should make sure it is easy for the user to assist the tool to maintain it correctly [*]. Side note: and that is what I find "frustrating" in the "notes" world. "notes" can be copied after cherry-pick manually, but that is a very tedious process, and at some point, being "merely possible" stops to have much value, unless it is "easily doable". There are of course other things people do to their branches, and I do not think we need to teach all the tools used in these workflows to update the bottom pointer automatically (even though the more we can do automatically would make it easier for users, as long as the automation never makes any mistakes). Again, I think the key to the success for this "we record the fork point of a branch" idea is to make it easy and simple for users to help the tools to maintain it correctly. Thanks.
On Fri, Mar 10, 2023 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > Felipe Contreras <felipe.contreras@gmail.com> writes: > > > This is *not* meant a serious proposal, it's just an exploration of an > > idea. > > It is easy to explain and understand the benefit of keeping a > separate pointer to the bottom [*] of the branch on top of which the > history leading to the commit at the tip of the branch has been > built, but the devil is in the details of how such a bottom pointer > will be maintained. > > side note: below, I use "bottom" because for me it is the most > natural term to refer to the starting end of the range of > commits. In the context of this topic, readers can replace any > "bottom" they see with "tail", if they prefer. Perhaps @{base} would be better (I think that was my original name). Mercurial has an experimental feature called "topics", and that's the name they use for the starting point of a topic. > In a sense, this is very similar to the idea of "notes". It is easy > to explain and understand that a bag of objects, in which additional > data can be associated with an object name, can be used to keep > track of extra data on commits (and other objects) after they are > created without invalidating their object name. As long as they are > copied/moved when a commit is used to create another copy of it. > The "notes" are automatically copied across "rebasing", which is one > of the many details that makes the "notes" usable, but cherry-pick > that does not honor notes.rewriteRef sometimes leads to frustration. I implemented that in 2014 [1]. There's no actual reason for that to not work in 2023 if we wanted. But this is an argument in favor of @{base} (or whatever): even if notes are not perfect, they still can be useful in certain situations, and it's certainly better than not having that information. Similarly, @{base} doesn't have to be perfect in the first iteration, the natural points in which it's updated can be implemented later, by just existing it would provide some potentially useful information to the user, which is better than nothing. > Creation of a new branch with "git branch" would be an obvious point > to add such a bottom pointer, and "git rebase" is a good point to > update such a bottom pointer. But there are many other ways that > people update their branches, depending on the workflow, and > guessing when to update the bottom pointer and trying to be complete > with the heuristics will lead to the same "no, we do not know all > users' workflows" that made approaches based on reflog parsing > etc. fail to solve the "where did the branch start?" puzzle. > > And I think what is sketched in these RFC patches can be a good > starting point for a solution that strikes a good balance. "git > rebase", which is the most common way to mangle branches, is taught > to update the bottom pointer automatically. > > Giving users an explicit way to set the bottom when manipulating > branches would help those who mangle their branches with something > other than "git rebase" in the most trivial form. I suspect that is > still missing in this RFC? Yes, we would want a way to update the base manually, just like with @{upstream}. > Of course other things on the consuming side may be missing, like > send-email or format-patch, but they are a lot more trivial to add and > will be useful. As long as the bottom pointer is properly maintained, > that is. Yes, but that can be done later. If @{base} is useful and updated in a good enough manner, users are obviously going to want it used in tools like `git send-email`, but even before that, just being able to do `@{base}..` is useful (even if manually). > A few of the things that I often do to mangle my branches are > listed. Some of them are not application of "git rebase" in the > trivial form: > > * I have a patch series (single strand of pearls). I update on > top of the updated upstream: > > $ git rebase -i --onto master @{bottom} > $ git range-diff @{bottom}@{1}..@{1} @{bottom}..HEAD > > No, this is not what "I often do" yet, but I hope to see become > doable. Rebase the current branch from its bottom on top of the > master, and then take the range diff between the old branch > (i.e. @{bottom} refers to the bottom pointer, but because it is > implemented as a ref, its reflog knows what the previous value of > it was---@{bottom}@{1}..@{1} would be the range of commits on the > branch before I did the above rebase) and the new one. That would work only if the last update was a rebase. To make it work reliably we would need some sort of branchlog. Personally I have a similar use case, but I want to use range-diff mainly before sending a patch series. What my tool `git send-series` does is store for example `refs/sent/test-aggregate/v2` and `refs/sent/test-aggregate/v2-tail`. Conceptually this is v2 of the patch series. > * I have 7 patch series (single strand of pearls). I only need to > touch the top 3. > > $ git rebase -i HEAD~3 > $ git range-diff @{1}... > > In this case, I am not updating the bottom to HEAD~3 and reducing > the branch into 3-patch series. I am keeping the bottom of the > branch, and the commits that happen to be updated are only the > topmost 3. Right, maybe the base should be updated only when --onto is supplied, or perhaps even a new --base option so it's clear the user wants the new behavior. > * In the same situation, but the top 3 in the original are so bad > that I am better off redoing them from scratch, taking advantage > of new features in 'master'. > > $ git checkout --detach master > ... work on detached HEAD ... > ... first pick the bottom commits ... > $ git cherry-pick master..@{-1}~3 > ... still working on detached HEAD ... > ... redo the topmost commits from scratch ... > $ git range-diff master..@{-1} master.. > $ git checkout -B @{-1} > > I do not mind "checkout -B" *not* learning any trick to > automatically update the bottom pointer for the branch to > 'master' in this case, but I should be able to manually update > the bottom of the branch easily. Something like "git checkout -B > @{-1} --set-bottom=master" might be acceptable here. Yes, something like that would be needed. One obvious use case for me is "show me the current branch", as in `git log @{base}..@`. Because `git log` is very efficient that's usually not necessary, but I often launch `gitk`, and it's annoying that it tried to load *all* the commits reachable, wasting resources and polluting the view, which is why I started developing a tool that essentially did `gitk $1@{u}..$1`, but that quickly becomes complex if upstream isn't configured. With my tool I can do `git vs` (show the current branch visually), or `git ls` (show the current branch on the command line). Weirdly enough, Mercurial's new topic extension has a command that shows precisely that `hg stack` shows only the commits on the current topic (starting from a base). And this reminds me of the previous discussion: What actually is a branch? [2] If we can agree that `branch@{base}..branch` semantically is *something* (whatever you want to call it), then it might make sense to have a way to refer to it, for example `branch^b` or `branch+`. Then interesting combinations immediately become obvious, for example your: git range-diff @{bottom}@{1}..@{1} @@{bottom}..@ Becomes: git range-diff @{1}+ @+ Then if we expand that we can see that @{base} should be an operation on @{1} (@{1}@{base}), not the other way around. > IOW, I do not mind if maintenance of the bottom of the branch is not > always automatic (and prone to heuristic making an incorrect guess). > But I think we should make sure it is easy for the user to assist > the tool to maintain it correctly [*]. > > Side note: and that is what I find "frustrating" in the "notes" > world. "notes" can be copied after cherry-pick manually, but > that is a very tedious process, and at some point, being "merely > possible" stops to have much value, unless it is "easily > doable". Agreed. Similarly, I did not start to use @{upstream} until it was easy to use. But again: @{upstream} was not easy to use at the start, and @{base} doesn't have to be either. I think the important thing to not forget is that this is useful information, and many would argue git is missing it. Cheers. [1] https://lore.kernel.org/git/1398307491-21314-13-git-send-email-felipe.contreras@gmail.com/ [2] https://lore.kernel.org/git/60e61bbd7a37d_3030aa2081a@natae.notmuch/