technical doc: add a design doc for the evolve command

Message ID	20181115005546.212538-1-sxenos@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 14 Nov 2018 16:55:46 -0800 Message-Id: <20181115005546.212538-1-sxenos@google.com> Mime-Version: 1.0 Subject: [PATCH] technical doc: add a design doc for the evolve command From: sxenos@google.com To: git@vger.kernel.org Cc: sbeller@google.com, jrn@google.com, jch@google.com, jonathantanmy@google.com, stolee@gmail.com, carl@ecbaldwin.net, dborowitz@google.com, Stefan Xenos <sxenos@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: git-owner@vger.kernel.org Precedence: bulk
Series	technical doc: add a design doc for the evolve command \| expand technical doc: add a design doc for the evolve command

Stefan Xenos Nov. 15, 2018, 12:55 a.m. UTC

From: Stefan Xenos <sxenos@google.com>

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.

Signed-off-by: Stefan Xenos <sxenos@google.com>
---
 Documentation/technical/evolve.txt | 885 +++++++++++++++++++++++++++++
 1 file changed, 885 insertions(+)
 create mode 100644 Documentation/technical/evolve.txt

Johannes Schindelin Nov. 15, 2018, 12:52 p.m. UTC | #1

Hi Stefan,

On Wed, 14 Nov 2018, sxenos@google.com wrote:

> From: Stefan Xenos <sxenos@google.com>
> 
> This document describes what an obsolescence graph for
> git would look like, the behavior of the evolve command,
> and the changes planned for other commands.

Thanks, this is a good discussion starter.

> +Objective
> +---------
> +Track the edits to a commit over time in an obsolescence graph.

I am not sure that we necessarily need this to be a graph. I think part of
the problems with not being able to GC *any* of this is by this
requirement to have it stored in a graph, rather than having mappings from
which you could reconstruct any non-GC'ed parts of that graph, if you
really want.

> +Background
> +----------
> +Imagine you have three dependent changes up for review and you receive feedback
> +that requires editing all three changes. While you're editing one, more feedback
> +arrives on one of the others. What do you do?
> +
> +The evolve command is a convenient way to work with chains of commits that are
> +under review. Whenever you rebase or amend a commit, the repository remembers
> +that the old commit is obsolete and has been replaced by the new one. Then, at
> +some point in the future, you can run "git evolve" and the correct sequence of
> +rebases will occur in the correct order such that no commit has an obsolete
> +parent.
> +
> +Part of making the "evolve" command work involves tracking the edits to a commit
> +over time, which is why we need an obsolescence graph. However, the obsolescence
> +graph will also bring other benefits:
> +
> +- Users can view the history of a commit directly (the sequence of amends and
> +  rebases it has undergone, orthogonal to the history of the branch it is on).
> +- It will be possible to quickly locate and list all the changes the user
> +  currently has in progress.
> +- It can be used as part of other high-level commands that combine or split
> +  changes.
> +- It can be used to decorate commits (in git log, gitk, etc) that are either
> +  obsolete or are the tip of a work in progress.
> +- By pushing and pulling the obsolescence graph, users can collaborate more
> +  easily on changes-in-progress. This is better than pushing and pulling the
> +  changes themselves since the obsolescence graph can be used to locate a more
> +  specific merge base, allowing for better merges between different versions of
> +  the same change.
> +- It could be used to correctly rebase local changes and other local branches
> +  after running git-filter-branch.
> +- It can replace the change-id footer used by gerrit.

Okay.

> +Similar technologies
> +--------------------
> +There are some other technologies that address the same end-user problem.
> +
> +Rebase -i can be used to solve the same problem, but users can't easily switch
> +tasks midway through an interactive rebase or have more than one interactive
> +rebase going on at the same time. It can't handle the case where you have
> +multiple changes sharing the same parent when that parent needs to be rebased
> +and won't let you collaborate with others on resolving a complicated interactive
> +rebase. You can think of rebase -i as a top-down approach and the evolve command
> +as the bottom-up approach to the same problem.
> +
> +Several patch queue managers have been built on top of git (such as topgit,
> +stgit, and quilt). They address the same user need. However they also rely on
> +state managed outside git that needs to be kept in sync. Such state can be
> +easily damaged when running a git native command that is unaware of the patch
> +queue. They also typically require an explicit initialization step to be done by
> +the user which creates workflow problems.
> +
> +Replacements (refs/replace) are superficially similar to obsolescences in that
> +they describe that one commit should be replaced by another. However, they
> +differ in both how they are created and how they are intended to be used.
> +Obsolescences are created automatically by the commands a user runs, and they
> +describe the user’s intent to perform a future rebase. Obsolete commits still
> +appear in branches, logs, etc like normal commits (possibly with an extra
> +decoration that marks them as obsolete). Replacements are typically created
> +explicitly by the user, they are meant to be kept around for a long time, and
> +they describe a replacement to be applied at read-time rather than as the input
> +to a future operation. When a replaced commit is queried, it is typically hidden
> +and swapped out with its replacement as though the replacement has already
> +occurred.

Why is this missing most notably `hg evolve`? Also, there should be *at
least* a brief introduction how `hg evolve` works. They do have the
benefit of real-world testing, and probably encountered problems and came
up with solutions, and we would be remiss if we did not learn from them.

Also, please do not forget `git imerge`.

Further, I see that this document tries to suggest a proliferation of new
commands (`git change`, `git evolve`, `git obslog` and whatever I glanced
over). This smells a little bit like it wants to be condensed into a
single-purpose command, maybe `evolve`, maybe something better if you can
think of anything.

I guess I will have to stop now and read up on how `hg evolve` works. It
is a it of a pity that that was not described in this document, first
thing, as it forces everybody who is interested in this patch to duplicate
my effort and also go hunt for information about Mercurial.

Ciao,
Johannes

> +Goals
> +-----
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.
> +
> +P0. All commands that modify commits (such as the normal commit --amend or
> +    rebase command) should mark the old commit as being obsolete and replaced by
> +    the new one. No additional commands should be required to keep the
> +    obsolescence graph up-to-date.
> +P0. Any commit that may be involved in a future evolve command should not be
> +    garbage collected. Specifically:
> +    - Commits that obsolete another should not be garbage collected until
> +      user-specified conditions have occurred and the change has expired from
> +      the reflog. User specified conditions for removing changes include:
> +      - The user explicitly deleted the change.
> +      - The change was merged into a specific branch.
> +    - Commits that have been obsoleted by another should not be garbage
> +      collected if any of their replacements are still being retained.
> +P0. A commit can be obsoleted by more than one replacement (called divergence).
> +P0. Must be able to resolve divergence (convergence).
> +P1. Users should be able to share chains of obsolete changes in order to
> +    collaborate on WIP changes.
> +P2. Such sharing should be at the user’s option. That is, it should be possible
> +    to directly share a change without also sharing the file states or commit
> +    comments from the obsolete changes that led up to it, and the choice not to
> +    share those commits should not require changing any commit hashes.
> +P2. It should be possible to discard part or all of the obsolescence graph
> +    without discarding the commits themselves that are already present in
> +    branches and the reflog.
> +
> +
> +Overview
> +========
> +We introduce the notion of “meta-commits” which describe how one commit was
> +created from other commits. A branch of meta-commits is known as a change.
> +Changes are created and updated automatically whenever a user runs a command
> +that creates a commit. They are used for locating obsolete commits, providing a
> +list of a user’s unsubmitted work in progress, and providing a stable name for
> +each unsubmitted change.
> +
> +Users can exchange edit histories by pushing and fetching changes.
> +
> +New commands will be introduced for manipulating changes and resolving
> +divergence between them. Existing commands that create commits will be updated
> +to modify the meta-commit graph and create changes where necessary.
> +
> +Example usage
> +-------------
> +# First create three dependent changes
> +$ echo foo>bar.txt && git add .
> +$ git commit -m "This is a test"
> +created change metas/this_is_a_test
> +$ echo foo2>bar2.txt && git add .
> +$ git commit -m "This is also a test"
> +created change metas/this_is_also_a_test
> +$ echo foo3>bar3.txt && git add .
> +$ git commit -m "More testing"
> +created change metas/more_testing
> +
> +# List all our changes in progress
> +$ git change -l
> +metas/this_is_a_test
> +metas/this_is_also_a_test
> +* metas/more_testing
> +metas/some_change_already_merged_upstream
> +
> +# Now modify the earliest change, using its stable name
> +$ git reset --hard metas/this_is_a_test
> +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> +
> +# Use git-evolve to fix up any dependent changes
> +$ git evolve
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +Done
> +
> +# Use git-obslog to view the history of the this_is_a_test change
> +$ git obslog
> +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> +930219 metas/this_is_a_test@{1} commit: This is a test
> +
> +# Now create an unrelated change
> +$ git reset --hard origin/master
> +$ echo newchange>unrelated.txt && git add .
> +$ git commit -m "Unrelated change"
> +created change metas/unrelated_change
> +
> +# Fetch the latest code from origin/master and use git-evolve
> +# to rebase all dependent changes.
> +$ git fetch origin master
> +$ git evolve origin/master
> +deleting metas/some_change_already_merged_upstream
> +rebasing metas/this_is_a_test onto origin/master
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +rebasing metas/unrelated_change onto origin/master
> +Conflict detected! Resolve it and then use git evolve --continue to resume.
> +
> +# Sort out the conflict
> +$ git mergetool
> +$ git evolve --continue
> +Done
> +
> +# Share the full history of edits for the this_is_a_test change
> +# with a review server
> +$ git push origin metas/this_is_a_test:refs/for/master
> +# Share the lastest commit for “Unrelated change”, without history
> +$ git push origin HEAD:refs/for/master
> +
> +Detailed design
> +===============
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p <example_meta_commit>
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git may
> +attach other trees here. For forward-compatibility fsck should ignore such trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.
> +
> +Parent-type
> +-----------
> +The “parent-type” field in the commit header identifies a commit as a
> +meta-commit and indicates the meaning for each of its parents. It is never
> +present for normal commits. It is a list of enum values whose order matches the
> +order of the parents. Possible parent types are:
> +
> +- content: the content parent identifies the commit that this meta-commit is
> +  describing.
> +- obsolete: indicates that this parent is made obsolete by the content parent.
> +- origin: indicates that this parent was generated from the given commit.
> +
> +There must be exactly one content parent for each meta-commit and it is always
> +be the first parent. The content commit will always be a normal commit and not a
> +meta-commit. However, future versions of git may create meta-commits for other
> +meta-commits and the fsck tool must be aware of this for forwards compatibility.
> +
> +A meta-commit can have zero or more obsolete parents. An amend operation creates
> +a single obsolete parent. A merge used to resolve divergence (see divergence,
> +below) will create multiple obsolete parents. A meta-commit may have zero
> +obsolete parents if it describes a cherry-pick or squash merge that copies one
> +or more commits but does not replace them.
> +
> +A meta-commit can have zero or more origin parents. A cherry-pick creates a
> +single origin parent. Certain types of squash merge will create multiple origin
> +parents.
> +
> +An obsolete parent or origin parent may be either a normal commit (indicating
> +the oldest-known version of a change) or another meta-commit (for a change that
> +has already been modified one or more times).
> +
> +Changes
> +-------
> +A branch of meta-commits describes how a commit was produced and what previous
> +commits it is based on. It is also an identifier for a thing the user is
> +currently working on. We refer to such a meta-branch as a change.
> +
> +Local changes are stored in the new refs/metas namespace. Remote changes are
> +stored in the refs/remotemetas/<remotename> namespace.
> +
> +The list of changes in refs/metas is more than just a mechanism for the evolve
> +command to locate obsolete commits. It is also a convenient list of all of a
> +user’s work in progress and their current state - a list of things they’re
> +likely to want to come back to.
> +
> +Strictly speaking, it is the presence of the branch in the refs/metas namespace
> +that marks a branch as being a change, not the fact that it points to a
> +metacommit. Metacommits are only created when a commit is amended or rebased, so
> +in the case where a change points to a commit that has never been modified, the
> +change points to that initial commit rather than a metacommit.
> +
> +Obsolescence
> +------------
> +A commit is considered obsolete if it is reachable from the “replaces” edges
> +anywhere in the history of a change and it isn’t the head of that change.
> +Commits may be the content for 0 or more meta-commits. If the same commit
> +appears in multiple changes, it is not obsolete if it is the head of any of
> +those changes.
> +
> +Divergence
> +----------
> +From the user’s perspective, two changes are divergent if they both ask for
> +different replacements to the same commit. More precisely, a target commit is
> +considered divergent if there is more than one commit at the head of a change in
> +refs/metas that leads to the target commit via an unbroken chain of “obsolete”
> +edges.
> +
> +Much like a merge conflict, divergence is a situation that requires user
> +intervention to resolve. The evolve command will stop when it encounters
> +divergence and prompt the user to resolve the problem. Users can solve the
> +problem in several ways:
> +
> +- Discard one of the changes (by deleting its change branch).
> +- Merge the two changes (producing a single change branch).
> +- Copy one of the changes (keep both commits, but one of them gets a new
> +  metacommit appended to its history that is connected to its predecessor via an
> +  origin edge rather than an obsolete edge. That new change no longer obsoletes
> +  the original.)
> +
> +Obsolescence across cherry-picks
> +--------------------------------
> +By default the evolve command will treat cherry-picks and squash merges as being
> +completely separate from the original. Further amendments to the original commit
> +will have no effect on the cherry-picked copy. However, this behavior may not be
> +desirable in all circumstances.
> +
> +The evolve command may at some point support an option to look for cases where
> +the source of a cherry-pick or squash merge has itself been amended, and
> +automatically apply that same change to the cherry-picked copy. In such cases,
> +it would traverse origin edges rather than ignoring them, and would treat a
> +commit with origin edges as being obsolete if any of its origins were obsolete.
> +
> +Garbage collection
> +------------------
> +For GC purposes, meta-commits are normal commits. Just as a commit causes its
> +parents and tree to be retained, a meta-commit also causes its parents to be
> +retained.
> +
> +Change creation
> +---------------
> +Changes are created automatically whenever the user runs a command like “commit”
> +that has the semantics of creating a new change. They also move forward
> +automatically even if they’re not checked out. For example, whenever the user
> +runs a command like “commit --amend” that modifies a commit, all branches in
> +refs/metas that pointed to the old commit move forward to point to its
> +replacement instead. This also happens when the user is working from a detached
> +head.
> +
> +This does not mean that every commit has a corresponding change. By default,
> +changes only exist for recent locally-created commits. Users may explicitly pull
> +changes from other users or keep their changes around for a long time, but
> +either behavior requires a user to opt-in. Code review systems like gerrit may
> +also choose to keep changes around forever.
> +
> +Note that the changes in refs/metas serve a dual function as both a way to
> +identify obsolete changes and as a way for the user to keep track of their work
> +in progress. If we were only concerned with identifying obsolete changes, it
> +would be sufficient to create the change branch lazily the first time a commit
> +is obsoleted. Addressing the second use - of refs/metas as a mechanism for
> +keeping track of work in progress - is the reason for eagerly creating the
> +change on first commit.
> +
> +Change naming
> +-------------
> +When a change is first created, the only requirement for its name is that it
> +must be unique. Good names would also serve as useful mnemonics and be easy to
> +type. For example, a short word from the commit message containing no numbers or
> +special characters and that shows up with low frequency in other commit messages
> +would make a good choice.
> +
> +Different users may prefer different heuristics for their change names. For this
> +reason a new hook will be introduced to compute change names. Git will invoke
> +the hook for all newly-created changes and will append a numeric suffix if the
> +name isn’t unique. The default heuristics are not specified by this proposal and
> +may change during implementation.
> +
> +Change deletion
> +---------------
> +Changes are normally only interesting to a user while a commit is still in
> +development and under review. Once the commit has submitted wherever it is
> +going, its change can be discarded.
> +
> +The normal way of deleting changes makes this easy to do - changes are deleted
> +by the evolve command when it detects that the change is present in an upstream
> +branch. It does this in two ways: if the latest commit in a change either shows
> +up in the branch history or the change becomes empty after a rebase, it is
> +considered merged and the change is discarded. In this context, an “upstream
> +branch” is any branch passed in as the upstream argument of the evolve command.
> +
> +In case this sometimes deletes a useful change, such automatic deletions are
> +recorded in the reflog allowing them to be easily recovered.
> +
> +Sharing changes
> +---------------
> +Change histories are shared by pushing or fetching meta-commits and change
> +branches. This provides users with a lot of control of what to share and
> +repository implementations with control over what to retain.
> +
> +Users that only want to share the content of a commit can do so by pushing the
> +commit itself as they currently would. Users that want to share an edit history
> +for the commit can push its change, which would point to a meta-commit rather
> +than the commit itself if there is any history to share. Note that multiple
> +changes can refer to the same commits, so it’s possible to construct and push a
> +different history for the same commit in order to remove sensitive or irrelevant
> +intermediate states.
> +
> +Imagine the user is working on a change “mychange” that is currently the latest
> +commit on master, they have two ways to share it:
> +
> +# User shares just a commit without its history
> +> git push origin master
> +
> +# User shares the full history of the commit to a review system
> +> git push origin change/mychange:refs/for/master
> +
> +# User fetches a collaborator’s modifications to their change
> +> git fetch remotename change/mychange
> +# Which updates the ref remotechange/remotename/mychange
> +
> +This will cause more intermediate states to be shared with the server than would
> +have been shared previously. A review system like gerrit would need to keep
> +track of which states had been explicitly pushed versus other intermediate
> +states in order to de-emphasize (or hide) the extra intermediate states from the
> +user interface.
> +
> +Merge-base
> +----------
> +Merge-base will be changed to search the meta-commit graph for common ancestors
> +as well as the commit graph, and will generally prefer results from the
> +meta-commit graph over the commit graph. Merge-base will consider meta-commits
> +from all changes, and will traverse both origin and obsolete edges.
> +
> +The reason for this is that - when merging two versions of the same commit
> +together - an earlier version of that same commit will usually be much more
> +similar than their common parent. This should make the workflow of collaborating
> +on unsubmitted patches as convenient as the workflow for collaborating in a
> +topic branch by eliminating repeated merges.
> +
> +User interface
> +--------------
> +All git porcelain commands that create commits are classified as having one of
> +four behaviors: modify, create, copy, or import. These behaviors are discussed
> +in more detail below.
> +
> +Modify commands
> +---------------
> +Modification commands (commit --amend, rebase) will mark the old commit as
> +obsolete by creating a new meta-commit that references the old one as an
> +obsolete parent. In the event that multiple changes point to the same commit,
> +this is done independently for every such change.
> +
> +More specifically, modifications work like this:
> +
> +1. Locate all existing changes for which the old commit is the content for the
> +   head of the change branch. If no such branch exists, create one that points
> +   to the old commit. Changes that include this commit in their history but not
> +   at their head are explicitly not included.
> +2. For every such change, create a new meta-commit that references the new
> +   commit as its content and references the old head of the change as an
> +   obsolete parent.
> +3. Move the change branch forward to point to the new meta-commit.
> +
> +Copy commands
> +-------------
> +Copy commands (cherry-pick, merge --squash) create a new meta-commit that
> +references the old commits as origin parents. Besides the fact that the new
> +parents are tagged differently, copy commands work the same way as modify
> +commands.
> +
> +Create commands
> +---------------
> +Creation commands (commit, merge) create a new commit and a new change that
> +points to that commit. The do not create any meta-commits.
> +
> +Import commands
> +---------------
> +Import commands (fetch, pull) do not create any new meta-commits or changes
> +unless that is specifically what they are importing. For example, the fetch
> +command would update remotechange/origin/change35 and fetch all referenced
> +meta-commits if asked to do so directly, but it wouldn’t create any changes or
> +meta-commits for commits discovered on the master branch when running “git fetch
> +origin master”.
> +
> +Other commands
> +--------------
> +Some commands don’t fit cleanly into one of the above categories.
> +
> +Semantically, filter-branch should be treated as a modify command, but doing so
> +is likely to create a lot of irrelevant clutter in the changes namespace and the
> +large number of extra change refs may introduce performance problems. We
> +recommend treating filter-branch as an import command initially, but making it
> +behave more like a modify command in future follow-up work. One possible
> +solution may be to treat commits that are part of existing changes as being
> +modified but to avoid creating changes for other rewritten changes.
> +
> +Once the evolve command can handle obsolescence across cherry-picks, such
> +cherry-picks will result in a hybrid move-and-copy operation. It will create
> +cherry-picks that replace other cherry-picks, which will have both origin edges
> +(pointing to the new source commit being picked) and obsolete edges (pointing to
> +the previous cherry-pick being replaced).
> +
> +Evolve
> +------
> +The evolve command performs the correct sequence of rebases such that no change
> +has an obsolete parent. The syntax looks like this:
> +
> +git evolve [--abort][--continue][--quit] [upstream…]
> +
> +It takes an optional list of upstream branches. All changes whose parent shows
> +up in the history of one of the upstream branches will be rebased onto the
> +upstream branch before resolving obsolete parents.
> +
> +Any change whose latest state is found in an upstream branch (or that ends up
> +empty after rebase) will be deleted. This is the normal mechanism for deleting
> +changes. Changes are created automatically on the first commit, and are deleted
> +automatically when evolve determines that they’ve been merged upstream.
> +
> +Orphan commits are commits with obsolete parents. The evolve command then
> +repeatedly rebases orphan commits with non-orphan parents until there are either
> +no orphan commits left, a merge conflict is discovered, or a divergent parent is
> +discovered.
> +
> +The --abort option returns all changes to the state they were in prior to
> +invoking evolve, and the --quit option terminates the current evolution without
> +changing the current state.
> +
> +Checkout
> +--------
> +Running checkout on a change by name has the same effect as checking out a
> +detached head pointing to the latest commit on that change-branch. There is no
> +need to ever have HEAD point to a change since changes always move forward when
> +necessary, no matter what branch the user has checked out
> +
> +Meta-commits themselves cannot be checked out by their hash.
> +
> +Reset
> +-----
> +Resetting a branch to a change by name is the same as resetting to the commit at
> +that change’s head.
> +
> +Commit
> +------
> +Commit --amend gets modify semantics and will move existing changes forward. The
> +normal form of commit gets create semantics and will create a new change.
> +
> +$ touch foo && git add . && git commit -m "foo" && git tag A
> +$ touch bar && git add . && git commit -m "bar" && git tag B
> +$ touch baz && git add . && git commit -m "baz" && git tag C
> +
> +This produces the following commits:
> +A(tree=[foo])
> +B(tree=[foo, bar], parent=A)
> +C(tree=[foo, bar, baz], parent=B)
> +
> +...along with three changes:
> +change/foo = A
> +change/bar = B
> +change/baz = C
> +
> +Running commit --amend does the following:
> +$ git checkout B
> +$ touch zoom && git add . && git commit --amend -m "baz and zoom"
> +$ git tag D
> +
> +Commits:
> +A(tree=[foo])
> +B(tree=[foo, bar], parent=A)
> +C(tree=[foo, bar, baz], parent=B)
> +D(tree=[foo, bar, zoom], parent=A)
> +Dmeta(content=D, obsolete=B)
> +
> +Changes:
> +change/foo = A
> +change/bar = Dmeta
> +change/baz = C
> +
> +Merge
> +-----
> +Merge gets create, modify, or copy semantics based on what is being merged and
> +the options being used.
> +
> +The --squash version of merge gets copy semantics (it produces a new change that
> +is marked as a copy of all the original changes that were squashed into it).
> +
> +The “modify” version of merge replaces both of the original commits with the
> +resulting merge commit. This is one of the standard mechanisms for resolving
> +divergence. The parents of the merge commit are the parents of the two commits
> +being merged. The resulting commit will not be a merge commit if both of the
> +original commits had the same parent or if one was the parent of the other.
> +
> +The “create” version of merge creates a new change pointing to a merge commit
> +that has both original commits as parents. The result is what merge produces now
> +- a new merge commit. However, this version of merge doesn’t directly resolve
> +divergence.
> +
> +To select between these two behaviors, merge gets new “--amend” and “--noamend”
> +options which select between the “create” and “modify” behaviors respectively,
> +with noamend being the default.
> +
> +For example, imagine we created two divergent changes like this:
> +
> +$ touch foo && git add . && git commit -m "foo" && git tag A
> +$ touch bar && git add . && git commit -m "bar" && git tag B
> +$ touch baz && git add . && git commit --amend -m "bar and baz"
> +$ git tag C
> +$ git checkout B
> +$ touch bam && git add . && git commit --amend -m "bar and bam"
> +$ git tag D
> +
> +At this point the commit graph looks like this:
> +
> +A(tree=[foo])
> +B(tree=[bar], parent=A)
> +C(tree=[bar, baz], parent=A)
> +D(tree=[bar, bam], parent=A)
> +Cmeta(content=C, obsoletes=B)
> +Dmeta(content=D, obsoletes=B)
> +
> +There would be three active changes with heads pointing as follows:
> +
> +change/changeA=A
> +change/changeB=Cmeta
> +change/changeB2=Dmeta
> +
> +ChangeB and changeB2 are divergent at this point. Lets consider what happens if
> +perform each type of merge between changeB and changeB2.
> +
> +Merge example: Amend merge
> +One way to resolve divergent changes is to use an amend merge. Recall that HEAD
> +is currently pointing to D at this point.
> +
> +$ git merge --amend change/changeB
> +
> +Here we’ve asked for an amend merge since we’re trying to resolve divergence
> +between two versions of the same change. There are no conflicts so we end up
> +with this:
> +
> +E(tree=[bar, baz, bam], parent=A)
> +Emeta(content=E, obsoletes=[Cmeta, Dmeta])
> +
> +With the following branches:
> +
> +change/changeA=A
> +change/changeB=Emeta
> +change/changeB2=Emeta
> +
> +Notice that the result of the “amend merge” is a replacement for C and D rather
> +than a new commit with C and D as parents (as a normal merge would have
> +produced). The parents of the amend merge are the parents of C and D which - in
> +this case - is just A, so the result is not a merge commit. Also notice that
> +changeB and changeB2 are now aliases for the same change.
> +
> +Merge example: Noamend merge
> +Consider what would have happened if we’d used a noamend merge instead. Recall
> +that HEAD was at D and our branches looked like this:
> +
> +change/changeA=A
> +change/changeB=Cmeta
> +change/changeB2=Dmeta
> +
> +$ git merge --noamend change/changeB
> +
> +That would produce the sort of merge we’d normally expect today:
> +
> +F(tree=[bar, baz, bam], parent=[C, D])
> +
> +And our changes would look like this:
> +change/changeA=A
> +change/changeB=Cmeta
> +change/changeB2=Dmeta
> +change/changeF=F
> +
> +In this case, changeB and changeB2 are still divergent and we’ve created a new
> +change for our merge commit. However, this is just a temporary state. The next
> +time we run the “evolve” command, it will discover the divergence but also
> +discover the merge commit F that resolves it. Evolve will suggest converting F
> +into an amend merge in order to resolve the divergence and will display the
> +command for doing so.
> +
> +Change
> +------
> +The “change” command can be used to list, rename, reset or delete change. It
> +takes arguments similar to the “branch” command.
> +
> +The -l argument lists all local changes that aren’t present in the given branch.
> +If the branch name is omitted, all local changes are listed.
> +
> +The -r argument list all remote changes.
> +
> +The -m argument renames a change, given its old and new name.
> +
> +The -d argument deletes a change. This is one way to resolve divergence.
> +
> +The -n argument renames the current change, or creates a change of the given
> +name for the current commit if no such change exists yet. If given an optional
> +commit hash, the change is created for that commit rather than head. If there
> +are multiple local changes for the same commit and they are all aliases for the
> +same metacommit hash, they are all deleted except the newly-created name. If
> +given the name of a metacommit, the new change points to that metacommit.
> +
> +The --purge argument deletes all obsolete changes and all changes that are
> +present in the given branch. Note that such changes can be recovered from the
> +reflog.
> +
> +Combined with the GC protection that is offered, this is intended to facilitate
> +a workflow that relies on changes instead of branches. Users could choose to
> +work with no local branches and use changes instead - both for mailing list and
> +gerrit workflows.
> +
> +Log
> +---
> +When a commit is shown in git log that is part of a change, it is decorated with
> +extra change information. If it is the head of a change, the name of the change
> +is shown next to the list of branches. If it is obsolete, it is decorated with
> +the word “obsolete”.
> +
> +Obslog
> +------
> +Obslog command lists the change history for the current commit.
> +
> +Rebase
> +------
> +In general the rebase command is treated as a modify command. When a change is
> +rebased, the new commit replaces the original.
> +
> +Rebase --abort is special. Its intent is to restore git to the state it had
> +prior to running rebase. It should move back any changes to point to the refs
> +they had prior to running rebase and delete any new changes that were created as
> +part of the rebase. To achieve this, rebase will save the state of all changes
> +in refs/metas prior to running rebase and will restore the entire namespace
> +after rebase completes (deleting any newly-created changes). Newly-created
> +metacommits are left in place, but will have no effect until garbage collected
> +since metacommits are only used if they are reachable from refs/metas.
> +
> +Other options considered
> +========================
> +We considered several other options for storing the obsolescence graph. This
> +section describes the other options and why they were rejected.
> +
> +Commit header
> +-------------
> +Add an “obsoletes” field to the commit header that points backwards from a
> +commit to the previous commits it obsoletes.
> +
> +Pros:
> +- Very simple
> +- Easy to traverse from a commit to the previous commits it obsoletes.
> +Cons:
> +- Adds a cost to the storage format, even for commits where the change history
> +  is uninteresting.
> +- Unconditionally prevents the change history from being garbage collected.
> +- Always causes the change history to be shared when pushing or pulling changes.
> +
> +Git notes
> +---------
> +Instead of storing obsolescence information in metacommits, the metacommit
> +content could go in a new notes namespace - say refs/notes/metacommit. Each note
> +would contain the list of obsolete and origin parents, and an automerger could
> +be supplied to make it easy to merge the metacommit notes from different remotes.
> +
> +Pros:
> +- Easy to locate all commits obsoleted by a given commit (since there would only
> +  be one metacommit for any given commit).
> +Cons:
> +- Wrong GC behavior (obsolete commits wouldn’t automatically be retained by GC)
> +  unless we introduced a special case for these kinds of notes.
> +- No way to selectively share or pull the metacommits for one specific change.
> +  It would be all-or-nothing, which would be expensive. This could be addressed
> +  by changes to the protocol, but this would be invasive.
> +- Requires custom auto-merging behavior on fetch.
> +
> +Tags
> +----
> +Put the content of the metacommit in a message attached to tag on the
> +replacement commit. This is very similar to the git notes approach and has the
> +same pros and cons.
> +
> +Simple forward references
> +-------------------------
> +Record an edge from an obsolete commit to its replacement in this form:
> +
> +refs/obsoletes/<A>
> +
> +pointing to commit <B> as an indication that B is the replacement for the
> +obsolete commit A.
> +
> +Pros:
> +- Protects <B> from being garbage collected.
> +- Fast lookup for the evolve operation, without additional search structures
> +  (“what is the replacement for <A>?” is very fast).
> +
> +Cons:
> +- Can’t represent divergence (which is a P0 requirement).
> +- Creates lots of refs (which can be inefficient)
> +- Doesn’t provide a way to fetch only refs for a specific change.
> +- The obslog command requires a search of all refs.
> +
> +Complex forward references
> +--------------------------
> +Record an edge from an obsolete commit to its replacement in this form:
> +
> +refs/obsoletes/<change_id>/obs<A>_<B>
> +
> +Pointing to commit <B> as an indication that B is the replacement for obsolete
> +commit A.
> +
> +Pros:
> +- Permits sharing and fetching refs for only a specific change.
> +- Supports divergence
> +- Protects <B> from being garbage collected.
> +
> +Cons:
> +- Creates lots of refs, which is inefficient.
> +- Doesn’t provide a good lookup structure for lookups in either direction.
> +
> +Backward references
> +-------------------
> +Record an edge from a replacement commit to the obsolete one in this form:
> +
> +refs/obsolescences/<B>
> +
> +Cons:
> +- Doesn’t provide a way to resolve divergence (which is a P0 requirement).
> +- Doesn’t protect <B> from being garbage collected (which could be fixed by
> +  combining this with a refs/metas namespace, as in the metacommit variant).
> +
> +Obsolescences file
> +------------------
> +Create a custom file (or files) in .git recording obsolescences.
> +
> +Pros:
> +- Can store exactly the information we want with exactly the performance we want
> +  for all operations. For example, there could be a disk-based hashtable
> +  permitting constant time lookups in either direction.
> +
> +Cons:
> +- Handling GC, pushing, and pulling would all require custom solutions. GC
> +  issues could be addressed with a repository format extension.
> +
> +Squash points
> +-------------
> +We create and update change branches in refs/metas them at the same time we
> +would in the metacommit proposal. However, rather than pointing to a metacommit
> +branch they point to normal commits and are treated as “squash points” - markers
> +for sequences of commits intended to be squashed together on submission.
> +
> +Amends and rebases work differently than they do now. Rather than actually
> +containing the desired state of a commit, they contain a delta from the previous
> +version along with a squash point indicating that the preceding changes are
> +intended to be squashed on submission. Specifically, amends would become new
> +changes and rebases would become merge commits with the old commit and new
> +parent as parents.
> +
> +When the changes are finally submitted, the squashes are executed, producing the
> +final version of the commit.
> +
> +In addition to the squash points, git would maintain a set of “nosquash” tags
> +for commits that were used as ancestors of a change that are not meant to be
> +included in the squash.
> +
> +For example, if we have this commit graph:
> +
> +A(...)
> +B(parent=A)
> +C(parent=B)
> +
> +...and we amend B to produce D, we’d get:
> +
> +A(...)
> +B(parent=A)
> +C(parent=B)
> +D(parent=B)
> +
> +...along with a new change branch indicating D should be squashed with its
> +parents when submitted:
> +
> +change/changeB = D
> +change/changeC = C
> +
> +We’d also create a nosquash tag for A indicating that A shouldn’t be included
> +when changeB is squashed.
> +
> +If a user amends the change again, they’d get:
> +
> +A(...)
> +B(parent=A)
> +C(parent=B)
> +D(parent=B)
> +E(parent=D)
> +
> +change/changeB = E
> +change/changeC = C
> +
> +Pros:
> +- Good GC behavior.
> +- Provides a natural way to share changes (they’re just normal branches).
> +- Merge-base works automatically without special cases.
> +- Rewriting the obslog would be easy using existing git commands.
> +- No new data types needed.
> +Cons:
> +- No way to connect the squashed version of a change to the original, so no way
> +  to automatically clean up old changes. This also means users lose all benefits
> +  of the evolve command if they prematurely squash their commits. This may occur
> +  if a user thinks a change is ready for submission, squashes it, and then later
> +  discovers an additional change to make.
> +- Histories would look very cluttered (users would see all previous edits to
> +  their commit in the commit log, and all previous rebases would show up as
> +  merges). Could be quite hard for users to tell what is going on. (Possible
> +  fix: also implement a new smart log feature that displays the log as though
> +  the squashes had occurred).
> +- Need to change the current behavior of current commands (like amend and
> +  rebase) in ways that will be unexpected to many users.
> -- 
> 2.19.1.930.g4563a0d9d0-goog
> 
>

Ævar Arnfjörð Bjarmason Nov. 15, 2018, 3:36 p.m. UTC | #2

On Thu, Nov 15 2018, sxenos@google.com wrote:

> +Detailed design
> +===============
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p <example_meta_commit>
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git may
> +attach other trees here. For forward-compatibility fsck should ignore such trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.
> +
> +Parent-type
> +-----------
> +The “parent-type” field in the commit header identifies a commit as a
> +meta-commit and indicates the meaning for each of its parents. It is never
> +present for normal commits. It is a list of enum values whose order matches the
> +order of the parents. Possible parent types are:
> +
> +- content: the content parent identifies the commit that this meta-commit is
> +  describing.
> +- obsolete: indicates that this parent is made obsolete by the content parent.
> +- origin: indicates that this parent was generated from the given commit.
> +
> +There must be exactly one content parent for each meta-commit and it is always
> +be the first parent. The content commit will always be a normal commit and not a
> +meta-commit. However, future versions of git may create meta-commits for other
> +meta-commits and the fsck tool must be aware of this for forwards compatibility.
> +
> +A meta-commit can have zero or more obsolete parents. An amend operation creates
> +a single obsolete parent. A merge used to resolve divergence (see divergence,
> +below) will create multiple obsolete parents. A meta-commit may have zero
> +obsolete parents if it describes a cherry-pick or squash merge that copies one
> +or more commits but does not replace them.
> +
> +A meta-commit can have zero or more origin parents. A cherry-pick creates a
> +single origin parent. Certain types of squash merge will create multiple origin
> +parents.
> +
> +An obsolete parent or origin parent may be either a normal commit (indicating
> +the oldest-known version of a change) or another meta-commit (for a change that
> +has already been modified one or more times).

I think it's worth pointing out for those that are rusty on commit
object details (but I checked) is that the reason for it not being:

    tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
    parent aa7ce55545bf2c14bef48db91af1a74e2347539a
    parent-type content
    parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
    parent-type obsolete
    parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
    parent-type origin
    author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
    committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700

Which would be easier to read, is that we're very sensitive to the order
of the first few fields (tree -> parent -> author -> committer) and fsck
will error out if we interjected a new field.

Derrick Stolee Nov. 16, 2018, 9:36 p.m. UTC | #3

On 11/14/2018 7:55 PM, sxenos@google.com wrote:
> From: Stefan Xenos <sxenos@google.com>
>
> This document describes what an obsolescence graph for
> git would look like, the behavior of the evolve command,
> and the changes planned for other commands.

Thanks for putting this together!

> diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt
...
> +Git Obsolescence Graph
> +======================
> +
> +Objective
> +---------
> +Track the edits to a commit over time in an obsolescence graph.

The file name and the title are in a mismatch.

I'd prefer if the title was "Git Evolve Design Document" and this 
opening paragraph
was about the reasons we want a 'git evolve' command. Here is my attempt:

   The proposed 'git evolve' command will help users craft a 
high-quality commit
   history in their topic branches. By working to improve commits one at 
a time,
   then running 'git evolve', users can rewrite recent history with more 
options
   than interactive rebase. The core benefit is that users can pause 
their progress
   and move to other branches before returning to where they left off. 
Users can
   also share progress with others using standard 'push', 'fetch', and 
'format-patch'
   commands.

> +Background
> +----------

Perhaps you can call this "Example"?

> +Imagine you have three dependent changes up for review and you receive feedback
> +that requires editing all three changes. While you're editing one, more feedback
> +arrives on one of the others. What do you do?

"three dependent changes" sounds a bit vague enough to me to possibly 
confuse readers. Perhaps
"three sequential patches"?

> +- Users can view the history of a commit directly (the sequence of amends and
> +  rebases it has undergone, orthogonal to the history of the branch it is on).

"the history of a commit" doesn't semantically work, as a commit is an 
immutable Git object.

Instead, I would try to use the term "patch" to describe a change to the 
codebase, and that
takes the form as a list of commits that are improving on each other 
(but don't actually
have each other in their commit history). This means that the lifetime 
of a patch is described
by the commits that are amended or rebased.

> +- By pushing and pulling the obsolescence graph, users can collaborate more
> +  easily on changes-in-progress. This is better than pushing and pulling the
> +  changes themselves since the obsolescence graph can be used to locate a more
> +  specific merge base, allowing for better merges between different versions of
> +  the same change.

(Making a note so I come back to this. I hope to learn what you mean by 
this "more specific
merge base".)

> +
> +Similar technologies
> +--------------------
> ... It can't handle the case where you have
> +multiple changes sharing the same parent when that parent needs to be rebased

Perhaps this could be made more concrete by describing commit history 
and a specific workflow
change using 'git evolve'.

Suppose we have two topic branches, topic1 and topic2, that point to 
commits A and B,
respectively.Suppose further that A and B have a common parent C with 
parent D. If we rebase
topic1 relativeto D, then we create new commits C' and A' that are newer 
versions of commits
C and A. It would benice to easily update topic2 to be on a new commit 
B' with parent C'.
Currently, a user needs to knowthat C updated to C', and use 'git rebase 
--onto C' C topic2'.
Instead, if we have a marker showing thatC' is an updated version of C, 
'git log topic2'
would show that topic2 can be updated, and the 'gitevolve' command would 
perform the correct
action to make B' with parent C'.

(This paragraph above is an example of "what can happen now is 
complicated and demands that
the user keep some information in their memory" and "the new workflow is 
simpler and helps
users make the right decision". I think we could use more of these at 
the start to sell the
idea.)

> +and won't let you collaborate with others on resolving a complicated interactive
> +rebase.

In the same sentence, we have an even more complicated workflow 
mentioned as an aside. This
could be fleshed out more concretely. It could help describing that the 
current model is for
usersto share "!fixup" commits and then one performs an interactive 
rebase to apply those
fixups inthe correct order. If a user instead shares an amended commit, 
then we are in a
difficult state toapply those changes. The new workflow would be to 
share amended commits
and 'git evolve'inserts the correct amended commits in the right order.

I'm a big proponent of the teaching philosophy of "examples first". It's 
easier to talk
abstractlyafter going through some concrete examples.

>   You can think of rebase -i as a top-down approach and the evolve command
> +as the bottom-up approach to the same problem.

This comparison is important. Perhaps it is more specific to say 
"interactive rebase splits
a plan torewrite history into independent units of work, while evolve 
collects independent
units of workinto a plan to rewrite history."

> +
> +Several patch queue managers have been built on top of git...
> +
> +Replacements (refs/replace) are superficially...

These two paragraphs could be moved lower, under a "Semi-Related Work" 
section,
because they describe things that are a bit similar, but are unable to 
help us solve the
problem at hand.

> +
> +Goals
> +-----
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.

I like the prioritization here.

> +P0. Any commit that may be involved in a future evolve command should not be
> +    garbage collected.

I wonder about the priority here. If we GC'd commit A but still have the 
newer A', I can
either thinkthat

1. We will no longer need to run 'git evolve', or
2. We run 'git evolve' on something that can reach A', but A' already 
contains all the
    informationwe need to produce a "final" commit A''.

I apologize that I'm not able to read the whole thing right now, and I 
will pick up reading
from here again soon. Hopefully the feedback above is constructive in 
the mean time.

Thanks,
-Stolee

Duy Nguyen Nov. 17, 2018, 6:06 a.m. UTC | #4

On Thu, Nov 15, 2018 at 2:00 AM <sxenos@google.com> wrote:
> +Goals
> +-----
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.
> +
> +P0. All commands that modify commits (such as the normal commit --amend or
> +    rebase command) should mark the old commit as being obsolete and replaced by
> +    the new one. No additional commands should be required to keep the
> +    obsolescence graph up-to-date.

I sometimes "modify" a commit by "git reset @^", pick up the changes
then "git commit -c @{1}". I don't think this counts as a typical
modification and is probably hard to detect automatically. But I hope
there's some way for me to tell git "yes this is a modified commit of
that one, record that!".

> +Example usage
> +-------------
> +# First create three dependent changes
> +$ echo foo>bar.txt && git add .
> +$ git commit -m "This is a test"
> +created change metas/this_is_a_test

I guess as an example, how the name metas/this_is_a_test is
constructed does not matter much. But it's probably better to stick
with some sort of id because subject line will change over time and
the original one may become irrelevant. Perhaps we could use the
original commit id as name.

> +$ echo foo2>bar2.txt && git add .
> +$ git commit -m "This is also a test"
> +created change metas/this_is_also_a_test
> +$ echo foo3>bar3.txt && git add .
> +$ git commit -m "More testing"
> +created change metas/more_testing
> +
> +# List all our changes in progress
> +$ git change -l
> +metas/this_is_a_test
> +metas/this_is_also_a_test
> +* metas/more_testing
> +metas/some_change_already_merged_upstream
> +
> +# Now modify the earliest change, using its stable name
> +$ git reset --hard metas/this_is_a_test
> +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> +
> +# Use git-evolve to fix up any dependent changes
> +$ git evolve
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +Done
> +
> +# Use git-obslog to view the history of the this_is_a_test change
> +$ git obslog
> +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> +930219 metas/this_is_a_test@{1} commit: This is a test
> +
> +# Now create an unrelated change
> +$ git reset --hard origin/master
> +$ echo newchange>unrelated.txt && git add .
> +$ git commit -m "Unrelated change"
> +created change metas/unrelated_change
> +
> +# Fetch the latest code from origin/master and use git-evolve
> +# to rebase all dependent changes.
> +$ git fetch origin master
> +$ git evolve origin/master
> +deleting metas/some_change_already_merged_upstream
> +rebasing metas/this_is_a_test onto origin/master
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +rebasing metas/unrelated_change onto origin/master
> +Conflict detected! Resolve it and then use git evolve --continue to resume.
> +
> +# Sort out the conflict
> +$ git mergetool
> +$ git evolve --continue
> +Done
> +
> +# Share the full history of edits for the this_is_a_test change
> +# with a review server
> +$ git push origin metas/this_is_a_test:refs/for/master
> +# Share the lastest commit for “Unrelated change”, without history
> +$ git push origin HEAD:refs/for/master

How do we group changes of a topic together? I think branch-diff could
take advantage of that.

> +Detailed design
> +===============
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p <example_meta_commit>
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.

This feels a bit forced. Could we just organize it like a normal
history? Something like

*
|\
| * last version of the commit
*
|\
| * second last version of the commit
*
|\

Basically all commits will be linked in a new merge history. Real
commits are on the second parent, first parent is to link changes
together. This makes it possible to just use "git log --first-parent
--patch" (or "git log --oneline --graph") to examine the change. More
details (e.g. parent-type) could be stored as normal trailers in the
commit message of these merges.

Junio C Hamano Nov. 17, 2018, 7:36 a.m. UTC | #5

sxenos@google.com writes:

> +Detailed design
> +===============
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p <example_meta_commit>
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git may
> +attach other trees here. For forward-compatibility fsck should ignore such trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.

Am I correct to understand that the reason why a commit object is
(ab|re)used to represent a meta-commit is because by doing so we
would get connectivity (i.e. fetching & pushing would transfer all
the associated objects along) for free, and by not representing it
as a new and different object type, existing implementations can
just pass them along without understanding what they are, and as
long as these are not mixed as parts of the main history of the
project (e.g. when enumerating commits that has aa7ce5 as its
parents, because somebody else obsoleted aa7ce5 and you want to
evolve anything that built on it, you do not want to mistake the
above "meta" commit as a commit that is part of the ordinary history
and rebuild on top of the new version of aa7ce5, which would lead to
a disaster), everything would work just fine?

Perhaps you'd use something like "presence of parent-type header
marks that a commit is a meta-commit and not part of the main
history".

How are these meta commits anchored so that it won't be reclaimed by
repack?  I do not see any "parent" field used to chain them
together, but I do not think we can afford to spend one ref per meta
commit, as refs are not designed to point into each and every object
in the repository.

I have a moderately strong opposition against "origin" thing.  If
aa7ce555 replaces d664309ee, in order for the tool to be able to
"evolve" other histories that build on top of d664309ee, it only
needs the history between aa7ce555 and d664309ee and it would not
matter how aa7ce555 was built relative to its parent.  The user may
have typed/developed it from scratch, the user may have borrowed 70%
of its change from 7e1bbcd while remaining 30% was done from
scratch, or it was a concatenation of the change made in 7e1bbcd and
another commit.  

One half of my point being that we can do _without_ it, and in all
cases, aa7ce555, if leaving the fact that it was derived from
7e1bbcd is so important, can mention that in its log message how it
relates to the "origin" thing.

And the other half is that while I consider the "origin" thing is
unnecessary for the above reasons, having it means we need to not
just transfer the history reading to aa7ce555 and d664309ee (which
are necessary anyway while we have histories to transplant from
d664309ee to aa7ce555) but also have to pull in the history leading
to 7e1bbcd and we cannot discard it.

Stefan Xenos Nov. 17, 2018, 8:30 p.m. UTC | #6

> I am not sure that we necessarily need this to be a graph. I think part
> of the problems with not being able to GC *any* of this is by this
> requirement to have it stored in a graph, rather than having mappings from
> which you could reconstruct any non-GC'ed parts of that graph, if you
> really want.

Sorry, I'm not sure what GC problem you're alluding to here. As far as
I'm aware, this proposal should permit us to GC or retain any subset
of commits that we want. We create a chain of metacommits pointing to
the commits we want to retain, and put a ref in the metas namespace to
cause the chain itself to be retained. If we want to GC a different
subset, we can build a different chain of metacommits and move the ref
(or delete the ref entirely to permit the whole chain to be gc'd).
Could you be more specific about which use-case is problematic?

> Why is this missing most notably `hg evolve`?

Good point. I'll add a brief description and comparison to the doc.

> Also, please do not forget `git imerge`.

Thanks for directing me to this. It looks fantastic! I'm not sure it's
really an alternative to this work, but I could see adding an argument
to "git evolve" that allows you to use imerge for resolving merge
conflicts at any given step.

> Further, I see that this document tries to suggest a proliferation of new commands

It does. Let me explain a bit about the reasoning behind this
breakdown of commands. My main priority was to keep the commands as
consistent with existing git commands as possible. Secondary goals
were:
- Mapping a single intent to a single command where possible makes it
easier to explain what that command does.
- Having lots of simpler commands as opposed to a few complex commands
makes them easier to type.
- Command names are more descriptive than lettered arguments.

Git already has a "log" and "reflog" command for displaying two
different types of log, so putting "obslog" on its own command makes
it consistent with the existing logs, easier to type, and keeps the
command simple.

The "evolve" command updates changes to give them up-to-date parents.
This is a new type of user intent that didn't exist previously in git,
so putting it on its own command keeps things simpler for users. The
relationship between the evolve and change commands is a lot like the
the relationship between the rebase command and the branch commands.
They could technically be combined into one command but I'm not sure
this would help with usability.

The "change" command combines many user intents (create a change,
rename a change, delete a change, etc.) If I were to design it from
scratch, I'd prefer to have all of these things on separate commands.
However, since changes are very similar to branches and users are
presumably already familiar with the branch command, I intentionally
made the change command as close as possible to the branch command -
using the same arguments for the same purpose. In this case, I
sacrificed the single-intent and simple commands goals in order to
retain consistency.

Anyway, that was my reasoning behind the selection of commands. Of
course, I'd welcome feedback - a good UX is the one that was built by
listening to feedback from its intended users. Personally, I don't
consider a proliferation of new commands to be inherently bad (or
inherently good, really). Is there a reason new commands should be
avoided?

Some other alternatives to consider:

- We could turn "obslog" into an extra option on the "log" command,
but that would be inconsistent with reflog and would complicate the
already-complex log command.
- If we were to combine "evolve" with another command, "git rebase
--evolve" would probably be the best candidate. However, this is
longer to type and I tend to prefer lots of simple commands over a few
complex ones. Also, the evolve command will get additional options in
the future (to enable stuff like amend-over-cherry-pick, various
automatic resolution strategies for divergence, etc.)... and putting
it on rebase would mean we'd end up with a lot of extra arguments
whose doc says "this argument is only used if you're also using
--evolve".
- We could break the "change" command into a bunch of simpler ones
"lschange", "mkchange", "rmchange", "mvchange", etc. I actually like
this a lot, but this would make it diverge from the "branch" command
so I'm not sure we should do it unless enough of us feel the same way.
- We could combine the "change" command with the "branch" command. The
branch command could look for the "metas" prefix to determine whether
its argument is a branch or a change -- or it could just search one
namespace followed by the other. This would make for fewer commands,
but I'm concerned it may create confusion by making changes resemble
branches too closely. If you're not already familiar with the
distinction, you may see unexpected behavior when the "branch" you
think you're manipulating turns out to be a change.

  - Stefan

On Thu, Nov 15, 2018 at 4:52 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Stefan,
>
> On Wed, 14 Nov 2018, sxenos@google.com wrote:
>
> > From: Stefan Xenos <sxenos@google.com>
> >
> > This document describes what an obsolescence graph for
> > git would look like, the behavior of the evolve command,
> > and the changes planned for other commands.
>
> Thanks, this is a good discussion starter.
>
> > +Objective
> > +---------
> > +Track the edits to a commit over time in an obsolescence graph.
>
> I am not sure that we necessarily need this to be a graph. I think part of
> the problems with not being able to GC *any* of this is by this
> requirement to have it stored in a graph, rather than having mappings from
> which you could reconstruct any non-GC'ed parts of that graph, if you
> really want.
>
> > +Background
> > +----------
> > +Imagine you have three dependent changes up for review and you receive feedback
> > +that requires editing all three changes. While you're editing one, more feedback
> > +arrives on one of the others. What do you do?
> > +
> > +The evolve command is a convenient way to work with chains of commits that are
> > +under review. Whenever you rebase or amend a commit, the repository remembers
> > +that the old commit is obsolete and has been replaced by the new one. Then, at
> > +some point in the future, you can run "git evolve" and the correct sequence of
> > +rebases will occur in the correct order such that no commit has an obsolete
> > +parent.
> > +
> > +Part of making the "evolve" command work involves tracking the edits to a commit
> > +over time, which is why we need an obsolescence graph. However, the obsolescence
> > +graph will also bring other benefits:
> > +
> > +- Users can view the history of a commit directly (the sequence of amends and
> > +  rebases it has undergone, orthogonal to the history of the branch it is on).
> > +- It will be possible to quickly locate and list all the changes the user
> > +  currently has in progress.
> > +- It can be used as part of other high-level commands that combine or split
> > +  changes.
> > +- It can be used to decorate commits (in git log, gitk, etc) that are either
> > +  obsolete or are the tip of a work in progress.
> > +- By pushing and pulling the obsolescence graph, users can collaborate more
> > +  easily on changes-in-progress. This is better than pushing and pulling the
> > +  changes themselves since the obsolescence graph can be used to locate a more
> > +  specific merge base, allowing for better merges between different versions of
> > +  the same change.
> > +- It could be used to correctly rebase local changes and other local branches
> > +  after running git-filter-branch.
> > +- It can replace the change-id footer used by gerrit.
>
> Okay.
>
> > +Similar technologies
> > +--------------------
> > +There are some other technologies that address the same end-user problem.
> > +
> > +Rebase -i can be used to solve the same problem, but users can't easily switch
> > +tasks midway through an interactive rebase or have more than one interactive
> > +rebase going on at the same time. It can't handle the case where you have
> > +multiple changes sharing the same parent when that parent needs to be rebased
> > +and won't let you collaborate with others on resolving a complicated interactive
> > +rebase. You can think of rebase -i as a top-down approach and the evolve command
> > +as the bottom-up approach to the same problem.
> > +
> > +Several patch queue managers have been built on top of git (such as topgit,
> > +stgit, and quilt). They address the same user need. However they also rely on
> > +state managed outside git that needs to be kept in sync. Such state can be
> > +easily damaged when running a git native command that is unaware of the patch
> > +queue. They also typically require an explicit initialization step to be done by
> > +the user which creates workflow problems.
> > +
> > +Replacements (refs/replace) are superficially similar to obsolescences in that
> > +they describe that one commit should be replaced by another. However, they
> > +differ in both how they are created and how they are intended to be used.
> > +Obsolescences are created automatically by the commands a user runs, and they
> > +describe the user’s intent to perform a future rebase. Obsolete commits still
> > +appear in branches, logs, etc like normal commits (possibly with an extra
> > +decoration that marks them as obsolete). Replacements are typically created
> > +explicitly by the user, they are meant to be kept around for a long time, and
> > +they describe a replacement to be applied at read-time rather than as the input
> > +to a future operation. When a replaced commit is queried, it is typically hidden
> > +and swapped out with its replacement as though the replacement has already
> > +occurred.
>
> Why is this missing most notably `hg evolve`? Also, there should be *at
> least* a brief introduction how `hg evolve` works. They do have the
> benefit of real-world testing, and probably encountered problems and came
> up with solutions, and we would be remiss if we did not learn from them.
>
> Also, please do not forget `git imerge`.
>
> Further, I see that this document tries to suggest a proliferation of new
> commands (`git change`, `git evolve`, `git obslog` and whatever I glanced
> over). This smells a little bit like it wants to be condensed into a
> single-purpose command, maybe `evolve`, maybe something better if you can
> think of anything.
>
> I guess I will have to stop now and read up on how `hg evolve` works. It
> is a it of a pity that that was not described in this document, first
> thing, as it forces everybody who is interested in this patch to duplicate
> my effort and also go hunt for information about Mercurial.
>
> Ciao,
> Johannes
>
> > +Goals
> > +-----
> > +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> > +attempted unless they interfere with goals marked with Pn-1.
> > +
> > +P0. All commands that modify commits (such as the normal commit --amend or
> > +    rebase command) should mark the old commit as being obsolete and replaced by
> > +    the new one. No additional commands should be required to keep the
> > +    obsolescence graph up-to-date.
> > +P0. Any commit that may be involved in a future evolve command should not be
> > +    garbage collected. Specifically:
> > +    - Commits that obsolete another should not be garbage collected until
> > +      user-specified conditions have occurred and the change has expired from
> > +      the reflog. User specified conditions for removing changes include:
> > +      - The user explicitly deleted the change.
> > +      - The change was merged into a specific branch.
> > +    - Commits that have been obsoleted by another should not be garbage
> > +      collected if any of their replacements are still being retained.
> > +P0. A commit can be obsoleted by more than one replacement (called divergence).
> > +P0. Must be able to resolve divergence (convergence).
> > +P1. Users should be able to share chains of obsolete changes in order to
> > +    collaborate on WIP changes.
> > +P2. Such sharing should be at the user’s option. That is, it should be possible
> > +    to directly share a change without also sharing the file states or commit
> > +    comments from the obsolete changes that led up to it, and the choice not to
> > +    share those commits should not require changing any commit hashes.
> > +P2. It should be possible to discard part or all of the obsolescence graph
> > +    without discarding the commits themselves that are already present in
> > +    branches and the reflog.
> > +
> > +
> > +Overview
> > +========
> > +We introduce the notion of “meta-commits” which describe how one commit was
> > +created from other commits. A branch of meta-commits is known as a change.
> > +Changes are created and updated automatically whenever a user runs a command
> > +that creates a commit. They are used for locating obsolete commits, providing a
> > +list of a user’s unsubmitted work in progress, and providing a stable name for
> > +each unsubmitted change.
> > +
> > +Users can exchange edit histories by pushing and fetching changes.
> > +
> > +New commands will be introduced for manipulating changes and resolving
> > +divergence between them. Existing commands that create commits will be updated
> > +to modify the meta-commit graph and create changes where necessary.
> > +
> > +Example usage
> > +-------------
> > +# First create three dependent changes
> > +$ echo foo>bar.txt && git add .
> > +$ git commit -m "This is a test"
> > +created change metas/this_is_a_test
> > +$ echo foo2>bar2.txt && git add .
> > +$ git commit -m "This is also a test"
> > +created change metas/this_is_also_a_test
> > +$ echo foo3>bar3.txt && git add .
> > +$ git commit -m "More testing"
> > +created change metas/more_testing
> > +
> > +# List all our changes in progress
> > +$ git change -l
> > +metas/this_is_a_test
> > +metas/this_is_also_a_test
> > +* metas/more_testing
> > +metas/some_change_already_merged_upstream
> > +
> > +# Now modify the earliest change, using its stable name
> > +$ git reset --hard metas/this_is_a_test
> > +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> > +
> > +# Use git-evolve to fix up any dependent changes
> > +$ git evolve
> > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> > +rebasing metas/more_testing onto metas/this_is_also_a_test
> > +Done
> > +
> > +# Use git-obslog to view the history of the this_is_a_test change
> > +$ git obslog
> > +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> > +930219 metas/this_is_a_test@{1} commit: This is a test
> > +
> > +# Now create an unrelated change
> > +$ git reset --hard origin/master
> > +$ echo newchange>unrelated.txt && git add .
> > +$ git commit -m "Unrelated change"
> > +created change metas/unrelated_change
> > +
> > +# Fetch the latest code from origin/master and use git-evolve
> > +# to rebase all dependent changes.
> > +$ git fetch origin master
> > +$ git evolve origin/master
> > +deleting metas/some_change_already_merged_upstream
> > +rebasing metas/this_is_a_test onto origin/master
> > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> > +rebasing metas/more_testing onto metas/this_is_also_a_test
> > +rebasing metas/unrelated_change onto origin/master
> > +Conflict detected! Resolve it and then use git evolve --continue to resume.
> > +
> > +# Sort out the conflict
> > +$ git mergetool
> > +$ git evolve --continue
> > +Done
> > +
> > +# Share the full history of edits for the this_is_a_test change
> > +# with a review server
> > +$ git push origin metas/this_is_a_test:refs/for/master
> > +# Share the lastest commit for “Unrelated change”, without history
> > +$ git push origin HEAD:refs/for/master
> > +
> > +Detailed design
> > +===============
> > +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> > +a specially-formatted merge commit that describes how one commit was created
> > +from others.
> > +
> > +Meta-commits look like this:
> > +
> > +$ git cat-file -p <example_meta_commit>
> > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> > +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> > +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> > +parent-type content
> > +parent-type obsolete
> > +parent-type origin
> > +
> > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> > +cherry-picking commit 7e1bbcd3”.
> > +
> > +The tree for meta-commits is always the empty tree whose hash matches
> > +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git may
> > +attach other trees here. For forward-compatibility fsck should ignore such trees
> > +if found on future repository versions. Similarly, current versions of git
> > +should always fill in an empty commit comment and tools like fsck should ignore
> > +the content of the commit comment if present in a future repository version.
> > +This will allow future versions of git to add metadata to the meta-commit
> > +comments or tree without breaking forwards compatibility.
> > +
> > +Parent-type
> > +-----------
> > +The “parent-type” field in the commit header identifies a commit as a
> > +meta-commit and indicates the meaning for each of its parents. It is never
> > +present for normal commits. It is a list of enum values whose order matches the
> > +order of the parents. Possible parent types are:
> > +
> > +- content: the content parent identifies the commit that this meta-commit is
> > +  describing.
> > +- obsolete: indicates that this parent is made obsolete by the content parent.
> > +- origin: indicates that this parent was generated from the given commit.
> > +
> > +There must be exactly one content parent for each meta-commit and it is always
> > +be the first parent. The content commit will always be a normal commit and not a
> > +meta-commit. However, future versions of git may create meta-commits for other
> > +meta-commits and the fsck tool must be aware of this for forwards compatibility.
> > +
> > +A meta-commit can have zero or more obsolete parents. An amend operation creates
> > +a single obsolete parent. A merge used to resolve divergence (see divergence,
> > +below) will create multiple obsolete parents. A meta-commit may have zero
> > +obsolete parents if it describes a cherry-pick or squash merge that copies one
> > +or more commits but does not replace them.
> > +
> > +A meta-commit can have zero or more origin parents. A cherry-pick creates a
> > +single origin parent. Certain types of squash merge will create multiple origin
> > +parents.
> > +
> > +An obsolete parent or origin parent may be either a normal commit (indicating
> > +the oldest-known version of a change) or another meta-commit (for a change that
> > +has already been modified one or more times).
> > +
> > +Changes
> > +-------
> > +A branch of meta-commits describes how a commit was produced and what previous
> > +commits it is based on. It is also an identifier for a thing the user is
> > +currently working on. We refer to such a meta-branch as a change.
> > +
> > +Local changes are stored in the new refs/metas namespace. Remote changes are
> > +stored in the refs/remotemetas/<remotename> namespace.
> > +
> > +The list of changes in refs/metas is more than just a mechanism for the evolve
> > +command to locate obsolete commits. It is also a convenient list of all of a
> > +user’s work in progress and their current state - a list of things they’re
> > +likely to want to come back to.
> > +
> > +Strictly speaking, it is the presence of the branch in the refs/metas namespace
> > +that marks a branch as being a change, not the fact that it points to a
> > +metacommit. Metacommits are only created when a commit is amended or rebased, so
> > +in the case where a change points to a commit that has never been modified, the
> > +change points to that initial commit rather than a metacommit.
> > +
> > +Obsolescence
> > +------------
> > +A commit is considered obsolete if it is reachable from the “replaces” edges
> > +anywhere in the history of a change and it isn’t the head of that change.
> > +Commits may be the content for 0 or more meta-commits. If the same commit
> > +appears in multiple changes, it is not obsolete if it is the head of any of
> > +those changes.
> > +
> > +Divergence
> > +----------
> > +From the user’s perspective, two changes are divergent if they both ask for
> > +different replacements to the same commit. More precisely, a target commit is
> > +considered divergent if there is more than one commit at the head of a change in
> > +refs/metas that leads to the target commit via an unbroken chain of “obsolete”
> > +edges.
> > +
> > +Much like a merge conflict, divergence is a situation that requires user
> > +intervention to resolve. The evolve command will stop when it encounters
> > +divergence and prompt the user to resolve the problem. Users can solve the
> > +problem in several ways:
> > +
> > +- Discard one of the changes (by deleting its change branch).
> > +- Merge the two changes (producing a single change branch).
> > +- Copy one of the changes (keep both commits, but one of them gets a new
> > +  metacommit appended to its history that is connected to its predecessor via an
> > +  origin edge rather than an obsolete edge. That new change no longer obsoletes
> > +  the original.)
> > +
> > +Obsolescence across cherry-picks
> > +--------------------------------
> > +By default the evolve command will treat cherry-picks and squash merges as being
> > +completely separate from the original. Further amendments to the original commit
> > +will have no effect on the cherry-picked copy. However, this behavior may not be
> > +desirable in all circumstances.
> > +
> > +The evolve command may at some point support an option to look for cases where
> > +the source of a cherry-pick or squash merge has itself been amended, and
> > +automatically apply that same change to the cherry-picked copy. In such cases,
> > +it would traverse origin edges rather than ignoring them, and would treat a
> > +commit with origin edges as being obsolete if any of its origins were obsolete.
> > +
> > +Garbage collection
> > +------------------
> > +For GC purposes, meta-commits are normal commits. Just as a commit causes its
> > +parents and tree to be retained, a meta-commit also causes its parents to be
> > +retained.
> > +
> > +Change creation
> > +---------------
> > +Changes are created automatically whenever the user runs a command like “commit”
> > +that has the semantics of creating a new change. They also move forward
> > +automatically even if they’re not checked out. For example, whenever the user
> > +runs a command like “commit --amend” that modifies a commit, all branches in
> > +refs/metas that pointed to the old commit move forward to point to its
> > +replacement instead. This also happens when the user is working from a detached
> > +head.
> > +
> > +This does not mean that every commit has a corresponding change. By default,
> > +changes only exist for recent locally-created commits. Users may explicitly pull
> > +changes from other users or keep their changes around for a long time, but
> > +either behavior requires a user to opt-in. Code review systems like gerrit may
> > +also choose to keep changes around forever.
> > +
> > +Note that the changes in refs/metas serve a dual function as both a way to
> > +identify obsolete changes and as a way for the user to keep track of their work
> > +in progress. If we were only concerned with identifying obsolete changes, it
> > +would be sufficient to create the change branch lazily the first time a commit
> > +is obsoleted. Addressing the second use - of refs/metas as a mechanism for
> > +keeping track of work in progress - is the reason for eagerly creating the
> > +change on first commit.
> > +
> > +Change naming
> > +-------------
> > +When a change is first created, the only requirement for its name is that it
> > +must be unique. Good names would also serve as useful mnemonics and be easy to
> > +type. For example, a short word from the commit message containing no numbers or
> > +special characters and that shows up with low frequency in other commit messages
> > +would make a good choice.
> > +
> > +Different users may prefer different heuristics for their change names. For this
> > +reason a new hook will be introduced to compute change names. Git will invoke
> > +the hook for all newly-created changes and will append a numeric suffix if the
> > +name isn’t unique. The default heuristics are not specified by this proposal and
> > +may change during implementation.
> > +
> > +Change deletion
> > +---------------
> > +Changes are normally only interesting to a user while a commit is still in
> > +development and under review. Once the commit has submitted wherever it is
> > +going, its change can be discarded.
> > +
> > +The normal way of deleting changes makes this easy to do - changes are deleted
> > +by the evolve command when it detects that the change is present in an upstream
> > +branch. It does this in two ways: if the latest commit in a change either shows
> > +up in the branch history or the change becomes empty after a rebase, it is
> > +considered merged and the change is discarded. In this context, an “upstream
> > +branch” is any branch passed in as the upstream argument of the evolve command.
> > +
> > +In case this sometimes deletes a useful change, such automatic deletions are
> > +recorded in the reflog allowing them to be easily recovered.
> > +
> > +Sharing changes
> > +---------------
> > +Change histories are shared by pushing or fetching meta-commits and change
> > +branches. This provides users with a lot of control of what to share and
> > +repository implementations with control over what to retain.
> > +
> > +Users that only want to share the content of a commit can do so by pushing the
> > +commit itself as they currently would. Users that want to share an edit history
> > +for the commit can push its change, which would point to a meta-commit rather
> > +than the commit itself if there is any history to share. Note that multiple
> > +changes can refer to the same commits, so it’s possible to construct and push a
> > +different history for the same commit in order to remove sensitive or irrelevant
> > +intermediate states.
> > +
> > +Imagine the user is working on a change “mychange” that is currently the latest
> > +commit on master, they have two ways to share it:
> > +
> > +# User shares just a commit without its history
> > +> git push origin master
> > +
> > +# User shares the full history of the commit to a review system
> > +> git push origin change/mychange:refs/for/master
> > +
> > +# User fetches a collaborator’s modifications to their change
> > +> git fetch remotename change/mychange
> > +# Which updates the ref remotechange/remotename/mychange
> > +
> > +This will cause more intermediate states to be shared with the server than would
> > +have been shared previously. A review system like gerrit would need to keep
> > +track of which states had been explicitly pushed versus other intermediate
> > +states in order to de-emphasize (or hide) the extra intermediate states from the
> > +user interface.
> > +
> > +Merge-base
> > +----------
> > +Merge-base will be changed to search the meta-commit graph for common ancestors
> > +as well as the commit graph, and will generally prefer results from the
> > +meta-commit graph over the commit graph. Merge-base will consider meta-commits
> > +from all changes, and will traverse both origin and obsolete edges.
> > +
> > +The reason for this is that - when merging two versions of the same commit
> > +together - an earlier version of that same commit will usually be much more
> > +similar than their common parent. This should make the workflow of collaborating
> > +on unsubmitted patches as convenient as the workflow for collaborating in a
> > +topic branch by eliminating repeated merges.
> > +
> > +User interface
> > +--------------
> > +All git porcelain commands that create commits are classified as having one of
> > +four behaviors: modify, create, copy, or import. These behaviors are discussed
> > +in more detail below.
> > +
> > +Modify commands
> > +---------------
> > +Modification commands (commit --amend, rebase) will mark the old commit as
> > +obsolete by creating a new meta-commit that references the old one as an
> > +obsolete parent. In the event that multiple changes point to the same commit,
> > +this is done independently for every such change.
> > +
> > +More specifically, modifications work like this:
> > +
> > +1. Locate all existing changes for which the old commit is the content for the
> > +   head of the change branch. If no such branch exists, create one that points
> > +   to the old commit. Changes that include this commit in their history but not
> > +   at their head are explicitly not included.
> > +2. For every such change, create a new meta-commit that references the new
> > +   commit as its content and references the old head of the change as an
> > +   obsolete parent.
> > +3. Move the change branch forward to point to the new meta-commit.
> > +
> > +Copy commands
> > +-------------
> > +Copy commands (cherry-pick, merge --squash) create a new meta-commit that
> > +references the old commits as origin parents. Besides the fact that the new
> > +parents are tagged differently, copy commands work the same way as modify
> > +commands.
> > +
> > +Create commands
> > +---------------
> > +Creation commands (commit, merge) create a new commit and a new change that
> > +points to that commit. The do not create any meta-commits.
> > +
> > +Import commands
> > +---------------
> > +Import commands (fetch, pull) do not create any new meta-commits or changes
> > +unless that is specifically what they are importing. For example, the fetch
> > +command would update remotechange/origin/change35 and fetch all referenced
> > +meta-commits if asked to do so directly, but it wouldn’t create any changes or
> > +meta-commits for commits discovered on the master branch when running “git fetch
> > +origin master”.
> > +
> > +Other commands
> > +--------------
> > +Some commands don’t fit cleanly into one of the above categories.
> > +
> > +Semantically, filter-branch should be treated as a modify command, but doing so
> > +is likely to create a lot of irrelevant clutter in the changes namespace and the
> > +large number of extra change refs may introduce performance problems. We
> > +recommend treating filter-branch as an import command initially, but making it
> > +behave more like a modify command in future follow-up work. One possible
> > +solution may be to treat commits that are part of existing changes as being
> > +modified but to avoid creating changes for other rewritten changes.
> > +
> > +Once the evolve command can handle obsolescence across cherry-picks, such
> > +cherry-picks will result in a hybrid move-and-copy operation. It will create
> > +cherry-picks that replace other cherry-picks, which will have both origin edges
> > +(pointing to the new source commit being picked) and obsolete edges (pointing to
> > +the previous cherry-pick being replaced).
> > +
> > +Evolve
> > +------
> > +The evolve command performs the correct sequence of rebases such that no change
> > +has an obsolete parent. The syntax looks like this:
> > +
> > +git evolve [--abort][--continue][--quit] [upstream…]
> > +
> > +It takes an optional list of upstream branches. All changes whose parent shows
> > +up in the history of one of the upstream branches will be rebased onto the
> > +upstream branch before resolving obsolete parents.
> > +
> > +Any change whose latest state is found in an upstream branch (or that ends up
> > +empty after rebase) will be deleted. This is the normal mechanism for deleting
> > +changes. Changes are created automatically on the first commit, and are deleted
> > +automatically when evolve determines that they’ve been merged upstream.
> > +
> > +Orphan commits are commits with obsolete parents. The evolve command then
> > +repeatedly rebases orphan commits with non-orphan parents until there are either
> > +no orphan commits left, a merge conflict is discovered, or a divergent parent is
> > +discovered.
> > +
> > +The --abort option returns all changes to the state they were in prior to
> > +invoking evolve, and the --quit option terminates the current evolution without
> > +changing the current state.
> > +
> > +Checkout
> > +--------
> > +Running checkout on a change by name has the same effect as checking out a
> > +detached head pointing to the latest commit on that change-branch. There is no
> > +need to ever have HEAD point to a change since changes always move forward when
> > +necessary, no matter what branch the user has checked out
> > +
> > +Meta-commits themselves cannot be checked out by their hash.
> > +
> > +Reset
> > +-----
> > +Resetting a branch to a change by name is the same as resetting to the commit at
> > +that change’s head.
> > +
> > +Commit
> > +------
> > +Commit --amend gets modify semantics and will move existing changes forward. The
> > +normal form of commit gets create semantics and will create a new change.
> > +
> > +$ touch foo && git add . && git commit -m "foo" && git tag A
> > +$ touch bar && git add . && git commit -m "bar" && git tag B
> > +$ touch baz && git add . && git commit -m "baz" && git tag C
> > +
> > +This produces the following commits:
> > +A(tree=[foo])
> > +B(tree=[foo, bar], parent=A)
> > +C(tree=[foo, bar, baz], parent=B)
> > +
> > +...along with three changes:
> > +change/foo = A
> > +change/bar = B
> > +change/baz = C
> > +
> > +Running commit --amend does the following:
> > +$ git checkout B
> > +$ touch zoom && git add . && git commit --amend -m "baz and zoom"
> > +$ git tag D
> > +
> > +Commits:
> > +A(tree=[foo])
> > +B(tree=[foo, bar], parent=A)
> > +C(tree=[foo, bar, baz], parent=B)
> > +D(tree=[foo, bar, zoom], parent=A)
> > +Dmeta(content=D, obsolete=B)
> > +
> > +Changes:
> > +change/foo = A
> > +change/bar = Dmeta
> > +change/baz = C
> > +
> > +Merge
> > +-----
> > +Merge gets create, modify, or copy semantics based on what is being merged and
> > +the options being used.
> > +
> > +The --squash version of merge gets copy semantics (it produces a new change that
> > +is marked as a copy of all the original changes that were squashed into it).
> > +
> > +The “modify” version of merge replaces both of the original commits with the
> > +resulting merge commit. This is one of the standard mechanisms for resolving
> > +divergence. The parents of the merge commit are the parents of the two commits
> > +being merged. The resulting commit will not be a merge commit if both of the
> > +original commits had the same parent or if one was the parent of the other.
> > +
> > +The “create” version of merge creates a new change pointing to a merge commit
> > +that has both original commits as parents. The result is what merge produces now
> > +- a new merge commit. However, this version of merge doesn’t directly resolve
> > +divergence.
> > +
> > +To select between these two behaviors, merge gets new “--amend” and “--noamend”
> > +options which select between the “create” and “modify” behaviors respectively,
> > +with noamend being the default.
> > +
> > +For example, imagine we created two divergent changes like this:
> > +
> > +$ touch foo && git add . && git commit -m "foo" && git tag A
> > +$ touch bar && git add . && git commit -m "bar" && git tag B
> > +$ touch baz && git add . && git commit --amend -m "bar and baz"
> > +$ git tag C
> > +$ git checkout B
> > +$ touch bam && git add . && git commit --amend -m "bar and bam"
> > +$ git tag D
> > +
> > +At this point the commit graph looks like this:
> > +
> > +A(tree=[foo])
> > +B(tree=[bar], parent=A)
> > +C(tree=[bar, baz], parent=A)
> > +D(tree=[bar, bam], parent=A)
> > +Cmeta(content=C, obsoletes=B)
> > +Dmeta(content=D, obsoletes=B)
> > +
> > +There would be three active changes with heads pointing as follows:
> > +
> > +change/changeA=A
> > +change/changeB=Cmeta
> > +change/changeB2=Dmeta
> > +
> > +ChangeB and changeB2 are divergent at this point. Lets consider what happens if
> > +perform each type of merge between changeB and changeB2.
> > +
> > +Merge example: Amend merge
> > +One way to resolve divergent changes is to use an amend merge. Recall that HEAD
> > +is currently pointing to D at this point.
> > +
> > +$ git merge --amend change/changeB
> > +
> > +Here we’ve asked for an amend merge since we’re trying to resolve divergence
> > +between two versions of the same change. There are no conflicts so we end up
> > +with this:
> > +
> > +E(tree=[bar, baz, bam], parent=A)
> > +Emeta(content=E, obsoletes=[Cmeta, Dmeta])
> > +
> > +With the following branches:
> > +
> > +change/changeA=A
> > +change/changeB=Emeta
> > +change/changeB2=Emeta
> > +
> > +Notice that the result of the “amend merge” is a replacement for C and D rather
> > +than a new commit with C and D as parents (as a normal merge would have
> > +produced). The parents of the amend merge are the parents of C and D which - in
> > +this case - is just A, so the result is not a merge commit. Also notice that
> > +changeB and changeB2 are now aliases for the same change.
> > +
> > +Merge example: Noamend merge
> > +Consider what would have happened if we’d used a noamend merge instead. Recall
> > +that HEAD was at D and our branches looked like this:
> > +
> > +change/changeA=A
> > +change/changeB=Cmeta
> > +change/changeB2=Dmeta
> > +
> > +$ git merge --noamend change/changeB
> > +
> > +That would produce the sort of merge we’d normally expect today:
> > +
> > +F(tree=[bar, baz, bam], parent=[C, D])
> > +
> > +And our changes would look like this:
> > +change/changeA=A
> > +change/changeB=Cmeta
> > +change/changeB2=Dmeta
> > +change/changeF=F
> > +
> > +In this case, changeB and changeB2 are still divergent and we’ve created a new
> > +change for our merge commit. However, this is just a temporary state. The next
> > +time we run the “evolve” command, it will discover the divergence but also
> > +discover the merge commit F that resolves it. Evolve will suggest converting F
> > +into an amend merge in order to resolve the divergence and will display the
> > +command for doing so.
> > +
> > +Change
> > +------
> > +The “change” command can be used to list, rename, reset or delete change. It
> > +takes arguments similar to the “branch” command.
> > +
> > +The -l argument lists all local changes that aren’t present in the given branch.
> > +If the branch name is omitted, all local changes are listed.
> > +
> > +The -r argument list all remote changes.
> > +
> > +The -m argument renames a change, given its old and new name.
> > +
> > +The -d argument deletes a change. This is one way to resolve divergence.
> > +
> > +The -n argument renames the current change, or creates a change of the given
> > +name for the current commit if no such change exists yet. If given an optional
> > +commit hash, the change is created for that commit rather than head. If there
> > +are multiple local changes for the same commit and they are all aliases for the
> > +same metacommit hash, they are all deleted except the newly-created name. If
> > +given the name of a metacommit, the new change points to that metacommit.
> > +
> > +The --purge argument deletes all obsolete changes and all changes that are
> > +present in the given branch. Note that such changes can be recovered from the
> > +reflog.
> > +
> > +Combined with the GC protection that is offered, this is intended to facilitate
> > +a workflow that relies on changes instead of branches. Users could choose to
> > +work with no local branches and use changes instead - both for mailing list and
> > +gerrit workflows.
> > +
> > +Log
> > +---
> > +When a commit is shown in git log that is part of a change, it is decorated with
> > +extra change information. If it is the head of a change, the name of the change
> > +is shown next to the list of branches. If it is obsolete, it is decorated with
> > +the word “obsolete”.
> > +
> > +Obslog
> > +------
> > +Obslog command lists the change history for the current commit.
> > +
> > +Rebase
> > +------
> > +In general the rebase command is treated as a modify command. When a change is
> > +rebased, the new commit replaces the original.
> > +
> > +Rebase --abort is special. Its intent is to restore git to the state it had
> > +prior to running rebase. It should move back any changes to point to the refs
> > +they had prior to running rebase and delete any new changes that were created as
> > +part of the rebase. To achieve this, rebase will save the state of all changes
> > +in refs/metas prior to running rebase and will restore the entire namespace
> > +after rebase completes (deleting any newly-created changes). Newly-created
> > +metacommits are left in place, but will have no effect until garbage collected
> > +since metacommits are only used if they are reachable from refs/metas.
> > +
> > +Other options considered
> > +========================
> > +We considered several other options for storing the obsolescence graph. This
> > +section describes the other options and why they were rejected.
> > +
> > +Commit header
> > +-------------
> > +Add an “obsoletes” field to the commit header that points backwards from a
> > +commit to the previous commits it obsoletes.
> > +
> > +Pros:
> > +- Very simple
> > +- Easy to traverse from a commit to the previous commits it obsoletes.
> > +Cons:
> > +- Adds a cost to the storage format, even for commits where the change history
> > +  is uninteresting.
> > +- Unconditionally prevents the change history from being garbage collected.
> > +- Always causes the change history to be shared when pushing or pulling changes.
> > +
> > +Git notes
> > +---------
> > +Instead of storing obsolescence information in metacommits, the metacommit
> > +content could go in a new notes namespace - say refs/notes/metacommit. Each note
> > +would contain the list of obsolete and origin parents, and an automerger could
> > +be supplied to make it easy to merge the metacommit notes from different remotes.
> > +
> > +Pros:
> > +- Easy to locate all commits obsoleted by a given commit (since there would only
> > +  be one metacommit for any given commit).
> > +Cons:
> > +- Wrong GC behavior (obsolete commits wouldn’t automatically be retained by GC)
> > +  unless we introduced a special case for these kinds of notes.
> > +- No way to selectively share or pull the metacommits for one specific change.
> > +  It would be all-or-nothing, which would be expensive. This could be addressed
> > +  by changes to the protocol, but this would be invasive.
> > +- Requires custom auto-merging behavior on fetch.
> > +
> > +Tags
> > +----
> > +Put the content of the metacommit in a message attached to tag on the
> > +replacement commit. This is very similar to the git notes approach and has the
> > +same pros and cons.
> > +
> > +Simple forward references
> > +-------------------------
> > +Record an edge from an obsolete commit to its replacement in this form:
> > +
> > +refs/obsoletes/<A>
> > +
> > +pointing to commit <B> as an indication that B is the replacement for the
> > +obsolete commit A.
> > +
> > +Pros:
> > +- Protects <B> from being garbage collected.
> > +- Fast lookup for the evolve operation, without additional search structures
> > +  (“what is the replacement for <A>?” is very fast).
> > +
> > +Cons:
> > +- Can’t represent divergence (which is a P0 requirement).
> > +- Creates lots of refs (which can be inefficient)
> > +- Doesn’t provide a way to fetch only refs for a specific change.
> > +- The obslog command requires a search of all refs.
> > +
> > +Complex forward references
> > +--------------------------
> > +Record an edge from an obsolete commit to its replacement in this form:
> > +
> > +refs/obsoletes/<change_id>/obs<A>_<B>
> > +
> > +Pointing to commit <B> as an indication that B is the replacement for obsolete
> > +commit A.
> > +
> > +Pros:
> > +- Permits sharing and fetching refs for only a specific change.
> > +- Supports divergence
> > +- Protects <B> from being garbage collected.
> > +
> > +Cons:
> > +- Creates lots of refs, which is inefficient.
> > +- Doesn’t provide a good lookup structure for lookups in either direction.
> > +
> > +Backward references
> > +-------------------
> > +Record an edge from a replacement commit to the obsolete one in this form:
> > +
> > +refs/obsolescences/<B>
> > +
> > +Cons:
> > +- Doesn’t provide a way to resolve divergence (which is a P0 requirement).
> > +- Doesn’t protect <B> from being garbage collected (which could be fixed by
> > +  combining this with a refs/metas namespace, as in the metacommit variant).
> > +
> > +Obsolescences file
> > +------------------
> > +Create a custom file (or files) in .git recording obsolescences.
> > +
> > +Pros:
> > +- Can store exactly the information we want with exactly the performance we want
> > +  for all operations. For example, there could be a disk-based hashtable
> > +  permitting constant time lookups in either direction.
> > +
> > +Cons:
> > +- Handling GC, pushing, and pulling would all require custom solutions. GC
> > +  issues could be addressed with a repository format extension.
> > +
> > +Squash points
> > +-------------
> > +We create and update change branches in refs/metas them at the same time we
> > +would in the metacommit proposal. However, rather than pointing to a metacommit
> > +branch they point to normal commits and are treated as “squash points” - markers
> > +for sequences of commits intended to be squashed together on submission.
> > +
> > +Amends and rebases work differently than they do now. Rather than actually
> > +containing the desired state of a commit, they contain a delta from the previous
> > +version along with a squash point indicating that the preceding changes are
> > +intended to be squashed on submission. Specifically, amends would become new
> > +changes and rebases would become merge commits with the old commit and new
> > +parent as parents.
> > +
> > +When the changes are finally submitted, the squashes are executed, producing the
> > +final version of the commit.
> > +
> > +In addition to the squash points, git would maintain a set of “nosquash” tags
> > +for commits that were used as ancestors of a change that are not meant to be
> > +included in the squash.
> > +
> > +For example, if we have this commit graph:
> > +
> > +A(...)
> > +B(parent=A)
> > +C(parent=B)
> > +
> > +...and we amend B to produce D, we’d get:
> > +
> > +A(...)
> > +B(parent=A)
> > +C(parent=B)
> > +D(parent=B)
> > +
> > +...along with a new change branch indicating D should be squashed with its
> > +parents when submitted:
> > +
> > +change/changeB = D
> > +change/changeC = C
> > +
> > +We’d also create a nosquash tag for A indicating that A shouldn’t be included
> > +when changeB is squashed.
> > +
> > +If a user amends the change again, they’d get:
> > +
> > +A(...)
> > +B(parent=A)
> > +C(parent=B)
> > +D(parent=B)
> > +E(parent=D)
> > +
> > +change/changeB = E
> > +change/changeC = C
> > +
> > +Pros:
> > +- Good GC behavior.
> > +- Provides a natural way to share changes (they’re just normal branches).
> > +- Merge-base works automatically without special cases.
> > +- Rewriting the obslog would be easy using existing git commands.
> > +- No new data types needed.
> > +Cons:
> > +- No way to connect the squashed version of a change to the original, so no way
> > +  to automatically clean up old changes. This also means users lose all benefits
> > +  of the evolve command if they prematurely squash their commits. This may occur
> > +  if a user thinks a change is ready for submission, squashes it, and then later
> > +  discovers an additional change to make.
> > +- Histories would look very cluttered (users would see all previous edits to
> > +  their commit in the commit log, and all previous rebases would show up as
> > +  merges). Could be quite hard for users to tell what is going on. (Possible
> > +  fix: also implement a new smart log feature that displays the log as though
> > +  the squashes had occurred).
> > +- Need to change the current behavior of current commands (like amend and
> > +  rebase) in ways that will be unexpected to many users.
> > --
> > 2.19.1.930.g4563a0d9d0-goog
> >
> >

Stefan Xenos Nov. 17, 2018, 11:44 p.m. UTC | #7

Resending this without HTML enabled... sorry if you receive it twice.

> The file name and the title are in a mismatch.

Good point. However, the focus of this proposal really is supposed to
be on the underlying data structure, not just the evolve command
(which is the driving use-case for the graph but not the only
important one). I think I'll fix the mismatch by renaming both the
title and document to "change graph" if that seems acceptable. I'll
also expand the "objective" paragraph to mention the evolve command.

> Perhaps"three sequential patches"?

I've added a quick informal definition of the word "change", along
with a cross-reference to the precise definition later in the
document.

> These two paragraphs could be moved lower, under a "Semi-Related Work"

Good point. I'll keep the patch queue managers here since they really
can be used to solve the same problem that evolve addresses, but I'll
move replacements paragraph down to a new section on semi-related
work. There was also a request to discuss git-imerge which I'll insert
there.

> Instead, I would try to use the term "patch" to describe a change to the codebase

I know you didn't finish the document but later on I define the term
"change" to have essentially this meaning. I've moved the definition
earlier in the document to make the earlier sections easier to
understand. Given the choice of the word "patch" or "change" for this
definition, I prefer to use "change" since gerrit already defines it
in this way and the word "patch" already has a meaning in git (a file
containing a diff).

> Making a note so I come back to this. I hope to learn what you mean by this "more specific merge base".)

Lets say we have commits:

P <- C

Then two people amend C in different ways producing:

P <- C
P <- C1
P <- C2

...then we try to resolve the divergence by merging C1 and C2. Without
the change graph, the closest merge-base (ancestor) would be P. With
the change graph, the closest merge base would be C.

> If we GC'd commit A but still have the newer A', I can either thinkthat

I'm not sure I followed that. Are you suggesting a change to the
proposal or asking for a clarification?

On Fri, Nov 16, 2018 at 1:36 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/14/2018 7:55 PM, sxenos@google.com wrote:
> > From: Stefan Xenos <sxenos@google.com>
> >
> > This document describes what an obsolescence graph for
> > git would look like, the behavior of the evolve command,
> > and the changes planned for other commands.
>
> Thanks for putting this together!
>
> > diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt
> ...
> > +Git Obsolescence Graph
> > +======================
> > +
> > +Objective
> > +---------
> > +Track the edits to a commit over time in an obsolescence graph.
>
> The file name and the title are in a mismatch.
>
> I'd prefer if the title was "Git Evolve Design Document" and this
> opening paragraph
> was about the reasons we want a 'git evolve' command. Here is my attempt:
>
>    The proposed 'git evolve' command will help users craft a
> high-quality commit
>    history in their topic branches. By working to improve commits one at
> a time,
>    then running 'git evolve', users can rewrite recent history with more
> options
>    than interactive rebase. The core benefit is that users can pause
> their progress
>    and move to other branches before returning to where they left off.
> Users can
>    also share progress with others using standard 'push', 'fetch', and
> 'format-patch'
>    commands.
>
> > +Background
> > +----------
>
> Perhaps you can call this "Example"?
>
> > +Imagine you have three dependent changes up for review and you receive feedback
> > +that requires editing all three changes. While you're editing one, more feedback
> > +arrives on one of the others. What do you do?
>
> "three dependent changes" sounds a bit vague enough to me to possibly
> confuse readers. Perhaps
> "three sequential patches"?
>
> > +- Users can view the history of a commit directly (the sequence of amends and
> > +  rebases it has undergone, orthogonal to the history of the branch it is on).
>
> "the history of a commit" doesn't semantically work, as a commit is an
> immutable Git object.
>
> Instead, I would try to use the term "patch" to describe a change to the
> codebase, and that
> takes the form as a list of commits that are improving on each other
> (but don't actually
> have each other in their commit history). This means that the lifetime
> of a patch is described
> by the commits that are amended or rebased.
>
> > +- By pushing and pulling the obsolescence graph, users can collaborate more
> > +  easily on changes-in-progress. This is better than pushing and pulling the
> > +  changes themselves since the obsolescence graph can be used to locate a more
> > +  specific merge base, allowing for better merges between different versions of
> > +  the same change.
>
> (Making a note so I come back to this. I hope to learn what you mean by
> this "more specific
> merge base".)
>
> > +
> > +Similar technologies
> > +--------------------
> > ... It can't handle the case where you have
> > +multiple changes sharing the same parent when that parent needs to be rebased
>
> Perhaps this could be made more concrete by describing commit history
> and a specific workflow
> change using 'git evolve'.
>
> Suppose we have two topic branches, topic1 and topic2, that point to
> commits A and B,
> respectively.Suppose further that A and B have a common parent C with
> parent D. If we rebase
> topic1 relativeto D, then we create new commits C' and A' that are newer
> versions of commits
> C and A. It would benice to easily update topic2 to be on a new commit
> B' with parent C'.
> Currently, a user needs to knowthat C updated to C', and use 'git rebase
> --onto C' C topic2'.
> Instead, if we have a marker showing thatC' is an updated version of C,
> 'git log topic2'
> would show that topic2 can be updated, and the 'gitevolve' command would
> perform the correct
> action to make B' with parent C'.
>
> (This paragraph above is an example of "what can happen now is
> complicated and demands that
> the user keep some information in their memory" and "the new workflow is
> simpler and helps
> users make the right decision". I think we could use more of these at
> the start to sell the
> idea.)
>
>
> > +and won't let you collaborate with others on resolving a complicated interactive
> > +rebase.
>
> In the same sentence, we have an even more complicated workflow
> mentioned as an aside. This
> could be fleshed out more concretely. It could help describing that the
> current model is for
> usersto share "!fixup" commits and then one performs an interactive
> rebase to apply those
> fixups inthe correct order. If a user instead shares an amended commit,
> then we are in a
> difficult state toapply those changes. The new workflow would be to
> share amended commits
> and 'git evolve'inserts the correct amended commits in the right order.
>
> I'm a big proponent of the teaching philosophy of "examples first". It's
> easier to talk
> abstractlyafter going through some concrete examples.
>
> >   You can think of rebase -i as a top-down approach and the evolve command
> > +as the bottom-up approach to the same problem.
>
> This comparison is important. Perhaps it is more specific to say
> "interactive rebase splits
> a plan torewrite history into independent units of work, while evolve
> collects independent
> units of workinto a plan to rewrite history."
>
> > +
> > +Several patch queue managers have been built on top of git...
> > +
> > +Replacements (refs/replace) are superficially...
>
> These two paragraphs could be moved lower, under a "Semi-Related Work"
> section,
> because they describe things that are a bit similar, but are unable to
> help us solve the
> problem at hand.
>
> > +
> > +Goals
> > +-----
> > +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> > +attempted unless they interfere with goals marked with Pn-1.
>
> I like the prioritization here.
>
> > +P0. Any commit that may be involved in a future evolve command should not be
> > +    garbage collected.
>
> I wonder about the priority here. If we GC'd commit A but still have the
> newer A', I can
> either thinkthat
>
> 1. We will no longer need to run 'git evolve', or
> 2. We run 'git evolve' on something that can reach A', but A' already
> contains all the
>     informationwe need to produce a "final" commit A''.
>
> I apologize that I'm not able to read the whole thing right now, and I
> will pick up reading
> from here again soon. Hopefully the feedback above is constructive in
> the mean time.
>
> Thanks,
> -Stolee

Stefan Xenos Nov. 18, 2018, 10:27 p.m. UTC | #8

> I don't think this counts as a typical modification and is probably hard to detect automatically.

Clever use of commands! (side: wouldn't it just be easier to just use
git commit --amend, though?)

Either way, I agree that there should be a way to manually create a
change graph or modify one into any possible shape. I've updated the
"change" command to do what you want - the new version will have
subcommands for creating arbitrary change graphs.

> subject line will change over time and the original one may become irrelevant.

There's a section on change naming further down the document. My
criteria for name selection was that good names should be unique,
short to type, and memorable to the user. Being relevant to the commit
wasn't actually a requirement for me except insofar as it helps make
them memorable. If we agree that these are reasonable criteria, commit
hashes wouldn't be as good a choice since they'd satisfy the
uniqueness criteria but would not be short or memorable. I expect that
whatever criteria we select probably won't be optimal for all users
which is why the design also includes a new hook for name selection. I
believe that selected words from the commit comment should cover all
three criteria in the majority of cases, and that the hook and the
"change rename" command should cover the remaining corner cases. This
breaks the "git change" symmetry with "git branch", but after
responding to other messages regarding that command, I'm starting to
think that's not really a problem.

> How do we group changes of a topic together? I think branch-diff could take advantage of that.

Could you clarify your use-case for me? I'm not sure what you mean by
"changes of a topic". Are you referring to gerrit topics here? Topic
branches? Or are you asking for some way for end-users to classify and
organize their unsubmitted changes?

> Could we just organize it like a normal history?
> Basically all commits will be linked in a new merge history.

From what I can tell, you're suggesting the following changes:
1. Reorder the parents such that the content parent comes last rather
than first.
2. Move parent-type from the structured portion of the header to the
unstructured portion of the commit message.

I'm fine with 1 if that makes something easier.

Regarding 2, I can see some good reasons to put parent-type in the
header rather than the user-readable portion of the commit message
- fsck can rely on them when checking the database for validity (for
example, it can assert that the current repository version doesn't
attach a non-empty tree, that the content parent always points to a
real commit, the commit message is empty, that the number of
parent-types matches the number of parents, that the enum values are
valid, that the parent orders are correct, etc.).
- accidental collisions are impossible (users can't accidentally
corrupt their database by adding or removing the word "parent-type" in
a commit message).
- it doesn't spam the user-readable region with machine-readable
repository internals.

> This makes it possible to just use "git log --first-parent
> --patch" (or "git log --oneline --graph") to examine the change.

The "git log --oneline --graph" thing should work fine with the
proposal as it currently is, but I'm not sure that the --first-parent
--patch thing would be very useful no matter how we order the parents.
The metacommits have empty trees and commit messages, so such a log
would just list the metacommit hashes and nothing else. That certainly
has some utility, but I'd guess it's probably not what you were going
for. Were you intending to suggest that the metacommit should also use
the same tree and commit message as its content commit? If so, we
briefly considered this option while preparing this proposal. That
would make some commands do approximately the right thing for free.
However, when we started working through the use-cases (for example,
checking out a metacommit) we found that all the ones we looked at
would still need special cases for metacommits and those special cases
wouldn't be much simpler than they'd be with an empty tree and
message. Admittedly, git log wasn't one of the use-cases we worked
through.

  - Stefan

On Fri, Nov 16, 2018 at 10:07 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Thu, Nov 15, 2018 at 2:00 AM <sxenos@google.com> wrote:
> > +Goals
> > +-----
> > +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> > +attempted unless they interfere with goals marked with Pn-1.
> > +
> > +P0. All commands that modify commits (such as the normal commit --amend or
> > +    rebase command) should mark the old commit as being obsolete and replaced by
> > +    the new one. No additional commands should be required to keep the
> > +    obsolescence graph up-to-date.
>
> I sometimes "modify" a commit by "git reset @^", pick up the changes
> then "git commit -c @{1}". I don't think this counts as a typical
> modification and is probably hard to detect automatically. But I hope
> there's some way for me to tell git "yes this is a modified commit of
> that one, record that!".
>
> > +Example usage
> > +-------------
> > +# First create three dependent changes
> > +$ echo foo>bar.txt && git add .
> > +$ git commit -m "This is a test"
> > +created change metas/this_is_a_test
>
> I guess as an example, how the name metas/this_is_a_test is
> constructed does not matter much. But it's probably better to stick
> with some sort of id because subject line will change over time and
> the original one may become irrelevant. Perhaps we could use the
> original commit id as name.
>
> > +$ echo foo2>bar2.txt && git add .
> > +$ git commit -m "This is also a test"
> > +created change metas/this_is_also_a_test
> > +$ echo foo3>bar3.txt && git add .
> > +$ git commit -m "More testing"
> > +created change metas/more_testing
> > +
> > +# List all our changes in progress
> > +$ git change -l
> > +metas/this_is_a_test
> > +metas/this_is_also_a_test
> > +* metas/more_testing
> > +metas/some_change_already_merged_upstream
> > +
> > +# Now modify the earliest change, using its stable name
> > +$ git reset --hard metas/this_is_a_test
> > +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> > +
> > +# Use git-evolve to fix up any dependent changes
> > +$ git evolve
> > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> > +rebasing metas/more_testing onto metas/this_is_also_a_test
> > +Done
> > +
> > +# Use git-obslog to view the history of the this_is_a_test change
> > +$ git obslog
> > +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> > +930219 metas/this_is_a_test@{1} commit: This is a test
> > +
> > +# Now create an unrelated change
> > +$ git reset --hard origin/master
> > +$ echo newchange>unrelated.txt && git add .
> > +$ git commit -m "Unrelated change"
> > +created change metas/unrelated_change
> > +
> > +# Fetch the latest code from origin/master and use git-evolve
> > +# to rebase all dependent changes.
> > +$ git fetch origin master
> > +$ git evolve origin/master
> > +deleting metas/some_change_already_merged_upstream
> > +rebasing metas/this_is_a_test onto origin/master
> > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> > +rebasing metas/more_testing onto metas/this_is_also_a_test
> > +rebasing metas/unrelated_change onto origin/master
> > +Conflict detected! Resolve it and then use git evolve --continue to resume.
> > +
> > +# Sort out the conflict
> > +$ git mergetool
> > +$ git evolve --continue
> > +Done
> > +
> > +# Share the full history of edits for the this_is_a_test change
> > +# with a review server
> > +$ git push origin metas/this_is_a_test:refs/for/master
> > +# Share the lastest commit for “Unrelated change”, without history
> > +$ git push origin HEAD:refs/for/master
>
> How do we group changes of a topic together? I think branch-diff could
> take advantage of that.
>
> > +Detailed design
> > +===============
> > +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> > +a specially-formatted merge commit that describes how one commit was created
> > +from others.
> > +
> > +Meta-commits look like this:
> > +
> > +$ git cat-file -p <example_meta_commit>
> > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> > +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> > +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> > +parent-type content
> > +parent-type obsolete
> > +parent-type origin
> > +
> > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> > +cherry-picking commit 7e1bbcd3”.
>
> This feels a bit forced. Could we just organize it like a normal
> history? Something like
>
> *
> |\
> | * last version of the commit
> *
> |\
> | * second last version of the commit
> *
> |\
>
> Basically all commits will be linked in a new merge history. Real
> commits are on the second parent, first parent is to link changes
> together. This makes it possible to just use "git log --first-parent
> --patch" (or "git log --oneline --graph") to examine the change. More
> details (e.g. parent-type) could be stored as normal trailers in the
> commit message of these merges.
> --
> Duy

Stefan Xenos Nov. 18, 2018, 10:29 p.m. UTC | #9

> This breaks the "git change" symmetry with "git branch", but after
> responding to other messages regarding that command, I'm starting to
> think that's not really a problem.

Sorry, I appended that sentence to the wrong paragraph. It should have
gone with the previous one that regarding the "git change" command.
On Sun, Nov 18, 2018 at 2:27 PM Stefan Xenos <sxenos@google.com> wrote:
>
> > I don't think this counts as a typical modification and is probably hard to detect automatically.
>
> Clever use of commands! (side: wouldn't it just be easier to just use
> git commit --amend, though?)
>
> Either way, I agree that there should be a way to manually create a
> change graph or modify one into any possible shape. I've updated the
> "change" command to do what you want - the new version will have
> subcommands for creating arbitrary change graphs.
>
> > subject line will change over time and the original one may become irrelevant.
>
> There's a section on change naming further down the document. My
> criteria for name selection was that good names should be unique,
> short to type, and memorable to the user. Being relevant to the commit
> wasn't actually a requirement for me except insofar as it helps make
> them memorable. If we agree that these are reasonable criteria, commit
> hashes wouldn't be as good a choice since they'd satisfy the
> uniqueness criteria but would not be short or memorable. I expect that
> whatever criteria we select probably won't be optimal for all users
> which is why the design also includes a new hook for name selection. I
> believe that selected words from the commit comment should cover all
> three criteria in the majority of cases, and that the hook and the
> "change rename" command should cover the remaining corner cases. This
> breaks the "git change" symmetry with "git branch", but after
> responding to other messages regarding that command, I'm starting to
> think that's not really a problem.
>
> > How do we group changes of a topic together? I think branch-diff could take advantage of that.
>
> Could you clarify your use-case for me? I'm not sure what you mean by
> "changes of a topic". Are you referring to gerrit topics here? Topic
> branches? Or are you asking for some way for end-users to classify and
> organize their unsubmitted changes?
>
> > Could we just organize it like a normal history?
> > Basically all commits will be linked in a new merge history.
>
> From what I can tell, you're suggesting the following changes:
> 1. Reorder the parents such that the content parent comes last rather
> than first.
> 2. Move parent-type from the structured portion of the header to the
> unstructured portion of the commit message.
>
> I'm fine with 1 if that makes something easier.
>
> Regarding 2, I can see some good reasons to put parent-type in the
> header rather than the user-readable portion of the commit message
> - fsck can rely on them when checking the database for validity (for
> example, it can assert that the current repository version doesn't
> attach a non-empty tree, that the content parent always points to a
> real commit, the commit message is empty, that the number of
> parent-types matches the number of parents, that the enum values are
> valid, that the parent orders are correct, etc.).
> - accidental collisions are impossible (users can't accidentally
> corrupt their database by adding or removing the word "parent-type" in
> a commit message).
> - it doesn't spam the user-readable region with machine-readable
> repository internals.
>
> > This makes it possible to just use "git log --first-parent
> > --patch" (or "git log --oneline --graph") to examine the change.
>
> The "git log --oneline --graph" thing should work fine with the
> proposal as it currently is, but I'm not sure that the --first-parent
> --patch thing would be very useful no matter how we order the parents.
> The metacommits have empty trees and commit messages, so such a log
> would just list the metacommit hashes and nothing else. That certainly
> has some utility, but I'd guess it's probably not what you were going
> for. Were you intending to suggest that the metacommit should also use
> the same tree and commit message as its content commit? If so, we
> briefly considered this option while preparing this proposal. That
> would make some commands do approximately the right thing for free.
> However, when we started working through the use-cases (for example,
> checking out a metacommit) we found that all the ones we looked at
> would still need special cases for metacommits and those special cases
> wouldn't be much simpler than they'd be with an empty tree and
> message. Admittedly, git log wasn't one of the use-cases we worked
> through.
>
>   - Stefan
>
> On Fri, Nov 16, 2018 at 10:07 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Thu, Nov 15, 2018 at 2:00 AM <sxenos@google.com> wrote:
> > > +Goals
> > > +-----
> > > +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> > > +attempted unless they interfere with goals marked with Pn-1.
> > > +
> > > +P0. All commands that modify commits (such as the normal commit --amend or
> > > +    rebase command) should mark the old commit as being obsolete and replaced by
> > > +    the new one. No additional commands should be required to keep the
> > > +    obsolescence graph up-to-date.
> >
> > I sometimes "modify" a commit by "git reset @^", pick up the changes
> > then "git commit -c @{1}". I don't think this counts as a typical
> > modification and is probably hard to detect automatically. But I hope
> > there's some way for me to tell git "yes this is a modified commit of
> > that one, record that!".
> >
> > > +Example usage
> > > +-------------
> > > +# First create three dependent changes
> > > +$ echo foo>bar.txt && git add .
> > > +$ git commit -m "This is a test"
> > > +created change metas/this_is_a_test
> >
> > I guess as an example, how the name metas/this_is_a_test is
> > constructed does not matter much. But it's probably better to stick
> > with some sort of id because subject line will change over time and
> > the original one may become irrelevant. Perhaps we could use the
> > original commit id as name.
> >
> > > +$ echo foo2>bar2.txt && git add .
> > > +$ git commit -m "This is also a test"
> > > +created change metas/this_is_also_a_test
> > > +$ echo foo3>bar3.txt && git add .
> > > +$ git commit -m "More testing"
> > > +created change metas/more_testing
> > > +
> > > +# List all our changes in progress
> > > +$ git change -l
> > > +metas/this_is_a_test
> > > +metas/this_is_also_a_test
> > > +* metas/more_testing
> > > +metas/some_change_already_merged_upstream
> > > +
> > > +# Now modify the earliest change, using its stable name
> > > +$ git reset --hard metas/this_is_a_test
> > > +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> > > +
> > > +# Use git-evolve to fix up any dependent changes
> > > +$ git evolve
> > > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> > > +rebasing metas/more_testing onto metas/this_is_also_a_test
> > > +Done
> > > +
> > > +# Use git-obslog to view the history of the this_is_a_test change
> > > +$ git obslog
> > > +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> > > +930219 metas/this_is_a_test@{1} commit: This is a test
> > > +
> > > +# Now create an unrelated change
> > > +$ git reset --hard origin/master
> > > +$ echo newchange>unrelated.txt && git add .
> > > +$ git commit -m "Unrelated change"
> > > +created change metas/unrelated_change
> > > +
> > > +# Fetch the latest code from origin/master and use git-evolve
> > > +# to rebase all dependent changes.
> > > +$ git fetch origin master
> > > +$ git evolve origin/master
> > > +deleting metas/some_change_already_merged_upstream
> > > +rebasing metas/this_is_a_test onto origin/master
> > > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> > > +rebasing metas/more_testing onto metas/this_is_also_a_test
> > > +rebasing metas/unrelated_change onto origin/master
> > > +Conflict detected! Resolve it and then use git evolve --continue to resume.
> > > +
> > > +# Sort out the conflict
> > > +$ git mergetool
> > > +$ git evolve --continue
> > > +Done
> > > +
> > > +# Share the full history of edits for the this_is_a_test change
> > > +# with a review server
> > > +$ git push origin metas/this_is_a_test:refs/for/master
> > > +# Share the lastest commit for “Unrelated change”, without history
> > > +$ git push origin HEAD:refs/for/master
> >
> > How do we group changes of a topic together? I think branch-diff could
> > take advantage of that.
> >
> > > +Detailed design
> > > +===============
> > > +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> > > +a specially-formatted merge commit that describes how one commit was created
> > > +from others.
> > > +
> > > +Meta-commits look like this:
> > > +
> > > +$ git cat-file -p <example_meta_commit>
> > > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> > > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> > > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> > > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> > > +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> > > +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> > > +parent-type content
> > > +parent-type obsolete
> > > +parent-type origin
> > > +
> > > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> > > +cherry-picking commit 7e1bbcd3”.
> >
> > This feels a bit forced. Could we just organize it like a normal
> > history? Something like
> >
> > *
> > |\
> > | * last version of the commit
> > *
> > |\
> > | * second last version of the commit
> > *
> > |\
> >
> > Basically all commits will be linked in a new merge history. Real
> > commits are on the second parent, first parent is to link changes
> > together. This makes it possible to just use "git log --first-parent
> > --patch" (or "git log --oneline --graph") to examine the change. More
> > details (e.g. parent-type) could be stored as normal trailers in the
> > commit message of these merges.
> > --
> > Duy

Junio C Hamano Nov. 18, 2018, 11:20 p.m. UTC | #10

Stefan Xenos <sxenos@google.com> writes:

>> I don't think this counts as a typical modification and is probably hard to detect automatically.
>
> Clever use of commands! (side: wouldn't it just be easier to just use
> git commit --amend, though?)

When an original commit is mostly an early part of a feature, mixed
with a small but an urgent bugfix, it is not unusual to start your
work from "reset HEAD^" (or "reset --soft HEAD^") and recreate a
commit that has the main part of the change from the original,
leaving the remainder in the working tree to be worked into another
bugfix commit, most likely to be on a new branch forked from an
earlier point in the history, i.e.

	git reset HEAD^
	git add -p
	git commit -c @{1}
	git checkout -m -b a-small-bugfix-split-out master
	edit
	git commit -a

I agree with both of you that we want to have a way to mark that the
first commit we made by partially committing what was in the
original came from the original one, and also that the second one
has contents from the same original one.

It is unclear, without human involvement, if we can mechanically
infer that anything that used to be built on top of the original
commit would want to be rebuilt on top of the first half of the
split commit (i.e. the early part of the feature with the bugfix
separated out) but not on the other half (i.e. the bugfix alone).

Stefan Xenos Nov. 19, 2018, 12:36 a.m. UTC | #11

> Am I correct to understand that the reason why a commit object is
> (ab|re)used to represent a meta-commit is because by doing so we
> would get connectivity (i.e. fetching & pushing would transfer all
> the associated objects along) for free, and by not representing it
> as a new and different object type, existing implementations can
> just pass them along without understanding what they are, and as
> long as these are not mixed as parts of the main history of the
> project (e.g. when enumerating commits that has aa7ce5 as its
> parents, because somebody else obsoleted aa7ce5 and you want to
> evolve anything that built on it, you do not want to mistake the
> above "meta" commit as a commit that is part of the ordinary history
> and rebuild on top of the new version of aa7ce5, which would lead to
> a disaster), everything would work just fine?

Yes, sir. That's it exactly. My first draft of the proposal suggested
creating a new top-level object type, but when I started digging
through the code I realized that the new object was so similar to a
commit that there was no need for a new type.

> Perhaps you'd use something like "presence of parent-type header
> marks that a commit is a meta-commit and not part of the main
> history".

Yes, that's called out explicitly as part of the proposal (see the
first sentence in the Parent-type subsection). Fsck would enforce this
invariant.

> How are these meta commits anchored so that it won't be reclaimed by
> repack?

They would either be anchored by a ref in the metas/ namespace (for
active changes currently under consideration by evolve) or by the
reflog (for recently deleted changes).

> I do not see any "parent" field used to chain them together,

They point to one another using the usual "parent" field present in
all commit objects. For an example of what the raw struct would look
like with parent pointers, see the top of the "Detailed design"
section or search the doc for the string <example_meta_commit>. For
examples of how the metacommits in a change graph would be connected
after various operations, see the "Commit" section and the "Merge"
section. Please let me know if any of these examples are
insufficiently explained or if there's any other examples you'd like
to see.

> but I do not think we can afford to spend one ref per meta
> commit, as refs are not designed to point into each and every object
> in the repository.

Agreed. This is actually one of the reasons I'm proposing the use of
chains of meta-commits as opposed to using a purely ref-based
approach. I describe several other ref-based approaches in the "Other
options considered" section, and I made essentially the same point
there. We only create refs in the metas/ namespace to point to the
head of each change, and the rest of the commits and metacommits used
by the graph are reached via the parent pointers in the metacommits.

> I have a moderately strong opposition against "origin" thing.  If
> aa7ce555 replaces d664309ee, in order for the tool to be able to
> "evolve" other histories that build on top of d664309ee, it only
> needs the history between aa7ce555 and d664309ee and it would not
> matter how aa7ce555 was built relative to its parent.

I see I haven't justified the "origin" thing well enough. I'll
elaborate in the document, but here's the short version. The "origin"
edges are needed to address several use-cases:

1. Gerrit needs to know about cherry picks.

This is one of the lesser-known things that it uses the change-id
footers for and if we want to be able to eliminate the gerrit
change-id footers we need to record and communicate information about
cherry-picks somehow. This is the main reason for the origin edges -
the early drafts of this proposal didn't have them but it came up when
I asked a kind Gerrit maintainer to whether the proposal would be
sufficient to eliminate gerrit's change-ids. However there may be
alternatives I didn't think of. If we were to omit the origin edges,
can you suggest an alternative way for git to record the fact that one
commit was cherry-picked from another and communicate this fact to
gerrit?

I see that I forgot to call out "replacing gerrit change-ids" as an
explicit goal. I'll add that to the doc.

2. Obsolescence across cherry-picks.

In your example, it *may* actually matter how aa7ce55 was constructed.
One such scenario is what I'm calling obsolescence across
cherry-picks. Let me describe the use-case for it:

Alice creates commit A1.

Bob cherry-picks A1 to another branch, producing B1. At this point,
Bob has a metacommit saying that A1 is the origin of B1.

Alice amends A1, producing A2. She shares this with Bob.

At this point, Bob probably wants to amend B1 to include whatever
bugfix Alice did in A2 since the thing he cherry-picked is now out of
date. That's what the obsolescence across cherry-picks feature does.
If bob runs evolve with this option enabled, the evolve command will
produce B2 by amending B1 with whatever diff Alice did between A1 and
A2... and this only works if we have origin edges. Without the origin
edge, the evolve command wouldn't know that B1 came from the now
out-of-date A1. The commit B2 that results from this would have both
origin and replacement edges. It replaces B1 but it was formed by
cherry-picking A2.

I'm currently unsure if obsolescence across cherry-picks should be on
or off by default. I was thinking of making it off-by-default
initially and then possibly flipping the default after users have a
chance to try it and give feedback.

3. Merge-base.

Origin edges will always point to a better merge base (ancestor for
three-way merges) than the content's parent. For example, consider
doing a cherry-pick followed by a rebase:

$ git checkout -b source
# Stage some stuff
$ git commit -m "A" && git tag A
# Stage some more stuff
$ git commit --amend -m "B" && git tag B
$ git checkout -b dest && git reset --hard HEAD^1
$ git cherry-pick A
$ git checkout source
$ git rebase dest

We cherry-picked an old version of a change, and then rebased a new
version of that same change onto the a branch containing the old one.
With today's code we'd pick A's parent commit as the common ancestor
and the user would have to resolve a bunch of merge conflicts since
both A and B are versions of the same patch and touch a lot of the
same lines. We know that B is the newer version but the merge tool
doesn't know that B is newer than A. If we had origin edges, we would
know that the latest commit on the dest branch was a cherry-pick of A
and would traverse its origin edge before the content edge to look for
ancestors. That would select A as the common ancestor, and since B
applies cleanly on top of A there would be no conflicts and the
automerge would most likely succeed.

Now, this may seem like a crazy thing to do. Why would you rebase two
different versions of the same change on top of one another? This
scenario is likely to crop up when users start using the change graph
to collaborate on the same WIP change. They'll be using rebase, merge,
and cherry-pick to resolve divergence and incorporate changes from
other users... so rebases and cherry-picks of different versions of
the same change would be commonplace.

> The user may
> have typed/developed it from scratch, the user may have borrowed 70%
> of its change from 7e1bbcd while remaining 30% was done from
> scratch, or it was a concatenation of the change made in 7e1bbcd and
> another commit.

I don't think the amount they developed from scratch invalidates any
of the three main use-cases (above). Even if we have no idea how many
manual edits were made, the origin parents would still be useful for
communication with gerrit, locating a better merge base, and giving
the user the option of obsolescence across cherry-picks. The proposal
calls for commands like squash merge to create multiple origin
parents, so concatenations *would* be recorded accurately if there was
ever a reason to treat them differently (I'm not sure any of my 3 main
use-cases need to).

> One half of my point being that we can do _without_ it, and in all
> cases, aa7ce555, if leaving the fact that it was derived from
> 7e1bbcd is so important, can mention that in its log message how it
> relates to the "origin" thing.

Log messages would be sufficient for communicating cherry-picks to the
user, but wouldn't address any of the driving use-cases for origin
parents which require it to be machine-readable.

Now, admittedly the obsolescence-across-cherry-picks and merge-base
use-cases are minor features due to the fact that cherry-picks and
squash merges are themselves uncommon. A lot of users would probably
never notice the difference. However, it would be very disappointing
if gerrit's change-ids needed to stick around just for the sake of
this one missing corner case.

> And the other half is that while I consider the "origin" thing is
> unnecessary for the above reasons, having it means we need to not
> just transfer the history reading to aa7ce555 and d664309ee (which
> are necessary anyway while we have histories to transplant from
> d664309ee to aa7ce555) but also have to pull in the history leading
> to 7e1bbcd and we cannot discard it.

I'll assume that by "history" you're referring to the change graph
(the metacommits) and not the branches (the commits), which would have
no origin edges or connection between replacements.

If the user has kept a change around in their metas namespace, it's an
indication that they (or their collaborators) are still working on it
and want its history to be retained. I don't necessarily see this as a
problem because if collaborators are still editing a change that the
local user cherry-picked, it's plausible that the change may be the
subject of a future obsolescence-over-cherry-pick in which case having
the history around is necessary.

You're right that our default position should be not to retain extra
objects unless there's a compelling reason to do so, and this proposal
should have explained that reason. Now that I've explained the reason
do you still have a strong objection to the "origin" parents, or have
I overlooked a use-case?

  - Stefan

Junio C Hamano Nov. 19, 2018, 2:15 a.m. UTC | #12

Stefan Xenos <sxenos@google.com> writes:

>> And the other half is that while I consider the "origin" thing is
>> unnecessary for the above reasons, having it means we need to not
>> just transfer the history reading to aa7ce555 and d664309ee (which
>> are necessary anyway while we have histories to transplant from
>> d664309ee to aa7ce555) but also have to pull in the history leading
>> to 7e1bbcd and we cannot discard it.
>
> I'll assume that by "history" you're referring to the change graph
> (the metacommits) and not the branches (the commits), which would have
> no origin edges or connection between replacements.

I meant the project's history, not the meta-graph thing.

By having a "this was cherry-picked from that commit" in a commit
that is not GC'ed, the original commit that has no longer have any
relevance (because the newer one that is the result of the
cherry-pick is the surviving version people will be building on) is
kept reachable.  It is very much delierate that "cherry-pick -x"
does not make the "origin" reachable and merely notes it in the
human readable form that is ignored by the reachablity machinery.

> If the user has kept a change around in their metas namespace, it's an
> indication that they (or their collaborators) are still working on it
> and want its history to be retained.

This is where we differ.  If commit X was rewritten (perhaps with
help from change cherry-picked from commit Z, or without any) to
produce Y, I do agree that it would be logical to keep X around
until every dependent commit on it are migrated to be on top of Y.
But we do not need Z to transplant what used to be on X on top of Y,
do we?  So I do agree that in such a situation they want the
relevant parts of the history retained, but I do not agree that
"origin" is among them.

	Side note.  As long as we have commits yet to be migrated to
	be on Y that still is on X, ew do not need the meta-commit
	to be protecting from getting GC'ed, as X is reachable from
	these "need to be updated" branch tips anyway.  What we gain
	from extra reachability brought in by the meta commits is
	that by fetching the "change", we get Y (and its anestors),
	even if we are not following any branch that contains it, so
	that we can migrate those that are still based on X to it.

Stefan Xenos Nov. 19, 2018, 3:33 a.m. UTC | #13

> I meant the project's history, not the meta-graph thing.

In that case, we agree. The proposal suggests that "origin" should be
reachable from the meta-graph for the cherry-picked commit, NOT the
cherry-picked commit itself. Does that resolve our disagreement, or is
reachability from the meta-graph also undesirable for you?

> By having a "this was cherry-picked from that commit" in a commit
> that is not GC'ed, the original commit that has no longer have any
> relevance (because the newer one that is the result of the
> cherry-pick is the surviving version people will be building on) is
> kept reachable.  It is very much delierate that "cherry-pick -x"
> does not make the "origin" reachable and merely notes it in the
> human readable form that is ignored by the reachablity machinery.

Hmm. It sounds like you may be arguing against reachability from the
cherry-picked commit (which we agree on). I'm arguing for reachability
ONLY from the meta-graph. From your reply it's not completely clear to
me whether you also disapprove of reachability from the meta-graph or
if you thought the origin edges would be present on the cherry-picked
commit itself. Could you clarify? I suspect it may be the latter,
which suggests ambiguity in the proposal. If you could point to the
text that gave the impression origin parents would be present in the
cherry-picked commits themselves, I'll fix it.

> This is where we differ.  If commit X was rewritten (perhaps with
> help from change cherry-picked from commit Z, or without any) to
> produce Y, I do agree that it would be logical to keep X around
> until every dependent commit on it are migrated to be on top of Y.

The scenario you describe would not produce an origin edge in the
metacommit graph. If the user amended X, there would be no origin
edges - just a replacement. If you cherry-picked Z you'd get no
replacements and just an origin. In neither case would you get both
types of parent. I'd suggest we focus on the cherry-pick scenario
since it's the simplest real-world use case that produces origin
parents. All the more complex scenarios involving both parent types
only occur if you start from that simple case, so if you convince me
that the origin-only use case is unnecessary or undesirable, it would
also follow that the more complex origin-plus-obsolete-parent use case
is unnecessary.

So, if you don't mind - let me simplify that use-case: "If commit Z is
cherry-picked to produce Y, is there any need to keep Z around?". I
don't think we need X in the example to answer that question.

> But we do not need Z to transplant what used to be on X on top of Y,
> do we?

That's correct. The origin parent would be used to incorporate amended
versions of Z into Y, not to transplant things. It would also be used
to locate ancestors when merging code based on Z with code based on Y.

> So I do agree that in such a situation they want the
> relevant parts of the history retained, but I do not agree that
> "origin" is among them.

You may be entirely right, but at this point I'm not certain whether
we're disagreeing or miscommunicating. :-(

Junio C Hamano Nov. 19, 2018, 3:45 a.m. UTC | #14

Stefan Xenos <sxenos@google.com> writes:

>> I meant the project's history, not the meta-graph thing.
>
> In that case, we agree. The proposal suggests that "origin" should be
> reachable from the meta-graph for the cherry-picked commit, NOT the
> cherry-picked commit itself. Does that resolve our disagreement, or is
> reachability from the meta-graph also undesirable for you?

Sorry, I confused myself.

Yes, I do mind that the "origin" thing in the meta history to pin
the old commit whose contents were cherry picked to create a new
commit, which is separate from the old commit that was rewritten to
create a new commit.  The latter (i.e. the old one) I do not mind to
get retrieved when such a meta commit is fetched, and all of us of
course would want the new one, too (which is the whole point of
adding the meta commit to help other commits built on the old one
migrate to the new one).  But I simply do not see the point of
having to drag the history leading to "origin", and that is why I am
moderately against recording "the change in this came from that
commit via cherry-pick" in a meta commit.

Junio C Hamano Nov. 19, 2018, 4:15 a.m. UTC | #15

Stefan Xenos <sxenos@google.com> writes:

> The scenario you describe would not produce an origin edge in the
> metacommit graph. If the user amended X, there would be no origin
> edges - just a replacement. If you cherry-picked Z you'd get no
> replacements and just an origin. In neither case would you get both
> types of parent.

OK, that makes things a lot simpler.

I can see why we want to record "commit X obsoletes commit Y" to
help the "evolve" feature, which was the original motivation this
started the whole discussion.  But it is not immediately obvious to
me how it would help to have "Z was cherry-picked from W" in
"evolve".

The whole point of cherry-picking an old commit W to produce a new
commit Z is because the developer wanted to use the change between
W^ and W in a context that is quite different from W^, so it would
make no sense to "evolve" anything that was built on top of W on top
of Z.

It is of course OK to build a different feature that can take
advantage of the cherry-pick information on top of the same meta
commit concept in later steps, and to ensure that is doable, the
initial meta commit design must be done in a way that is flexible
enough to be extended, but it is not clear to me if this "origin"
thing is "while this does not have much to do with 'evolve', let's
throw in fields that would help another feature while we are at it"
or "in addition to 'X obsoletes Y', we need the cherry-pick
information for 'evolve' feature because..." (and because it is not
clear, I am assuming that it is the former).  If we can design the
"evolve" thing with only the "contents" and "obsoletes", that would
allow us to limit the scope of discussion we need to have around
meta commit and have something that works earlier, wouldn't it?

Thanks.

SZEDER Gábor Nov. 19, 2018, 3:55 p.m. UTC | #16

On Sat, Nov 17, 2018 at 12:30:58PM -0800, Stefan Xenos wrote:
> > Further, I see that this document tries to suggest a proliferation of new commands
> 
> It does. Let me explain a bit about the reasoning behind this
> breakdown of commands. My main priority was to keep the commands as
> consistent with existing git commands as possible. Secondary goals
> were:
> - Mapping a single intent to a single command where possible makes it
> easier to explain what that command does.
> - Having lots of simpler commands as opposed to a few complex commands
> makes them easier to type.
> - Command names are more descriptive than lettered arguments.

Subcommand names and --long-options are just as descriptive.

> Git already has a "log" and "reflog" command for displaying two
> different types of log,

No, there is 'git log' for displaying logs in various ways, and there
is 'git reflog' which not only displays reflogs, but also operates on
them, e.g. deletes specific reflog entries or expires old entries,
necessitating and justifying the dedicated 'git reflog' command.

> so putting "obslog" on its own command makes
> it consistent with the existing logs, easier to type, and keeps the
> command simple.

> - We could turn "obslog" into an extra option on the "log" command,
> but that would be inconsistent with reflog and would complicate the
> already-complex log command.

On one hand, it's unclear to me what additional operations the
proposed 'git obslog' command will perform besides showing the log of
a change.  If there are no such operations, then it can't really be
compared to 'git reflog' to justify a dedicated 'git obslog' command.

OTOH, note that 'git log' already has a '--walk-reflogs' option, and
indeed 'git reflog [show]' is implemented via the common log
machinery.  And this is not a mere implementation detail, because "git
reflog show accepts any of the options accepted by git log" (quoting
its manpage), making it possible to filter, limit and format reflog
entries, e.g.:

  git reflog --format='%h %cd %s' --author=szeder -5 branch file

I think 'git obslog' should allow the same when showing the log of a
change.

> Personally, I don't
> consider a proliferation of new commands to be inherently bad (or
> inherently good, really). Is there a reason new commands should be
> avoided?

If a user wants to deal with reflogs, then there is 'git help reflog'
which in one manpage describes the concept, and how to list and
expire reflogs and delete individual entries from a reflog using the
various subcommands.  If a user wants to deal with stashes, then there
is 'git help stash', which in one manpage describes the concept, and
how to create, list, show, apply, delete, etc. stashes using the
various subcommands.  See where this is going?  The same applies to
bisect, notes, remotes, rerere, submodules, worktree; maybe there are
more.  This is a Good Thing.

By adding several new commands users will have to consult the manpages
of 'evolve', 'change', 'obslog', etc., even though the commands and
the concepts are closely related.

Stefan Xenos Nov. 19, 2018, 8:14 p.m. UTC | #17

> But it is not immediately obvious to me how it would help to have "Z was cherry-picked from W" in "evolve".

The evolve command would use it for handling the
obsolescence-over-cherry-pick (OOCP) feature. If someone cherry-picks
a commit and then amends the original, the evolve command would give
you the option of applying the same amendment to the cherry-picked
version.

Are you claiming that this is undesirable, or are you claiming that
this could be accomplished without origin parents?

> the developer wanted to use the change between W^ and W in a context that is quite different from

I guess that depends on the reason for doing the cherry-pick. A very
common scenario I see for cherry-picks is cherry-picking a bugfix from
a development branch to a maintenance branch. In that situation, if
there was a better version of the original bugfix you'd also want to
update the cherry-pick on the maintenance branch to use the better
version of the fix. That's what OOCP does.

> make no sense to "evolve" anything that was built on top of W on top of Z.

Agreed. But that's not what evolve would do with the origin edges. It
would be looking for amendments of W, not children of W.

> It is of course OK to build a different feature that can take advantage of the cherry-pick information on top of the same meta commit concept in later steps

All valid points - we could build a useful "evolve" command without
origin edges (and without OOCP), we could easily add origin parents
later to a design that just supported obsolete and content parents,
and the decision about /when/ to add origin parents is orthogonal to
the decision about /if/ to add them.

Lets explore the "when" question. I think there's a compelling reason
to add them as soon as possible - namely, gerrit. If and when we come
to some sort of agreement on this proposal, gerrit could start adding
tooling to understand change graphs as an alternative to change-id
footers. That work could proceed in parallel with the work in git-core
once we know what the data structures look like, but it can't start
until the data structures are sufficient to address all the use cases
that were previously covered by change-id. At the moment, meta-commits
without origin parents would not cover all of gerrit's use-cases so
this would block adoption in gerrit.

  - Stefan
On Sun, Nov 18, 2018 at 8:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Xenos <sxenos@google.com> writes:
>
> > The scenario you describe would not produce an origin edge in the
> > metacommit graph. If the user amended X, there would be no origin
> > edges - just a replacement. If you cherry-picked Z you'd get no
> > replacements and just an origin. In neither case would you get both
> > types of parent.
>
> OK, that makes things a lot simpler.
>
> I can see why we want to record "commit X obsoletes commit Y" to
> help the "evolve" feature, which was the original motivation this
> started the whole discussion.  But it is not immediately obvious to
> me how it would help to have "Z was cherry-picked from W" in
> "evolve".
>
> The whole point of cherry-picking an old commit W to produce a new
> commit Z is because the developer wanted to use the change between
> W^ and W in a context that is quite different from W^, so it would
> make no sense to "evolve" anything that was built on top of W on top
> of Z.
>
> It is of course OK to build a different feature that can take
> advantage of the cherry-pick information on top of the same meta
> commit concept in later steps, and to ensure that is doable, the
> initial meta commit design must be done in a way that is flexible
> enough to be extended, but it is not clear to me if this "origin"
> thing is "while this does not have much to do with 'evolve', let's
> throw in fields that would help another feature while we are at it"
> or "in addition to 'X obsoletes Y', we need the cherry-pick
> information for 'evolve' feature because..." (and because it is not
> clear, I am assuming that it is the former).  If we can design the
> "evolve" thing with only the "contents" and "obsoletes", that would
> allow us to limit the scope of discussion we need to have around
> meta commit and have something that works earlier, wouldn't it?
>
> Thanks.

Jonathan Nieder Nov. 19, 2018, 8:26 p.m. UTC | #18

Hi,

Xenos wrote:

> Lets explore the "when" question. I think there's a compelling reason
> to add them as soon as possible - namely, gerrit. If and when we come
> to some sort of agreement on this proposal, gerrit could start adding
> tooling to understand change graphs as an alternative to change-id
> footers. That work could proceed in parallel with the work in git-core
> once we know what the data structures look like, but it can't start
> until the data structures are sufficient to address all the use cases
> that were previously covered by change-id. At the moment, meta-commits
> without origin parents would not cover all of gerrit's use-cases so
> this would block adoption in gerrit.

By this, are you referring to the "Cherry-picks" list in the Gerrit
web UI?

Thanks,
Jonathan

Stefan Xenos Nov. 19, 2018, 9:32 p.m. UTC | #19

> Subcommand names and --long-options are just as descriptive.

Yeah, I'm convinced about the descriptiveness. If you check the latest
version of the patch, I've already updated the "change" command to use
subcommands rather than lettered arguments.

> If a user wants to deal with reflogs, then there is 'git help reflog'

I guess it depends on whether you prefer having a single big help page
(risk: user may see irrelevant content), or a number of small help
pages (risk: user may need to follow cross-references). My guess is
that we should probably try to hit the sweet spot that minimizes the
amount of irrelevant information on a help page, the probability of
needing to follow a cross-reference to understand context, and the
amount of content that needs to be duplicated between pages.

But assuming we add a bunch of formatting options to obslog that match
log, it may make sense to just have an "--obslog" argument to log.

> I think 'git obslog' should allow the same when showing the log of a change.

Sounds good. We should probably also change the default formatting for
the obslog command to be some sort of description of what changed
since the commit message will probably be very similar for every
entry. I'll update the proposal to mention formatting options once we
sort out where obslog will actually live.

> By adding several new commands users will have to consult the manpages of 'evolve',
> 'change', 'obslog', etc., even though the commands and the concepts are closely related.

I'm not sure that's the case. There is some common background material
you'd need to understand in order to use any of those commands ("what
are changes?"), but the same could be said of pretty much any git
command ("what are commits?"). Assuming the user knows what a change
is, I'm pretty sure I could write a self-contained description for
evolve, change, or obslog that doesn't require cross-referencing any
of the other commands... and the evolve command could probably be
understood even without understanding changes.

But since several comments have focused on the commands, let's brainstorm!

Here's some options that occur to me:
1. Three commands: evolve + change + obslog as top-level commands (the
current proposal). Change gets a bunch of subcommands for manipulating
the change graph and metas/ namespace.
2. All top-level: evolve + lschange + mkchange + rmchange +
prunechange + obslog. None of the commands get subcommands.
3. Everything under change: "change evolve", "change obslog" become
new change subcommands.
4. Evolve as a rebase argument, obslog as a log argument. Use "rebase
--evolve" to initiate evolve and use "log --obslog" to initiate
obslog. The change command stays as it is in the proposal.
5. Two commands: evolve + change. obslog becomes a "log" argument.

Note that there will be more "evolve"-specific arguments in the
future. For most transformations that evolve uses, there will be a
matching argument to enable or disable that transformation and as we
add transformations we'll also add arguments.

If we go with option 3, it would make for a very cluttered help page.
For example, if you're looking for information on how to use evolve,
you'd have to scroll past a bunch of formatting information that are
only relevant to obslog... and if you're looking for the formatting
options, you'd have to scroll past a bunch of
transformation-enablement options that are only relevant to evolve.

Based on your log feedback above, I'm thinking #5 may make sense.

  - Stefan
On Mon, Nov 19, 2018 at 7:55 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Sat, Nov 17, 2018 at 12:30:58PM -0800, Stefan Xenos wrote:
> > > Further, I see that this document tries to suggest a proliferation of new commands
> >
> > It does. Let me explain a bit about the reasoning behind this
> > breakdown of commands. My main priority was to keep the commands as
> > consistent with existing git commands as possible. Secondary goals
> > were:
> > - Mapping a single intent to a single command where possible makes it
> > easier to explain what that command does.
> > - Having lots of simpler commands as opposed to a few complex commands
> > makes them easier to type.
> > - Command names are more descriptive than lettered arguments.
>
> Subcommand names and --long-options are just as descriptive.
>
>
> > Git already has a "log" and "reflog" command for displaying two
> > different types of log,
>
> No, there is 'git log' for displaying logs in various ways, and there
> is 'git reflog' which not only displays reflogs, but also operates on
> them, e.g. deletes specific reflog entries or expires old entries,
> necessitating and justifying the dedicated 'git reflog' command.
>
> > so putting "obslog" on its own command makes
> > it consistent with the existing logs, easier to type, and keeps the
> > command simple.
>
> > - We could turn "obslog" into an extra option on the "log" command,
> > but that would be inconsistent with reflog and would complicate the
> > already-complex log command.
>
> On one hand, it's unclear to me what additional operations the
> proposed 'git obslog' command will perform besides showing the log of
> a change.  If there are no such operations, then it can't really be
> compared to 'git reflog' to justify a dedicated 'git obslog' command.
>
> OTOH, note that 'git log' already has a '--walk-reflogs' option, and
> indeed 'git reflog [show]' is implemented via the common log
> machinery.  And this is not a mere implementation detail, because "git
> reflog show accepts any of the options accepted by git log" (quoting
> its manpage), making it possible to filter, limit and format reflog
> entries, e.g.:
>
>   git reflog --format='%h %cd %s' --author=szeder -5 branch file
>
> I think 'git obslog' should allow the same when showing the log of a
> change.
>
>
> > Personally, I don't
> > consider a proliferation of new commands to be inherently bad (or
> > inherently good, really). Is there a reason new commands should be
> > avoided?
>
> If a user wants to deal with reflogs, then there is 'git help reflog'
> which in one manpage describes the concept, and how to list and
> expire reflogs and delete individual entries from a reflog using the
> various subcommands.  If a user wants to deal with stashes, then there
> is 'git help stash', which in one manpage describes the concept, and
> how to create, list, show, apply, delete, etc. stashes using the
> various subcommands.  See where this is going?  The same applies to
> bisect, notes, remotes, rerere, submodules, worktree; maybe there are
> more.  This is a Good Thing.
>
> By adding several new commands users will have to consult the manpages
> of 'evolve', 'change', 'obslog', etc., even though the commands and
> the concepts are closely related.
>
>

Junio C Hamano Nov. 20, 2018, 1:03 a.m. UTC | #20

Stefan Xenos <sxenos@google.com> writes:

>> But it is not immediately obvious to me how it would help to have
>> "Z was cherry-picked from W" in "evolve".
>
> The evolve command would use it for handling the
> obsolescence-over-cherry-pick (OOCP) feature. If someone cherry-picks
> a commit and then amends the original, the evolve command would give
> you the option of applying the same amendment to the cherry-picked
> version.

Yeah, I missed that case when I was formulating my thought on how we
can start smaller and simpler to get the ball rolling.  And for
"this commit and anything built on top of it need to be adjusted
since that other commit, which this commit was made by cherry-picking
it, has been obsoleted" to work, the "origin" commit pointed at by
the meta commit must be made available.

> Are you claiming that this is undesirable, or are you claiming that
> this could be accomplished without origin parents?

I was trying to see if this is something we can leave out to limit
the initial scope.

Jonathan Nieder Nov. 20, 2018, 1:09 a.m. UTC | #21

Hi,

Stefan Xenos wrote:

> But since several comments have focused on the commands, let's brainstorm!
>
> Here's some options that occur to me:
>
> 1. Three commands: evolve + change + obslog as top-level commands (the
> current proposal). Change gets a bunch of subcommands for manipulating
> the change graph and metas/ namespace.
>
> 2. All top-level: evolve + lschange + mkchange + rmchange +
> prunechange + obslog. None of the commands get subcommands.
>
> 3. Everything under change: "change evolve", "change obslog" become
> new change subcommands.
>
> 4. Evolve as a rebase argument, obslog as a log argument. Use "rebase
> --evolve" to initiate evolve and use "log --obslog" to initiate
> obslog. The change command stays as it is in the proposal.
>
> 5. Two commands: evolve + change. obslog becomes a "log" argument.
>
> Note that there will be more "evolve"-specific arguments in the
> future. For most transformations that evolve uses, there will be a
> matching argument to enable or disable that transformation and as we
> add transformations we'll also add arguments.
>
> If we go with option 3, it would make for a very cluttered help page.
> For example, if you're looking for information on how to use evolve,
> you'd have to scroll past a bunch of formatting information that are
> only relevant to obslog... and if you're looking for the formatting
> options, you'd have to scroll past a bunch of
> transformation-enablement options that are only relevant to evolve.
>
> Based on your log feedback above, I'm thinking #5 may make sense.

(5) sounds good to me, too.  Thanks, both, for your thoughtfulness.

Jonathan

Jonathan Nieder Nov. 20, 2018, 1:18 a.m. UTC | #22

Ævar Arnfjörð Bjarmason wrote:
> On Thu, Nov 15 2018, sxenos@google.com wrote:

>> +Parent-type
>> +-----------
>> +The “parent-type” field in the commit header identifies a commit as a
>> +meta-commit and indicates the meaning for each of its parents. It is never
>> +present for normal commits.
[...]
> I think it's worth pointing out for those that are rusty on commit
> object details (but I checked) is that the reason for it not being:
>
>     tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
>     parent aa7ce55545bf2c14bef48db91af1a74e2347539a
>     parent-type content
>     parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
>     parent-type obsolete
>     parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
>     parent-type origin
>     author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
>     committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
>
> Which would be easier to read, is that we're very sensitive to the order
> of the first few fields (tree -> parent -> author -> committer) and fsck
> will error out if we interjected a new field.

By the way, in the spirit of limiting the initial scope, I wonder
whether the parent-type fields can be stored in the commit message
initially.

Elsewhere in this thread it was mentioned that the parent-type is a
field to allow tools like "git fsck" to understand the meaning of
these parent relationships (for example, to forbid a commit
referencing a meta-commit).  The same could be done using special
commit message text, though.

The advantage of such an approach would be that we could experiment
without changing the official object format at all.  If experiments
revealed a different set of information to store, we could update the
format without having to maintain the memory of the older format in
"git fsck"'s understanding of commit object fields.  So even though I
think that in the end we would want to put this information in the
commit object header, I'm tempted to suspect that the benefits of
putting it in the commit message to start outweigh the costs (in
particular, of having to migrate to another format later).

Thanks,
Jonathan

Ævar Arnfjörð Bjarmason Nov. 20, 2018, 9:43 a.m. UTC | #23

On Tue, Nov 20 2018, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Nov 15 2018, sxenos@google.com wrote:
>
>>> +Parent-type
>>> +-----------
>>> +The “parent-type” field in the commit header identifies a commit as a
>>> +meta-commit and indicates the meaning for each of its parents. It is never
>>> +present for normal commits.
> [...]
>> I think it's worth pointing out for those that are rusty on commit
>> object details (but I checked) is that the reason for it not being:
>>
>>     tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
>>     parent aa7ce55545bf2c14bef48db91af1a74e2347539a
>>     parent-type content
>>     parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
>>     parent-type obsolete
>>     parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
>>     parent-type origin
>>     author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
>>     committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
>>
>> Which would be easier to read, is that we're very sensitive to the order
>> of the first few fields (tree -> parent -> author -> committer) and fsck
>> will error out if we interjected a new field.
>
> By the way, in the spirit of limiting the initial scope, I wonder
> whether the parent-type fields can be stored in the commit message
> initially.
>
> Elsewhere in this thread it was mentioned that the parent-type is a
> field to allow tools like "git fsck" to understand the meaning of
> these parent relationships (for example, to forbid a commit
> referencing a meta-commit).  The same could be done using special
> commit message text, though.
>
> The advantage of such an approach would be that we could experiment
> without changing the official object format at all.  If experiments
> revealed a different set of information to store, we could update the
> format without having to maintain the memory of the older format in
> "git fsck"'s understanding of commit object fields.  So even though I
> think that in the end we would want to put this information in the
> commit object header, I'm tempted to suspect that the benefits of
> putting it in the commit message to start outweigh the costs (in
> particular, of having to migrate to another format later).

I think it sounds better to just make it, in the header:

    x-evolve-pt content
    x-evolve-pt obsolete
    x-evolve-pt origin

Where "pt = parent-type", we could of course spell that out too, but in
this case it's "x-evolve-pt" is the exact same number of bytes as
"parent-type", so nobody can object that it takes more space:)

We'd then carry some documentation where we say everything except "x-*-"
is reserved, and that we'd like to know about new "*" there before it's
used, so it can be documented.

Putting it in the commit message just sounds like a hack around not
having namespaced headers. If we'd like to keep this then tools would
need to parse both (potentially unpacking a lot of the commit message
object, it can be quite big in some cases...).

Phillip Wood Nov. 20, 2018, 12:18 p.m. UTC | #24

Hi Stefan

Thanks for working on this, I think it could be a really useful addition 
to git. I'd echo Gábor's comments about making commands descriptive and 
easy for the user to find, git has aliases, accepts abbreviated option 
names and has shell completion so I don't think typing is really such a 
problem. From your reply it looks like you've taken those concerns on 
board. I've got some more comments below.

On 15/11/2018 00:55, sxenos@google.com wrote:
> From: Stefan Xenos <sxenos@google.com>
> 
> This document describes what an obsolescence graph for
> git would look like, the behavior of the evolve command,
> and the changes planned for other commands.
> 
> Signed-off-by: Stefan Xenos <sxenos@google.com>
> ---
>   Documentation/technical/evolve.txt | 885 +++++++++++++++++++++++++++++
>   1 file changed, 885 insertions(+)
>   create mode 100644 Documentation/technical/evolve.txt
> 
> diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt
> new file mode 100644
> index 0000000000..88470eada3
> --- /dev/null
> +++ b/Documentation/technical/evolve.txt
> @@ -0,0 +1,885 @@
> +Git Obsolescence Graph
> +======================
> +
> +Objective
> +---------
> +Track the edits to a commit over time in an obsolescence graph.
> +
> +Background
> +----------
> +Imagine you have three dependent changes up for review and you receive feedback
> +that requires editing all three changes. While you're editing one, more feedback
> +arrives on one of the others. What do you do?
> +
> +The evolve command is a convenient way to work with chains of commits that are
> +under review. Whenever you rebase or amend a commit, the repository remembers
> +that the old commit is obsolete and has been replaced by the new one. Then, at
> +some point in the future, you can run "git evolve" and the correct sequence of
> +rebases will occur in the correct order such that no commit has an obsolete
> +parent.
> +
> +Part of making the "evolve" command work involves tracking the edits to a commit
> +over time, which is why we need an obsolescence graph. However, the obsolescence
> +graph will also bring other benefits:
> +
> +- Users can view the history of a commit directly (the sequence of amends and
> +  rebases it has undergone, orthogonal to the history of the branch it is on).
> +- It will be possible to quickly locate and list all the changes the user
> +  currently has in progress.
> +- It can be used as part of other high-level commands that combine or split
> +  changes.
> +- It can be used to decorate commits (in git log, gitk, etc) that are either
> +  obsolete or are the tip of a work in progress.
> +- By pushing and pulling the obsolescence graph, users can collaborate more
> +  easily on changes-in-progress. This is better than pushing and pulling the
> +  changes themselves since the obsolescence graph can be used to locate a more
> +  specific merge base, allowing for better merges between different versions of
> +  the same change.
> +- It could be used to correctly rebase local changes and other local branches
> +  after running git-filter-branch.
> +- It can replace the change-id footer used by gerrit.
> +
> +Similar technologies
> +--------------------
> +There are some other technologies that address the same end-user problem.
> +
> +Rebase -i can be used to solve the same problem, but users can't easily switch
> +tasks midway through an interactive rebase or have more than one interactive
> +rebase going on at the same time. It can't handle the case where you have
> +multiple changes sharing the same parent when that parent needs to be rebased
> +and won't let you collaborate with others on resolving a complicated interactive
> +rebase. You can think of rebase -i as a top-down approach and the evolve command
> +as the bottom-up approach to the same problem.
> +
> +Several patch queue managers have been built on top of git (such as topgit,
> +stgit, and quilt). They address the same user need. However they also rely on
> +state managed outside git that needs to be kept in sync. Such state can be
> +easily damaged when running a git native command that is unaware of the patch
> +queue. They also typically require an explicit initialization step to be done by
> +the user which creates workflow problems.
> +
> +Replacements (refs/replace) are superficially similar to obsolescences in that
> +they describe that one commit should be replaced by another. However, they
> +differ in both how they are created and how they are intended to be used.
> +Obsolescences are created automatically by the commands a user runs, and they
> +describe the user’s intent to perform a future rebase. Obsolete commits still
> +appear in branches, logs, etc like normal commits (possibly with an extra
> +decoration that marks them as obsolete). Replacements are typically created
> +explicitly by the user, they are meant to be kept around for a long time, and
> +they describe a replacement to be applied at read-time rather than as the input
> +to a future operation. When a replaced commit is queried, it is typically hidden
> +and swapped out with its replacement as though the replacement has already
> +occurred.
> +
> +Goals
> +-----
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.
> +
> +P0. All commands that modify commits (such as the normal commit --amend or
> +    rebase command) should mark the old commit as being obsolete and replaced by
> +    the new one. No additional commands should be required to keep the
> +    obsolescence graph up-to-date.
> +P0. Any commit that may be involved in a future evolve command should not be
> +    garbage collected. Specifically:
> +    - Commits that obsolete another should not be garbage collected until
> +      user-specified conditions have occurred and the change has expired from
> +      the reflog. User specified conditions for removing changes include:
> +      - The user explicitly deleted the change.
> +      - The change was merged into a specific branch.
> +    - Commits that have been obsoleted by another should not be garbage
> +      collected if any of their replacements are still being retained.
> +P0. A commit can be obsoleted by more than one replacement (called divergence).
> +P0. Must be able to resolve divergence (convergence).
> +P1. Users should be able to share chains of obsolete changes in order to
> +    collaborate on WIP changes.
> +P2. Such sharing should be at the user’s option. That is, it should be possible
> +    to directly share a change without also sharing the file states or commit
> +    comments from the obsolete changes that led up to it, and the choice not to
> +    share those commits should not require changing any commit hashes.
> +P2. It should be possible to discard part or all of the obsolescence graph
> +    without discarding the commits themselves that are already present in
> +    branches and the reflog.
> +
> +
> +Overview
> +========
> +We introduce the notion of “meta-commits” which describe how one commit was
> +created from other commits. A branch of meta-commits is known as a change.
> +Changes are created and updated automatically whenever a user runs a command
> +that creates a commit. They are used for locating obsolete commits, providing a
> +list of a user’s unsubmitted work in progress, and providing a stable name for
> +each unsubmitted change.
> +
> +Users can exchange edit histories by pushing and fetching changes.
> +
> +New commands will be introduced for manipulating changes and resolving
> +divergence between them. Existing commands that create commits will be updated
> +to modify the meta-commit graph and create changes where necessary.
> +
> +Example usage
> +-------------
> +# First create three dependent changes
> +$ echo foo>bar.txt && git add .
> +$ git commit -m "This is a test"
> +created change metas/this_is_a_test
> +$ echo foo2>bar2.txt && git add .
> +$ git commit -m "This is also a test"
> +created change metas/this_is_also_a_test
> +$ echo foo3>bar3.txt && git add .
> +$ git commit -m "More testing"
> +created change metas/more_testing
> +
> +# List all our changes in progress
> +$ git change -l
> +metas/this_is_a_test
> +metas/this_is_also_a_test
> +* metas/more_testing
> +metas/some_change_already_merged_upstream

I'm a bit confused why it is creating a meta ref per commit rather than 
one for the current branch.

> +
> +# Now modify the earliest change, using its stable name
> +$ git reset --hard metas/this_is_a_test
> +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> +
> +# Use git-evolve to fix up any dependent changes
> +$ git evolve
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +Done

What happens if the original commit are currently checked out with local 
changes? With worktrees there could be several branches currently 
checked out.

> +# Use git-obslog to view the history of the this_is_a_test change
> +$ git obslog
> +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> +930219 metas/this_is_a_test@{1} commit: This is a test
> +
> +# Now create an unrelated change
> +$ git reset --hard origin/master
> +$ echo newchange>unrelated.txt && git add .
> +$ git commit -m "Unrelated change"
> +created change metas/unrelated_change
> +
> +# Fetch the latest code from origin/master and use git-evolve
> +# to rebase all dependent changes.
> +$ git fetch origin master
> +$ git evolve origin/master

I've not really used mercurial but I did watch a talk about 'hg evolve' 
a while ago. I got the impression they had put quite a lot of effort 
into having evolve automatically run and resolve divergences when 
pulling and rebasing, is there a long term plan for git to do the same?

> +deleting metas/some_change_already_merged_upstream
> +rebasing metas/this_is_a_test onto origin/master
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +rebasing metas/unrelated_change onto origin/master
> +Conflict detected! Resolve it and then use git evolve --continue to resume.
> +
> +# Sort out the conflict
> +$ git mergetool
> +$ git evolve --continue
> +Done
> +
> +# Share the full history of edits for the this_is_a_test change
> +# with a review server
> +$ git push origin metas/this_is_a_test:refs/for/master
> +# Share the lastest commit for “Unrelated change”, without history
> +$ git push origin HEAD:refs/for/master
> +
> +Detailed design
> +===============
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p <example_meta_commit>
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git may
> +attach other trees here. For forward-compatibility fsck should ignore such trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.
> +
> +Parent-type
> +-----------
> +The “parent-type” field in the commit header identifies a commit as a
> +meta-commit and indicates the meaning for each of its parents. It is never
> +present for normal commits. It is a list of enum values whose order matches the
> +order of the parents. Possible parent types are:
> +
> +- content: the content parent identifies the commit that this meta-commit is
> +  describing.
> +- obsolete: indicates that this parent is made obsolete by the content parent.
> +- origin: indicates that this parent was generated from the given commit.
> +
> +There must be exactly one content parent for each meta-commit and it is always
> +be the first parent. The content commit will always be a normal commit and not a
> +meta-commit. However, future versions of git may create meta-commits for other
> +meta-commits and the fsck tool must be aware of this for forwards compatibility.
> +
> +A meta-commit can have zero or more obsolete parents. An amend operation creates
> +a single obsolete parent. A merge used to resolve divergence (see divergence,
> +below) will create multiple obsolete parents. A meta-commit may have zero
> +obsolete parents if it describes a cherry-pick or squash merge that copies one
> +or more commits but does not replace them.
> +
> +A meta-commit can have zero or more origin parents. A cherry-pick creates a
> +single origin parent. Certain types of squash merge will create multiple origin
> +parents.
> +
> +An obsolete parent or origin parent may be either a normal commit (indicating
> +the oldest-known version of a change) or another meta-commit (for a change that
> +has already been modified one or more times).
> +
> +Changes
> +-------
> +A branch of meta-commits describes how a commit was produced and what previous
> +commits it is based on. It is also an identifier for a thing the user is
> +currently working on. We refer to such a meta-branch as a change.
> +
> +Local changes are stored in the new refs/metas namespace. Remote changes are
> +stored in the refs/remotemetas/<remotename> namespace.

Can I suggest using refs/remote/<remotenome>/metas. It would be really 
nice to get to a point where git stores all its remote refs under the 
same hierarchy, so shared notes could go under 
refs/remote/<remotename>/notes and branches eventually end up under 
refs/remote/<remotename>/heads etc. git-p4 already uses 
refs/remote/<remotename>/p4 for its remote refs.

> +
> +The list of changes in refs/metas is more than just a mechanism for the evolve
> +command to locate obsolete commits. It is also a convenient list of all of a
> +user’s work in progress and their current state - a list of things they’re
> +likely to want to come back to.

I think this could be useful (although I guess you can get the branches 
you've been working on recently from HEAD's reflog quite easily). This 
sounds like hg show-work, is it worth mentioning that when you add a 
section about hg evolve?

> +
> +Strictly speaking, it is the presence of the branch in the refs/metas namespace
> +that marks a branch as being a change, not the fact that it points to a
> +metacommit. Metacommits are only created when a commit is amended or rebased, so
> +in the case where a change points to a commit that has never been modified, the
> +change points to that initial commit rather than a metacommit.
> +
> +Obsolescence
> +------------
> +A commit is considered obsolete if it is reachable from the “replaces” edges
> +anywhere in the history of a change and it isn’t the head of that change.
> +Commits may be the content for 0 or more meta-commits. If the same commit
> +appears in multiple changes, it is not obsolete if it is the head of any of
> +those changes.
> +
> +Divergence
> +----------
> +From the user’s perspective, two changes are divergent if they both ask for
> +different replacements to the same commit. More precisely, a target commit is
> +considered divergent if there is more than one commit at the head of a change in
> +refs/metas that leads to the target commit via an unbroken chain of “obsolete”
> +edges.
> +
> +Much like a merge conflict, divergence is a situation that requires user
> +intervention to resolve. The evolve command will stop when it encounters
> +divergence and prompt the user to resolve the problem. Users can solve the
> +problem in several ways:
> +
> +- Discard one of the changes (by deleting its change branch).
> +- Merge the two changes (producing a single change branch).

I assume this wont create merge commits for the actual commits though, 
just merge the meta branches and create some new commits that are each 
the result of something like 'merge-recursive original-commit 
our-new-version their-new-version'

> +- Copy one of the changes (keep both commits, but one of them gets a new
> +  metacommit appended to its history that is connected to its predecessor via an
> +  origin edge rather than an obsolete edge. That new change no longer obsoletes
> +  the original.)
> +
> +Obsolescence across cherry-picks
> +--------------------------------
> +By default the evolve command will treat cherry-picks and squash merges as being
> +completely separate from the original. Further amendments to the original commit
> +will have no effect on the cherry-picked copy. However, this behavior may not be
> +desirable in all circumstances.
> +
> +The evolve command may at some point support an option to look for cases where
> +the source of a cherry-pick or squash merge has itself been amended, and
> +automatically apply that same change to the cherry-picked copy. In such cases,
> +it would traverse origin edges rather than ignoring them, and would treat a
> +commit with origin edges as being obsolete if any of its origins were obsolete.

This explains why we have 'origin' fields in the meta commits, it might 
be worth putting a forward reference or note earlier on to explain why 
recording the origin is useful. (I didn't find gerrit needs it very 
convincing on its own but it is actually more general than gerrit's 
specific use case)

> +
> +Garbage collection
> +------------------
> +For GC purposes, meta-commits are normal commits. Just as a commit causes its
> +parents and tree to be retained, a meta-commit also causes its parents to be
> +retained.
> +
> +Change creation
> +---------------
> +Changes are created automatically whenever the user runs a command like “commit”
> +that has the semantics of creating a new change. They also move forward
> +automatically even if they’re not checked out. For example, whenever the user
> +runs a command like “commit --amend” that modifies a commit, all branches in
> +refs/metas that pointed to the old commit move forward to point to its
> +replacement instead. This also happens when the user is working from a detached
> +head.
> +
> +This does not mean that every commit has a corresponding change. By default,
> +changes only exist for recent locally-created commits. Users may explicitly pull
> +changes from other users or keep their changes around for a long time, but
> +either behavior requires a user to opt-in. Code review systems like gerrit may
> +also choose to keep changes around forever.
> +
> +Note that the changes in refs/metas serve a dual function as both a way to
> +identify obsolete changes and as a way for the user to keep track of their work
> +in progress. If we were only concerned with identifying obsolete changes, it
> +would be sufficient to create the change branch lazily the first time a commit
> +is obsoleted. Addressing the second use - of refs/metas as a mechanism for
> +keeping track of work in progress - is the reason for eagerly creating the
> +change on first commit.
> +
> +Change naming
> +-------------
> +When a change is first created, the only requirement for its name is that it
> +must be unique. Good names would also serve as useful mnemonics and be easy to
> +type. For example, a short word from the commit message containing no numbers or
> +special characters and that shows up with low frequency in other commit messages
> +would make a good choice.
> +
> +Different users may prefer different heuristics for their change names. For this
> +reason a new hook will be introduced to compute change names. Git will invoke
> +the hook for all newly-created changes and will append a numeric suffix if the
> +name isn’t unique. The default heuristics are not specified by this proposal and
> +may change during implementation.
> +
> +Change deletion
> +---------------
> +Changes are normally only interesting to a user while a commit is still in
> +development and under review. Once the commit has submitted wherever it is
> +going, its change can be discarded.
> +
> +The normal way of deleting changes makes this easy to do - changes are deleted
> +by the evolve command when it detects that the change is present in an upstream
> +branch. It does this in two ways: if the latest commit in a change either shows
> +up in the branch history or the change becomes empty after a rebase, it is
> +considered merged and the change is discarded. In this context, an “upstream
> +branch” is any branch passed in as the upstream argument of the evolve command.
> +
> +In case this sometimes deletes a useful change, such automatic deletions are
> +recorded in the reflog allowing them to be easily recovered.
> +
> +Sharing changes
> +---------------
> +Change histories are shared by pushing or fetching meta-commits and change
> +branches. This provides users with a lot of control of what to share and
> +repository implementations with control over what to retain.
> +
> +Users that only want to share the content of a commit can do so by pushing the
> +commit itself as they currently would. Users that want to share an edit history
> +for the commit can push its change, which would point to a meta-commit rather
> +than the commit itself if there is any history to share. Note that multiple
> +changes can refer to the same commits, so it’s possible to construct and push a
> +different history for the same commit in order to remove sensitive or irrelevant
> +intermediate states.
> +
> +Imagine the user is working on a change “mychange” that is currently the latest
> +commit on master, they have two ways to share it:
> +
> +# User shares just a commit without its history
> +> git push origin master
> +
> +# User shares the full history of the commit to a review system
> +> git push origin change/mychange:refs/for/master

Should this be meta/mychange:refs/for/master or have I missed something?

> +
> +# User fetches a collaborator’s modifications to their change
> +> git fetch remotename change/mychange
> +# Which updates the ref remotechange/remotename/mychange
> +
> +This will cause more intermediate states to be shared with the server than would
> +have been shared previously. A review system like gerrit would need to keep
> +track of which states had been explicitly pushed versus other intermediate
> +states in order to de-emphasize (or hide) the extra intermediate states from the
> +user interface.
> +
> +Merge-base
> +----------
> +Merge-base will be changed to search the meta-commit graph for common ancestors
> +as well as the commit graph, and will generally prefer results from the
> +meta-commit graph over the commit graph. Merge-base will consider meta-commits
> +from all changes, and will traverse both origin and obsolete edges.
> +
> +The reason for this is that - when merging two versions of the same commit
> +together - an earlier version of that same commit will usually be much more
> +similar than their common parent. This should make the workflow of collaborating
> +on unsubmitted patches as convenient as the workflow for collaborating in a
> +topic branch by eliminating repeated merges.
> +
> +User interface
> +--------------
> +All git porcelain commands that create commits are classified as having one of
> +four behaviors: modify, create, copy, or import. These behaviors are discussed
> +in more detail below.
> +
> +Modify commands
> +---------------
> +Modification commands (commit --amend, rebase) will mark the old commit as
> +obsolete by creating a new meta-commit that references the old one as an
> +obsolete parent. In the event that multiple changes point to the same commit,
> +this is done independently for every such change.
> +
> +More specifically, modifications work like this:
> +
> +1. Locate all existing changes for which the old commit is the content for the
> +   head of the change branch. If no such branch exists, create one that points
> +   to the old commit. Changes that include this commit in their history but not
> +   at their head are explicitly not included.
> +2. For every such change, create a new meta-commit that references the new
> +   commit as its content and references the old head of the change as an
> +   obsolete parent.
> +3. Move the change branch forward to point to the new meta-commit.
> +
> +Copy commands
> +-------------
> +Copy commands (cherry-pick, merge --squash) create a new meta-commit that
> +references the old commits as origin parents. Besides the fact that the new
> +parents are tagged differently, copy commands work the same way as modify
> +commands.
> +
> +Create commands
> +---------------
> +Creation commands (commit, merge) create a new commit and a new change that
> +points to that commit. The do not create any meta-commits.
> +
> +Import commands
> +---------------
> +Import commands (fetch, pull) do not create any new meta-commits or changes
> +unless that is specifically what they are importing. For example, the fetch
> +command would update remotechange/origin/change35 and fetch all referenced
> +meta-commits if asked to do so directly, but it wouldn’t create any changes or
> +meta-commits for commits discovered on the master branch when running “git fetch
> +origin master”.
> +
> +Other commands
> +--------------
> +Some commands don’t fit cleanly into one of the above categories.
> +
> +Semantically, filter-branch should be treated as a modify command, but doing so
> +is likely to create a lot of irrelevant clutter in the changes namespace and the
> +large number of extra change refs may introduce performance problems. We
> +recommend treating filter-branch as an import command initially, but making it
> +behave more like a modify command in future follow-up work. One possible
> +solution may be to treat commits that are part of existing changes as being
> +modified but to avoid creating changes for other rewritten changes.
> +
> +Once the evolve command can handle obsolescence across cherry-picks, such
> +cherry-picks will result in a hybrid move-and-copy operation. It will create
> +cherry-picks that replace other cherry-picks, which will have both origin edges
> +(pointing to the new source commit being picked) and obsolete edges (pointing to
> +the previous cherry-pick being replaced).
> +
> +Evolve
> +------
> +The evolve command performs the correct sequence of rebases such that no change
> +has an obsolete parent. The syntax looks like this:
> +
> +git evolve [--abort][--continue][--quit] [upstream…]
> +
> +It takes an optional list of upstream branches. All changes whose parent shows
> +up in the history of one of the upstream branches will be rebased onto the
> +upstream branch before resolving obsolete parents.
> +
> +Any change whose latest state is found in an upstream branch (or that ends up
> +empty after rebase) will be deleted. This is the normal mechanism for deleting
> +changes. Changes are created automatically on the first commit, and are deleted
> +automatically when evolve determines that they’ve been merged upstream.
> +
> +Orphan commits are commits with obsolete parents. The evolve command then
> +repeatedly rebases orphan commits with non-orphan parents until there are either
> +no orphan commits left, a merge conflict is discovered, or a divergent parent is
> +discovered.
> +
> +The --abort option returns all changes to the state they were in prior to
> +invoking evolve, and the --quit option terminates the current evolution without
> +changing the current state.
> +
> +Checkout
> +--------
> +Running checkout on a change by name has the same effect as checking out a
> +detached head pointing to the latest commit on that change-branch. There is no
> +need to ever have HEAD point to a change since changes always move forward when
> +necessary, no matter what branch the user has checked out
> +
> +Meta-commits themselves cannot be checked out by their hash.
> +
> +Reset
> +-----
> +Resetting a branch to a change by name is the same as resetting to the commit at
> +that change’s head.
> +
> +Commit
> +------
> +Commit --amend gets modify semantics and will move existing changes forward. The
> +normal form of commit gets create semantics and will create a new change.
> +
> +$ touch foo && git add . && git commit -m "foo" && git tag A
> +$ touch bar && git add . && git commit -m "bar" && git tag B
> +$ touch baz && git add . && git commit -m "baz" && git tag C
> +
> +This produces the following commits:
> +A(tree=[foo])
> +B(tree=[foo, bar], parent=A)
> +C(tree=[foo, bar, baz], parent=B)
> +
> +...along with three changes:
> +change/foo = A
> +change/bar = B
> +change/baz = C
> +
> +Running commit --amend does the following:
> +$ git checkout B
> +$ touch zoom && git add . && git commit --amend -m "baz and zoom"
> +$ git tag D
> +
> +Commits:
> +A(tree=[foo])
> +B(tree=[foo, bar], parent=A)
> +C(tree=[foo, bar, baz], parent=B)
> +D(tree=[foo, bar, zoom], parent=A)
> +Dmeta(content=D, obsolete=B)
> +
> +Changes:
> +change/foo = A
> +change/bar = Dmeta
> +change/baz = C
> +
> +Merge
> +-----
> +Merge gets create, modify, or copy semantics based on what is being merged and
> +the options being used.
> +
> +The --squash version of merge gets copy semantics (it produces a new change that
> +is marked as a copy of all the original changes that were squashed into it).
> +
> +The “modify” version of merge replaces both of the original commits with the
> +resulting merge commit. This is one of the standard mechanisms for resolving
> +divergence. The parents of the merge commit are the parents of the two commits
> +being merged. The resulting commit will not be a merge commit if both of the
> +original commits had the same parent or if one was the parent of the other.
> +
> +The “create” version of merge creates a new change pointing to a merge commit
> +that has both original commits as parents. The result is what merge produces now
> +- a new merge commit. However, this version of merge doesn’t directly resolve
> +divergence.
> +
> +To select between these two behaviors, merge gets new “--amend” and “--noamend”
> +options which select between the “create” and “modify” behaviors respectively,
> +with noamend being the default.
> +
> +For example, imagine we created two divergent changes like this:
> +
> +$ touch foo && git add . && git commit -m "foo" && git tag A
> +$ touch bar && git add . && git commit -m "bar" && git tag B
> +$ touch baz && git add . && git commit --amend -m "bar and baz"
> +$ git tag C
> +$ git checkout B
> +$ touch bam && git add . && git commit --amend -m "bar and bam"
> +$ git tag D
> +
> +At this point the commit graph looks like this:
> +
> +A(tree=[foo])
> +B(tree=[bar], parent=A)
> +C(tree=[bar, baz], parent=A)
> +D(tree=[bar, bam], parent=A)
> +Cmeta(content=C, obsoletes=B)
> +Dmeta(content=D, obsoletes=B)
> +
> +There would be three active changes with heads pointing as follows:
> +
> +change/changeA=A
> +change/changeB=Cmeta
> +change/changeB2=Dmeta
> +
> +ChangeB and changeB2 are divergent at this point. Lets consider what happens if
> +perform each type of merge between changeB and changeB2.
> +
> +Merge example: Amend merge
> +One way to resolve divergent changes is to use an amend merge. Recall that HEAD
> +is currently pointing to D at this point.
> +
> +$ git merge --amend change/changeB
> +
> +Here we’ve asked for an amend merge since we’re trying to resolve divergence
> +between two versions of the same change. There are no conflicts so we end up
> +with this:
> +
> +E(tree=[bar, baz, bam], parent=A)
> +Emeta(content=E, obsoletes=[Cmeta, Dmeta])
> +
> +With the following branches:
> +
> +change/changeA=A
> +change/changeB=Emeta
> +change/changeB2=Emeta
> +
> +Notice that the result of the “amend merge” is a replacement for C and D rather
> +than a new commit with C and D as parents (as a normal merge would have
> +produced). The parents of the amend merge are the parents of C and D which - in
> +this case - is just A, so the result is not a merge commit. Also notice that
> +changeB and changeB2 are now aliases for the same change.
> +
> +Merge example: Noamend merge
> +Consider what would have happened if we’d used a noamend merge instead. Recall
> +that HEAD was at D and our branches looked like this:
> +
> +change/changeA=A
> +change/changeB=Cmeta
> +change/changeB2=Dmeta
> +
> +$ git merge --noamend change/changeB
> +
> +That would produce the sort of merge we’d normally expect today:
> +
> +F(tree=[bar, baz, bam], parent=[C, D])
> +
> +And our changes would look like this:
> +change/changeA=A
> +change/changeB=Cmeta
> +change/changeB2=Dmeta
> +change/changeF=F
> +
> +In this case, changeB and changeB2 are still divergent and we’ve created a new
> +change for our merge commit. However, this is just a temporary state. The next
> +time we run the “evolve” command, it will discover the divergence but also
> +discover the merge commit F that resolves it. Evolve will suggest converting F
> +into an amend merge in order to resolve the divergence and will display the
> +command for doing so.
> +
> +Change
> +------
> +The “change” command can be used to list, rename, reset or delete change. It
> +takes arguments similar to the “branch” command.
> +
> +The -l argument lists all local changes that aren’t present in the given branch.
> +If the branch name is omitted, all local changes are listed.
> +
> +The -r argument list all remote changes.
> +
> +The -m argument renames a change, given its old and new name.
> +
> +The -d argument deletes a change. This is one way to resolve divergence.
> +
> +The -n argument renames the current change, or creates a change of the given
> +name for the current commit if no such change exists yet. If given an optional
> +commit hash, the change is created for that commit rather than head. If there
> +are multiple local changes for the same commit and they are all aliases for the
> +same metacommit hash, they are all deleted except the newly-created name. If
> +given the name of a metacommit, the new change points to that metacommit.
> +
> +The --purge argument deletes all obsolete changes and all changes that are
> +present in the given branch. Note that such changes can be recovered from the
> +reflog.
> +
> +Combined with the GC protection that is offered, this is intended to facilitate
> +a workflow that relies on changes instead of branches. Users could choose to
> +work with no local branches and use changes instead - both for mailing list and
> +gerrit workflows.
> +
> +Log
> +---
> +When a commit is shown in git log that is part of a change, it is decorated with
> +extra change information. If it is the head of a change, the name of the change
> +is shown next to the list of branches. If it is obsolete, it is decorated with
> +the word “obsolete”.
> +
> +Obslog
> +------
> +Obslog command lists the change history for the current commit.
> +
> +Rebase
> +------

I think it would make sense to have this next to the sections on commit 
--amend and merge I was wondering what about rebase when I was reading 
those sections.

Best wishes

Phillip

> +In general the rebase command is treated as a modify command. When a change is
> +rebased, the new commit replaces the original.
> +
> +Rebase --abort is special. Its intent is to restore git to the state it had
> +prior to running rebase. It should move back any changes to point to the refs
> +they had prior to running rebase and delete any new changes that were created as
> +part of the rebase. To achieve this, rebase will save the state of all changes
> +in refs/metas prior to running rebase and will restore the entire namespace
> +after rebase completes (deleting any newly-created changes). Newly-created
> +metacommits are left in place, but will have no effect until garbage collected
> +since metacommits are only used if they are reachable from refs/metas.
> +
> +Other options considered
> +========================
> +We considered several other options for storing the obsolescence graph. This
> +section describes the other options and why they were rejected.
> +
> +Commit header
> +-------------
> +Add an “obsoletes” field to the commit header that points backwards from a
> +commit to the previous commits it obsoletes.
> +
> +Pros:
> +- Very simple
> +- Easy to traverse from a commit to the previous commits it obsoletes.
> +Cons:
> +- Adds a cost to the storage format, even for commits where the change history
> +  is uninteresting.
> +- Unconditionally prevents the change history from being garbage collected.
> +- Always causes the change history to be shared when pushing or pulling changes.
> +
> +Git notes
> +---------
> +Instead of storing obsolescence information in metacommits, the metacommit
> +content could go in a new notes namespace - say refs/notes/metacommit. Each note
> +would contain the list of obsolete and origin parents, and an automerger could
> +be supplied to make it easy to merge the metacommit notes from different remotes.
> +
> +Pros:
> +- Easy to locate all commits obsoleted by a given commit (since there would only
> +  be one metacommit for any given commit).
> +Cons:
> +- Wrong GC behavior (obsolete commits wouldn’t automatically be retained by GC)
> +  unless we introduced a special case for these kinds of notes.
> +- No way to selectively share or pull the metacommits for one specific change.
> +  It would be all-or-nothing, which would be expensive. This could be addressed
> +  by changes to the protocol, but this would be invasive.
> +- Requires custom auto-merging behavior on fetch.
> +
> +Tags
> +----
> +Put the content of the metacommit in a message attached to tag on the
> +replacement commit. This is very similar to the git notes approach and has the
> +same pros and cons.
> +
> +Simple forward references
> +-------------------------
> +Record an edge from an obsolete commit to its replacement in this form:
> +
> +refs/obsoletes/<A>
> +
> +pointing to commit <B> as an indication that B is the replacement for the
> +obsolete commit A.
> +
> +Pros:
> +- Protects <B> from being garbage collected.
> +- Fast lookup for the evolve operation, without additional search structures
> +  (“what is the replacement for <A>?” is very fast).
> +
> +Cons:
> +- Can’t represent divergence (which is a P0 requirement).
> +- Creates lots of refs (which can be inefficient)
> +- Doesn’t provide a way to fetch only refs for a specific change.
> +- The obslog command requires a search of all refs.
> +
> +Complex forward references
> +--------------------------
> +Record an edge from an obsolete commit to its replacement in this form:
> +
> +refs/obsoletes/<change_id>/obs<A>_<B>
> +
> +Pointing to commit <B> as an indication that B is the replacement for obsolete
> +commit A.
> +
> +Pros:
> +- Permits sharing and fetching refs for only a specific change.
> +- Supports divergence
> +- Protects <B> from being garbage collected.
> +
> +Cons:
> +- Creates lots of refs, which is inefficient.
> +- Doesn’t provide a good lookup structure for lookups in either direction.
> +
> +Backward references
> +-------------------
> +Record an edge from a replacement commit to the obsolete one in this form:
> +
> +refs/obsolescences/<B>
> +
> +Cons:
> +- Doesn’t provide a way to resolve divergence (which is a P0 requirement).
> +- Doesn’t protect <B> from being garbage collected (which could be fixed by
> +  combining this with a refs/metas namespace, as in the metacommit variant).
> +
> +Obsolescences file
> +------------------
> +Create a custom file (or files) in .git recording obsolescences.
> +
> +Pros:
> +- Can store exactly the information we want with exactly the performance we want
> +  for all operations. For example, there could be a disk-based hashtable
> +  permitting constant time lookups in either direction.
> +
> +Cons:
> +- Handling GC, pushing, and pulling would all require custom solutions. GC
> +  issues could be addressed with a repository format extension.
> +
> +Squash points
> +-------------
> +We create and update change branches in refs/metas them at the same time we
> +would in the metacommit proposal. However, rather than pointing to a metacommit
> +branch they point to normal commits and are treated as “squash points” - markers
> +for sequences of commits intended to be squashed together on submission.
> +
> +Amends and rebases work differently than they do now. Rather than actually
> +containing the desired state of a commit, they contain a delta from the previous
> +version along with a squash point indicating that the preceding changes are
> +intended to be squashed on submission. Specifically, amends would become new
> +changes and rebases would become merge commits with the old commit and new
> +parent as parents.
> +
> +When the changes are finally submitted, the squashes are executed, producing the
> +final version of the commit.
> +
> +In addition to the squash points, git would maintain a set of “nosquash” tags
> +for commits that were used as ancestors of a change that are not meant to be
> +included in the squash.
> +
> +For example, if we have this commit graph:
> +
> +A(...)
> +B(parent=A)
> +C(parent=B)
> +
> +...and we amend B to produce D, we’d get:
> +
> +A(...)
> +B(parent=A)
> +C(parent=B)
> +D(parent=B)
> +
> +...along with a new change branch indicating D should be squashed with its
> +parents when submitted:
> +
> +change/changeB = D
> +change/changeC = C
> +
> +We’d also create a nosquash tag for A indicating that A shouldn’t be included
> +when changeB is squashed.
> +
> +If a user amends the change again, they’d get:
> +
> +A(...)
> +B(parent=A)
> +C(parent=B)
> +D(parent=B)
> +E(parent=D)
> +
> +change/changeB = E
> +change/changeC = C
> +
> +Pros:
> +- Good GC behavior.
> +- Provides a natural way to share changes (they’re just normal branches).
> +- Merge-base works automatically without special cases.
> +- Rewriting the obslog would be easy using existing git commands.
> +- No new data types needed.
> +Cons:
> +- No way to connect the squashed version of a change to the original, so no way
> +  to automatically clean up old changes. This also means users lose all benefits
> +  of the evolve command if they prematurely squash their commits. This may occur
> +  if a user thinks a change is ready for submission, squashes it, and then later
> +  discovers an additional change to make.
> +- Histories would look very cluttered (users would see all previous edits to
> +  their commit in the commit log, and all previous rebases would show up as
> +  merges). Could be quite hard for users to tell what is going on. (Possible
> +  fix: also implement a new smart log feature that displays the log as though
> +  the squashes had occurred).
> +- Need to change the current behavior of current commands (like amend and
> +  rebase) in ways that will be unexpected to many users.
>

Phillip Wood Nov. 20, 2018, 12:59 p.m. UTC | #25

On 20/11/2018 12:18, Phillip Wood wrote:
> On 15/11/2018 00:55, sxenos@google.com wrote:
>> From: Stefan Xenos <sxenos@google.com>
>> +Divergence
>> +----------
>> +From the user’s perspective, two changes are divergent if they both 
>> ask for
>> +different replacements to the same commit. More precisely, a target 
>> commit is
>> +considered divergent if there is more than one commit at the head of 
>> a change in
>> +refs/metas that leads to the target commit via an unbroken chain of 
>> “obsolete”
>> +edges.
>> +
>> +Much like a merge conflict, divergence is a situation that requires user
>> +intervention to resolve. The evolve command will stop when it encounters
>> +divergence and prompt the user to resolve the problem. Users can 
>> solve the
>> +problem in several ways:
>> +
>> +- Discard one of the changes (by deleting its change branch).
>> +- Merge the two changes (producing a single change branch).
> 
> I assume this wont create merge commits for the actual commits though, 
> just merge the meta branches and create some new commits that are each 
> the result of something like 'merge-recursive original-commit 
> our-new-version their-new-version'

That should have been

merge-recursive original-commit^ -- our-new-version their-new-version

Best Wishes

Phillip

Phillip Wood Nov. 20, 2018, 1:03 p.m. UTC | #26

On 15/11/2018 00:55, sxenos@google.com wrote:
> From: Stefan Xenos <sxenos@google.com>
>
> +Obsolescence across cherry-picks
> +--------------------------------
> +By default the evolve command will treat cherry-picks and squash merges as being
> +completely separate from the original. Further amendments to the original commit
> +will have no effect on the cherry-picked copy. However, this behavior may not be
> +desirable in all circumstances.
> +
> +The evolve command may at some point support an option to look for cases where
> +the source of a cherry-pick or squash merge has itself been amended, and
> +automatically apply that same change to the cherry-picked copy. In such cases,
> +it would traverse origin edges rather than ignoring them, and would treat a
> +commit with origin edges as being obsolete if any of its origins were obsolete.

If a merge has been cherry-picked we cannot update it as we don't record 
which parent was used for the pick, however it is probably not a problem 
in practice - I think it is unusual to amend merges.

Best Wishes

Phillip

Stefan Xenos Nov. 20, 2018, 5:27 p.m. UTC | #27

> I was trying to see if this is something we can leave out to limit the initial scope.

Oh, in that case, "yes". :-) If there's a need to cut something,
origin parents would be a viable candidate.

I was thinking that this file could document the final goal so that if
anyone else wanted to contribute to the implementation, we would be
heading in the same direction. It seems reasonable that an early
implementation may omit origin parents. Since the actual
implementation will lag behind the spec, I'll add a status section to
the top of the document where we can describe the delta between plan
and implementation.

Also, I'm now convinced we're talking about the same thing. :-)

> > Are you claiming that this is undesirable, or are you claiming that
> > this could be accomplished without origin parents?
>
> I was trying to see if this is something we can leave out to limit
> the initial scope.

Stefan Xenos Nov. 20, 2018, 5:45 p.m. UTC | #28

This sounds like a proposal for general namespacing. I like it - that
would pave the way for other header extensions - but that should
probably be the subject of a separate proposal (who owns the content
of a namespace, what is the process for adding a new namespace or a
new attribute within a namespace, what order should the header
attributes appear in, what problem is namespacing there to solve, when
do we use a namespaced attribute versus a "reserved" attribute, etc.).

x-evolve-pt seems reasonable to me. If you're keen on this and want to
document the namespacing proposal, I'll conform to it. However, if
don't have formal rules for namespaces in place yet it might be better
to avoid the use of an x- prefix for now, just in case I accidentally
squat on a name that breaks whatever namespacing rules we eventually
come up with.

Since we're talking bytes, a more compact representation of
parent-type could use single-letter codes:
x-evolve-pt c r o
(where c=content, r=replace/obsolete, o=origin)

  - Stefan
On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Nov 20 2018, Jonathan Nieder wrote:
>
> > Ævar Arnfjörð Bjarmason wrote:
> >> On Thu, Nov 15 2018, sxenos@google.com wrote:
> >
> >>> +Parent-type
> >>> +-----------
> >>> +The “parent-type” field in the commit header identifies a commit as a
> >>> +meta-commit and indicates the meaning for each of its parents. It is never
> >>> +present for normal commits.
> > [...]
> >> I think it's worth pointing out for those that are rusty on commit
> >> object details (but I checked) is that the reason for it not being:
> >>
> >>     tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> >>     parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> >>     parent-type content
> >>     parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> >>     parent-type obsolete
> >>     parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> >>     parent-type origin
> >>     author Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> >>     committer Stefan Xenos <sxenos@gmail.com> 1540841596 -0700
> >>
> >> Which would be easier to read, is that we're very sensitive to the order
> >> of the first few fields (tree -> parent -> author -> committer) and fsck
> >> will error out if we interjected a new field.
> >
> > By the way, in the spirit of limiting the initial scope, I wonder
> > whether the parent-type fields can be stored in the commit message
> > initially.
> >
> > Elsewhere in this thread it was mentioned that the parent-type is a
> > field to allow tools like "git fsck" to understand the meaning of
> > these parent relationships (for example, to forbid a commit
> > referencing a meta-commit).  The same could be done using special
> > commit message text, though.
> >
> > The advantage of such an approach would be that we could experiment
> > without changing the official object format at all.  If experiments
> > revealed a different set of information to store, we could update the
> > format without having to maintain the memory of the older format in
> > "git fsck"'s understanding of commit object fields.  So even though I
> > think that in the end we would want to put this information in the
> > commit object header, I'm tempted to suspect that the benefits of
> > putting it in the commit message to start outweigh the costs (in
> > particular, of having to migrate to another format later).
>
> I think it sounds better to just make it, in the header:
>
>     x-evolve-pt content
>     x-evolve-pt obsolete
>     x-evolve-pt origin
>
> Where "pt = parent-type", we could of course spell that out too, but in
> this case it's "x-evolve-pt" is the exact same number of bytes as
> "parent-type", so nobody can object that it takes more space:)
>
> We'd then carry some documentation where we say everything except "x-*-"
> is reserved, and that we'd like to know about new "*" there before it's
> used, so it can be documented.
>
> Putting it in the commit message just sounds like a hack around not
> having namespaced headers. If we'd like to keep this then tools would
> need to parse both (potentially unpacking a lot of the commit message
> object, it can be quite big in some cases...).

Stefan Xenos Nov. 20, 2018, 8:19 p.m. UTC | #29

> This explains why we have 'origin' fields in the meta commits, it might
> be worth putting a forward reference or note earlier on to explain why
> recording the origin is useful. (I didn't find gerrit needs it very
> convincing on its own but it is actually more general than gerrit's
> specific use case)

I'll add the forward reference.

TBH, gerrit is the main reason I added it - so I'm interested in why
you didn't find the gerrit use-case convincing. Can you elaborate? (If
there's some other way around the gerrit requirement, we might not
need the origin parents)

> Should this be meta/mychange:refs/for/master or have I missed something?

It should be metas/mychange/.... It's already fixed in the v2 patch.

I really wanted to use the namespace "changes", but gerrit is
squatting on that. I tried "change", but that brakes the plural naming
scheme and may get confused with gerrit's namespace, so I settled on
"metas".

> I think it would make sense to have this next to the sections on commit
> --amend and merge I was wondering what about rebase when I was reading
> those sections.

Will do.

> I'm a bit confused why it is creating a meta ref per commit rather than
> one for the current branch.

I tried to explain that later in the doc. meta refs serve two purposes
- they act as stable names for changes (or at least the commits at the
head of each change) and they point to the metacommits that are
currently in use. For both purposes, we need a ref per commit. For the
"stable name" case, this should be obvious - something that just
points to a branch couldn't provide different names for each commit on
that branch. The metacommit case is less obvious - the set of
metacommits for one change aren't connected to the metacommits for any
other change. The "parents" of a metacommit are older versions of the
same change. They don't point to the metacommits from the parent
change. That means that there is no single ref we could create for a
branch that would reach all the necessary metacommits.

> I got the impression they had put quite a lot of effort
> into having evolve automatically run and resolve divergences when
> pulling and rebasing, is there a long term plan for git to do the same?

IMO, we should add anything to the plan if doing so improves the
workflow of our users... but it sounds like you're referring to
mercurial features I've never used. Could you point me to specific
docs on the feature you want and/or make a concrete suggestion about
how it might work?

I never use pull so it slipped my mind. It would probably make sense
to have the option of doing an automatic evolve after pull (actually,
once the feature is stable, most users would probably want it to be
the default). How do you think it should be triggered? "git pull
--evolve"? or perhaps "git pull --rebase=evolve"? We should probably
also introduce a new "evolve" enum value to branch.<name>.rebase
config value. I'll use "--evolve" for now. If may make sense to add
"--evolve" to every git command that performs an automatic evolve when
done.

> What happens if the original commit are currently checked out with local
> changes?

For a start, I'll probably just display an error message if the
current working tree is dirty ("Please stash"). Long term, I'd like it
to work like rebase --autostash. It should stash your changes, do the
evolve, return to the evolved version of the original change, and
reapply the stash. I'll add this to the doc.

> Can I suggest using refs/remote/<remotenome>/metas. I

Ooh! Great idea! I'll update the doc.

> I think this could be useful (although I guess you can get the branches
> you've been working on recently from HEAD's reflog quite easily).

The changes list is different from the reflog. It's a list of all your
unsubmitted patches - regardless of their age or what branch they're
on. They may not have corresponding branches: you may have been
working on them with a detached head, or there may be multiple changes
on the same branch. You might not have visited them recently, in which
case they wouldn't be in the reflog at all. You may have reset to an
older version of the change, in which case they'd be in the reflog but
the reflog and change point to different places. If you've used gerrit
before, the "changes" list will contain pretty much the same content
as the gerrit dashboard, except that it works locally.

>> +Much like a merge conflict, divergence is a situation that requires user
>> +intervention to resolve. The evolve command will stop when it encounters
>> +divergence and prompt the user to resolve the problem. Users can solve the
>> +problem in several ways:
>> +
>> +- Discard one of the changes (by deleting its change branch).
>> +- Merge the two changes (producing a single change branch).
>
>I assume this wont create merge commits for the actual commits though,
>just merge the meta branches and create some new commits that are each
>the result of something like 'merge-recursive original-commit
>our-new-version their-new-version'

It depends on which version of merge you use. I've proposed a new
"merge --amend" argument specifically for resolving divergence. It
avoids creating merge commits as long as there's only one parent
remaining after combining the parents of the commits being merged.
Basically, if the two things being merged are divergent commits, it
would resolve the divergence without creating a new merge commit...
but if the divergent commits had different parents or were themselves
merge commits, the result may still be a merge commit.

If you run the normal version of merge, it *would* create a merge
commit and leave the changes divergent. However, one of the
transformations on the evolve command will look for this situation and
resolve it. Specifically, if it encounters two divergent changes but
exactly one child change contains a merge that would resolve that
divergence, the transformation will merge all three changes, squash
them together, and make all three changes point to the result. I'm not
sure what to call this transformation, but it serves a useful purpose:
it allows users to use either form of merge to resolve the divergence.
If they use the "--amend" version of merge, no merge commit is created
and the divergence is resolved immediately. If they use the normal
version of merge, a merge commit is created (as it is now) and the
evolve command figures out later whether that merge was intended to
resolve divergence. This avoids putting any magic in the merge command
itself, avoids changing the existing behavior of the merge command,
and it means that most users won't need to learn about "merge --amend"
and can't accidentally paint themselves into a corner by accidentally
using the wrong kind of merge. Power users can disable this
transformation and resolve their divergence explicitly using --amend.
Novices can just use the defaults and things will probably work.

It can get more complex, though. If there are two or more child
changes containing merge commits that resolve divergence, this
transformation would happen separately for each one and the resulting
merges would themselves become divergent (since they are two
conflicting solutions to the same problem). This may happen if the
user unnecessarily resolved the same divergence multiple times with
different merge commits. At that point, one of several things would
happen. If after rebasing the merge, the result automerges to exactly
the same thing (which would happen if both merges were the result of
running the automerger on incremental versions of the same two
changes), the divergence would instantly resolve itself because the
two changes are aliases. Otherwise, this new divergence would be
treated like any other and evolve would eventually try to apply the
same algorithm recursively on the new divergent changes.

I'll elaborate more on the supported transformations in the doc for
the evolve command.

Stefan Xenos Nov. 20, 2018, 8:24 p.m. UTC | #30

> If a merge has been cherry-picked we cannot update it as we don't record
> which parent was used for the pick, however it is probably not a problem
> in practice - I think it is unusual to amend merges.

I've read and reread that sentence several times and don't fully
understand it. Could you elaborate?

It sounds scary, though. With the evolve command, amending merges will
need to be supported. If you create a merge and then amend one of its
parent commits, the evolve command will need to rebase the merge and
point one or both parents to the replacement instead.

  - Stefan
On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> On 15/11/2018 00:55, sxenos@google.com wrote:
> > From: Stefan Xenos <sxenos@google.com>
> >
> > +Obsolescence across cherry-picks
> > +--------------------------------
> > +By default the evolve command will treat cherry-picks and squash merges as being
> > +completely separate from the original. Further amendments to the original commit
> > +will have no effect on the cherry-picked copy. However, this behavior may not be
> > +desirable in all circumstances.
> > +
> > +The evolve command may at some point support an option to look for cases where
> > +the source of a cherry-pick or squash merge has itself been amended, and
> > +automatically apply that same change to the cherry-picked copy. In such cases,
> > +it would traverse origin edges rather than ignoring them, and would treat a
> > +commit with origin edges as being obsolete if any of its origins were obsolete.
>
> If a merge has been cherry-picked we cannot update it as we don't record
> which parent was used for the pick, however it is probably not a problem
> in practice - I think it is unusual to amend merges.
>
> Best Wishes
>
> Phillip

Jonathan Nieder Nov. 20, 2018, 10:06 p.m. UTC | #31

Stefan Xenos wrote:
> On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:

>> I think it sounds better to just make it, in the header:
>>
>>     x-evolve-pt content
>>     x-evolve-pt obsolete
>>     x-evolve-pt origin
>>
>> Where "pt = parent-type", we could of course spell that out too, but in
>> this case it's "x-evolve-pt" is the exact same number of bytes as
>> "parent-type", so nobody can object that it takes more space:)
>>
>> We'd then carry some documentation where we say everything except "x-*-"
>> is reserved, and that we'd like to know about new "*" there before it's
>> used, so it can be documented.
[...]
>                                                      that should
> probably be the subject of a separate proposal (who owns the content
> of a namespace, what is the process for adding a new namespace or a
> new attribute within a namespace, what order should the header
> attributes appear in, what problem is namespacing there to solve, when
> do we use a namespaced attribute versus a "reserved" attribute, etc.).

Agreed.  There are reasons that I prefer not to go in this direction,
but regardless, it would be the subject of a separate thread if you want
to pursue it.

>> Putting it in the commit message just sounds like a hack around not
>> having namespaced headers. If we'd like to keep this then tools would
>> need to parse both (potentially unpacking a lot of the commit message
>> object, it can be quite big in some cases...).

On the contrary: putting it in the commit message is a way to
experiment with the workflow without changing the object format at
all.

I don't think we should underestimate the value of that ability.

I don't understand what you're referring to by parsing both.  Are you
saying that if the experiment proves successful, we wouldn't be able
to migrate completely to a new format?  That sounds worrying to me ---
I want the ability to experiment and to act on what we learn from an
experiment, including when it touches on formats.

Thanks,
Jonathan

Stefan Xenos Nov. 20, 2018, 11:45 p.m. UTC | #32

> putting it in the commit message is a way to
> experiment with the workflow without changing the object format

As long as we're talking about a temporary state of affairs for users
that have opted in, and we're explicit about the fact that future
versions of git won't understand the change graphs that are produced
during that temporary state of affairs, I'm fine with using the commit
message. We can move it to the header prior to enabling the feature by
default.

- Stefan



On Tue, Nov 20, 2018 at 2:06 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Stefan Xenos wrote:
> > On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
>
> >> I think it sounds better to just make it, in the header:
> >>
> >>     x-evolve-pt content
> >>     x-evolve-pt obsolete
> >>     x-evolve-pt origin
> >>
> >> Where "pt = parent-type", we could of course spell that out too, but in
> >> this case it's "x-evolve-pt" is the exact same number of bytes as
> >> "parent-type", so nobody can object that it takes more space:)
> >>
> >> We'd then carry some documentation where we say everything except "x-*-"
> >> is reserved, and that we'd like to know about new "*" there before it's
> >> used, so it can be documented.
> [...]
> >                                                      that should
> > probably be the subject of a separate proposal (who owns the content
> > of a namespace, what is the process for adding a new namespace or a
> > new attribute within a namespace, what order should the header
> > attributes appear in, what problem is namespacing there to solve, when
> > do we use a namespaced attribute versus a "reserved" attribute, etc.).
>
> Agreed.  There are reasons that I prefer not to go in this direction,
> but regardless, it would be the subject of a separate thread if you want
> to pursue it.
>
> >> Putting it in the commit message just sounds like a hack around not
> >> having namespaced headers. If we'd like to keep this then tools would
> >> need to parse both (potentially unpacking a lot of the commit message
> >> object, it can be quite big in some cases...).
>
> On the contrary: putting it in the commit message is a way to
> experiment with the workflow without changing the object format at
> all.
>
> I don't think we should underestimate the value of that ability.
>
> I don't understand what you're referring to by parsing both.  Are you
> saying that if the experiment proves successful, we wouldn't be able
> to migrate completely to a new format?  That sounds worrying to me ---
> I want the ability to experiment and to act on what we learn from an
> experiment, including when it touches on formats.
>
> Thanks,
> Jonathan

Jonathan Nieder Nov. 21, 2018, 1:33 a.m. UTC | #33

Stefan Xenos wrote:
> Jonathan Nieder wrote:

>> putting it in the commit message is a way to
>> experiment with the workflow without changing the object format
>
> As long as we're talking about a temporary state of affairs for users
> that have opted in, and we're explicit about the fact that future
> versions of git won't understand the change graphs that are produced
> during that temporary state of affairs, I'm fine with using the commit
> message. We can move it to the header prior to enabling the feature by
> default.

Yay!  I think that addresses both my and Ævar's concerns.  Also, if
you run into an issue that requires changing the object format
earlier, that's fine and we can handle the situation when it comes.

I don't have a strong opinion about whether this would go in the
design doc.  I suppose the doc could have an "implementation plan"
section describing temporary stopping points on the way to the final
result, but it's not necessary to include that.

Thanks for the quick and thoughtful replies.

Jonathan

Phillip Wood Nov. 21, 2018, 12:14 p.m. UTC | #34

Hi Stefan

On 20/11/2018 20:24, Stefan Xenos wrote:
>> If a merge has been cherry-picked we cannot update it as we don't record
>> which parent was used for the pick, however it is probably not a problem
>> in practice - I think it is unusual to amend merges.
> 
> I've read and reread that sentence several times and don't fully
> understand it. Could you elaborate?

Sorry if I wasn't very clear. To cherry-pick (or revert) a merge commit
one has to specify a parent of the commit being picked with -m for
cherry-pick to use as the merge base for the three way merge that
creates the new commit. If the original merge is updated then evolve
wont know which parent to use as the merge base when evolving the
cherry-picked version of the merge as the parent is not recorded in the
meta commit.

> It sounds scary, though. With the evolve command, amending merges will
> need to be supported.

Evolving a merge should be fine, I was just referring to merges that
have been cherry-picked.


Best Wishes

Phillip

(Thanks for your reply to my other message, I'm still digesting it at
the moment, once I've done that and found the references to mercurial
using commit obsolescence information in rebase and pull I'll reply.)

> If you create a merge and then amend one of its
> parent commits, the evolve command will need to rebase the merge and
> point one or both parents to the replacement instead.
> 
>   - Stefan
> On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> On 15/11/2018 00:55, sxenos@google.com wrote:
>>> From: Stefan Xenos <sxenos@google.com>
>>>
>>> +Obsolescence across cherry-picks
>>> +--------------------------------
>>> +By default the evolve command will treat cherry-picks and squash merges as being
>>> +completely separate from the original. Further amendments to the original commit
>>> +will have no effect on the cherry-picked copy. However, this behavior may not be
>>> +desirable in all circumstances.
>>> +
>>> +The evolve command may at some point support an option to look for cases where
>>> +the source of a cherry-pick or squash merge has itself been amended, and
>>> +automatically apply that same change to the cherry-picked copy. In such cases,
>>> +it would traverse origin edges rather than ignoring them, and would treat a
>>> +commit with origin edges as being obsolete if any of its origins were obsolete.
>>
>> If a merge has been cherry-picked we cannot update it as we don't record
>> which parent was used for the pick, however it is probably not a problem
>> in practice - I think it is unusual to amend merges.
>>
>> Best Wishes
>>
>> Phillip

Stefan Xenos Nov. 21, 2018, 7:10 p.m. UTC | #35

>   I don't have a strong opinion about whether this would go in the
> design doc.  I suppose the doc could have an "implementation plan"
> section describing temporary stopping points on the way to the final
> result, but it's not necessary to include that.

As long as this is something I'm just doing for fun and nobody needs
to coordinate anything with me, I was planning to just document the
endpoint and then work on whatever seems interesting at any given
moment. Of course, if I found a job/team that would let me do this as
my day job, I'd be more willing to commit to deliverables.

  - Stefan
On Tue, Nov 20, 2018 at 5:33 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Stefan Xenos wrote:
> > Jonathan Nieder wrote:
>
> >> putting it in the commit message is a way to
> >> experiment with the workflow without changing the object format
> >
> > As long as we're talking about a temporary state of affairs for users
> > that have opted in, and we're explicit about the fact that future
> > versions of git won't understand the change graphs that are produced
> > during that temporary state of affairs, I'm fine with using the commit
> > message. We can move it to the header prior to enabling the feature by
> > default.
>
> Yay!  I think that addresses both my and Ævar's concerns.  Also, if
> you run into an issue that requires changing the object format
> earlier, that's fine and we can handle the situation when it comes.
>
> I don't have a strong opinion about whether this would go in the
> design doc.  I suppose the doc could have an "implementation plan"
> section describing temporary stopping points on the way to the final
> result, but it's not necessary to include that.
>
> Thanks for the quick and thoughtful replies.
>
> Jonathan

Phillip Wood Jan. 15, 2019, 11:16 a.m. UTC | #36

Hi Stefan

Sorry for the delay in replying to your message, Christmas and the New
Year got in the way.

On 20/11/2018 20:19, Stefan Xenos wrote:
>> This explains why we have 'origin' fields in the meta commits, it might
>> be worth putting a forward reference or note earlier on to explain why
>> recording the origin is useful. (I didn't find gerrit needs it very
>> convincing on its own but it is actually more general than gerrit's
>> specific use case)
> 
> I'll add the forward reference.
> 
> TBH, gerrit is the main reason I added it - so I'm interested in why
> you didn't find the gerrit use-case convincing. Can you elaborate? (If
> there's some other way around the gerrit requirement, we might not
> need the origin parents)

As someone who does not use gerrit, it felt that the origin was just
being added to enable gerrit to remove change-id tags, it wasn't
immediately clear what the benefit to git was. I think being able to
evolve cherry-picks is a useful feature in its own right.

> 
>> Should this be meta/mychange:refs/for/master or have I missed something?
> 
> It should be metas/mychange/.... It's already fixed in the v2 patch.
> 
> I really wanted to use the namespace "changes", but gerrit is
> squatting on that. I tried "change", but that brakes the plural naming
> scheme and may get confused with gerrit's namespace, so I settled on
> "metas".

It's a shame "changes" is taken that would have been a good match.

>> I think it would make sense to have this next to the sections on commit
>> --amend and merge I was wondering what about rebase when I was reading
>> those sections.
> 
> Will do.
> 
>> I'm a bit confused why it is creating a meta ref per commit rather than
>> one for the current branch.
> 
> I tried to explain that later in the doc. meta refs serve two purposes
> - they act as stable names for changes (or at least the commits at the
> head of each change)

I think this is interesting, one thing I've thought might be useful but
never got round to experimenting with is names for commits so it is easy
to refer to them when creating fixups or rebasing.

> and they point to the metacommits that are
> currently in use. For both purposes, we need a ref per commit. For the
> "stable name" case, this should be obvious - something that just
> points to a branch couldn't provide different names for each commit on
> that branch. The metacommit case is less obvious - the set of
> metacommits for one change aren't connected to the metacommits for any
> other change. The "parents" of a metacommit are older versions of the
> same change. They don't point to the metacommits from the parent
> change. That means that there is no single ref we could create for a
> branch that would reach all the necessary metacommits.

Thanks for the explanation. So to push the evolution of a branch we have
to push a bunch of refs rather than just one and branches with many
commits will create many refs. I do wonder how well this scales in the
absence of something like reftable[1]. If the metacommits are only kept
around for things that are being actively worked on then hopefully there
shouldn't be any problems for individual developers, I'm not sure about
servers that are used by lots of different groups of developers all with
their own metacommits though. Is there a name spacing problem there if
two developers pushing to the same remote have commits with the same
names? (When I was thinking about the named commits above I had
envisaged them being prefixed with a branch name and for ease of use an
unqualified name would resolve to that name on the current branch.)

[1]
https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

>> I got the impression they had put quite a lot of effort
>> into having evolve automatically run and resolve divergences when
>> pulling and rebasing, is there a long term plan for git to do the same?
> 
> IMO, we should add anything to the plan if doing so improves the
> workflow of our users... but it sounds like you're referring to
> mercurial features I've never used. Could you point me to specific
> docs on the feature you want and/or make a concrete suggestion about
> how it might work?

There are some comments on the wiki page[2], there is a link at the
bottom to some more info for rebase but I only glanced at that.
Unfortunately the talk I watched which is linked to from another wiki
page[3] seems to have been taken down.

I think rebase automatically dropping commits that have been dropped
upstream when pulling and rebasing could be useful - I think your latest
draft[4] included support for that with the 'abandoned' parent-type.

[2]
https://www.mercurial-scm.org/wiki/ChangesetEvolutionDevel#Using_Obsolescence_Marker_during_Rebase

[3] https://www.mercurial-scm.org/wiki/ChangesetEvolution

[4] https://public-inbox.org/git/20181218164612.233602-1-sxenos@google.com/

> I never use pull so it slipped my mind. It would probably make sense
> to have the option of doing an automatic evolve after pull (actually,
> once the feature is stable, most users would probably want it to be
> the default). 

I think making it the default in the long term is a good goal. In the
mean time having pull tell users if they need to run evolve would be useful.

> How do you think it should be triggered? "git pull
> --evolve"? or perhaps "git pull --rebase=evolve"? We should probably
> also introduce a new "evolve" enum value to branch.<name>.rebase
> config value. I'll use "--evolve" for now. If may make sense to add
> "--evolve" to every git command that performs an automatic evolve when
> done.

I think using "--evolve" makes sense, the user will want to be able to
use the obsolescence graph with pull when merging as well as rebasing.

>> What happens if the original commit are currently checked out with local
>> changes?
> 
> For a start, I'll probably just display an error message if the
> current working tree is dirty ("Please stash"). Long term, I'd like it
> to work like rebase --autostash. It should stash your changes, do the
> evolve, return to the evolved version of the original change, and
> reapply the stash. I'll add this to the doc.
> 
>> Can I suggest using refs/remote/<remotenome>/metas. I
> 
> Ooh! Great idea! I'll update the doc.
> 
>> I think this could be useful (although I guess you can get the branches
>> you've been working on recently from HEAD's reflog quite easily).
> 
> The changes list is different from the reflog. It's a list of all your
> unsubmitted patches - regardless of their age or what branch they're
> on. They may not have corresponding branches: you may have been
> working on them with a detached head, or there may be multiple changes
> on the same branch. You might not have visited them recently, in which
> case they wouldn't be in the reflog at all. You may have reset to an
> older version of the change, in which case they'd be in the reflog but
> the reflog and change point to different places. If you've used gerrit
> before, the "changes" list will contain pretty much the same content
> as the gerrit dashboard, except that it works locally.

One thing I like to do sometimes to when I've rebased a lot and
something has gone wrong is
  git rev-list -g $branch | git log --oneline --graph --stdin ^$branch@{u}
which shows all the rebased versions coming off the upstream branch so I
can see when I accidentally dropped a commit etc. If I've been
developing on my laptop and desktop then I only get the history from
that machine, it would be nice to be able to get the complete history,
but I think that is subtly different from what evolve tracks as it wont
record resets.

>>> +Much like a merge conflict, divergence is a situation that requires user
>>> +intervention to resolve. The evolve command will stop when it encounters
>>> +divergence and prompt the user to resolve the problem. Users can solve the
>>> +problem in several ways:
>>> +
>>> +- Discard one of the changes (by deleting its change branch).
>>> +- Merge the two changes (producing a single change branch).
>>
>> I assume this wont create merge commits for the actual commits though,
>> just merge the meta branches and create some new commits that are each
>> the result of something like 'merge-recursive original-commit
>> our-new-version their-new-version'
> 
> It depends on which version of merge you use. I've proposed a new
> "merge --amend" argument specifically for resolving divergence. It
> avoids creating merge commits as long as there's only one parent
> remaining after combining the parents of the commits being merged.
> Basically, if the two things being merged are divergent commits, it
> would resolve the divergence without creating a new merge commit...
> but if the divergent commits had different parents or were themselves
> merge commits, the result may still be a merge commit.
> 
> If you run the normal version of merge, it *would* create a merge
> commit and leave the changes divergent. However, one of the
> transformations on the evolve command will look for this situation and
> resolve it. Specifically, if it encounters two divergent changes but
> exactly one child change contains a merge that would resolve that
> divergence, the transformation will merge all three changes, squash
> them together, and make all three changes point to the result. I'm not
> sure what to call this transformation, but it serves a useful purpose:
> it allows users to use either form of merge to resolve the divergence.
> If they use the "--amend" version of merge, no merge commit is created
> and the divergence is resolved immediately. If they use the normal
> version of merge, a merge commit is created (as it is now) and the
> evolve command figures out later whether that merge was intended to
> resolve divergence. This avoids putting any magic in the merge command
> itself, avoids changing the existing behavior of the merge command,
> and it means that most users won't need to learn about "merge --amend"
> and can't accidentally paint themselves into a corner by accidentally
> using the wrong kind of merge. Power users can disable this
> transformation and resolve their divergence explicitly using --amend.
> Novices can just use the defaults and things will probably work.
> 
> It can get more complex, though. If there are two or more child
> changes containing merge commits that resolve divergence, this
> transformation would happen separately for each one and the resulting
> merges would themselves become divergent (since they are two
> conflicting solutions to the same problem). This may happen if the
> user unnecessarily resolved the same divergence multiple times with
> different merge commits. At that point, one of several things would
> happen. If after rebasing the merge, the result automerges to exactly
> the same thing (which would happen if both merges were the result of
> running the automerger on incremental versions of the same two
> changes), the divergence would instantly resolve itself because the
> two changes are aliases. Otherwise, this new divergence would be
> treated like any other and evolve would eventually try to apply the
> same algorithm recursively on the new divergent changes.
> 
> I'll elaborate more on the supported transformations in the doc for
> the evolve command.

Thanks for the explanation, as you say it can get quite complicated so I
think good user documentation would be important. I'm conflicted about
having merge create non-merge commits as I think users are used to it
creating a commit with more than one parent. One of the things I found
difficult to understand when I first started using git was the
difference between a merge commit and performing a three-way merge. The
documentation tends just to use 'merge' when the two concepts are
somewhat orthogonal as rebase and cherry-pick use the same three-way
merge algorithm as 'git merge' but create commits with a single parent.

Best Wishes

Phillip

technical doc: add a design doc for the evolve command

Commit Message

Comments

Patch