Message ID | pull.493.v2.git.1602021913.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | subtree: Fix handling of complex history | expand |
Hi Tom, On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote: > Fixes several issues that could occur when running subtree split on large > repos with more complex history. > > 1. A merge commit could bypass the known start point of the subtree, which > would cause the entire history to be processed recursively, leading to a > stack overflow / segfault after reading a few hundred commits. Older > commits are now explicitly recorded as irrelevant so that the recursive > process can terminate on any mainline commit rather than only on subtree > joins and initial commits. > > > 2. It is possible for a repo to contain subtrees that lack the metadata > that is usually present in add/join commit messages (git-svn at least > can produce such a structure). The new use/ignore/map commands allow the > user to provide that information for any problematic commits. > > > 3. A mainline commit that does not contain the subtree folder could be > erroneously identified as a subtree commit, which would add the entire > mainline history to the subtree. Commits will now only be used as is if > all their parents are already identified as subtree commits. While the > new code can still be tripped up by unusual folder structures, the > completely unambiguous solution turned out to involve a significant > performance penalty, and the new ignore / use commands provide a > workaround for that scenario. I gave this as thorough a review as I can (which is not saying too much, as I am not exactly familiar with `git subtree`'s inner workings). Hopefully some of my comments and suggestions are helpful. At some stage, especially given the problems I pointed out with the implementation detail that is a flat directory with a potentially insane number of files in it, I think it would make a lot of sense to go ahead and turn this into a built-in Git command, implemented in C, and with a more robust file system layout of its cache. Ciao, Dscho