[v2,2/2] git-p4: speed up search for branch parent

Message ID 41b3a23f682cddb3720de14723854c5956f25704.1620215786.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 6b79818bfbd39a5d41d15003375a9f382f70ad0e
Headers show
Series git-p4: speed up search for branch parent | expand

Commit Message

Joachim Kuebart May 5, 2021, 11:56 a.m. UTC
From: Joachim Kuebart <joachim.kuebart@gmail.com>

For every new branch that git-p4 imports, it needs to find the commit
where it branched off its parent branch. While p4 doesn't record this
information explicitly, the first changelist on a branch is usually an
identical copy of the parent branch.

The method searchParent() tries to find a commit in the history of the
given "parent" branch whose tree exactly matches the initial changelist
of the new branch, "target". The code iterates through the parent
commits and compares each of them to this initial changelist using

Since we already know the tree object name we are looking for, spawning
diff-tree for each commit is wasteful.

Use the "--format" option of "rev-list" to find out the tree object name
of each commit in the history, and find the tree whose name is exactly
the same as the tree of the target commit to optimize this.

This results in a considerable speed-up, at least on Windows. On one
Windows machine with a fairly large repository of about 16000 commits in
the parent branch, the current code takes over 7 minutes, while the new
code only takes just over 10 seconds for the same changelist:


    $ time git p4 sync
    Importing from/into multiple branches
    Depot paths: //depot
    Importing revision 31274 (100.0%)
    Updated branches: b1

    real    7m41.458s
    user    0m0.000s
    sys     0m0.077s


    $ time git p4 sync
    Importing from/into multiple branches
    Depot paths: //depot
    Importing revision 31274 (100.0%)
    Updated branches: b1

    real    0m10.235s
    user    0m0.000s
    sys     0m0.062s

Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Luke Diamand <luke@diamand.org>
 git-p4.py | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index 09c9e93ac401..d34a1946b754 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -3600,19 +3600,18 @@  def importNewBranch(self, branch, maxChange):
         return True
     def searchParent(self, parent, branch, target):
-        parentFound = False
-        for blob in read_pipe_lines(["git", "rev-list", "--reverse",
+        targetTree = read_pipe(["git", "rev-parse",
+                                "{}^{{tree}}".format(target)]).strip()
+        for line in read_pipe_lines(["git", "rev-list", "--format=%H %T",
                                      "--no-merges", parent]):
-            blob = blob.strip()
-            if len(read_pipe(["git", "diff-tree", blob, target])) == 0:
-                parentFound = True
+            if line.startswith("commit "):
+                continue
+            commit, tree = line.strip().split(" ")
+            if tree == targetTree:
                 if self.verbose:
-                    print("Found parent of %s in commit %s" % (branch, blob))
-                break
-        if parentFound:
-            return blob
-        else:
-            return None
+                    print("Found parent of %s in commit %s" % (branch, commit))
+                return commit
+        return None
     def importChanges(self, changes, origin_revision=0):
         cnt = 1