diff mbox series

[8/8] diffcore-rename: determine which relevant_sources are no longer relevant

Message ID 495c10937b7f2b23b7d2a52c254e02ae9ce810a1.1615674128.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 9bd342137eb4dfe3852f657f8afcc637e68b1439
Headers show
Series Optimization batch 10: avoid detecting even more irrelevant renames | expand

Commit Message

Elijah Newren March 13, 2021, 10:22 p.m. UTC
From: Elijah Newren <newren@gmail.com>

As noted a few commits ago ("diffcore-rename: only compute
dir_rename_count for relevant directories"), when a source file rename
is used as part of directory rename detection, we need to increment
counts for each ancestor directory in dirs_removed with value
RELEVANT_FOR_SELF.  However, a few commits ago ("diffcore-rename: check
if we have enough renames for directories early on"), we may have
downgraded all relevant ancestor directories from RELEVANT_FOR_SELF to
RELEVANT_FOR_ANCESTOR.

For a given file, if no ancestor directory is found in dirs_removed with
a value of RELEVANT_FOR_SELF, then we can downgrade
relevant_source[PATH] from RELEVANT_LOCATION to RELEVANT_NO_MORE.  This
means we can skip detecting a rename for that particular path (and any
other paths in the same directory).

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.680 s ±  0.096 s     5.665 s ±  0.129 s
    mega-renames:     13.812 s ±  0.162 s    11.435 s ±  0.158 s
    just-one-mega:   506.0  ms ±  3.9  ms   494.2  ms ±  6.1  ms

While this improvement looks rather modest for these testcases (because
all the previous optimizations were sufficient to nearly remove all time
spent in rename detection already),  consider this alternative testcase
tweaked from the ones in commit 557ac0350d as follows

    <Same initial setup as commit 557ac0350d, then...>
    $ git switch -c add-empty-file v5.5
    $ >drivers/gpu/drm/i915/new-empty-file
    $ git add drivers/gpu/drm/i915/new-empty-file
    $ git commit -m "new file"
    $ git switch 5.4-rename
    $ git cherry-pick --strategy=ort add-empty-file

For this testcase, we see the following improvement:

                            Before                  After
    pick-empty:        1.936 s ±  0.024 s     688.1 ms ±  4.2 ms

So roughly a factor of 3 speedup.  At $DAYJOB, there was a particular
repository and cherry-pick that inspired this optimization; for that
case I saw a speedup factor of 7 with this optimization.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 9844cd48788e..7cc24592617e 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1129,7 +1129,7 @@  static void handle_early_known_dir_renames(struct dir_rename_info *info,
 	 * a majority.
 	 */
 
-	int i;
+	int i, new_num_src;
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
 
@@ -1193,6 +1193,55 @@  static void handle_early_known_dir_renames(struct dir_rename_info *info,
 				      RELEVANT_FOR_ANCESTOR);
 		}
 	}
+
+	for (i = 0, new_num_src = 0; i < rename_src_nr; i++) {
+		struct diff_filespec *one = rename_src[i].p->one;
+		int val;
+
+		val = strintmap_get(relevant_sources, one->path);
+
+		/*
+		 * sources that were not found in relevant_sources should
+		 * have already been removed by a prior call to
+		 * remove_unneeded_paths_from_src()
+		 */
+		assert(val != -1);
+
+		if (val == RELEVANT_LOCATION) {
+			int removable = 1;
+			char *dir = get_dirname(one->path);
+			while (1) {
+				char *freeme = dir;
+				int res = strintmap_get(dirs_removed, dir);
+
+				/* Quit if not found or irrelevant */
+				if (res == NOT_RELEVANT)
+					break;
+				/* If RELEVANT_FOR_SELF, can't remove */
+				if (res == RELEVANT_FOR_SELF) {
+					removable = 0;
+					break;
+				}
+				/* Else continue searching upwards */
+				assert(res == RELEVANT_FOR_ANCESTOR);
+				dir = get_dirname(dir);
+				free(freeme);
+			}
+			free(dir);
+			if (removable) {
+				strintmap_set(relevant_sources, one->path,
+					      RELEVANT_NO_MORE);
+				continue;
+			}
+		}
+
+		if (new_num_src < i)
+			memcpy(&rename_src[new_num_src], &rename_src[i],
+			       sizeof(struct diff_rename_src));
+		new_num_src++;
+	}
+
+	rename_src_nr = new_num_src;
 }
 
 void diffcore_rename_extended(struct diff_options *options,