diff mbox series

[1/1] merge-ort: begin performance work; instrument with trace2_region_* calls

Message ID 20210108205111.2197944-2-newren@gmail.com (mailing list archive)
State Superseded
Headers show
Series And so it begins...merge/rename performance work | expand

Commit Message

Elijah Newren Jan. 8, 2021, 8:51 p.m. UTC
Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                  merge-recursive         merge-ort
  no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
  mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
  just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s

A single re-run of each with some breakdowns:

                                  ---  no-renames  ---
                            merge-recursive   merge-ort
  overall runtime:              19.302 s        13.017 s
  inexact rename detection:      7.603 s         7.695 s
  everything else:              11.699 s         5.322 s

                                  --- mega-renames ---
                            merge-recursive   merge-ort
  overall runtime:            5950.195 s      5132.851 s
  inexact rename detection:   5746.309 s      5119.215 s
  everything else:             203.886 s        13.636 s

                                  --- just-one-mega ---
                            merge-recursive   merge-ort
  overall runtime:             151.001 s       146.478 s
  inexact rename detection:    143.448 s       145.901 s
  everything else:               7.553 s         0.577 s

=== Timing observations ===

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 14s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort[A] (one of them large) and into
fast-rebase[B] already.

    [A] Avoid quadratic behavior with O(N) insertions or removals
	of entries in the index & avoid unconditional dropping and
        re-reading of the index
    [B] Avoid updating the on-disk index or the working directory
        for intermediate patches -- only update at the end

2b) rename-detection is somehow ~10% cheaper for merge-ort than
merge-recursive.  This was and is a big surprise to me.  Both of them
call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
don't have an explanation, but it is very consistent even after
re-running many times.  Interestingly, the rename detection for the
first patch is more expensive (just barely) for merge-ort than
merge-recursive, and that is also consistent.  I won't investigate this
further, as I'm just going to focus on 1a & 2a.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

Comments

Taylor Blau Jan. 8, 2021, 8:59 p.m. UTC | #1
On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> 10 runs for the other two cases):

Ah, I love hyperfine. In case you don't already have this in your
arsenal, the following `--prepare` step is useful for measuring
cold-cache performance:

    --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

> === Goals ===
>
> This patch is obviously just the beginning.  Here are some of my goals
> that this measurement will help us achieve:
>
> * Drive the cost of rename detection down considerably for merges
> * After the above has been achieved, see if there are other slowness
>   factors (which would have previously been overshadowed by rename
>   detection costs) which we can then focus on and also optimize.
> * Ensure our rebase testcase that requires little rename detection
>   is noticeably faster with merge-ort than with apply-based rebase.

These are great, and I am looking forward to your work.

> Signed-off-by: Elijah Newren <newren@gmail.com>

Thanks, this patch looks good to me.

Thanks,
Taylor
Elijah Newren Jan. 8, 2021, 9:50 p.m. UTC | #2
On Fri, Jan 8, 2021 at 12:59 PM Taylor Blau <ttaylorr@github.com> wrote:
>
> On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> > Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> > 10 runs for the other two cases):
>
> Ah, I love hyperfine. In case you don't already have this in your
> arsenal, the following `--prepare` step is useful for measuring
> cold-cache performance:
>
>     --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

/proc/sys/vm/drop_caches is definitely useful for cold-cache
measurements and I've used it in other projects for that purpose.  I
think cold-cache testing makes sense for various I/O intensive areas
such as object lookup, but I ignored it here as I felt the merge code
is really about algorithmic performance.  So, I instead went the other
direction and ensured warm-cache testing by using a warmup run, in
order to ensure that I wasn't putting one of the tests at an unfair
disadvantage.  (Side note: My script that runs the tests actually does
more than the warmup run to ensure a fair playing field.  For example,
the script expires reflogs and runs a git prune before each hyperfine
invocation, to make sure that each hyperfine run starts with a fully
repacked repository with no loose objects.  Without the expire &
prune, enough perf testing of rebases in a short time period will
result in a mysterious and gradual slowdown of all the test runs even
without code changes...).

> > === Goals ===
> >
> > This patch is obviously just the beginning.  Here are some of my goals
> > that this measurement will help us achieve:
> >
> > * Drive the cost of rename detection down considerably for merges
> > * After the above has been achieved, see if there are other slowness
> >   factors (which would have previously been overshadowed by rename
> >   detection costs) which we can then focus on and also optimize.
> > * Ensure our rebase testcase that requires little rename detection
> >   is noticeably faster with merge-ort than with apply-based rebase.
>
> These are great, and I am looking forward to your work.
>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
>
> Thanks, this patch looks good to me.

As always, thanks for taking a look.
Taylor Blau Jan. 8, 2021, 9:55 p.m. UTC | #3
On Fri, Jan 08, 2021 at 01:50:34PM -0800, Elijah Newren wrote:
> On Fri, Jan 8, 2021 at 12:59 PM Taylor Blau <ttaylorr@github.com> wrote:
> >
> > On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> > > Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> > > 10 runs for the other two cases):
> >
> > Ah, I love hyperfine. In case you don't already have this in your
> > arsenal, the following `--prepare` step is useful for measuring
> > cold-cache performance:
> >
> >     --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'
>
> /proc/sys/vm/drop_caches is definitely useful for cold-cache
> measurements and I've used it in other projects for that purpose.  I
> think cold-cache testing makes sense for various I/O intensive areas
> such as object lookup, but I ignored it here as I felt the merge code
> is really about algorithmic performance.

Yes, I agree that the interesting thing here is algorithmic performance
moreso than I/O.

> So, I instead went the other direction and ensured warm-cache testing
> by using a warmup run, in order to ensure that I wasn't putting one of
> the tests at an unfair disadvantage.

I often use it for both. Combining that `--prepare` step with at least
one `--warmup` invocation is useful to make sure that your I/O cache is
warmed only with the things it might want to read during your timing
tests. (Probably one `--warmup` without dumping the cache is fine, since
you will likely end up evicting things out of your cache that you don't
care about, but I digress..)

Thanks,
Taylor
Elijah Newren Jan. 9, 2021, 12:52 a.m. UTC | #4
On Fri, Jan 8, 2021 at 1:55 PM Taylor Blau <ttaylorr@github.com> wrote:
>
> On Fri, Jan 08, 2021 at 01:50:34PM -0800, Elijah Newren wrote:
> > On Fri, Jan 8, 2021 at 12:59 PM Taylor Blau <ttaylorr@github.com> wrote:
> > >
> > > On Fri, Jan 08, 2021 at 12:51:11PM -0800, Elijah Newren wrote:
> > > > Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
> > > > 10 runs for the other two cases):
> > >
> > > Ah, I love hyperfine. In case you don't already have this in your
> > > arsenal, the following `--prepare` step is useful for measuring
> > > cold-cache performance:
> > >
> > >     --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'
> >
> > /proc/sys/vm/drop_caches is definitely useful for cold-cache
> > measurements and I've used it in other projects for that purpose.  I
> > think cold-cache testing makes sense for various I/O intensive areas
> > such as object lookup, but I ignored it here as I felt the merge code
> > is really about algorithmic performance.
>
> Yes, I agree that the interesting thing here is algorithmic performance
> moreso than I/O.
>
> > So, I instead went the other direction and ensured warm-cache testing
> > by using a warmup run, in order to ensure that I wasn't putting one of
> > the tests at an unfair disadvantage.
>
> I often use it for both. Combining that `--prepare` step with at least
> one `--warmup` invocation is useful to make sure that your I/O cache is
> warmed only with the things it might want to read during your timing
> tests. (Probably one `--warmup` without dumping the cache is fine, since
> you will likely end up evicting things out of your cache that you don't
> care about, but I digress..)

Ah, that hadn't occurred to me, but it makes sense.  Thanks for the
tip; I may give it a try at some point.  I worry slightly that it
might increase the run-to-run noise instead of decreasing it since I'm
committing sins by not running the performance tests on a quiet server
but on my laptop with a full GUI running -- a few year old,
nearly-bottom-of-the-line Dell refurbished grade B laptop with spinny
disks.  Dropping disk caches would lower the risk of needing to spend
time evicting other things from the warm cache, but would increase the
risk that some background GUI thing or system daemon needs to read
from the hard disk when it wouldn't have needed to otherwise, and if
the timing of that disk read is unfortunately placed, then it could
slow down I/O I care about.  I guess there's only one way to find out
if it'd help or hurt though...
diff mbox series

Patch

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 90db9ebd6d..8fe6c9384b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -465,6 +465,7 @@  void diffcore_rename(struct diff_options *options)
 	int num_destinations, dst_cnt;
 	struct progress *progress = NULL;
 
+	trace2_region_enter("diff", "setup", options->repo);
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
 
@@ -510,14 +511,17 @@  void diffcore_rename(struct diff_options *options)
 			register_rename_src(p);
 		}
 	}
+	trace2_region_leave("diff", "setup", options->repo);
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
 		goto cleanup; /* nothing to do */
 
+	trace2_region_enter("diff", "exact renames", options->repo);
 	/*
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
 	rename_count = find_exact_renames(options);
+	trace2_region_leave("diff", "exact renames", options->repo);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -545,6 +549,7 @@  void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	trace2_region_enter("diff", "inexact renames", options->repo);
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
@@ -600,11 +605,13 @@  void diffcore_rename(struct diff_options *options)
 	if (detect_rename == DIFF_DETECT_COPY)
 		rename_count += find_renames(mx, dst_cnt, minimum_score, 1);
 	free(mx);
+	trace2_region_leave("diff", "inexact renames", options->repo);
 
  cleanup:
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+	trace2_region_enter("diff", "write back to queue", options->repo);
 	DIFF_QUEUE_CLEAR(&outq);
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -680,5 +687,6 @@  void diffcore_rename(struct diff_options *options)
 		strintmap_clear(break_idx);
 		FREE_AND_NULL(break_idx);
 	}
+	trace2_region_leave("diff", "write back to queue", options->repo);
 	return;
 }
diff --git a/merge-ort.c b/merge-ort.c
index 8f4ca4fe83..0f3ad78f3c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -752,7 +752,9 @@  static int collect_merge_info(struct merge_options *opt,
 	init_tree_desc(t + 1, side1->buffer, side1->size);
 	init_tree_desc(t + 2, side2->buffer, side2->size);
 
+	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
+	trace2_region_leave("merge", "traverse_trees", opt->repo);
 
 	return ret;
 }
@@ -2095,9 +2097,12 @@  static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_setup_done(&diff_opts);
+
+	trace2_region_enter("diff", "diffcore_rename", opt->repo);
 	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
 		      &diff_opts);
 	diffcore_std(&diff_opts);
+	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 
 	if (diff_opts.needed_rename_limit > renames->needed_limit)
 		renames->needed_limit = diff_opts.needed_rename_limit;
@@ -2196,9 +2201,12 @@  static int detect_and_process_renames(struct merge_options *opt,
 
 	memset(&combined, 0, sizeof(combined));
 
+	trace2_region_enter("merge", "regular renames", opt->repo);
 	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
 	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+	trace2_region_leave("merge", "regular renames", opt->repo);
 
+	trace2_region_enter("merge", "directory renames", opt->repo);
 	need_dir_renames =
 	  !opt->priv->call_depth &&
 	  (opt->detect_directory_renames == MERGE_DIRECTORY_RENAMES_TRUE ||
@@ -2220,8 +2228,11 @@  static int detect_and_process_renames(struct merge_options *opt,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
 	QSORT(combined.queue, combined.nr, compare_pairs);
+	trace2_region_leave("merge", "directory renames", opt->repo);
 
+	trace2_region_enter("merge", "process renames", opt->repo);
 	clean &= process_renames(opt, &combined);
+	trace2_region_leave("merge", "process renames", opt->repo);
 
 	/* Free memory for renames->pairs[] and combined */
 	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
@@ -2903,20 +2914,30 @@  static void process_entries(struct merge_options *opt,
 						   STRING_LIST_INIT_NODUP,
 						   NULL, 0 };
 
+	trace2_region_enter("merge", "process_entries setup", opt->repo);
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
 	/* Hack to pre-allocate plist to the desired size */
+	trace2_region_enter("merge", "plist grow", opt->repo);
 	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+	trace2_region_leave("merge", "plist grow", opt->repo);
 
 	/* Put every entry from paths into plist, then sort */
+	trace2_region_enter("merge", "plist copy", opt->repo);
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
 		string_list_append(&plist, e->key)->util = e->value;
 	}
+	trace2_region_leave("merge", "plist copy", opt->repo);
+
+	trace2_region_enter("merge", "plist special sort", opt->repo);
 	plist.cmp = string_list_df_name_compare;
 	string_list_sort(&plist);
+	trace2_region_leave("merge", "plist special sort", opt->repo);
+
+	trace2_region_leave("merge", "process_entries setup", opt->repo);
 
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
@@ -2927,6 +2948,7 @@  static void process_entries(struct merge_options *opt,
 	 * (because it allows us to know whether the directory is still in
 	 * the way when it is time to process the file at the same path).
 	 */
+	trace2_region_enter("merge", "processing", opt->repo);
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
 		/*
@@ -2945,7 +2967,9 @@  static void process_entries(struct merge_options *opt,
 			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
+	trace2_region_leave("merge", "processing", opt->repo);
 
+	trace2_region_enter("merge", "process_entries cleanup", opt->repo);
 	if (dir_metadata.offsets.nr != 1 ||
 	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
 		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
@@ -2960,6 +2984,7 @@  static void process_entries(struct merge_options *opt,
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
 	string_list_clear(&dir_metadata.offsets, 0);
+	trace2_region_leave("merge", "process_entries cleanup", opt->repo);
 }
 
 /*** Function Grouping: functions related to merge_switch_to_result() ***/
@@ -3118,12 +3143,15 @@  void merge_switch_to_result(struct merge_options *opt,
 	if (result->clean >= 0 && update_worktree_and_index) {
 		struct merge_options_internal *opti = result->priv;
 
+		trace2_region_enter("merge", "checkout", opt->repo);
 		if (checkout(opt, head, result->tree)) {
 			/* failure to function */
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "checkout", opt->repo);
 
+		trace2_region_enter("merge", "record_conflicted", opt->repo);
 		if (record_conflicted_index_entries(opt, opt->repo->index,
 						    &opti->paths,
 						    &opti->conflicted)) {
@@ -3131,6 +3159,7 @@  void merge_switch_to_result(struct merge_options *opt,
 			result->clean = -1;
 			return;
 		}
+		trace2_region_leave("merge", "record_conflicted", opt->repo);
 	}
 
 	if (display_update_msgs) {
@@ -3140,6 +3169,8 @@  void merge_switch_to_result(struct merge_options *opt,
 		struct string_list olist = STRING_LIST_INIT_NODUP;
 		int i;
 
+		trace2_region_enter("merge", "display messages", opt->repo);
+
 		/* Hack to pre-allocate olist to the desired size */
 		ALLOC_GROW(olist.items, strmap_get_size(&opti->output),
 			   olist.alloc);
@@ -3161,6 +3192,8 @@  void merge_switch_to_result(struct merge_options *opt,
 		/* Also include needed rename limit adjustment now */
 		diff_warn_rename_limit("merge.renamelimit",
 				       opti->renames.needed_limit, 0);
+
+		trace2_region_leave("merge", "display messages", opt->repo);
 	}
 
 	merge_finalize(opt, result);
@@ -3202,6 +3235,7 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 	int i;
 
 	/* Sanity checks on opt */
+	trace2_region_enter("merge", "sanity checks", opt->repo);
 	assert(opt->repo);
 
 	assert(opt->branch1 && opt->branch2);
@@ -3228,11 +3262,13 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 	assert(opt->obuf.len == 0);
 
 	assert(opt->priv == NULL);
+	trace2_region_leave("merge", "sanity checks", opt->repo);
 
 	/* Default to histogram diff.  Actually, just hardcode it...for now. */
 	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
 
 	/* Initialization of opt->priv, our internal merge data */
+	trace2_region_enter("merge", "allocate/init", opt->repo);
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
 	/* Initialization of various renames fields */
@@ -3265,6 +3301,8 @@  static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 * subset of the overall paths that have special output.
 	 */
 	strmap_init(&opt->priv->output);
+
+	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
 /*** Function Grouping: merge_incore_*() and their internal variants ***/
@@ -3280,6 +3318,7 @@  static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
+	trace2_region_enter("merge", "collect_merge_info", opt->repo);
 	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
 		/*
 		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
@@ -3292,10 +3331,16 @@  static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 		result->clean = -1;
 		return;
 	}
+	trace2_region_leave("merge", "collect_merge_info", opt->repo);
 
+	trace2_region_enter("merge", "renames", opt->repo);
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
+	trace2_region_leave("merge", "renames", opt->repo);
+
+	trace2_region_enter("merge", "process_entries", opt->repo);
 	process_entries(opt, &working_tree_oid);
+	trace2_region_leave("merge", "process_entries", opt->repo);
 
 	/* Set return values */
 	result->tree = parse_tree_indirect(&working_tree_oid);
@@ -3396,9 +3441,15 @@  void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_nonrecursive", opt->repo);
+
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	assert(opt->ancestor != NULL);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
+	trace2_region_leave("merge", "incore_nonrecursive", opt->repo);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
@@ -3407,9 +3458,15 @@  void merge_incore_recursive(struct merge_options *opt,
 			    struct commit *side2,
 			    struct merge_result *result)
 {
+	trace2_region_enter("merge", "incore_recursive", opt->repo);
+
 	/* We set the ancestor label based on the merge_bases */
 	assert(opt->ancestor == NULL);
 
+	trace2_region_enter("merge", "merge_start", opt->repo);
 	merge_start(opt, result);
+	trace2_region_leave("merge", "merge_start", opt->repo);
+
 	merge_ort_internal(opt, merge_bases, side1, side2, result);
+	trace2_region_leave("merge", "incore_recursive", opt->repo);
 }