[1/1] merge-ort: begin performance work; instrument with trace2_region_* calls

Add some timing instrumentation for both merge-ort and diffcore-rename;
I used these to measure and optimize performance in both, and several
future patch series will build on these to reduce the timings of some
select testcases.

=== Setup ===

The primary testcase I used involved rebasing a random topic in the
linux kernel (consisting of 35 patches) against an older version.  I
added two variants, one where I rename a toplevel directory, and another
where I only rebase one patch instead of the whole topic.  The setup is
as follows:

  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e
  $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34
  $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e
  $ git switch -c 5.4-renames v5.4
  $ git mv drivers pilots  # Introduce over 26,000 renames
  $ git commit -m "Rename drivers/ to pilots/"

=== Testcases ===

Now with REBASE standing for either "git rebase [--merge]" (using
merge-recursive) or "test-tool fast-rebase" (using merge-ort), the
testcases are:

Testcase #1: no-renames

  $ git checkout v5.4^0
  $ REBASE --onto HEAD base hwmon-updates

  Note: technically the name is misleading; there are some renames, but
  very few.  Rename detection only takes about half the overall time.

Testcase #2: mega-renames

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-updates

Testcase #3: just-one-mega

  $ git checkout 5.4-renames^0
  $ REBASE --onto HEAD base hwmon-just-one

=== Timing results ===

Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):

                  merge-recursive         merge-ort
  no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
  mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
  just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s

A single re-run of each with some breakdowns:

                                  ---  no-renames  ---
                            merge-recursive   merge-ort
  overall runtime:              19.302 s        13.017 s
  inexact rename detection:      7.603 s         7.695 s
  everything else:              11.699 s         5.322 s

                                  --- mega-renames ---
                            merge-recursive   merge-ort
  overall runtime:            5950.195 s      5132.851 s
  inexact rename detection:   5746.309 s      5119.215 s
  everything else:             203.886 s        13.636 s

                                  --- just-one-mega ---
                            merge-recursive   merge-ort
  overall runtime:             151.001 s       146.478 s
  inexact rename detection:    143.448 s       145.901 s
  everything else:               7.553 s         0.577 s

=== Timing observations ===

1) no-renames

1a) merge-ort is faster than merge-recursive, which is nice.  However,
this still should not be considered good enough.  Although the "merge"
backend to rebase (merge-recursive) is sometimes faster than the "apply"
backend, this is one of those cases where it is not.  In fact, even
merge-ort is slower.  The "apply" backend can complete this testcase in
    6.940 s ± 0.485 s
which is about 2x faster than merge-ort and 3x faster than
merge-recursive.  One goal of the merge-ort performance work will be to
make it faster than git-am on this (and similar) testcases.

2) mega-renames

2a) Obviously rename detection is a huge cost; it's where most the time
is spent.  We need to cut that down.  If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
drop to about 204s, and the merge-ort time would drop to about 14s.  I
think this particular stat shows I've subtly baked a couple performance
improvements into merge-ort[A] (one of them large) and into
fast-rebase[B] already.

    [A] Avoid quadratic behavior with O(N) insertions or removals
	of entries in the index & avoid unconditional dropping and
        re-reading of the index
    [B] Avoid updating the on-disk index or the working directory
        for intermediate patches -- only update at the end

2b) rename-detection is somehow ~10% cheaper for merge-ort than
merge-recursive.  This was and is a big surprise to me.  Both of them
call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
don't have an explanation, but it is very consistent even after
re-running many times.  Interestingly, the rename detection for the
first patch is more expensive (just barely) for merge-ort than
merge-recursive, and that is also consistent.  I won't investigate this
further, as I'm just going to focus on 1a & 2a.

3) just-one-mega

3a) not much to say here, it just gives some flavor for how rebasing
only one patch compares to rebasing 35.

=== Goals ===

This patch is obviously just the beginning.  Here are some of my goals
that this measurement will help us achieve:

* Drive the cost of rename detection down considerably for merges
* After the above has been achieved, see if there are other slowness
  factors (which would have previously been overshadowed by rename
  detection costs) which we can then focus on and also optimize.
* Ensure our rebase testcase that requires little rename detection
  is noticeably faster with merge-ort than with apply-based rebase.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  8 +++++++
 merge-ort.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

Message ID	20210108205111.2197944-2-newren@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15A57C433DB for <git@archiver.kernel.org>; Fri, 8 Jan 2021 20:52:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C7E3A23A9B for <git@archiver.kernel.org>; Fri, 8 Jan 2021 20:52:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729089AbhAHUwR (ORCPT <rfc822;git@archiver.kernel.org>); Fri, 8 Jan 2021 15:52:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728222AbhAHUwR (ORCPT <rfc822;git@vger.kernel.org>); Fri, 8 Jan 2021 15:52:17 -0500 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3795C061381 for <git@vger.kernel.org>; Fri, 8 Jan 2021 12:51:36 -0800 (PST) Received: by mail-oi1-x232.google.com with SMTP id 9so12829288oiq.3 for <git@vger.kernel.org>; Fri, 08 Jan 2021 12:51:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aIag8YgUHu429mWEItuT87ZARsAt1VG/ymjz3gf9zps=; b=rMQS/Yom0NrwE9bX7eKB9Kc9qX92xknVOdMPTybWUSrPJwIbeO1EYK8h5OHcKNFc/T 49kjqcIbdOhS3ygbS6GxjbwDrzAcyQAQWouOe0bAngXZZzzTS2MM3+xWYUaFM4RRRErD 389Gh2JN7iGFcJm3SPbQFAPfYsHJwCjxMPhUuiCUhR/fldPYDs6mPzTFk2+SxcWdkF8G iCs34pG7EtXLLxsV/SFVE8mteEogcE5rQVnpNHzCyQ7CRedffvjFhujA81xNnj8tJXtl dGGYlr4ueOK5TKO4uxmxh2UISsleRn1PCNFv00YbZOGPyQKPfMqFlsoW0UZGCk4ftyfd WKsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aIag8YgUHu429mWEItuT87ZARsAt1VG/ymjz3gf9zps=; b=VRMIyQcWN5izjtFX78voL1gL4KUWQVefYyFbxJAtkh1y4cILJA5OUfO+JYW0AWpCWz oOu9IjQccbdwL73IG4CJd1Xng2H8mxL7mpWKXHXJPu+MHMjwt2Ie0GBrBXZpNA4nruWE dS9kSh7814kqFMMWeTlZMWpw87uSTPVK/qjC1mIKsg0NhYZFFL2L4m432kUTssRufxGd bNvsEUjQlzzbgL4YZiVhXPM9oulYdIYNo317dL+ay7fb2+0BkrNWvwlG3WJBAs1/FW74 CU0LtMlvQPxF15fVnTeBmz9ctG0Un0/bfOpulLDeDOxwCpnUUrR6xR3y0p/B828onKtW IFJg== X-Gm-Message-State: AOAM532lMSiNyc7qu0DG/lARRlwUFfMF8zRn2lTcYIpTK9cOxs2OzMJj czJpj86UvqzufBdfGGSp9dhmrl6Aols= X-Google-Smtp-Source: ABdhPJwrJzrjd+AzwGH/J2FGKsyDMH0xvQpbZyMfauOMJnFw8Kb59YdPxN1fDRQRMhMs9TTMpTlk7Q== X-Received: by 2002:aca:c30b:: with SMTP id t11mr3655903oif.61.1610139095887; Fri, 08 Jan 2021 12:51:35 -0800 (PST) Received: from tiger.attlocal.net ([2600:1700:bae0:2de0::26]) by smtp.gmail.com with ESMTPSA id j10sm2018817otn.63.2021.01.08.12.51.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Jan 2021 12:51:35 -0800 (PST) From: Elijah Newren <newren@gmail.com> To: git@vger.kernel.org Cc: Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>, Jonathan Nieder <jrnieder@gmail.com>, Jonathan Tan <jonathantanmy@google.com>, Taylor Blau <me@ttaylorr.com>, Elijah Newren <newren@gmail.com> Subject: [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Date: Fri, 8 Jan 2021 12:51:11 -0800 Message-Id: <20210108205111.2197944-2-newren@gmail.com> X-Mailer: git-send-email 2.29.2.542.g6f8bc064c5 In-Reply-To: <20210108205111.2197944-1-newren@gmail.com> References: <20210108205111.2197944-1-newren@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <git.vger.kernel.org> X-Mailing-List: git@vger.kernel.org
Series	And so it begins...merge/rename performance work \| expand [0/1] And so it begins...merge/rename performance work [1/1] merge-ort: begin performance work; instrument with trace2_region_* calls

[1/1] merge-ort: begin performance work; instrument with trace2_region_* calls

Commit Message

Comments

Patch