From patchwork Tue May  4 02:12:07 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237297
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7772AC433ED
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:27 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 5924B61090
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229698AbhEDCNT (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36200 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229665AbhEDCNS (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:18 -0400
Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com
 [IPv6:2a00:1450:4864:20::42e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8428C06174A
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:23 -0700 (PDT)
Received: by mail-wr1-x42e.google.com with SMTP id l13so5968105wru.11
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=zXSwx3+sF61LG1Vzw98++skHdpw2Mzzq9AIj1bC/DAg=;
        b=toN9pYwEhQHZMHuIvbLQAEmckSkRP5niGReBp/rQaHLDqj9S6fq2/10fYypV3Z1lpa
         rn0doOQe53aFdVQ3XeVy0+vMy+RU82mlUyS0dRtUKgEDi0yRLWPxxedHxqVKgSvmgKQ5
         MwvWkbDQBs9O9x4iq1xrXyylTKh3B1BRDMKWWINZ/ffPPiEsFMLQpH3C4S3NsDpjaP54
         yOkbmnWgKM+atpBISlO8deKGme4tvYadmrpTnPhXzJfxGhrxxWDCG8pCTShwrS6TJ2HT
         B5d+GR2+egaMB/3XVfRJAPs/WmjC14i5dd6V62To3iZWecP/apDVPR9qN5fQmPRtw4mw
         AseQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=zXSwx3+sF61LG1Vzw98++skHdpw2Mzzq9AIj1bC/DAg=;
        b=Mxd18zOOeA3v20mXXymgCx0XpuLeO0Mc4uusXKd+PIXO+ENXEbG0fOZS6E/GwH9xzR
         iV2F9v+Qq5Sn3kWnJl+epW35vml8P+f7mU3jUjDyImnZHC0MdhmP8K/21k2X+qQCP2I1
         TXczsnh+txu1NoHWG/mK28g8vBc7o4GqNvx+mbr4yKpERaZOJ8MI3kHo/hxvINhQnW/9
         TI1zZlqhutqPUGzrSizawieXd9eO5S+uOHpIOjAMWEi5or7dbbsAhj2XxHaa2TAqZsar
         KojGBDpC0cyu8zClN0zF3Y8+gT8T+3i5D/A3MIBzuMI7Trp0gZVSSCkGQ2q9+ZlfQLEg
         QRyg==
X-Gm-Message-State: AOAM5322+g7aBlZ+gf11p6nPtLsZTBthZn+RdvofYHtXjbFEi13N9xBF
        ikz8wHIV0CGtDzUZ4jKO2KhPehim8zM=
X-Google-Smtp-Source: 
 ABdhPJx30mX48RDYnycq5qbWNJ3qLvAgFx/90BS5JDKBgVf1/BLJxxOYBdGYkntM/45UfxVQmLCIzA==
X-Received: by 2002:a05:6000:1449:: with SMTP id
 v9mr28803432wrx.82.1620094342697;
        Mon, 03 May 2021 19:12:22 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 d9sm5348516wrp.47.2021.05.03.19.12.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:22 -0700 (PDT)
Message-Id: 
 <129136a10c9d37c9cd3d875dfbbd91c452c37554.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:07 +0000
Subject: [PATCH v2 01/13] t6423: rename file within directory that other side
 renamed
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

Add a new testcase where one side of history renames:
   olddir/ -> newdir/
and the other side of history renames:
   olddir/a -> olddir/alpha

When using merge.directoryRenames=true, it seems logical to expect the
file to end up at newdir/alpha.  Unfortunately, both merge-recursive and
merge-ort currently see this as a rename/rename conflict:

   olddir/a -> newdir/a
vs.
   olddir/a -> newdir/alpha

Suggesting that there's some extra logic we probably want to add
somewhere to allow this case to run without triggering a conflict.  For
now simply document this known issue.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6423-merge-rename-directories.sh | 58 +++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 7134769149fc..be84d22419d9 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -4966,6 +4966,64 @@ test_expect_success '12g: Testcase with two kinds of "relevant" renames' '
 	)
 '
 
+# Testcase 12h, Testcase with two kinds of "relevant" renames
+#   Commit O: olddir/{a_1, b}
+#   Commit A: newdir/{a_2, b}
+#   Commit B: olddir/{alpha_1, b}
+#   Expected: newdir/{alpha_2, b}
+
+test_setup_12h () {
+	test_create_repo 12h &&
+	(
+		cd 12h &&
+
+		mkdir olddir &&
+		test_seq 3 8 >olddir/a &&
+		>olddir/b &&
+		git add olddir &&
+		git commit -m orig &&
+
+		git branch O &&
+		git branch A &&
+		git branch B &&
+
+		git switch A &&
+		test_seq 3 10 >olddir/a &&
+		git add olddir/a &&
+		git mv olddir newdir &&
+		git commit -m A &&
+
+		git switch B &&
+
+		git mv olddir/a olddir/alpha &&
+		git commit -m B
+	)
+}
+
+test_expect_failure '12h: renaming a file within a renamed directory' '
+	test_setup_12h &&
+	(
+		cd 12h &&
+
+		git checkout A^0 &&
+
+		test_might_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+		git ls-files >tracked &&
+		test_line_count = 2 tracked &&
+
+		test_path_is_missing olddir/a &&
+		test_path_is_file newdir/alpha &&
+		test_path_is_file newdir/b &&
+
+		git rev-parse >actual \
+			HEAD:newdir/alpha  HEAD:newdir/b &&
+		git rev-parse >expect \
+			A:newdir/a         O:oldir/b &&
+		test_cmp expect actual
+	)
+'
+
 ###########################################################################
 # SECTION 13: Checking informational and conflict messages
 #

From patchwork Tue May  4 02:12:08 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237305
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 29496C433B4
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:30 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 045066135F
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229737AbhEDCNX (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36220 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229723AbhEDCNV (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:21 -0400
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com
 [IPv6:2a00:1450:4864:20::42d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29962C061761
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:26 -0700 (PDT)
Received: by mail-wr1-x42d.google.com with SMTP id h4so7550481wrt.12
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=sTEqJ82I+6KIpx8o1efNz5CWuT3px5YAGUFJJ8sthkM=;
        b=PJUhUuYqC6hgMpwNdLyAuFT00W9AAAMvCAvoWFsP5FzjCsiT6OO+gndNDKSjubk9Ip
         YNLJpusZCJD2Uvhf6MIt8lmChhsO083eG8Pj/yPlZ7zB5b17QFLaq26H0qbw73SFh4fd
         0+x4QI/6sieKwE6RXTG4mKpfOBKvZB8uhO0utwEWUds2oVORxpiSU0T+F7aH4gyp1KbL
         PNj4jWzHUHNPcGjBc3vROM0deoGkfZ5wVec3L3Z4Uz+kL5od08JOASpU1wBjsa8DUQuS
         UKr4FSOcB7GgEdJfayNX+e5NomgE+8qHPRuI6us02xkSoE0hLNt5AGwbEfoNr2AwvWoO
         uwkw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=sTEqJ82I+6KIpx8o1efNz5CWuT3px5YAGUFJJ8sthkM=;
        b=mgvPqJOtOxb44V4BsdP93Ab9qrh8EwHoBDjqSbmxAwGsn/k9Nre7gC4wIwvGDg9mLJ
         iIroZiq6i2rvuYEUz5Scu/yblYShlukBVTnTmaHW9iFGFkUWsWwG+2TMg8lowXzyIXpZ
         Pk5qK2K0v+hQzLZoeAjoihZ7yyw20PTiDtFtaRiCBD3nx6TMcD6VEq3DQvMyJiBMsr0m
         fA7tBvK2tsKIsCDRei4QRVpd2G7Rb2seTZumYTjckxU/sbE7RRb2uNIthdaOkCyYbn7+
         T2wlJJyBQuyPUn725knMmJFLfmdCCNTdFK1Rg3+utqdLdwLyfkQMLZ5xpdy8obfsOBCi
         WhGQ==
X-Gm-Message-State: AOAM531I/YaRsPaP2DqreQUCiVTqtGGWFQH6Oar/mNXxYDoN3pk/enyA
        y2zqj3ifi+bveT+X1D3NtuVLv64C4iM=
X-Google-Smtp-Source: 
 ABdhPJzzxnJ4jVbQJu13SQIxGhM3fblAYxDx2F0FL9wkBFzmXVbrwU/4nvdfF4iqDxl6/cLW0aNDjA==
X-Received: by 2002:a05:6000:144:: with SMTP id
 r4mr27941715wrx.128.1620094343417;
        Mon, 03 May 2021 19:12:23 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 f25sm14787146wrd.67.2021.05.03.19.12.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:23 -0700 (PDT)
Message-Id: 
 <027c446286d9a7679af61cbd2939d68ae7bc4563.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:08 +0000
Subject: [PATCH v2 02/13] Documentation/technical: describe remembering
 renames optimization
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

Remembering renames on the upstream side of history in an early merge of
a rebase or cherry-pick for re-use in a latter merge of the same
operation makes pretty good intuitive sense.  However, trying to show
that it doesn't cause some subtle behavioral difference or some funny
edge or corner case is much more involved.  And, in fact, it does
introduce a subtle behavioral change.

Document all the assumptions, special cases, and logic involved in such
an optimization, and describe why this optimization is safe under the
current optimizations/features/etc. -- even when the subtle behavioral
change is triggered.

Part of the point of adding this document that goes over the
optimization in such laborious detail, is that it is possible that
significant future changes (optimizations or feature changes) could
interact with this optimization in interesting ways; this document is
here to help folks making big changes sanity check that the assumptions
and arguments underlying this optimization are still valid.  (As a side
note, creating this document forced me to review things in sufficient
detail that I found I was not properly caching directory-rename-induced
renames, resulting in the code not being aware of those renames and
causing unnecessary diffcore_rename_extended() calls in subsequent
merges.)

A subsequent commit will add several testcases based on this document
meant to stress-test the implementation and also document the case with
the subtle behavioral change.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 .../technical/remembering-renames.txt         | 669 ++++++++++++++++++
 1 file changed, 669 insertions(+)
 create mode 100644 Documentation/technical/remembering-renames.txt

diff --git a/Documentation/technical/remembering-renames.txt b/Documentation/technical/remembering-renames.txt
new file mode 100644
index 000000000000..e7c298d83df2
--- /dev/null
+++ b/Documentation/technical/remembering-renames.txt
@@ -0,0 +1,669 @@
+Rebases and cherry-picks involve a sequence of merges whose results are
+recorded as new single-parent commits.  The first parent side of those
+merges represent the "upstream" side, and often include a far larger set of
+changes than the second parent side.  Traditionally, the renames on the
+first-parent side of that sequence of merges were repeatedly re-detected
+for every merge.  This file explains why it is safe and effective during
+rebases and cherry-picks to remember renames on the upstream side of
+history as an optimization, assuming all merges are automatic and clean
+(i.e. no conflicts and not interrupted for user input or editing).
+
+Outline:
+
+  0. Assumptions
+
+  1. How rebasing and cherry-picking work
+
+  2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
+     superset of the renames on MERGE_SIDE1 for the next pick.
+
+  3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
+     a rename on MERGE_SIDE1 for the next pick
+
+  4. A detailed description of the the counter-examples to #3.
+
+  5. Why the special cases in #4 are still fully reasonable to use to pair
+     up files for three-way content merging in the merge machinery, and why
+     they do not affect the correctness of the merge.
+
+  6. Interaction with skipping of "irrelevant" renames
+
+  7. Additional items that need to be cached
+
+  8. How directory rename detection interacts with the above and why this
+     optimization is still safe even if merge.directoryRenames is set to
+     "true".
+
+
+=== 0. Assumptions ===
+
+There are two assumptions that will hold throughout this document:
+
+  * The upstream side where commits are transplanted to is treated as the
+    first parent side when rebase/cherry-pick call the merge machinery
+
+  * All merges are fully automatic
+
+and a third that will hold in sections 2-5 for simplicity, that I'll later
+address in section 8:
+
+  * No directory renames occur
+
+
+Let me explain more about each assumption and why I include it:
+
+
+The first assumption is merely for the purposes of making this document
+clearer; the optimization implementation does not actually depend upon it.
+However, the assumption does hold in all cases because it reflects the way
+that both rebase and cherry-pick where implemented; and the implementation
+of cherry-pick and rebase are not readily changeable for backwards
+compatibility reasons (see for example the discussion of the --ours and
+--theirs flag in the documentation of `git checkout`, particularly the
+comments about how they behave with rebase).  The optimization avoids
+checking first-parent-ness, though.  It checks the conditions that make the
+optimization valid instead, so it would still continue working if someone
+changed the parent ordering that cherry-pick and rebase use.  But making
+this assumption does make this document much clearer and prevents me from
+having to repeat every example twice.
+
+If the second assumption is violated, then the optimization simply is
+turned off and thus isn't relevant to consider.  The second assumption can
+also be stated as "there is no interruption for a user to resolve conflicts
+or to just further edit or tweak files".  While real rebases and
+cherry-picks are often interrupted (either because it's an interactive
+rebase where the user requested to stop and edit, or because there were
+conflicts that the user needs to resolve), the cache of renames is not
+stored on disk, and thus is thrown away as soon as the rebase or cherry
+pick stops for the user to resolve the operation.
+
+The third assumption makes sections 2-5 simpler, and allows people to
+understand the basics of why this optimization is safe and effective, and
+then I can go back and address the specifics in section 8.  It is probably
+also worth noting that if directory renames do occur, then the default of
+merge.directoryRenames being set to "conflict" means that the operation
+will stop for users to resolve the conflicts and the cache will be thrown
+away, and thus that there won't be an optimization to apply.  So, the only
+reason we need to address directory renames specifically, is that some
+users will have set merge.directoryRenames to "true" to allow the merges to
+continue to proceed automatically.  The optimization is still safe with
+this config setting, but we have to discuss a few more cases to show why;
+this discussion is deferred until section 8.
+
+
+=== 1. How rebasing and cherry-picking work ===
+
+Consider the following setup (from the git-rebase manpage):
+
+		     A---B---C topic
+		    /
+	       D---E---F---G main
+
+After rebasing or cherry-picking topic onto main, this will appear as:
+
+			     A'--B'--C' topic
+			    /
+	       D---E---F---G main
+
+The way the commits A', B', and C' are created is through a series of
+merges, where rebase or cherry-pick sequentially uses each of the three
+A-B-C commits in a special merge operation.  Let's label the three commits
+in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2.  For
+this picture, the three commits for each of the three merges would be:
+
+To create A':
+   MERGE_BASE:   E
+   MERGE_SIDE1:  G
+   MERGE_SIDE2:  A
+
+To create B':
+   MERGE_BASE:   A
+   MERGE_SIDE1:  A'
+   MERGE_SIDE2:  B
+
+To create C':
+   MERGE_BASE:   B
+   MERGE_SIDE1:  B'
+   MERGE_SIDE2:  C
+
+Sometimes, folks are surprised that these three-way merges are done.  It
+can be useful in understanding these three-way merges to view them in a
+slightly different light.  For example, in creating C', you can view it as
+either:
+
+  * Apply the changes between B & C to B'
+  * Apply the changes between B & B' to C
+
+Conceptually the two statements above are the same as a three-way merge of
+B, B', and C, at least the parts before you decide to record a commit.
+
+
+=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
+===    superset of the renames on MERGE_SIDE1 for the next pick.     ===
+
+The merge machinery uses the filenames it is fed from MERGE_BASE,
+MERGE_SIDE1, and MERGE_SIDE2.  It will only move content to a different
+filename under one of three conditions:
+
+  * To make both pieces of a conflict available to a user during conflict
+    resolution (examples: directory/file conflict, add/add type conflict
+    such as symlink vs. regular file)
+
+  * When MERGE_SIDE1 renames the file.
+
+  * When MERGE_SIDE2 renames the file.
+
+First, let's remember what commits are involved in the first and second
+picks of the cherry-pick or rebase sequence:
+
+To create A':
+   MERGE_BASE:   E
+   MERGE_SIDE1:  G
+   MERGE_SIDE2:  A
+
+To create B':
+   MERGE_BASE:   A
+   MERGE_SIDE1:  A'
+   MERGE_SIDE2:  B
+
+So, in particular, we need to show that the renames between E and G are a
+superset of those between A and A'.
+
+A' is created by the first merge.  A' will only have renames for one of the
+three reasons listed above.  The first case, a conflict, results in a
+situation where the cache is dropped and thus this optimization doesn't
+take effect, so we need not consider that case.  The third case, a rename
+on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
+in A -- therefore when diffing A and A' that path does not show up as a
+rename.  The only remaining way for renames to show up in A' is for the
+rename to come from MERGE_SIDE1.  Therefore, all renames between A and A'
+are a subset of those between E and G.  Equivalently, all renames between E
+and G are a superset of those between A and A'.
+
+
+=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_   ===
+===    always also a rename on MERGE_SIDE1 for the next pick.        ===
+
+Let's again look at the first two picks:
+
+To create A':
+   MERGE_BASE:   E
+   MERGE_SIDE1:  G
+   MERGE_SIDE2:  A
+
+To create B':
+   MERGE_BASE:   A
+   MERGE_SIDE1:  A'
+   MERGE_SIDE2:  B
+
+Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
+any given rename from E to G.  Let's use the filenames 'oldfile' and
+'newfile' for demonstration purposes.  That first pick will function as
+follows; when the rename is detected, the merge machinery will do a
+three-way content merge of the following:
+    E:oldfile
+    G:newfile
+    A:oldfile
+and produce a new result:
+    A':newfile
+
+Note above that I've assumed that E->A did not rename oldfile.  If that
+side did rename, then we most likely have a rename/rename(1to2) conflict
+that will cause the rebase or cherry-pick operation to halt and drop the
+in-memory cache of renames and thus doesn't need to be considered further.
+In the special case that E->A does rename the file but also renames it to
+newfile, then there is no conflict from the renaming and the merge can
+succeed.  In this special case, the rename is not valid to cache because
+the second merge will find A:newfile in the MERGE_BASE.  So a
+rename/rename(1to1) needs to be specially handled by pruning renames from
+the cache and decrementing the dir_rename_counts in the current and leading
+directories associated with those renames.  Or, since these are really
+rare, one could just take the easy way out and disable the remembering
+renames optimization when a rename/rename(1to1) happens.
+
+The previous paragraph handled the cases for E->A renaming oldfile, let's
+continue assuming that oldfile is not renamed in A.
+
+As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
+to A'.  So, we are curious whether A:oldfile and A':newfile will be viewed
+as renames.  Note that:
+
+  * There will be no A':oldfile (because there could not have been a
+    G:oldfile as we do not do break detection in the merge machinery and
+    G:newfile was detected as a rename, and by the construction of the
+    rename above that merged cleanly, the merge machinery will ensure there
+    is no 'oldfile' in the result).
+
+  * There will be no A:newfile (if there had been, we would have had a
+    rename/add conflict).
+
+  * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
+    clean three-way content merge involving A:oldfile).
+
+We can also expound on the third point above, by noting that three-way
+content merges can also be viewed as applying the differences between the
+base and one side to the other side.  Thus we can view A':newfile as
+having been created by taking the changes between E:oldfile and G:newfile
+(which were detected as being related, i.e. <50% changed) to A:oldfile.
+
+Thus A:oldfile and A':newfile are just as related as E:oldfile and
+G:newfile are -- they have exactly identical differences.  Since the latter
+were detected as renames, A:oldfile and A':newfile should also be
+detectable as renames almost always.
+
+
+=== 4. A detailed description of the counter-examples to #3.         ===
+
+We already noted in section 3 that rename/rename(1to1) (i.e. both sides
+renaming a file the same way) was one counter-example.  The more
+interesting bit, though, is why did we need to use the "almost" qualifier
+when stating that A:oldfile and A':newfile are "almost" always detectable
+as renames?
+
+Let's repeat an earlier point that section 3 made:
+
+  A':newfile was created by applying the changes between E:oldfile and
+  G:newfile to A:oldfile.  The changes between E:oldfile and G:newfile were
+  <50% of the size of E:oldfile.
+
+If those changes that were <50% of the size of E:oldfile are also <50% of
+the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
+renames.  However, if there is a dramatic size reduction between E:oldfile
+and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
+still somehow merge cleanly), then traditional rename detection would not
+detect A:oldfile and A':newfile as renames.
+
+Here's an example where that can happen:
+  * E:oldfile had 20 lines
+  * G:newfile added 10 new lines at the beginning of the file
+  * A:oldfile kept the first 3 lines of the file, and deleted all the rest
+then
+  => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
+E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
+A':newfile would not be.
+
+
+=== 5. Why the special cases in #4 are still fully reasonable to use to    ===
+===    pair up files for three-way content merging in the merge machinery, ===
+===    and why they do not affect the correctness of the merge.            ===
+
+In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
+since they use the *same* filename.  However, files with the same filename
+are obviously fine to pair up for three-way content merging (the merge
+machinery has never employed break detection).  The interesting
+counter-example case is thus not the rename/rename(1to1) case, but the case
+where A did not rename oldfile.  That was the case that we spent most of
+the time discussing in sections 3 and 4.  The remainder of this section
+will be devoted to that case as well.
+
+So, even if A:oldfile and A':newfile aren't detectable as renames, why is
+it still reasonable to pair them up for three-way content merging in the
+merge machinery?  There are multiple reasons:
+
+  * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
+    is *exactly* the same as the diff between E:oldfile and G:newfile.  The
+    latter pair were detected as renames, so it seems unlikely to surprise
+    users for us to treat A:oldfile and A':newfile as renames.
+
+  * In fact, "oldfile" and "newfile" were at one point detected as renames
+    due to how they were constructed in the E..G chain.  And we used that
+    information once already in this rebase/cherry-pick.  I think users
+    would be unlikely to be surprised at us continuing to treat the files
+    as renames and would quickly understand why we had done so.
+
+  * Marking or declaring files as renames is *not* the end goal for merges.
+    Merges use renames to determine which files make sense to be paired up
+    for three-way content merges.
+
+  * A:oldfile and A':newfile were _already_ paired up in a three-way
+    content merge; that is how A':newfile was created.  In fact, that
+    three-way content merge was clean.  So using them again in a later
+    three-way content merge seems very reasonable.
+
+However, the above is focusing on the common scenarios.  Let's try to look
+at all possible unusual scenarios and compare without the optimization to
+with the optimization.  Consider the following theoretical cases; we will
+the dive into each to determine which of them are possible,
+and if so, what they mean:
+
+  1. Without the optimization, the second merge results in a conflict.
+     With the optimization, the second merge also results in a conflict.
+     Questions: Are the conflicts confusingly different?  Better in one case?
+
+  2. Without the optimization, the second merge results in NO conflict.
+     With the optimization, the second merge also results in NO conflict.
+     Questions: Are the merges the same?
+
+  3. Without the optimization, the second merge results in a conflict.
+     With the optimization, the second merge results in NO conflict.
+     Questions: Possible?  Bug, bugfix, or something else?
+
+  4. Without the optimization, the second merge results in NO conflict.
+     With the optimization, the second merge results in a conflict.
+     Questions: Possible?  Bug, bugfix, or something else?
+
+I'll consider all four cases, but out of order.
+
+The fourth case is impossible.  For the code without the remembering
+renames optimization to not get a conflict, B:oldfile would need to exactly
+match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
+If A:oldfile matches B:oldfile exactly, then a three-way content merge
+between A:oldfile, A':newfile, and B:oldfile would have no conflict and
+just give us the version of newfile from A' as the result.
+
+From the same logic as the above paragraph, the second case would indeed
+result in identical merges.  When A:oldfile exactly matches B:oldfile, an
+undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
+and the other side deleted it.  I'll delete it.  And I see you have this
+brand new file named 'newfile' in A', so I'll keep it."  That gives the
+same results as three-way content merging A:oldfile, A':newfile, and
+B:oldfile -- a removal of oldfile with the version of newfile from A'
+showing up in the result.
+
+The third case is interesting.  It means that A:oldfile and A':newfile were
+not just similar enough, but that the changes between them did not conflict
+with the changes between A:oldfile and B:oldfile.  This would validate our
+hunch that the files were similar enough to be used in a three-way content
+merge, and thus seems entirely correct for us to have used them that way.
+(Sidenote: One particular example here may be enlightening.  Let's say that
+B was an immediate revert of A.  B clearly would have been a clean revert
+of A, since A was B's immediate parent.  One would assume that if you can
+pick a commit, you should also be able to cherry-pick its immediate revert.
+However, this is one of those funny corner cases; without this
+optimization, we just successfully picked a commit cleanly, but we are
+unable to cherry-pick its immediate revert due to the size differences
+between E:oldfile and A:oldfile.)
+
+That leaves only the first case to consider -- when we get conflicts both
+with or without the optimization.  Without the optimization, we'll have a
+modify/delete conflict, where both A':newfile and B:oldfile are left in the
+tree for the user to deal with and no hints about the potential similarity
+between the two.  With the optimization, we'll have a three-way content
+merged A:oldfile, A':newfile, and B:oldfile with conflict markers
+suggesting we though the files were related but giving the user the chance
+to resolve.  As noted above, I don't think users will find us treating
+'oldfile' and 'newfile' as related as a surprise since they were between E
+and G.  In any event, though, this case shouldn't be concerning since we
+hit a conflict in both cases, told the user what we know, and asked them to
+resolve it.
+
+So, in summary, case 4 is impossible, case 2 yields the same behavior, and
+cases 1 and 3 seem to provide as good or better behavior with the
+optimization than without.
+
+
+=== 6. Interaction with skipping of "irrelevant" renames ===
+
+Previous optimizations involved skipping rename detection for paths
+considered to be "irrelevant".  See for example the following commits:
+
+  * 32a56dfb99 ("merge-ort: precompute subset of sources for which we
+		need rename detection", 2021-03-11)
+  * 2fd9eda462 ("merge-ort: precompute whether directory rename
+		detection is needed", 2021-03-11)
+  * 9bd342137e ("diffcore-rename: determine which relevant_sources are
+		no longer relevant", 2021-03-13)
+
+However, relevance is always determined by what the _other_ side of
+history has done, in terms of modifing a file that our side renamed,
+or adding a file to a directory which our side renamed.  However, this
+means that a path that is "irrelevant" when picking the first commit
+of a series in a rebase or cherry-pick, may suddenly become "relevant"
+when picking the next commit.
+
+The upshot of this is that we can only cache rename detection results
+for relevant paths, and need to re-check relevance in subsequent
+commits.  If those subsequent commits have additional paths that are
+relevant for rename detection, then we will need to redo rename
+detection -- though we can limit it to the paths for which we have not
+already detected renames.
+
+
+=== 7. Additional items that need to be cached ===
+
+It turns out we have to cache more than just renames; we also cache:
+
+  A) non-renames (i.e. unpaired deletes)
+  B) counts of renames within directories
+  C) sources that were marked as RELEVANT_LOCATION, but which were
+     downgraded to RELEVANT_NO_MORE
+  D) the toplevel trees involved in the merge
+
+These are all stored in struct rename_info, and respectively appear in
+  * cached_pairs (along side actual renames, just with a value of NULL)
+  * dir_rename_counts
+  * cached_irrelevant
+  * merge_trees
+
+The reason for (A) comes from the irrelevant renames skipping
+optimization discussed in section 6.  The fact that irrelevant renames
+are skipped means we only get a subset of the potential renames
+detected and subsequent commits may need to run rename detection on
+the upstream side on a subset of the remaining renames (to get the
+renames that are relevant for that later commit).  Since unpaired
+deletes are involved in rename detection too, we don't want to
+repeatedly check that those paths remain unpaired on the upstream side
+with every commit we are transplanting.
+
+The reason for (B) is that diffcore_rename_extended() is what
+generates the counts of renames by directory which is needed in
+directory rename detection, and if we don't run
+diffcore_rename_extended() again then we need to have the output from
+it, including dir_rename_counts, from the previous run.
+
+The reason for (C) is that merge-ort's tree traversal will again think
+those paths are relevant (marking them as RELEVANT_LOCATION), but the
+fact that they were downgraded to RELEVANT_NO_MORE means that
+dir_rename_counts already has the information we need for directory
+rename detection.  (A path which becomes RELEVANT_CONTENT in a
+subsequent commit will be removed from cached_irrelevant.)
+
+The reason for (D) is that is how we determine whether the remember
+renames optimization can be used.  In particular, remembering that our
+sequence of merges looks like:
+
+   Merge 1:
+   MERGE_BASE:   E
+   MERGE_SIDE1:  G
+   MERGE_SIDE2:  A
+   => Creates    A'
+
+   Merge 2:
+   MERGE_BASE:   A
+   MERGE_SIDE1:  A'
+   MERGE_SIDE2:  B
+   => Creates    B'
+
+It is the fact that the trees A and A' appear both in Merge 1 and in
+Merge 2, with A as a parent of A' that allows this optimization.  So
+we store the trees to compare with what we are asked to merge next
+time.
+
+
+=== 8. How directory rename detection interacts with the above and   ===
+===    why this optimization is still safe even if                   ===
+===    merge.directoryRenames is set to "true".                      ===
+
+As noted in the assumptions section:
+
+    """
+    ...if directory renames do occur, then the default of
+    merge.directoryRenames being set to "conflict" means that the operation
+    will stop for users to resolve the conflicts and the cache will be
+    thrown away, and thus that there won't be an optimization to apply.
+    So, the only reason we need to address directory renames specifically,
+    is that some users will have set merge.directoryRenames to "true" to
+    allow the merges to continue to proceed automatically.
+    """
+
+Let's remember that we need to look at how any given pick affects the next
+one.  So let's again use the first two picks from the diagram in section
+one:
+
+  First pick does this three-way merge:
+    MERGE_BASE:   E
+    MERGE_SIDE1:  G
+    MERGE_SIDE2:  A
+    => creates A'
+
+  Second pick does this three-way merge:
+    MERGE_BASE:   A
+    MERGE_SIDE1:  A'
+    MERGE_SIDE2:  B
+    => creates B'
+
+Now, directory rename detection exists so that if one side of history
+renames a directory, and the other side adds a new file to the old
+directory, then the merge (with merge.directoryRenames=true) can move the
+file into the new directory.  There are two qualitatively different ways to
+add a new file to an old directory: create a new file, or rename a file
+into that directory.  Also, directory renames can be done on either side of
+history, so there are four cases to consider:
+
+  * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to   old dir
+  * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir
+  * MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
+  * MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir
+
+One last note before we consider these four cases: There are some
+important properties about how we implement this optimization with
+respect to directory rename detection that we need to bear in mind
+while considering all of these cases:
+
+  * rename caching occurs *after* applying directory renames
+
+  * a rename created by directory rename detection is recorded for the side
+    of history that did the directory rename.
+
+  * dir_rename_counts, the nested map of
+	{oldname => {newname => count}},
+    is cached between runs as well.  This basically means that directory
+    rename detection is also cached, though only on the side of history
+    that we cache renames for (MERGE_SIDE1 as far as this document is
+    concerned; see the assumptions section).  Two interesting sub-notes
+    about these counts:
+
+    * If we need to perform rename-detection again on the given side (e.g.
+      some paths are relevant for rename detection that weren't before),
+      then we clear dir_rename_counts and recompute it, making use of
+      cached_pairs.  The reason it is important to do this is optimizations
+      around RELEVANT_LOCATION exist to prevent us from computing
+      unnecessary renames for directory rename detection and from computing
+      dir_rename_counts for irrelevant directories; but those same renames
+      or directories may become necessary for subsequent merges.  The
+      easiest way to "fix up" dir_rename_counts in such cases is to just
+      recompute it.
+
+    * If we prune rename/rename(1to1) entries from the cache, then we also
+      need to update dir_rename_counts to decrement the counts for the
+      involved directory and any relevant parent directories (to undo what
+      update_dir_rename_counts() in diffcore-rename.c incremented when the
+      rename was initially found).  If we instead just disable the
+      remembering renames optimization when the exceedingly rare
+      rename/rename(1to1) cases occur, then dir_rename_counts will get
+      re-computed the next time rename detection occurs, as noted above.
+
+  * the side with multiple commits to pick, is the side of history that we
+    do NOT cache renames for.  Thus, there are no additional commits to
+    change the number of renames in a directory, except for those done by
+    directory rename detection (which always pad the majority).
+
+  * the "renames" we cache are modified slightly by any directory rename,
+    as noted below.
+
+Now, with those notes out of the way, let's go through the four cases
+in order:
+
+Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
+
+  This case looks like this:
+
+    MERGE_BASE:   E,   Has olddir/
+    MERGE_SIDE1:  G,   Renames olddir/ -> newdir/
+    MERGE_SIDE2:  A,   Adds olddir/newfile
+    => creates    A',  With newdir/newfile
+
+    MERGE_BASE:   A,   Has olddir/newfile
+    MERGE_SIDE1:  A',  Has newdir/newfile
+    MERGE_SIDE2:  B,   Modifies olddir/newfile
+    => expected   B',  with threeway-merged newdir/newfile from above
+
+  In this case, with the optimization, note that after the first commit:
+    * MERGE_SIDE1 remembers olddir/ -> newdir/
+    * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
+  Given the cached rename noted above, the second merge can proceed as
+  expected without needing to perform rename detection from A -> A'.
+
+Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir
+
+  This case looks like this:
+    MERGE_BASE:   E    oldfile, olddir/
+    MERGE_SIDE1:  G    oldfile, olddir/ -> newdir/
+    MERGE_SIDE2:  A    oldfile -> olddir/newfile
+    => creates    A',  With newdir/newfile representing original oldfile
+
+    MERGE_BASE:   A    olddir/newfile
+    MERGE_SIDE1:  A'   newdir/newfile
+    MERGE_SIDE2:  B    modify olddir/newfile
+    => expected   B',  with threeway-merged newdir/newfile from above
+
+  In this case, with the optimization, note that after the first commit:
+    * MERGE_SIDE1 remembers olddir/ -> newdir/
+    * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
+		  (NOT oldfile -> newdir/newfile; compare to case with
+		   (p->status == 'R' && new_path) in possibly_cache_new_pair())
+
+  Given the cached rename noted above, the second merge can proceed as
+  expected without needing to perform rename detection from A -> A'.
+
+Case 3: MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
+
+  This case looks like this:
+
+    MERGE_BASE:   E,   Has olddir/
+    MERGE_SIDE1:  G,   Adds olddir/newfile
+    MERGE_SIDE2:  A,   Renames olddir/ -> newdir/
+    => creates    A',  With newdir/newfile
+
+    MERGE_BASE:   A,   Has newdir/, but no notion of newdir/newfile
+    MERGE_SIDE1:  A',  Has newdir/newfile
+    MERGE_SIDE2:  B,   Has newdir/, but no notion of newdir/newfile
+    => expected   B',  with newdir/newfile from A'
+
+  In this case, with the optimization, note that after the first commit there
+  were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
+  But the second merge didn't need any renames so this is fine.
+
+Case 4: MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir
+
+  This case looks like this:
+
+    MERGE_BASE:   E,   Has olddir/
+    MERGE_SIDE1:  G,   Renames oldfile -> olddir/newfile
+    MERGE_SIDE2:  A,   Renames olddir/ -> newdir/
+    => creates    A',  With newdir/newfile representing original oldfile
+
+    MERGE_BASE:   A,   Has oldfile
+    MERGE_SIDE1:  A',  Has newdir/newfile
+    MERGE_SIDE2:  B,   Modifies oldfile
+    => expected   B',  with threeway-merged newdir/newfile from above
+
+  In this case, with the optimization, note that after the first commit:
+    * MERGE_SIDE1 remembers oldfile -> newdir/newfile
+		  (NOT oldfile -> olddir/newfile; compare to case of second
+		   block under p->status == 'R' in possibly_cache_new_pair())
+    * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
+
+  Given the cached rename noted above, the second merge can proceed as
+  expected without needing to perform rename detection from A -> A'.
+
+Finally, I'll just note here that interactions with the
+skip-irrelevant-renames optimization means we sometimes don't detect
+renames for any files within a directory that was renamed, in which
+case we will not have been able to detect any rename for the directory
+itself.  In such a case, we do not know whether the directory was
+renamed; we want to be careful to avoid cacheing some kind of "this
+directory was not renamed" statement.  If we did, then a subsequent
+commit being rebased could add a file to the old directory, and the
+user would expect it to end up in the correct directory -- something
+our erroneous "this directory was not renamed" cache would preclude.

From patchwork Tue May  4 02:12:09 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237299
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7EB74C43461
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:28 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 590AF61168
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229726AbhEDCNV (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36210 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229665AbhEDCNU (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:20 -0400
Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com
 [IPv6:2a00:1450:4864:20::433])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EF89C061574
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:25 -0700 (PDT)
Received: by mail-wr1-x433.google.com with SMTP id a4so7599892wrr.2
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=xXgdDHvUd75yXjM8blIM7wuwHKtZQ9KdEC4eYenxuok=;
        b=n5BlTY+KZFRf66WaNeTe+igI0CJB1vNqjZ94avX7RZrhgPVHmghPU7iDjal56zNr5l
         ec0lrDFtxDBKMugFQrNe/GMmLhsgSlkMlUhidUpWC79AETRBabqeM0FdfFFShLmtLpd+
         vulm0Qo6VRhPQ1E00Kyc3d4bCuECbBXe8J86tQ0nXH6N0ToulJ7LGDMC3J/0E9xapjoC
         OP89FKpf1tHxAzLM3jH7AZCGjWB3DA9bUZRoubv4mFS3iJZ/FkLiKDMEjFN4DJGGAmJc
         IWz8Oxb/p64X4B9lxe1Rdy5yohZj3m40N1a5YuyTA4vJlHrbtStWGjVQW4Ky4GEo6C/N
         J2Uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=xXgdDHvUd75yXjM8blIM7wuwHKtZQ9KdEC4eYenxuok=;
        b=o2X5BYBPb7k+ty7qx1luU+v8kmhkUafzd86MXV9ZM6Hg3iTFDfS+x4xxK/2o2CVv+L
         PKOEYMIMr6Ytjbmw3VbFe1pmUjBxAw83BH6aWUeSx4Z4bt41RbtSv5NG7mgysfyw68Sc
         L+ELLPbyhP/99+lNeKo6oT8SEwwOEWAD6szQuzVvT0SjIvEb9uTfwYsWvYck9jsrqs65
         FqD9Gm08h4vYAWtoiJUnjNXefGSoZMeo5EL3MmMqb6md/mi8YYGmVeeeppHWC215t9pB
         1XaDlojYTQh7t1o+Mfxwby38nZrI/h9bVHMml9PzNWDcg9mvH8tkGcRJEaG01TMon9uJ
         cmog==
X-Gm-Message-State: AOAM530SAgwGL5aKaqpZ7shO1t3+9PfoHXnmGDNmCEqcOxtF5sIRtlMD
        +/DloR6/gR12gPfJz9X7q2A2xjXuAcE=
X-Google-Smtp-Source: 
 ABdhPJy6DKDmQ0k3bL+Cx0Nc0TsMi6UBkopgVq3iE4m2ogLaFgeVJ+3Nh/p3TBaF9pcYTBKgFyA9kg==
X-Received: by 2002:a5d:6551:: with SMTP id
 z17mr15653425wrv.372.1620094344109;
        Mon, 03 May 2021 19:12:24 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 q12sm14137241wrx.17.2021.05.03.19.12.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:23 -0700 (PDT)
Message-Id: 
 <7dc273d458ea4e2b64706dcb9ce043abd0983b65.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:09 +0000
Subject: [PATCH v2 03/13] fast-rebase: change assert() to BUG()
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

assert() can succinctly document expectations for the code, and do so in
a way that may be useful to future folks trying to refactor the code and
change basic assumptions; it allows them to more quickly find some
places where their violations of previous assumptions trips things up.

Unfortunately, assert() can surround a function call with important
side-effects, which is a huge mistake since some users will compile with
assertions disabled.  I've had to debug such mistakes before in other
codebases, so I should know better.  Luckily, this was only in test
code, but it's still very embarrassing.  Change an assert() to an if
(...) BUG (...).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/helper/test-fast-rebase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 373212256a66..39fb7f41e8c1 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -124,7 +124,8 @@ int cmd__fast_rebase(int argc, const char **argv)
 	assert(oideq(&onto->object.oid, &head));
 
 	hold_locked_index(&lock, LOCK_DIE_ON_ERROR);
-	assert(repo_read_index(the_repository) >= 0);
+	if (repo_read_index(the_repository) < 0)
+		BUG("Could not read index");
 
 	repo_init_revisions(the_repository, &revs, NULL);
 	revs.verbose_header = 1;

From patchwork Tue May  4 02:12:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237301
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AFE65C43460
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:29 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 91A37611CB
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229727AbhEDCNW (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36218 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229722AbhEDCNU (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:20 -0400
Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com
 [IPv6:2a00:1450:4864:20::430])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28719C06174A
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:26 -0700 (PDT)
Received: by mail-wr1-x430.google.com with SMTP id n2so7617438wrm.0
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=gUnXWMvnUu3h+DNbCL2dlFWVnyBiQFKDUiNGfM7jfOE=;
        b=ROBgGiKSGJrakczz0JXlDaVqy6xTWBhf1OOCzjZ0n6+sVVBDJ5pf8FbnrbaTUNkZ8S
         dgbvJe8iacuWZ/pbQ0/+YGSoAlWFNT4CM4yfBB0nSHhYU37NifAA5uwRkgTKyWf4p2bW
         cNOQIbo5Y8oDbaSmjrFnpTogiaXO/uoHtXx+cFVxWYBMokTLlJK+RE0OZiHdx2hWnyqI
         STkoDd2sHp6HCgeMbioh35NuDG9xnaSenC2rL/sm16x8IeA5WQ9OTTYPZ9pQnIaCL4b6
         KHDV+jPF28TVjFNYJ+/MYkpRxZDxNw8uGNBpj+1xfdfWSVCjnHLFWhsm0LjJRPn7hb9j
         PkcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=gUnXWMvnUu3h+DNbCL2dlFWVnyBiQFKDUiNGfM7jfOE=;
        b=EpEM4migRK1vogwcXFxILC3cvgdzEl6jsSuCjX2ckAGbnAKtySa4Z/pbFFH0xNrL5U
         IVtUr5TEDTgmqlGJzaGgu1dZ42ren7vDA06kHnYjocsaknJQfxtqplqxBPCsfISFjLyp
         d+bnkVwu+BV61/gnUJL5Z1Tga7sf78M4E5Op+egharGWE60FftWeOn5ixwPQHt+onB66
         w4lP/NnGnsH+1IDHyUm0WDaMl8JjrD7h/EocoalUYdjw7QVrtXJm9Mw6qCCpgT54ZJ/o
         dr5xvvanP21RyxZ5sh1C8Ybeb2oVV7wuPldWY5CoNRyuX9W0trScxqmsyiNEGn7CHxIK
         WvaQ==
X-Gm-Message-State: AOAM531NU0WnggDdtzHeBQlVwgVZqoq3gkQRVGPr/P/asTCC1bRVTjnq
        kKiXOVBOxBL4Pusx6Xmmmshxn4TPZTE=
X-Google-Smtp-Source: 
 ABdhPJxpRIqBGXRZp15x47cb2AmoTcKPbSRTQnwoF8n0CDZoIYKmqOC4v7zgUtf0Dlr/s20hzTvsMw==
X-Received: by 2002:adf:f80f:: with SMTP id
 s15mr28470133wrp.341.1620094344917;
        Mon, 03 May 2021 19:12:24 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 j13sm19217954wrd.81.2021.05.03.19.12.24
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:24 -0700 (PDT)
Message-Id: 
 <887b151c26ff0f175f2da431d77cd377bd066990.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:10 +0000
Subject: [PATCH v2 04/13] fast-rebase: write conflict state to working tree,
 index, and HEAD
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

Previously, when fast-rebase hit a conflict, it simply aborted and left
HEAD, the index, and the working tree where they were before the
operation started.  While fast-rebase does not support restarting from a
conflicted state, write the conflicted state out anyway as it gives us a
way to see what the conflicts are and write tests that check for them.

This will be important in the upcoming commits, because sequencer.c is
only superficially integrated with merge-ort.c; in particular, it calls
merge_switch_to_result() after EACH merge instead of only calling it at
the end of all the sequence of merges (or when a conflict is hit).  This
not only causes needless updates to the working copy and index, but also
causes all intermediate data to be freed and tossed, preventing caching
information from one merge to the next.  However, integrating
sequencer.c more deeply with merge-ort.c is a big task, and making this
small extension to fast-rebase.c provides us with a simple way to test
the edge and corner cases that we want to make sure continue working.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/helper/test-fast-rebase.c | 51 +++++++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 39fb7f41e8c1..fc2d46090434 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -91,7 +91,6 @@ int cmd__fast_rebase(int argc, const char **argv)
 	struct commit *last_commit = NULL, *last_picked_commit = NULL;
 	struct object_id head;
 	struct lock_file lock = LOCK_INIT;
-	int clean = 1;
 	struct strvec rev_walk_args = STRVEC_INIT;
 	struct rev_info revs;
 	struct commit *commit;
@@ -176,11 +175,10 @@ int cmd__fast_rebase(int argc, const char **argv)
 		free((char*)merge_opt.ancestor);
 		merge_opt.ancestor = NULL;
 		if (!result.clean)
-			die("Aborting: Hit a conflict and restarting is not implemented.");
+			break;
 		last_picked_commit = commit;
 		last_commit = create_commit(result.tree, commit, last_commit);
 	}
-	fprintf(stderr, "\nDone.\n");
 	/* TODO: There should be some kind of rev_info_free(&revs) call... */
 	memset(&revs, 0, sizeof(revs));
 
@@ -189,24 +187,39 @@ int cmd__fast_rebase(int argc, const char **argv)
 	if (result.clean < 0)
 		exit(128);
 
-	strbuf_addf(&reflog_msg, "finish rebase %s onto %s",
-		    oid_to_hex(&last_picked_commit->object.oid),
-		    oid_to_hex(&last_commit->object.oid));
-	if (update_ref(reflog_msg.buf, branch_name.buf,
-		       &last_commit->object.oid,
-		       &last_picked_commit->object.oid,
-		       REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
-		error(_("could not update %s"), argv[4]);
-		die("Failed to update %s", argv[4]);
+	if (result.clean) {
+		fprintf(stderr, "\nDone.\n");
+		strbuf_addf(&reflog_msg, "finish rebase %s onto %s",
+			    oid_to_hex(&last_picked_commit->object.oid),
+			    oid_to_hex(&last_commit->object.oid));
+		if (update_ref(reflog_msg.buf, branch_name.buf,
+			       &last_commit->object.oid,
+			       &last_picked_commit->object.oid,
+			       REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
+			error(_("could not update %s"), argv[4]);
+			die("Failed to update %s", argv[4]);
+		}
+		if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
+			die(_("unable to update HEAD"));
+		strbuf_release(&reflog_msg);
+		strbuf_release(&branch_name);
+
+		prime_cache_tree(the_repository, the_repository->index,
+				 result.tree);
+	} else {
+		fprintf(stderr, "\nAborting: Hit a conflict.\n");
+		strbuf_addf(&reflog_msg, "rebase progress up to %s",
+			    oid_to_hex(&last_picked_commit->object.oid));
+		if (update_ref(reflog_msg.buf, "HEAD",
+			       &last_commit->object.oid,
+			       &head,
+			       REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) {
+			error(_("could not update %s"), argv[4]);
+			die("Failed to update %s", argv[4]);
+		}
 	}
-	if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0)
-		die(_("unable to update HEAD"));
-	strbuf_release(&reflog_msg);
-	strbuf_release(&branch_name);
-
-	prime_cache_tree(the_repository, the_repository->index, result.tree);
 	if (write_locked_index(&the_index, &lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 		die(_("unable to write %s"), get_index_file());
-	return (clean == 0);
+	return (result.clean == 0);
 }

From patchwork Tue May  4 02:12:11 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237309
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4F008C43461
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:33 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 2EAC9611CD
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229770AbhEDCN0 (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36222 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229732AbhEDCNW (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:22 -0400
Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com
 [IPv6:2a00:1450:4864:20::42e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27F20C061763
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:27 -0700 (PDT)
Received: by mail-wr1-x42e.google.com with SMTP id h4so7550529wrt.12
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=c+UKrxYhl2gCd0tRi0F4OvfFV2iYJN0KI6IcSt63GXg=;
        b=OyM6rwFFPAYpes1i51cplsqttjA9ajvs2MmK3rwzknEOXck+nZMdMbk8ORzthyXKbe
         i40edbjzFhYTScFATXkCqhx7gJOgaypfOfxvkf2Su7nlOwLv6OjSahGCF7Rss7VI9Tpn
         M5Cd89CFxFj+BLOuuPbOeKsgaZpoL+TmKBbDiBxz73w09B8ZxsgKNQk5hPNYS22aZAUI
         iW3B8OmcJiInkQshFiZg/YcGQHQjBlXRda5Va54VWeBQ3C6ftAYMuc+JOB4PgtREI6Tr
         NU1hEPcs9xjzT486c9sm+c3UNXl3RggcRwvtef6gECqta1F07eFdXhghIfYpGhlod8rU
         pZIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=c+UKrxYhl2gCd0tRi0F4OvfFV2iYJN0KI6IcSt63GXg=;
        b=cGxEXIvwjuXauFUhg7ntr9aZQYq3Lb818UYt+gXQxHyJVrRSB+uf1lnseTnce+MNB6
         Ut4X5LzfpC+dhyfXio6q3nGB3ZmwbYx7ewPxTGg8u2AcBAKlP013SSvT2m6zfTW2WWnE
         Gy2HoODO+aHyZ1oiFYg/ZJ8UdxoT1P6zPjZsWr2rcSnLgQNSYeiUvfwETAodqAlJnUSP
         l5oiDAUNAKWYCXLoZErYDpPploRevmIwGS0Duqib+uVnImQEK+KM1xXoQvOn9FuRhuIG
         cYLTYANfyLPmZluo+a4Ht6qb0uKGRIJ4El4TZipTQm+w7kJdCiUMmBWg9CnaRBSj+GCO
         l3mg==
X-Gm-Message-State: AOAM533xPzc0etv1WHvUF0wdxO9k0KwaTbPmZAhXupPciQoQQ3XktnhL
        ssP/hKWC3HH7Lp9C/ZzUKEEZTZrOdis=
X-Google-Smtp-Source: 
 ABdhPJxBFUVcO8Jp70Dak8fdR1Kd3cimuikwlf9wRGbKKBH+J9vX837Z2T5Q7O/QisGVdQO+vSMI8g==
X-Received: by 2002:adf:e611:: with SMTP id p17mr2098838wrm.161.1620094345788;
        Mon, 03 May 2021 19:12:25 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 o13sm271280wmh.34.2021.05.03.19.12.25
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:25 -0700 (PDT)
Message-Id: 
 <54c126f3809815acde0fc078f1bebb51cfe78edb.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:11 +0000
Subject: [PATCH v2 05/13] t6429: testcases for remembering renames
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

We will soon be adding an optimization that caches (in memory only,
never written to disk) upstream renames during a sequence of merges such
as occurs during a cherry-pick or rebase operation.  Add several tests
meant to stress such an implementation to ensure it does the right
thing, and include a test whose outcome we will later change due to this
optimization as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 .../technical/remembering-renames.txt         |  14 +-
 t/t6429-merge-sequence-rename-caching.sh      | 692 ++++++++++++++++++
 2 files changed, 700 insertions(+), 6 deletions(-)
 create mode 100755 t/t6429-merge-sequence-rename-caching.sh

diff --git a/Documentation/technical/remembering-renames.txt b/Documentation/technical/remembering-renames.txt
index e7c298d83df2..a1d3499ca727 100644
--- a/Documentation/technical/remembering-renames.txt
+++ b/Documentation/technical/remembering-renames.txt
@@ -214,12 +214,14 @@ in-memory cache of renames and thus doesn't need to be considered further.
 In the special case that E->A does rename the file but also renames it to
 newfile, then there is no conflict from the renaming and the merge can
 succeed.  In this special case, the rename is not valid to cache because
-the second merge will find A:newfile in the MERGE_BASE.  So a
-rename/rename(1to1) needs to be specially handled by pruning renames from
-the cache and decrementing the dir_rename_counts in the current and leading
-directories associated with those renames.  Or, since these are really
-rare, one could just take the easy way out and disable the remembering
-renames optimization when a rename/rename(1to1) happens.
+the second merge will find A:newfile in the MERGE_BASE (see also the new
+testcases in t6429 with "rename same file identically" in their
+description).  So a rename/rename(1to1) needs to be specially handled by
+pruning renames from the cache and decrementing the dir_rename_counts in
+the current and leading directories associated with those renames.  Or,
+since these are really rare, one could just take the easy way out and
+disable the remembering renames optimization when a rename/rename(1to1)
+happens.
 
 The previous paragraph handled the cases for E->A renaming oldfile, let's
 continue assuming that oldfile is not renamed in A.
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
new file mode 100755
index 000000000000..f47d8924ee73
--- /dev/null
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -0,0 +1,692 @@
+#!/bin/sh
+
+test_description="remember regular & dir renames in sequence of merges"
+
+. ./test-lib.sh
+
+#
+# NOTE 1: this testfile tends to not only rename files, but modify on both
+#         sides; without modifying on both sides, optimizations can kick in
+#         which make rename detection irrelevant or trivial.  We want to make
+#         sure that we are triggering rename caching rather than rename
+#         bypassing.
+#
+# NOTE 2: this testfile uses 'test-tool fast-rebase' instead of either
+#         cherry-pick or rebase.  sequencer.c is only superficially
+#         integrated with merge-ort; it calls merge_switch_to_result()
+#         after EACH merge, which updates the index and working copy AND
+#         throws away the cached results (because merge_switch_to_result()
+#         is only supposed to be called at the end of the sequence).
+#         Integrating them more deeply is a big task, so for now the tests
+#         use 'test-tool fast-rebase'.
+#
+
+
+#
+# In the following simple testcase:
+#   Base:     numbers_1, values_1
+#   Upstream: numbers_2, values_2
+#   Topic_1:  sequence_3
+#   Topic_2:  scruples_3
+# or, in english, rename numbers -> sequence in the first commit, and rename
+# values -> scruples in the second commit.
+#
+# This shouldn't be a challenge, it's just verifying that cached renames isn't
+# preventing us from finding new renames.
+#
+test_expect_success 'caching renames does not preclude finding new ones' '
+	test_create_repo caching-renames-and-new-renames &&
+	(
+		cd caching-renames-and-new-renames &&
+
+		test_seq 2 10 >numbers &&
+		test_seq 2 10 >values &&
+		git add numbers values &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq 1 10 >numbers &&
+		test_seq 1 10 >values &&
+		git add numbers values &&
+		git commit -m "Tweaked both files" &&
+
+		git switch topic &&
+
+		test_seq 2 12 >numbers &&
+		git add numbers &&
+		git mv numbers sequence &&
+		git commit -m A &&
+
+		test_seq 2 12 >values &&
+		git add values &&
+		git mv values scruples &&
+		git commit -m B &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream~1..topic
+
+		git ls-files >tracked-files &&
+		test_line_count = 2 tracked-files &&
+		test_seq 1 12 >expect &&
+		test_cmp expect sequence &&
+		test_cmp expect scruples
+	)
+'
+
+#
+# In the following testcase:
+#   Base:     numbers_1
+#   Upstream: rename numbers_1 -> sequence_2
+#   Topic_1:  numbers_3
+#   Topic_2:  numbers_1
+# or, in english, the first commit on the topic branch modifies numbers by
+# shrinking it (dramatically) and the second commit on topic reverts its
+# parent.
+#
+# Can git apply both patches?
+#
+# Traditional cherry-pick/rebase will fail to apply the second commit, the
+# one that reverted its parent, because despite detecting the rename from
+# 'numbers' to 'sequence' for the first commit, it fails to detect that
+# rename when picking the second commit.  That's "reasonable" given the
+# dramatic change in size of the file, but remembering the rename and
+# reusing it is reasonable too.
+#
+# Rename detection (diffcore_rename_extended()) will run twice here; it is
+# not needed on the topic side of history for either of the two commits
+# being merged, but it is needed on the upstream side of history for each
+# commit being picked.
+test_expect_success 'cherry-pick both a commit and its immediate revert' '
+	test_create_repo pick-commit-and-its-immediate-revert &&
+	(
+		cd pick-commit-and-its-immediate-revert &&
+
+		test_seq 11 30 >numbers &&
+		git add numbers &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq 1 30 >numbers &&
+		git add numbers &&
+		git mv numbers sequence &&
+		git commit -m "Renamed (and modified) numbers -> sequence" &&
+
+		git switch topic &&
+
+		test_seq 11 13 >numbers &&
+		git add numbers &&
+		git commit -m A &&
+
+		git revert HEAD &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test_might_fail test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream~1..topic &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 2 calls
+	)
+'
+
+#
+# In the following testcase:
+#   Base:     sequence_1
+#   Upstream: rename sequence_1 -> values_2
+#   Topic_1:  rename sequence_1 -> values_3
+#   Topic_2:  add unrelated sequence_4
+# or, in english, both sides rename sequence -> values, and then the second
+# commit on the topic branch adds an unrelated file called sequence.
+#
+# This testcase presents no problems for git traditionally, but having both
+# sides do the same rename in effect "uses it up" and if it remains cached,
+# could cause a spurious rename/add conflict.
+#
+test_expect_success 'rename same file identically, then reintroduce it' '
+	test_create_repo rename-rename-1to1-then-add-old-filename &&
+	(
+		cd rename-rename-1to1-then-add-old-filename &&
+
+		test_seq 3 8 >sequence &&
+		git add sequence &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq 1 8 >sequence &&
+		git add sequence &&
+		git mv sequence values &&
+		git commit -m "Renamed (and modified) sequence -> values" &&
+
+		git switch topic &&
+
+		test_seq 3 10 >sequence &&
+		git add sequence &&
+		git mv sequence values &&
+		git commit -m A &&
+
+		test_write_lines A B C D E F G H I J >sequence &&
+		git add sequence &&
+		git commit -m B &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream~1..topic &&
+
+		git ls-files >tracked &&
+		test_line_count = 2 tracked &&
+		test_path_is_file values &&
+		test_path_is_file sequence &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 2 calls
+	)
+'
+
+#
+# In the following testcase:
+#   Base:     olddir/{valuesZ_1, valuesY_1, valuesX_1}
+#   Upstream: rename olddir/valuesZ_1 -> dirA/valuesZ_2
+#             rename olddir/valuesY_1 -> dirA/valuesY_2
+#             rename olddir/valuesX_1 -> dirB/valuesX_2
+#   Topic_1:  rename olddir/valuesZ_1 -> dirA/valuesZ_3
+#             rename olddir/valuesY_1 -> dirA/valuesY_3
+#   Topic_2:  add olddir/newfile
+#   Expected Pick1: dirA/{valuesZ, valuesY}, dirB/valuesX
+#   Expected Pick2: dirA/{valuesZ, valuesY}, dirB/{valuesX, newfile}
+#
+# This testcase presents no problems for git traditionally, but having both
+# sides do the same renames in effect "use it up" but if the renames remain
+# cached, the directory rename could put newfile in the wrong directory.
+#
+test_expect_success 'rename same file identically, then add file to old dir' '
+	test_create_repo rename-rename-1to1-then-add-file-to-old-dir &&
+	(
+		cd rename-rename-1to1-then-add-file-to-old-dir &&
+
+		mkdir olddir/ &&
+		test_seq 3 8 >olddir/valuesZ &&
+		test_seq 3 8 >olddir/valuesY &&
+		test_seq 3 8 >olddir/valuesX &&
+		git add olddir &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq 1 8 >olddir/valuesZ &&
+		test_seq 1 8 >olddir/valuesY &&
+		test_seq 1 8 >olddir/valuesX &&
+		git add olddir &&
+		mkdir dirA &&
+		git mv olddir/valuesZ olddir/valuesY dirA &&
+		git mv olddir/ dirB/ &&
+		git commit -m "Renamed (and modified) values*" &&
+
+		git switch topic &&
+
+		test_seq 3 10 >olddir/valuesZ &&
+		test_seq 3 10 >olddir/valuesY &&
+		git add olddir &&
+		mkdir dirA &&
+		git mv olddir/valuesZ olddir/valuesY dirA &&
+		git commit -m A &&
+
+		>olddir/newfile &&
+		git add olddir/newfile &&
+		git commit -m B &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+		git config merge.directoryRenames true &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream~1..topic &&
+
+		git ls-files >tracked &&
+		test_line_count = 4 tracked &&
+		test_path_is_file dirA/valuesZ &&
+		test_path_is_file dirA/valuesY &&
+		test_path_is_file dirB/valuesX &&
+		test_path_is_file dirB/newfile &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 3 calls
+	)
+'
+
+#
+# In the following testcase, upstream renames a directory, and the topic branch
+# first adds a file to the directory, then later renames the directory
+# differently:
+#   Base:     olddir/a
+#             olddir/b
+#   Upstream: rename olddir/ -> newdir/
+#   Topic_1:  add olddir/newfile
+#   Topic_2:  rename olddir/ -> otherdir/
+#
+# Here we are just concerned that cached renames might prevent us from seeing
+# the rename conflict, and we want to ensure that we do get a conflict.
+#
+# While at it, also test that we do rename detection three times.  We have to
+# detect renames on the upstream side of history once for each merge, plus
+# Topic_2 has renames.
+#
+test_expect_success 'cached dir rename does not prevent noticing later conflict' '
+	test_create_repo dir-rename-cache-not-occluding-later-conflict &&
+	(
+		cd dir-rename-cache-not-occluding-later-conflict &&
+
+		mkdir olddir &&
+		test_seq 3 10 >olddir/a &&
+		test_seq 3 10 >olddir/b &&
+		git add olddir &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq 3 10 >olddir/a &&
+		test_seq 3 10 >olddir/b &&
+		git add olddir &&
+		git mv olddir newdir &&
+		git commit -m "Dir renamed" &&
+
+		git switch topic &&
+
+		>olddir/newfile &&
+		git add olddir/newfile &&
+		git commit -m A &&
+
+		test_seq 1 8 >olddir/a &&
+		test_seq 1 8 >olddir/b &&
+		git add olddir &&
+		git mv olddir otherdir &&
+		git commit -m B &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+		git config merge.directoryRenames true &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output &&
+		#git cherry-pick upstream..topic &&
+
+		grep CONFLICT..rename/rename output &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 3 calls
+	)
+'
+
+# Helper for the next two tests
+test_setup_upstream_rename () {
+	test_create_repo $1 &&
+	(
+		cd $1 &&
+
+		test_seq 3 8 >somefile &&
+		test_seq 3 8 >relevant-rename &&
+		git add somefile relevant-rename &&
+		mkdir olddir &&
+		test_write_lines a b c d e f g >olddir/a &&
+		test_write_lines z y x w v u t >olddir/b &&
+		git add olddir &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch upstream &&
+		test_seq 1 8 >somefile &&
+		test_seq 1 8 >relevant-rename &&
+		git add somefile relevant-rename &&
+		git mv relevant-rename renamed &&
+		echo h >>olddir/a &&
+		echo s >>olddir/b &&
+		git add olddir &&
+		git mv olddir newdir &&
+		git commit -m "Dir renamed"
+	)
+}
+
+#
+# In the following testcase, upstream renames a file in the toplevel directory
+# as well as its only directory:
+#   Base:     relevant-rename_1
+#             somefile
+#             olddir/a
+#             olddir/b
+#   Upstream: rename relevant-rename_1 -> renamed_2
+#             rename olddir/           -> newdir/
+#   Topic_1:  relevant-rename_3
+#   Topic_2:  olddir/newfile_1
+#   Topic_3:  olddir/newfile_2
+#
+# In this testcase, since the first commit being picked only modifies a
+# file in the toplevel directory, the directory rename is irrelevant for
+# that first merge.  However, we need to notice the directory rename for
+# the merge that picks the second commit, and we don't want the third
+# commit to mess up its location either.  We want to make sure that
+# olddir/newfile doesn't exist in the result and that newdir/newfile does.
+#
+# We also expect rename detection to occur three times.  Although it is
+# typically needed two times per commit, there are no deleted files on the
+# topic side of history, so we only need to detect renames on the upstream
+# side for each of the 3 commits we need to pick.
+#
+test_expect_success 'dir rename unneeded, then add new file to old dir' '
+	test_setup_upstream_rename dir-rename-unneeded-until-new-file &&
+	(
+		cd dir-rename-unneeded-until-new-file &&
+
+		git switch topic &&
+
+		test_seq 3 10 >relevant-rename &&
+		git add relevant-rename &&
+		git commit -m A &&
+
+		echo foo >olddir/newfile &&
+		git add olddir/newfile &&
+		git commit -m B &&
+
+		echo bar >>olddir/newfile &&
+		git add olddir/newfile &&
+		git commit -m C &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+		git config merge.directoryRenames true &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream..topic &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 3 calls &&
+
+		git ls-files >tracked &&
+		test_line_count = 5 tracked &&
+		test_path_is_missing olddir/newfile &&
+		test_path_is_file newdir/newfile
+	)
+'
+
+#
+# The following testcase is *very* similar to the last one, but instead of
+# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
+#   Base:     relevant-rename_1
+#             somefile_1
+#             olddir/a
+#             olddir/b
+#   Upstream: rename relevant-rename_1 -> renamed_2
+#             rename olddir/           -> newdir/
+#   Topic_1:  relevant-rename_3
+#   Topic_2:  rename somefile -> olddir/newfile_2
+#   Topic_3:  modify olddir/newfile_3
+#
+# In this testcase, since the first commit being picked only modifies a
+# file in the toplevel directory, the directory rename is irrelevant for
+# that first merge.  However, we need to notice the directory rename for
+# the merge that picks the second commit, and we don't want the third
+# commit to mess up its location either.  We want to make sure that
+# neither somefile or olddir/newfile exists in the result and that
+# newdir/newfile does.
+#
+# This testcase needs one more call to rename detection than the last
+# testcase, because of the somefile -> olddir/newfile rename in Topic_2.
+test_expect_success 'dir rename unneeded, then rename existing file into old dir' '
+	test_setup_upstream_rename dir-rename-unneeded-until-file-moved-inside &&
+	(
+		cd dir-rename-unneeded-until-file-moved-inside &&
+
+		git switch topic &&
+
+		test_seq 3 10 >relevant-rename &&
+		git add relevant-rename &&
+		git commit -m A &&
+
+		test_seq 1 10 >somefile &&
+		git add somefile &&
+		git mv somefile olddir/newfile &&
+		git commit -m B &&
+
+		test_seq 1 12 >olddir/newfile &&
+		git add olddir/newfile &&
+		git commit -m C &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+		git config merge.directoryRenames true &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream..topic &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 4 calls &&
+
+		test_path_is_missing somefile &&
+		test_path_is_missing olddir/newfile &&
+		test_path_is_file newdir/newfile &&
+		git ls-files >tracked &&
+		test_line_count = 4 tracked
+	)
+'
+
+# Helper for the next two tests
+test_setup_topic_rename () {
+	test_create_repo $1 &&
+	(
+		cd $1 &&
+
+		test_seq 3 8 >somefile &&
+		mkdir olddir &&
+		test_seq 3 8 >olddir/a &&
+		echo b >olddir/b &&
+		git add olddir somefile &&
+		git commit -m orig &&
+
+		git branch upstream &&
+		git branch topic &&
+
+		git switch topic &&
+		test_seq 1 8 >somefile &&
+		test_seq 1 8 >olddir/a &&
+		git add somefile olddir/a &&
+		git mv olddir newdir &&
+		git commit -m "Dir renamed" &&
+
+		test_seq 1 10 >somefile &&
+		git add somefile &&
+		mkdir olddir &&
+		>olddir/unrelated-file &&
+		git add olddir &&
+		git commit -m "Unrelated file in recreated old dir"
+	)
+}
+
+#
+# In the following testcase, the first commit on the topic branch renames
+# a directory, while the second recreates the old directory and places a
+# file into it:
+#   Base:     somefile
+#             olddir/a
+#             olddir/b
+#   Upstream: olddir/newfile
+#   Topic_1:  somefile_2
+#             rename olddir/ -> newdir/
+#   Topic_2:  olddir/unrelated-file
+#
+# Note that the first pick should merge:
+#   Base:     somefile
+#             olddir/{a,b}
+#   Upstream: olddir/newfile
+#   Topic_1:  rename olddir/ -> newdir/
+# For which the expected result (assuming merge.directoryRenames=true) is
+# clearly:
+#   Result:   somefile
+#             newdir/{a, b, newfile}
+#
+# While the second pick does the following three-way merge:
+#   Base (Topic_1):           somefile
+#                             newdir/{a,b}
+#   Upstream (Result from 1): same files as base, but adds newdir/newfile
+#   Topic_2:                  same files as base, but adds olddir/unrelated-file
+#
+# The second merge is pretty trivial; upstream adds newdir/newfile, and
+# topic_2 adds olddir/unrelated-file.  We're just testing that we don't
+# accidentally cache directory renames somehow and rename
+# olddir/unrelated-file to newdir/unrelated-file.
+#
+# This testcase should only need one call to diffcore_rename_extended().
+test_expect_success 'caching renames only on upstream side, part 1' '
+	test_setup_topic_rename cache-renames-only-upstream-add-file &&
+	(
+		cd cache-renames-only-upstream-add-file &&
+
+		git switch upstream &&
+
+		>olddir/newfile &&
+		git add olddir/newfile &&
+		git commit -m "Add newfile" &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+
+		git config merge.directoryRenames true &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream..topic &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 1 calls &&
+
+		git ls-files >tracked &&
+		test_line_count = 5 tracked &&
+		test_path_is_missing newdir/unrelated-file &&
+		test_path_is_file olddir/unrelated-file &&
+		test_path_is_file newdir/newfile &&
+		test_path_is_file newdir/b &&
+		test_path_is_file newdir/a &&
+		test_path_is_file somefile
+	)
+'
+
+#
+# The following testcase is *very* similar to the last one, but instead of
+# adding a new olddir/newfile, it renames somefile -> olddir/newfile:
+#   Base:     somefile
+#             olddir/a
+#             olddir/b
+#   Upstream: somefile_1 -> olddir/newfile
+#   Topic_1:  rename olddir/ -> newdir/
+#             somefile_2
+#   Topic_2:  olddir/unrelated-file
+#             somefile_3
+#
+# Much like the previous test, this case is actually trivial and we are just
+# making sure there isn't some spurious directory rename caching going on
+# for the wrong side of history.
+#
+#
+# This testcase should only need three calls to diffcore_rename_extended(),
+# because there are no renames on the topic side of history for picking
+# Topic_2.
+#
+test_expect_success 'caching renames only on upstream side, part 2' '
+	test_setup_topic_rename cache-renames-only-upstream-rename-file &&
+	(
+		cd cache-renames-only-upstream-rename-file &&
+
+		git switch upstream &&
+
+		git mv somefile olddir/newfile &&
+		git commit -m "Add newfile" &&
+
+		#
+		# Actual testing
+		#
+
+		git switch upstream &&
+
+		git config merge.directoryRenames true &&
+
+		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
+		export GIT_TRACE2_PERF &&
+
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		#git cherry-pick upstream..topic &&
+
+		grep region_enter.*diffcore_rename trace.output >calls &&
+		test_line_count = 3 calls &&
+
+		git ls-files >tracked &&
+		test_line_count = 4 tracked &&
+		test_path_is_missing newdir/unrelated-file &&
+		test_path_is_file olddir/unrelated-file &&
+		test_path_is_file newdir/newfile &&
+		test_path_is_file newdir/b &&
+		test_path_is_file newdir/a
+	)
+'
+
+test_done

From patchwork Tue May  4 02:12:12 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237307
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A19FFC433ED
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:32 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 734FB611CD
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229748AbhEDCNY (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:24 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36224 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229731AbhEDCNW (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:22 -0400
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com
 [IPv6:2a00:1450:4864:20::42d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6F98C061574
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:27 -0700 (PDT)
Received: by mail-wr1-x42d.google.com with SMTP id v12so7571623wrq.6
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=XS1CQERRQXHEjAgU2tvYDdnFDoyFf+4tzduFgU874gM=;
        b=ZBtHeCb6ndXLPEP1gLxYnsf6JaMH9znorj4YBeGEEiDw0WEcw4uEalmfgVOlTcqPSz
         E+TTlBcArWAjLQXYw2Q198o3VnTpZIax5yqH54tvcU22T4Ytpc0YPSbfCrg92ilXbmCu
         JOF0yYPseTGAJDWiWb3qZUm9CRsZ497x4v76SoSvxMMdNlkbM3J/hxyqYg9xA77w1NEm
         IcyANvZB02IvM1ZQvPIi6/aagUWQAFd6r4yEp/pu0+DVynf6cbqMOgfOLSisRA7PMk0B
         xdlc1ZPjfxYCgevqRfTqD64jj/k18YxrNZEL6+XVphWjLn96n2LtnITJiJimXRgbbPtc
         2TSw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=XS1CQERRQXHEjAgU2tvYDdnFDoyFf+4tzduFgU874gM=;
        b=Zzx5eML4FgC65wCOzVBCgCb7nreROTzo4tQ8dNX2Mkcf7dYsQTO1qqyXgQKByXtVMC
         Ikpk5hDiKkmrzUCgMdUc8foK7kqlSXZ9ywjdQqN7rwel2fQB6MX8aUul55xSrnBAEf5U
         kSJf3elmS+hLb4lTXrfrAeaE78mWb3fnmueWGWFaW6JtpXni1d/t/25g6u40LFkTPwXX
         UOwtjAkaR2Q2e4sib+zaVFgRQ3ZnHllWKu5ShzW2Vn8AO9MV1U3owHBE0sxlBBSnoT+z
         6AlN+v61y/k3Qe8cGXmFM9tKhj44bzuLljMSddU1LchOjyV0aNEXcTonqEt2ktABUlBH
         IyNQ==
X-Gm-Message-State: AOAM533NjJamxPwD2ilyVwX7CGUmh/RU+fSEFylQM7R9D1gsm71O/mTV
        BNZJrz2zY5U8lmGBzssfR3/6UhDGOMs=
X-Google-Smtp-Source: 
 ABdhPJwjcqBTxcNSu7ge6WlNIpHEWuxsLdgOGXohMxOUEi6tZ5eX8usfjzIY95Jqagl8GLBoT375sQ==
X-Received: by 2002:a5d:4d09:: with SMTP id z9mr20213426wrt.131.1620094346588;
        Mon, 03 May 2021 19:12:26 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 u9sm1168604wmc.38.2021.05.03.19.12.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:26 -0700 (PDT)
Message-Id: 
 <2a9e73de2beef5f51ad76fe1d9aaaed926a5fce8.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:12 +0000
Subject: [PATCH v2 06/13] merge-ort: add data structures for in-memory caching
 of rename detection
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

When there are many renames between the old base of a series of commits
and the new base for a series of commits, the sequence of merges
employed to transplant those commits (from a cherry-pick or rebase
operation) will repeatedly detect the exact same renames.  This is
wasted effort.

Add data structures which will be used to cache rename detection
results, along with the initialization and deallocation of these data
structures.  Future commits will populate these caches, detect the
appropriate circumstances when they can be used, and employ them to
avoid re-detecting the same renames repeatedly.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b1795d838eaf..8602f88a960c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -140,6 +140,37 @@ struct rename_info {
 	int callback_data_nr, callback_data_alloc;
 	char *callback_data_traverse_path;
 
+	/*
+	 * cached_pairs: Caching of renames and deletions.
+	 *
+	 * These are mappings recording renames and deletions of individual
+	 * files (not directories).  They are thus a map from an old
+	 * filename to either NULL (for deletions) or a new filename (for
+	 * renames).
+	 */
+	struct strmap cached_pairs[3];
+
+	/*
+	 * cached_target_names: just the destinations from cached_pairs
+	 *
+	 * We sometimes want a fast lookup to determine if a given filename
+	 * is one of the destinations in cached_pairs.  cached_target_names
+	 * is thus duplicative information, but it provides a fast lookup.
+	 */
+	struct strset cached_target_names[3];
+
+	/*
+	 * cached_irrelevant: Caching of rename_sources that aren't relevant.
+	 *
+	 * cached_pairs records both renames and deletes.  Sometimes we
+	 * do not know if a path is a rename or a delete because we pass
+	 * RELEVANT_LOCATION to diffcore_rename_extended() and based on
+	 * various optimizations it returns without detecting whether that
+	 * path is actually a rename or a delete.  We need to cache such
+	 * paths too, but separately from cached_pairs.
+	 */
+	struct strset cached_irrelevant[3];
+
 	/*
 	 * needed_limit: value needed for inexact rename detection to run
 	 *
@@ -382,6 +413,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		reinitialize ? strmap_partial_clear : strmap_clear;
 	void (*strintmap_func)(struct strintmap *) =
 		reinitialize ? strintmap_partial_clear : strintmap_clear;
+	void (*strset_func)(struct strset *) =
+		reinitialize ? strset_partial_clear : strset_clear;
 
 	/*
 	 * We marked opti->paths with strdup_strings = 0, so that we
@@ -425,6 +458,9 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_func(&renames->dir_renames[i], 0);
 
 		strintmap_func(&renames->relevant_sources[i]);
+		strset_func(&renames->cached_target_names[i]);
+		strmap_func(&renames->cached_pairs[i], 1);
+		strset_func(&renames->cached_irrelevant[i]);
 	}
 
 	if (!reinitialize) {
@@ -3667,6 +3703,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 					 NULL, 0);
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    0, NULL, 0);
+		strmap_init_with_options(&renames->cached_pairs[i],
+					 NULL, 1);
+		strset_init_with_options(&renames->cached_irrelevant[i],
+					 NULL, 1);
+		strset_init_with_options(&renames->cached_target_names[i],
+					 NULL, 0);
 	}
 
 	/*

From patchwork Tue May  4 02:12:13 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237303
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1185AC43461
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:31 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id E16E6611CD
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229742AbhEDCNY (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:24 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36234 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229722AbhEDCNW (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:22 -0400
Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com
 [IPv6:2a00:1450:4864:20::42b])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CBBAC06174A
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:28 -0700 (PDT)
Received: by mail-wr1-x42b.google.com with SMTP id l2so7576819wrm.9
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=VOu8FCKXmIIrUMbvuOjzcUv6+GzE8vjNh+KiKPLNun0=;
        b=vaBxRHy1LSamdAlQVSFpN1SfEr9BWssyGSs59al2h4YOwTYrAjULii6aihi7vLnLCp
         JJqvkIqJlNpNXD4TiNIfi9ECBIGdZJikKGegRP1aynR6pFZmILQ2//FNXs+NPqTsYT+W
         tTMr3elQeJtSrRrkpnQHmVd5CgvpUr5JAahoygB6JasJIiTlBAjaezs1XE44f8NFKd8t
         uerLX7PGYMJJ5h3nBCA3qgBIgOtzTTYtWsjpWqhgEtZPRTywwe3wQQlPpVUX/I+z/u8k
         DlGoONvKVhUIvuOiPLPNa2x5uyNqcV5dwWnAPfufqEnw6y23siBpEqS6vyoIC6jKkIuT
         nYzQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=VOu8FCKXmIIrUMbvuOjzcUv6+GzE8vjNh+KiKPLNun0=;
        b=P3qB4VbqGdbzg3HmQi2g0+2RQeCsOl7uNyBrkirfMlGu+IOACuEf7Dlkub8Uw/SEfW
         oMOGsD5c1LPaBjdsG7OcJ/kEe8mUGLFZFMmtWDmBEuzroRmT+eFoYO/Vsj74nT2hW4vl
         3vLZFkOAHvFO7g4LNX+C1eBMPuFuLCuu+dPBBh9WQMdCy9eRAD4kTg0UaRHMFSV3aJlL
         5oDSRm4QgA7pXb5PoUOphuZ8FEds+eV3H+cmFBAKnrQHMUtw31SMEu9VGj10Y0cMhDbL
         rFMOdyaOo7xCbWJvfGc+Pw/kwKLYh1Q+Xb9BRA1B7JHGX6z/MjkfRys9CiUbXCLvBP50
         7J/Q==
X-Gm-Message-State: AOAM532/LuuCPzgF2wB62yJGFAG2h6G+3OHoNNLG4pH+Xwn3fWJEPYrn
        IVfwkvMHOQrovddCGCakSw6HVyncJG4=
X-Google-Smtp-Source: 
 ABdhPJzetg1DpRrrfk7cJIYNomrDAnwEOHQGTaaQAIZqi9DSJDgB1hatIRnDOknqyCv7af2t3hRerg==
X-Received: by 2002:adf:d22c:: with SMTP id k12mr29045526wrh.25.1620094347276;
        Mon, 03 May 2021 19:12:27 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 a2sm15119918wrt.82.2021.05.03.19.12.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:26 -0700 (PDT)
Message-Id: 
 <02d517f052a35a952c726e7e941650ce424abb85.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:13 +0000
Subject: [PATCH v2 07/13] merge-ort: populate caches of rename detection
 results
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

Fill in cache_pairs, cached_target_names, and cached_irrelevant based on
rename detection results.  Future commits will make use of these values.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8602f88a960c..5523fc9e86b3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2333,6 +2333,65 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q)
 	}
 }
 
+static void possibly_cache_new_pair(struct rename_info *renames,
+				    struct diff_filepair *p,
+				    unsigned side,
+				    char *new_path)
+{
+	char *old_value;
+	int dir_renamed_side = 0;
+
+	if (new_path) {
+		/*
+		 * Directory renames happen on the other side of history from
+		 * the side that adds new files to the old directory.
+		 */
+		dir_renamed_side = 3 - side;
+	} else {
+		int val = strintmap_get(&renames->relevant_sources[side],
+					p->one->path);
+		if (val == RELEVANT_NO_MORE) {
+			assert(p->status == 'D');
+			strset_add(&renames->cached_irrelevant[side],
+				   p->one->path);
+		}
+		if (val <= 0)
+			return;
+	}
+
+	if (p->status == 'D') {
+		/*
+		 * If we already had this delete, we'll just set it's value
+		 * to NULL again, so no harm.
+		 */
+		strmap_put(&renames->cached_pairs[side], p->one->path, NULL);
+	} else if (p->status == 'R') {
+		if (new_path) {
+			new_path = xstrdup(new_path);
+			old_value = strmap_put(&renames->cached_pairs[dir_renamed_side],
+					       p->two->path, new_path);
+			strset_add(&renames->cached_target_names[dir_renamed_side],
+				   new_path);
+			assert(!old_value);
+		}
+		if (!new_path)
+			new_path = p->two->path;
+		new_path = xstrdup(new_path);
+		old_value = strmap_put(&renames->cached_pairs[side],
+				       p->one->path, new_path);
+		strset_add(&renames->cached_target_names[side],
+			   new_path);
+		free(old_value);
+	} else if (p->status == 'A' && new_path) {
+		new_path = xstrdup(new_path);
+		old_value = strmap_put(&renames->cached_pairs[dir_renamed_side],
+				       p->two->path, new_path);
+		strset_add(&renames->cached_target_names[dir_renamed_side],
+			   new_path);
+		assert(!old_value);
+	}
+}
+
 static int compare_pairs(const void *a_, const void *b_)
 {
 	const struct diff_filepair *a = *((const struct diff_filepair **)a_);
@@ -2415,6 +2474,7 @@ static int collect_renames(struct merge_options *opt,
 		char *new_path; /* non-NULL only with directory renames */
 
 		if (p->status != 'A' && p->status != 'R') {
+			possibly_cache_new_pair(renames, p, side_index, NULL);
 			diff_free_filepair(p);
 			continue;
 		}
@@ -2426,11 +2486,11 @@ static int collect_renames(struct merge_options *opt,
 						      &collisions,
 						      &clean);
 
+		possibly_cache_new_pair(renames, p, side_index, new_path);
 		if (p->status != 'R' && !new_path) {
 			diff_free_filepair(p);
 			continue;
 		}
-
 		if (new_path)
 			apply_directory_rename_modifications(opt, p, new_path);
 
@@ -3701,8 +3761,16 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 					 NULL, 1);
 		strmap_init_with_options(&renames->dir_renames[i],
 					 NULL, 0);
+		/*
+		 * relevant_sources uses -1 for the default, because we need
+		 * to be able to distinguish not-in-strintmap from valid
+		 * relevant_source values from enum file_rename_relevance.
+		 * In particular, possibly_cache_new_pair() expects a negative
+		 * value for not-found entries.
+		 */
 		strintmap_init_with_options(&renames->relevant_sources[i],
-					    0, NULL, 0);
+					    -1 /* explicitly invalid */,
+					    NULL, 0);
 		strmap_init_with_options(&renames->cached_pairs[i],
 					 NULL, 1);
 		strset_init_with_options(&renames->cached_irrelevant[i],

From patchwork Tue May  4 02:12:14 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237311
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 32BA8C43462
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:34 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 14358611CB
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229779AbhEDCN1 (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36236 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229739AbhEDCNY (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:24 -0400
Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com
 [IPv6:2a00:1450:4864:20::42c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42B1BC061574
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:29 -0700 (PDT)
Received: by mail-wr1-x42c.google.com with SMTP id l2so7576839wrm.9
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=KP2sE1SB6p3AhFcbQXGhLgaueKFXRAEFyG69JIqUoPU=;
        b=R5SVUb2aD1OE3U7UZ3XkWgvcLSxlMGuEBzoNoE2h3sc1nkpkcaypa8+Pu0UBHv98rn
         3bGOcU2+GK7QpPMVuc4dZWJPWx3/jucLJWo6bcOcDyRxkpImn206vfrRKr9dOf+7SkNJ
         iIY6ryQ9cdvy4E1YSXvSJ1k/6VK58hgk2o01aMoyxAsyKh3kHg2+rU8aXWb1f7uhTu5o
         ItpeyOHZb+xyzldcBXwTfxnba7B4l64kS+urtaqKMvZhYvRMB1AKfH8DllEoeYcGkqDV
         rZ/M6cXvJQ7HlOcUHJiusL/zU0lWGmZZQuOSKvDNUYQn6juIvEZ5ifqAcJDqBtaeQmkL
         DEqw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=KP2sE1SB6p3AhFcbQXGhLgaueKFXRAEFyG69JIqUoPU=;
        b=Qefs0eA1oZ/Xm+DpcHqnKfA0YmY49NajlF0eK1i/Se1cm+AjPLQWmuY3vDnWPCovog
         odaCrmKf3HB4JqT0ov0NVqBJzwbXVe8iM/5Ld6h7iwwm/o2zJaq3JMf0v6wfsF/vjwkb
         a5atbC3UbTgSvYc/hBaZxXLpwdAQgJtuVtg7ovb3kmYtBIKcwjMIN+Qf87lKNMx+SNJE
         9sT3vYmqTWrdLpXI8rRmbpbWCavWZNtpWGcbUtyDYpLJ76wA6QI4+DYDVlzlYl7EJ3fU
         Y+um/zvajwWaM4UBuQgSeOlzrslNcrYSQj7jKZZr+dBtYkb0y3e5mB+dOi85dp2dJCAl
         YjJQ==
X-Gm-Message-State: AOAM532w7Z3zUx7L2dlx7B3kR0bn+CaXSOJULnIIvKOHDpJlB2x/P4e7
        vLFZH+zeodNSnljpHHCi3W8akKjMxdU=
X-Google-Smtp-Source: 
 ABdhPJyG5YFkPibJA/gpggsptpxbJcqJc4J7FCdWaTG+dfI106luI2Pu1iv88n7ZZ0QItOnsnBT02w==
X-Received: by 2002:a5d:45cb:: with SMTP id
 b11mr28797876wrs.343.1620094347986;
        Mon, 03 May 2021 19:12:27 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 c2sm1167876wmr.22.2021.05.03.19.12.27
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:27 -0700 (PDT)
Message-Id: 
 <731b6bd15531fc7883a2c70275cea24ac686ab03.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:14 +0000
Subject: [PATCH v2 08/13] merge-ort: add code to check for whether cached
 renames can be reused
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

We need to know when renames detected in a previous merge operation can
be reused in a later merge operation.  Consider the following setup
(from the git-rebase manpage):

                     A---B---C topic
                    /
               D---E---F---G master

After rebasing, this will appear as:

                             A'--B'--C' topic
                            /
               D---E---F---G master

Further, let's say that 'oldfile' was renamed to 'newfile' between E
and G.  The rebase or cherry-pick of A onto G will involve a three-way
merge between E (as the merge base) and G and A.  After detecting the
rename between E:oldfile and G:newfile, there will be a three-way
content merge of the following:
    E:oldfile
    G:newfile
    A:oldfile
and produce a new result:
    A':newfile

Now, when we want to pick B onto A', we will need to do a three-way
merge between A (as the merge-base) and A' and B.  This will involve
a three-way content merge of
    A:oldfile
    A':newfile
    B:oldfile
but only if we can detect that A:oldfile is similar enough to A':newfile
to be used together in a three-way content merge, i.e. only if we can
detect that A:oldfile and A':newfile are a rename.  But we already know
that A:oldfile and A':newfile are similar enough to be used in a
three-way content merge, because that is precisely where A':newfile came
from in the previous merge.

Note that A & A' both appear in both merges.  That gives us the
condition under which we can reuse renames.

There are a couple important points about this optimization:

  - If the rebase or cherry-pick halts for user conflicts, these caches
    are NOT saved anywhere.  Thus, resuming a halted rebase or
    cherry-pick will result in no reused renames for the next commit.
    This is intentional, as user resolution can change files
    significantly and in ways that violate the similarity assumptions
    here.

  - Technically, in a *very* narrow case this might give slightly
    different results for rename detection.  Using the example above,
    if:
      * E:oldfile had 20 lines
      * G:newfile added 10 new lines at the beginning of the file
      * A:oldfile deleted all but the first three lines of the file
    then
      => A':newfile would have 13 lines, 3 of which matches those
         in A:oldfile.

    Consider the two cases:
      * Without this optimization:
        - the next step of the rebase operation (moving B to B')
          would not detect the rename betwen A:oldfile and A':newfile
        - we'd thus get a modify/delete conflict with the rebase
          operation halting for the user to resolve, and have both
          A':newfile and B:oldfile sitting in the working tree.
      * With this optimization:
        - the rename between A:oldfile and A':newfile would be detected
          via the cache of renames
        - a three-way merge between A:oldfile, A':newfile, and B:oldfile
          would commence and be written to A':newfile

    Now, is the difference in behavior a bug...or a bugfix?  I can't
    tell.  Given that A:oldfile and A':newfile are not very similar,
    when we three-way merge with B:oldfile it seems likely we'll hit a
    conflict for the user to resolve.  And it shouldn't be too hard for
    users to see why we did that three-way merge; oldfile and newfile
    *were* renames somewhere in the sequence.  So, most of these corner
    cases will still behave similarly -- namely, a conflict given to the
    user to resolve.  Also, consider the interesting case when commit B
    is a clean revert of commit A.  Without this optimization, a rebase
    could not both apply a weird patch like A and then immediately
    revert it; users would be forced to resolve merge conflicts.  With
    this optimization, it would successfully apply the clean revert.
    So, there is certainly at least one case that behaves better.  Even
    if it's considered a "difference in behavior", I think both behaviors
    are reasonable, and the time savings provided by this optimization
    justify using the slightly altered rename heuristics.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 5523fc9e86b3..a342cc6344fd 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -140,6 +140,30 @@ struct rename_info {
 	int callback_data_nr, callback_data_alloc;
 	char *callback_data_traverse_path;
 
+	/*
+	 * merge_trees: trees passed to the merge algorithm for the merge
+	 *
+	 * merge_trees records the trees passed to the merge algorithm.  But,
+	 * this data also is stored in merge_result->priv.  If a sequence of
+	 * merges are being done (such as when cherry-picking or rebasing),
+	 * the next merge can look at this and re-use information from
+	 * previous merges under certain cirumstances.
+	 *
+	 * See also all the cached_* variables.
+	 */
+	struct tree *merge_trees[3];
+
+	/*
+	 * cached_pairs_valid_side: which side's cached info can be reused
+	 *
+	 * See the description for merge_trees.  For repeated merges, at most
+	 * only one side's cached information can be used.  Valid values:
+	 *   MERGE_SIDE2: cached data from side2 can be reused
+	 *   MERGE_SIDE1: cached data from side1 can be reused
+	 *   0:           no cached data can be reused
+	 */
+	int cached_pairs_valid_side;
+
 	/*
 	 * cached_pairs: Caching of renames and deletions.
 	 *
@@ -462,6 +486,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_func(&renames->cached_pairs[i], 1);
 		strset_func(&renames->cached_irrelevant[i]);
 	}
+	renames->cached_pairs_valid_side = 0;
+	renames->dir_rename_mask = 0;
 
 	if (!reinitialize) {
 		struct hashmap_iter iter;
@@ -484,8 +510,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_clear(&opti->output, 0);
 	}
 
-	renames->dir_rename_mask = 0;
-
 	/* Clean out callback_data as well. */
 	FREE_AND_NULL(renames->callback_data);
 	renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -3802,6 +3826,35 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
+static void merge_check_renames_reusable(struct merge_options *opt,
+					 struct merge_result *result,
+					 struct tree *merge_base,
+					 struct tree *side1,
+					 struct tree *side2)
+{
+	struct rename_info *renames;
+	struct tree **merge_trees;
+	struct merge_options_internal *opti = result->priv;
+
+	if (!opti)
+		return;
+
+	renames = &opti->renames;
+	merge_trees = renames->merge_trees;
+	/* merge_trees[0..2] will only be NULL if opti is */
+	assert(merge_trees[0] && merge_trees[1] && merge_trees[2]);
+
+	/* Check if we meet a condition for re-using cached_pairs */
+	if (     oideq(&merge_base->object.oid, &merge_trees[2]->object.oid) &&
+		 oideq(     &side1->object.oid, &result->tree->object.oid))
+		renames->cached_pairs_valid_side = MERGE_SIDE1;
+	else if (oideq(&merge_base->object.oid, &merge_trees[1]->object.oid) &&
+		 oideq(     &side2->object.oid, &result->tree->object.oid))
+		renames->cached_pairs_valid_side = MERGE_SIDE2;
+	else
+		renames->cached_pairs_valid_side = 0; /* neither side valid */
+}
+
 /*** Function Grouping: merge_incore_*() and their internal variants ***/
 
 /*
@@ -3949,7 +4002,16 @@ void merge_incore_nonrecursive(struct merge_options *opt,
 
 	trace2_region_enter("merge", "merge_start", opt->repo);
 	assert(opt->ancestor != NULL);
+	merge_check_renames_reusable(opt, result, merge_base, side1, side2);
 	merge_start(opt, result);
+	/*
+	 * Record the trees used in this merge, so if there's a next merge in
+	 * a cherry-pick or rebase sequence it might be able to take advantage
+	 * of the cached_pairs in that next merge.
+	 */
+	opt->priv->renames.merge_trees[0] = merge_base;
+	opt->priv->renames.merge_trees[1] = side1;
+	opt->priv->renames.merge_trees[2] = side2;
 	trace2_region_leave("merge", "merge_start", opt->repo);
 
 	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);

From patchwork Tue May  4 02:12:15 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237321
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A1325C433ED
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:40 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 81EC161090
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229807AbhEDCNc (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36238 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229781AbhEDCN1 (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:27 -0400
Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com
 [IPv6:2a00:1450:4864:20::334])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D89A8C06174A
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:29 -0700 (PDT)
Received: by mail-wm1-x334.google.com with SMTP id i128so2415054wma.5
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=ztc3ZjtU9cT/TBaH9V9aKolleMMZadYpb0AXUv6dt0I=;
        b=HRs8QFG1v64aUKDmXRxc8Y+tTw/PD//yUo7vdNEDbZBGDM+NmC01VgJY93RKLbaTKM
         gNS9vCEIoeKlu18tTYpZdRWW60b80vebup97j+Kt4Np+ng/o5bhbr+d53Tu/H247q9uH
         p1qNUOjURsSQzuNMbjCWqarBMh6J66/CGCBZFxg6+ZE+cM3+qu/LttRl4RyXAJpO2jIM
         gRrUa3OKCv8Pmlf3Cex/xNq57MvnNgXq95lj46m83vduoAlulwtpbu31YU8GHQG3ynwL
         h8p+ad7Uhu609VVmheqAfAz2I7Eb5/d913WWtDtQrV2RfIeYkKO0jtoLtvbr0lF67tG2
         TLoQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=ztc3ZjtU9cT/TBaH9V9aKolleMMZadYpb0AXUv6dt0I=;
        b=mT4sszPOjEH5qQLKqQ0ycTQjho8R/1IUjZGvhmdpSP7qR5KzzhvBpo34TYTLbW31ZB
         8aY0JuPbr+zMr3JjsetHCtUwj4Q/J/ocjl/B/L1OYEWu8OZ0GsPSgWJJ+ReU0Sjcafhk
         peamm7U6LarVwmuAfjVcwPOfqyb0s4UlUfF6QxG85Xq9ppp4e237BnLEwAgnw7cC2LW9
         VeBqEjvDuNW4Sz6+z2rNec+0cx04Ed1KZJZQLYvzkmYqTe5g5g22pBTqPPJE13VOAA//
         miKU/Tf4lpU2ScV/Ddk0S8pfFMYpjQW9wvQIJP9wlbm8ccWBHYf8/1rBVtL/z5On43pU
         kTXA==
X-Gm-Message-State: AOAM531TJlx88QhKyHxWSCIccN2ftymzwdfpD0nUCSJPzwCp0vbXPohz
        pBZA4+YlwcuGp3OBjgwa+q+IudYwqnA=
X-Google-Smtp-Source: 
 ABdhPJy+/JkaxP/F7nig5MNpfQCLSPF+qj2B8867zNxS5LtmkmUaJO94fWF7s7CraYVEod/o3NiwFw==
X-Received: by 2002:a05:600c:19cc:: with SMTP id
 u12mr1393232wmq.171.1620094348659;
        Mon, 03 May 2021 19:12:28 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 b7sm6191599wri.83.2021.05.03.19.12.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:28 -0700 (PDT)
Message-Id: 
 <becd451030183e6271646a4bf006c2466822326e.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:15 +0000
Subject: [PATCH v2 09/13] merge-ort: avoid accidental API mis-use
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

Previously, callers of the merge-ort API could have passed an
uninitialized value for struct merge_result *result.  However, we want
to check result to see if it has cached renames from a previous merge
that we can reuse; such values would be found behind result->priv.
However, if result->priv is uninitialized, attempting to access behind
it will give a segfault.  So, we need result->priv to be NULL (which
will be the case if the caller does a memset(&result, 0)), or be written
by a previous call to the merge-ort machinery.  Documenting this
requirement may help, but despite being the person who introduced this
requirement, I still missed it once and it did not fail in a very clear
way and led to a long debugging session.

Add a _properly_initialized field to merge_result; that value will be
0 if the caller zero'ed the merge_result, it will be set to a very
specific value by a previous run by the merge-ort machinery, and if it's
uninitialized it will most likely either be 0 or some value that does
not match the specific one we'd expect allowing us to throw a much more
meaningful error.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 7 +++++++
 merge-ort.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index a342cc6344fd..e6a02fa928f5 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -53,6 +53,8 @@ enum merge_side {
 	MERGE_SIDE2 = 2
 };
 
+static unsigned RESULT_INITIALIZED = 0x1abe11ed; /* unlikely accidental value */
+
 struct traversal_callback_data {
 	unsigned long mask;
 	unsigned long dirmask;
@@ -3746,6 +3748,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	assert(opt->obuf.len == 0);
 
 	assert(opt->priv == NULL);
+	if (result->_properly_initialized != 0 &&
+	    result->_properly_initialized != RESULT_INITIALIZED)
+		BUG("struct merge_result passed to merge_incore_*recursive() must be zeroed or filled with values from a previous run");
+	assert(!!result->priv == !!result->_properly_initialized);
 	if (result->priv) {
 		opt->priv = result->priv;
 		result->priv = NULL;
@@ -3905,6 +3911,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 	result->clean &= strmap_empty(&opt->priv->conflicted);
 	if (!opt->priv->call_depth) {
 		result->priv = opt->priv;
+		result->_properly_initialized = RESULT_INITIALIZED;
 		opt->priv = NULL;
 	}
 }
diff --git a/merge-ort.h b/merge-ort.h
index d53a0a339f33..c011864ffeb1 100644
--- a/merge-ort.h
+++ b/merge-ort.h
@@ -29,6 +29,8 @@ struct merge_result {
 	 * !clean) and to print "CONFLICT" messages.  Not for external use.
 	 */
 	void *priv;
+	/* Also private */
+	unsigned _properly_initialized;
 };
 
 /*

From patchwork Tue May  4 02:12:16 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237313
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B1EDAC433B4
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:36 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 52A17611CB
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229784AbhEDCN2 (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36246 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229751AbhEDCNY (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:24 -0400
Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com
 [IPv6:2a00:1450:4864:20::434])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 867F7C061574
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:30 -0700 (PDT)
Received: by mail-wr1-x434.google.com with SMTP id x7so7580808wrw.10
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=r3cDq5vY4YQZr9QUMEa8RHAdsPDS6ptvqm6tEkstoD4=;
        b=mqfiH0pOU8Fv/J0571Bw6vVRiDEWYWLM69Om8FhnyMAx4g3GV9D8mpOe8CC5BgwLDE
         5agzuGZLZkJu1oG20XZLxtUgB1ROhNY7HKxhgeghi9QyvEGTRmk8kLV7Ts+hwnnCFeu/
         254f6CZlWMpilH8N7GxejcMeNI2d6ZHWWJ3V7XmhZsB9wOpcYuivpCdBpuznvsYOfJDW
         IPKIzsjkzAScIaLViYNFjzb7/xq6d8ew9Vdr9limz5w2T23/rlwHNhemtjOkR72FJgkm
         DLnRUSkrK5GlBpOqi+6cDq5G0UTezFiyBSKMdwjoVuULsCyeeSgis/6ov0NVLfTUA3DL
         ZOVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=r3cDq5vY4YQZr9QUMEa8RHAdsPDS6ptvqm6tEkstoD4=;
        b=P25O5YWVtqPqo7KoQJXsBfOBUGMrpP7/nxUwPiWlJlnZpHv/KFZt3BVq9EZ5jleXEa
         JN+NcUsfCdQU7utgJfZlPup12s1F6vb/hoXJPcJ9xGtPCRL/Y0JbJrTzubcSHV/xK+Cs
         mdHBujFCD5MF+/nPgvcSxAsOEbNSYkdbll2Apx8P+DjuUYaSv53T+MjUvoNp0fNgbPsU
         S+1BRfQ0DNouG+3KKHlFmVDwYAbN/U9/zU2r3X2sQO6HDkDRPw6EXjefBVE51Ir20cGl
         PTxu/9/pPPnrhg582X6vx0UdlF7zs2XJ++dmk4TmghjLkpneagI20l5edSpILdPnVtI8
         t7UA==
X-Gm-Message-State: AOAM532mNMwDs9KeZBz+vD3SDlmvKPIUx8UMTlIbGFwCZHh4+tknRVlr
        PEpTxdOwXaiFLvxbT/T3G4IS+xtGnwg=
X-Google-Smtp-Source: 
 ABdhPJzPP+k0jZSCxs2CH4tkRfICfmI4/UmY1Bx3rjK2CxjZJ+4HoqA+dInmLISt50PrEEd2/1ZF9Q==
X-Received: by 2002:a5d:6648:: with SMTP id f8mr29759240wrw.396.1620094349365;
        Mon, 03 May 2021 19:12:29 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 k16sm1146442wmi.44.2021.05.03.19.12.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:29 -0700 (PDT)
Message-Id: 
 <2163dded5798e8cfd0d8a19d77475cd31286c34b.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:16 +0000
Subject: [PATCH v2 10/13] merge-ort: preserve cached renames for the
 appropriate side
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

Previous commits created an in-memory cache of the results of rename
detection, and added logic to detect when that cache could appropriately
be used in a subsequent merge operation -- but we were still
unconditionally clearing the cache with each new merge operation anyway.
If it is valid to reuse the cache from one of the two sides of history,
preserve that side.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index e6a02fa928f5..b524a2db2769 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -476,17 +476,18 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	/* Free memory used by various renames maps */
 	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) {
 		strintmap_func(&renames->dirs_removed[i]);
-
-		partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
-		if (!reinitialize)
-			strmap_clear(&renames->dir_rename_count[i], 1);
-
 		strmap_func(&renames->dir_renames[i], 0);
-
 		strintmap_func(&renames->relevant_sources[i]);
-		strset_func(&renames->cached_target_names[i]);
-		strmap_func(&renames->cached_pairs[i], 1);
-		strset_func(&renames->cached_irrelevant[i]);
+		if (!reinitialize)
+			assert(renames->cached_pairs_valid_side == 0);
+		if (i != renames->cached_pairs_valid_side) {
+			strset_func(&renames->cached_target_names[i]);
+			strmap_func(&renames->cached_pairs[i], 1);
+			strset_func(&renames->cached_irrelevant[i]);
+			partial_clear_dir_rename_count(&renames->dir_rename_count[i]);
+			if (!reinitialize)
+				strmap_clear(&renames->dir_rename_count[i], 1);
+		}
 	}
 	renames->cached_pairs_valid_side = 0;
 	renames->dir_rename_mask = 0;
@@ -2443,6 +2444,7 @@ static void detect_regular_renames(struct merge_options *opt,
 		return;
 	}
 
+	partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]);
 	repo_diff_setup(opt->repo, &diff_opts);
 	diff_opts.flags.recursive = 1;
 	diff_opts.flags.rename_empty = 0;

From patchwork Tue May  4 02:12:17 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237315
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 40E9EC433ED
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:37 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 251B7611CB
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229795AbhEDCNa (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36252 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229764AbhEDCNZ (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:25 -0400
Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com
 [IPv6:2a00:1450:4864:20::32a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52B44C0613ED
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:31 -0700 (PDT)
Received: by mail-wm1-x32a.google.com with SMTP id
 p6-20020a05600c3586b029014131bbe5c7so341652wmq.3
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=w4GJi3ty/atVcewhO2bTD/C+Z0hRpCTJO44uI8VwQmI=;
        b=dNDNWiIhMEJWoR0xQ3STXdDDsMTJ3hkOB5fRGKfaEZyCSXT62wiZUkryHDt3bgTSIN
         h61kVoI4xTYKKVOI/yGQe2wtEFWkgg1S4zkRy+5cNv+cXApfc7pv/a9olp3IQoiJj8mN
         tNDn/CwVGXKdVqwLibyr6iJkvwoBHgotsy7JU4Eh3c563DZ1E/eekZ2NKT2TnhGbFNwQ
         OhPr8J+uG8TwA1o/1aSVGtgStxdvoiqx0h0+2yKMFusOpa68gzhdChZT0ADKhyvpyhqk
         cP/gwUM0kSgNGT80+qf8hTN+/RIK0Y8PuJ7um5JCMbUIrrCcjR8sHD0YQilQyrbGTbcm
         tsUg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=w4GJi3ty/atVcewhO2bTD/C+Z0hRpCTJO44uI8VwQmI=;
        b=bJjdE4P6C1PZ5WLNHokf/AlXmNa2iBCjbCdYWrzQh7W5ESUl9Ze1vszs/huoZBpKwk
         2xHkv4riYb1LeXSKEOI2uyXPYtz9U4pl9zssVCddGBYqdSSu0fEUro8y9y4UfErFNov5
         XyN8g8uE1LtMHGe+7XkbGJ+19lkilI8rUuJ4abtBjzoXkTVzSfaC0TMh5lxETnTrsfeY
         42Wx4HxHmFcLVZFmgRvkdSx8FHWEJ6GDmw9qyPG0U7rCbEJhZEEBj5dU8cVeyU897K2/
         oHpEAQtVNvqRqyHxOH3T9bwo65RiIePTBc/aDSVRJ2YDpTU58/SvtHKJnjFTbX0q8oH4
         hhGg==
X-Gm-Message-State: AOAM533mSXRhehn9Lfq+i8f0EBvEUayCk2zMGQhwXQ2SZ9TdYE2oC+sJ
        Fou4jKBJVQDmpDIyaxmxcfNL5b+wONk=
X-Google-Smtp-Source: 
 ABdhPJzENA9CGYXVPTjtffEbWlm3r2K6at1/+wjFCwA4vaP5QsbRvun5qOngnpCjpt2Ti7W6FzaMoQ==
X-Received: by 2002:a05:600c:2315:: with SMTP id
 21mr23913902wmo.39.1620094350085;
        Mon, 03 May 2021 19:12:30 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 61sm14955849wrm.52.2021.05.03.19.12.29
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:29 -0700 (PDT)
Message-Id: 
 <c0c1956e75c6881303691669dfdbfbb0c0d9b1d4.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:17 +0000
Subject: [PATCH v2 11/13] merge-ort: add helper functions for using cached
 renames
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

If we have a usable rename cache, then we can remove from
relevant_sources all the paths that were cached;
diffcore_rename_extended() can then consider an even smaller set of
relevant_sources in its rename detection.

However, when diffcore_rename_extended() is done, we will need to take
the renames it detected and then add back in all the ones we had cached
from before.

Add helper functions for doing these two operations; the next commit
will make use of them.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b524a2db2769..741970cd05e7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2360,6 +2360,53 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q)
 	}
 }
 
+MAYBE_UNUSED
+static void prune_cached_from_relevant(struct rename_info *renames,
+				       unsigned side)
+{
+	/* Reason for this function described in add_pair() */
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	/* Remove from relevant_sources all entries in cached_pairs[side] */
+	strmap_for_each_entry(&renames->cached_pairs[side], &iter, entry) {
+		strintmap_remove(&renames->relevant_sources[side],
+				 entry->key);
+	}
+	/* Remove from relevant_sources all entries in cached_irrelevant[side] */
+	strset_for_each_entry(&renames->cached_irrelevant[side], &iter, entry) {
+		strintmap_remove(&renames->relevant_sources[side],
+				 entry->key);
+	}
+}
+
+MAYBE_UNUSED
+static void use_cached_pairs(struct merge_options *opt,
+			     struct strmap *cached_pairs,
+			     struct diff_queue_struct *pairs)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	/*
+	 * Add to side_pairs all entries from renames->cached_pairs[side_index].
+	 * (Info in cached_irrelevant[side_index] is not relevant here.)
+	 */
+	strmap_for_each_entry(cached_pairs, &iter, entry) {
+		struct diff_filespec *one, *two;
+		const char *old_name = entry->key;
+		const char *new_name = entry->value;
+		if (!new_name)
+			new_name = old_name;
+
+		/* We don't care about oid/mode, only filenames and status */
+		one = alloc_filespec(old_name);
+		two = alloc_filespec(new_name);
+		diff_queue(pairs, one, two);
+		pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D';
+	}
+}
+
 static void possibly_cache_new_pair(struct rename_info *renames,
 				    struct diff_filepair *p,
 				    unsigned side,

From patchwork Tue May  4 02:12:18 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237317
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D29E1C43461
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:37 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id B13B7611CB
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229766AbhEDCNa (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36246 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229768AbhEDCN0 (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:26 -0400
Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com
 [IPv6:2a00:1450:4864:20::431])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF8DAC06138B
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:31 -0700 (PDT)
Received: by mail-wr1-x431.google.com with SMTP id m9so7595602wrx.3
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:fcc
         :content-transfer-encoding:mime-version:to:cc;
        bh=CobXKWM5+11X/E2p/lSwwF9swOXqfgnDizKF7JRoeQc=;
        b=GNCnv8DfxGy3alE0Dq2D1VWZDsAZN6pA/vpDkVT9EzuBXiXYQMzbePHNQ/QIbUPsfF
         J6Jjf837B9uoaaj0sv6mJHLOYIVublPAroSezk9DN43x3HlpaGTb0gW4Rbmp8iGVUx9w
         Qad+2fiH8m3kkvAE6Nobs7djB1EOIUgsZ5MhomW8oKqnvh+raVtH5R7njqkQTkpXu3or
         629sAtxMtwc7XGj7s4OX32NEqHHdU9ZpxmDYug26pfKkuxi24dsLU6blxWVdtq9RUF3V
         hBmxkWb+LuzZ6jTNxU6JJ/0Oo2TfUUX2SnChGGFbOz5Hm6DSDkc/6G94uXi+mR9Nu9DY
         TmSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:fcc:content-transfer-encoding:mime-version:to:cc;
        bh=CobXKWM5+11X/E2p/lSwwF9swOXqfgnDizKF7JRoeQc=;
        b=lc4yGc6GwuH0PgkQBLhHp8ra9Qv78ydodCnnz0yUAFO+Ga2f60UcxiyDLRnsVTjbx6
         szMHBiqISPglLz506l0hUq5+yiTM4LYeM6Fajlow81wHJr0ur4MO1/3uaTrW68a5fNQU
         Sze2GcCaW4oaigkVEh/4Hqx93fw85TLSykJzcPRYV3YrZ3GOE+vdWnrF74CvPeoLeobh
         XKfcEFN7Wbrco2aJO6qolS8ISgHeol/HGtXn+9TLM28KIC/9pcLLO7s5xCG94wop/erG
         KK6ltEilHhutXoCuMpRYh1t7yqfHQvp82En2t+nFMRT0AhaodPkups9rJz68/CPeUg61
         nFJw==
X-Gm-Message-State: AOAM531bdd5WxB3poDrBozkiKixaM/vsahu2CC7FAH2/STy6GPGFVilm
        KvDtUdIYEjKy9CZhBFxEkv3LhExqdAY=
X-Google-Smtp-Source: 
 ABdhPJwmwcM3iVWwvBDotQ+NxbZVdc4t1i60bzu0+fKL2NQDjQRzl6tROA+waeJWmX3N5e7p7sntyA==
X-Received: by 2002:adf:f212:: with SMTP id
 p18mr28166440wro.120.1620094350696;
        Mon, 03 May 2021 19:12:30 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 o9sm1003768wmh.19.2021.05.03.19.12.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:30 -0700 (PDT)
Message-Id: 
 <28b622a8261b8bf9190ecc9ce4189334ca553109.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:18 +0000
Subject: [PATCH v2 12/13] merge-ort: handle interactions of caching and
 rename/rename(1to1) cases
Fcc: Sent
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

As documented in Documentation/technical/remembering-renames.txt, and as
tested for in the two testcases in t6429 with "rename same file
identically" in their description, there is one case where we need to
have renames in one commit NOT be cached for the next commit in our
rebase sequence -- namely, rename/rename(1to1) cases.  Rather than
specifically trying to uncache those and fix up dir_rename_counts() to
match (which would also be valid but more work), we simply disable the
optimization when this really rare type of rename occurs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 741970cd05e7..2fc98b803d1c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2100,6 +2100,9 @@ static int process_renames(struct merge_options *opt,
 			VERIFY_CI(side2);
 
 			if (!strcmp(pathnames[1], pathnames[2])) {
+				struct rename_info *ri = &opt->priv->renames;
+				int j;
+
 				/* Both sides renamed the same way */
 				assert(side1 == side2);
 				memcpy(&side1->stages[0], &base->stages[0],
@@ -2109,6 +2112,16 @@ static int process_renames(struct merge_options *opt,
 				base->merged.is_null = 1;
 				base->merged.clean = 1;
 
+				/*
+				 * Disable remembering renames optimization;
+				 * rename/rename(1to1) is incredibly rare, and
+				 * just disabling the optimization is easier
+				 * than purging cached_pairs,
+				 * cached_target_names, and dir_rename_counts.
+				 */
+				for (j = 0; j < 3; j++)
+					ri->merge_trees[j] = NULL;
+
 				/* We handled both renames, i.e. i+1 handled */
 				i++;
 				/* Move to next rename */
@@ -3896,7 +3909,22 @@ static void merge_check_renames_reusable(struct merge_options *opt,
 
 	renames = &opti->renames;
 	merge_trees = renames->merge_trees;
-	/* merge_trees[0..2] will only be NULL if opti is */
+
+	/*
+	 * Handle case where previous merge operation did not want cache to
+	 * take effect, e.g. because rename/rename(1to1) makes it invalid.
+	 */
+	if (!merge_trees[0]) {
+		assert(!merge_trees[0] && !merge_trees[1] && !merge_trees[2]);
+		renames->cached_pairs_valid_side = 0; /* neither side valid */
+		return;
+	}
+
+	/*
+	 * Handle other cases; note that merge_trees[0..2] will only
+	 * be NULL if opti is, or if all three were manually set to
+	 * NULL by e.g. rename/rename(1to1) handling.
+	 */
 	assert(merge_trees[0] && merge_trees[1] && merge_trees[2]);
 
 	/* Check if we meet a condition for re-using cached_pairs */

From patchwork Tue May  4 02:12:19 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Elijah Newren <newren@gmail.com>
X-Patchwork-Id: 12237319
Return-Path: <git-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8B889C43460
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 74DA361090
	for <git@archiver.kernel.org>; Tue,  4 May 2021 02:12:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229796AbhEDCNb (ORCPT <rfc822;git@archiver.kernel.org>);
        Mon, 3 May 2021 22:13:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36252 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229773AbhEDCN1 (ORCPT <rfc822;git@vger.kernel.org>);
        Mon, 3 May 2021 22:13:27 -0400
Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com
 [IPv6:2a00:1450:4864:20::430])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA05AC061761
        for <git@vger.kernel.org>; Mon,  3 May 2021 19:12:32 -0700 (PDT)
Received: by mail-wr1-x430.google.com with SMTP id l2so7576917wrm.9
        for <git@vger.kernel.org>; Mon, 03 May 2021 19:12:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=message-id:in-reply-to:references:from:date:subject:mime-version
         :content-transfer-encoding:fcc:to:cc;
        bh=VHq4eFBssq3qbrsKbo9x1MjyEgTohsx7I6PXz2mZkXA=;
        b=NBJCfvvaq7cq89AjY+6JpzEJjr0zGP6TO2FxGbZNR9eOqUyGGEYUxNfbv8Xx/npRS6
         KUSNqoUxq+Z96jUPmXtNSkzK4YVut9zauXMB5+fwvfCoGhNMMKNefG45Etg5poaS7SVi
         iln2m4xT3BwxD+PM/Xw36ZcdJDqIo7tcCkoycK5DeJRBtLT3lu/4FTAZuvcYfWWMN0B5
         UT7kd2jAdEPLGwRTOgjVepK97S8M4pRkl87zrZWc3SjeWl1kt1MLXjlg7Bq6b6HaD8sz
         jpMU5bv2Yw41pYG7+C3YP2sKWyYfpDxvPDhNETB8aQH9B0Gwtofz3RU1M5D/qYo8+SsL
         MFxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:message-id:in-reply-to:references:from:date
         :subject:mime-version:content-transfer-encoding:fcc:to:cc;
        bh=VHq4eFBssq3qbrsKbo9x1MjyEgTohsx7I6PXz2mZkXA=;
        b=CYKEbTQfzyQQT3G8Od0BUEwAtM/DiEnu/2yVKB7z8r6rXwaMt9kVvDkMWmqUiW5lrX
         o92Q0ZMpCVYu7wB+XatYlN80oW+nnBS5gor28DBpm1hcPm9J7/yGdl+5SafRg5RfQ+3f
         z0CXMLrLdpqylwVtYqVhTWTDagFOGnDHz7pOiQzar+kVwJ9WRUi2tk/LPrUQrb1NHFmI
         8eNuvRmbbVYFW6ZKrR9TgQvI4y2EF+ABF7hXOg+0XkcROz31wpzl39Q3dTEVTrBsGk5t
         gvLkr5xICSw57VHXBirPxX3wNjwObQxAIaa7ErfaKM0SOUJdn9/NDO6ok8b4rTXseCZe
         HOZg==
X-Gm-Message-State: AOAM531v+0NAcxc9uDMy08yng/9Q+j2+jKxtl1oEZ/qQc2ZbVGpUonf5
        I09AHglGQwYxc6jBqLLdmH9hfXIu7vs=
X-Google-Smtp-Source: 
 ABdhPJx9YrXqmsj7Xya/jszSnH+Seyh+a0CIOSrx71fZKTmdHzg56uBHWYOQgH4X2MUPnofJ98z6Rw==
X-Received: by 2002:adf:f60d:: with SMTP id
 t13mr19877400wrp.189.1620094351477;
        Mon, 03 May 2021 19:12:31 -0700 (PDT)
Received: from [127.0.0.1] ([13.74.141.28])
        by smtp.gmail.com with ESMTPSA id
 n7sm13814751wri.14.2021.05.03.19.12.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 03 May 2021 19:12:31 -0700 (PDT)
Message-Id: 
 <91b6768adf2d1777219fb2d83cc2363f1497dbbd.1620094339.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
References: <pull.859.git.1616621553.gitgitgadget@gmail.com>
        <pull.859.v2.git.1620094339.gitgitgadget@gmail.com>
Date: Tue, 04 May 2021 02:12:19 +0000
Subject: [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames
 when possible
MIME-Version: 1.0
Fcc: Sent
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
        Jonathan Tan <jonathantanmy@google.com>,
        Taylor Blau <me@ttaylorr.com>,
        Elijah Newren <newren@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Elijah Newren <newren@gmail.com>,
        Elijah Newren <newren@gmail.com>
Precedence: bulk
List-ID: <git.vger.kernel.org>
X-Mailing-List: git@vger.kernel.org
From: Elijah Newren <newren@gmail.com>

From: Elijah Newren <newren@gmail.com>

When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted.  To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on.  See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused".  This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.665 s ±  0.129 s     5.622 s ±  0.059 s
    mega-renames:     11.435 s ±  0.158 s    10.127 s ±  0.073 s
    just-one-mega:   494.2  ms ±  6.1  ms   500.3  ms ±  3.8  ms

That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't.  If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:

                   Before Basename Series   After Just This Series
    no-renames:      13.815 s ±  0.062 s      5.697 s ±  0.080 s
    mega-renames:  1799.937 s ±  0.493 s    205.709 s ±  0.457 s

Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.

The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.

However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result.  If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization.  Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed.  Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c                        | 22 +++++++++--
 diffcore.h                               |  3 +-
 merge-ort.c                              | 47 ++++++++++++++++++++---
 t/t6429-merge-sequence-rename-caching.sh | 48 ++++++++++++++----------
 4 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 963ca582216b..3375e24659ea 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -568,7 +568,8 @@ static void update_dir_rename_counts(struct dir_rename_info *info,
 static void initialize_dir_rename_info(struct dir_rename_info *info,
 				       struct strintmap *relevant_sources,
 				       struct strintmap *dirs_removed,
-				       struct strmap *dir_rename_count)
+				       struct strmap *dir_rename_count,
+				       struct strmap *cached_pairs)
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
@@ -633,6 +634,17 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
 					 rename_dst[i].p->two->path);
 	}
 
+	/* Add cached_pairs to counts */
+	strmap_for_each_entry(cached_pairs, &iter, entry) {
+		const char *old_name = entry->key;
+		const char *new_name = entry->value;
+		if (!new_name)
+			/* known delete; ignore it */
+			continue;
+
+		update_dir_rename_counts(info, dirs_removed, old_name, new_name);
+	}
+
 	/*
 	 * Now we collapse
 	 *    dir_rename_count: old_directory -> {new_directory -> count}
@@ -1247,7 +1259,8 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 void diffcore_rename_extended(struct diff_options *options,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
-			      struct strmap *dir_rename_count)
+			      struct strmap *dir_rename_count,
+			      struct strmap *cached_pairs)
 {
 	int detect_rename = options->detect_rename;
 	int minimum_score = options->rename_score;
@@ -1363,7 +1376,8 @@ void diffcore_rename_extended(struct diff_options *options,
 		/* Preparation for basename-driven matching. */
 		trace2_region_enter("diff", "dir rename setup", options->repo);
 		initialize_dir_rename_info(&info, relevant_sources,
-					   dirs_removed, dir_rename_count);
+					   dirs_removed, dir_rename_count,
+					   cached_pairs);
 		trace2_region_leave("diff", "dir rename setup", options->repo);
 
 		/* Utilize file basenames to quickly find renames. */
@@ -1560,5 +1574,5 @@ void diffcore_rename_extended(struct diff_options *options,
 
 void diffcore_rename(struct diff_options *options)
 {
-	diffcore_rename_extended(options, NULL, NULL, NULL);
+	diffcore_rename_extended(options, NULL, NULL, NULL, NULL);
 }
diff --git a/diffcore.h b/diffcore.h
index f5c6de4841ed..533b30e21e7f 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -181,7 +181,8 @@ void diffcore_rename(struct diff_options *);
 void diffcore_rename_extended(struct diff_options *options,
 			      struct strintmap *relevant_sources,
 			      struct strintmap *dirs_removed,
-			      struct strmap *dir_rename_count);
+			      struct strmap *dir_rename_count,
+			      struct strmap *cached_pairs);
 void diffcore_merge_broken(void);
 void diffcore_pickaxe(struct diff_options *);
 void diffcore_order(const char *orderfile);
diff --git a/merge-ort.c b/merge-ort.c
index 2fc98b803d1c..17dc3deb3c73 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -753,15 +753,48 @@ static void add_pair(struct merge_options *opt,
 	struct rename_info *renames = &opt->priv->renames;
 	int names_idx = is_add ? side : 0;
 
-	if (!is_add) {
+	if (is_add) {
+		if (strset_contains(&renames->cached_target_names[side],
+				    pathname))
+			return;
+	} else {
 		unsigned content_relevant = (match_mask == 0);
 		unsigned location_relevant = (dir_rename_mask == 0x07);
 
+		/*
+		 * If pathname is found in cached_irrelevant[side] due to
+		 * previous pick but for this commit content is relevant,
+		 * then we need to remove it from cached_irrelevant.
+		 */
+		if (content_relevant)
+			/* strset_remove is no-op if strset doesn't have key */
+			strset_remove(&renames->cached_irrelevant[side],
+				      pathname);
+
+		/*
+		 * We do not need to re-detect renames for paths that we already
+		 * know the pairing, i.e. for cached_pairs (or
+		 * cached_irrelevant).  However, handle_deferred_entries() needs
+		 * to loop over the union of keys from relevant_sources[side] and
+		 * cached_pairs[side], so for simplicity we set relevant_sources
+		 * for all the cached_pairs too and then strip them back out in
+		 * prune_cached_from_relevant() at the beginning of
+		 * detect_regular_renames().
+		 */
 		if (content_relevant || location_relevant) {
 			/* content_relevant trumps location_relevant */
 			strintmap_set(&renames->relevant_sources[side], pathname,
 				      content_relevant ? RELEVANT_CONTENT : RELEVANT_LOCATION);
 		}
+
+		/*
+		 * Avoid creating pair if we've already cached rename results.
+		 * Note that we do this after setting relevant_sources[side]
+		 * as noted in the comment above.
+		 */
+		if (strmap_contains(&renames->cached_pairs[side], pathname) ||
+		    strset_contains(&renames->cached_irrelevant[side], pathname))
+			return;
 	}
 
 	one = alloc_filespec(pathname);
@@ -2349,7 +2382,9 @@ static inline int possible_side_renames(struct rename_info *renames,
 static inline int possible_renames(struct rename_info *renames)
 {
 	return possible_side_renames(renames, 1) ||
-	       possible_side_renames(renames, 2);
+	       possible_side_renames(renames, 2) ||
+	       !strmap_empty(&renames->cached_pairs[1]) ||
+	       !strmap_empty(&renames->cached_pairs[2]);
 }
 
 static void resolve_diffpair_statuses(struct diff_queue_struct *q)
@@ -2373,7 +2408,6 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q)
 	}
 }
 
-MAYBE_UNUSED
 static void prune_cached_from_relevant(struct rename_info *renames,
 				       unsigned side)
 {
@@ -2393,7 +2427,6 @@ static void prune_cached_from_relevant(struct rename_info *renames,
 	}
 }
 
-MAYBE_UNUSED
 static void use_cached_pairs(struct merge_options *opt,
 			     struct strmap *cached_pairs,
 			     struct diff_queue_struct *pairs)
@@ -2494,6 +2527,7 @@ static void detect_regular_renames(struct merge_options *opt,
 	struct diff_options diff_opts;
 	struct rename_info *renames = &opt->priv->renames;
 
+	prune_cached_from_relevant(renames, side_index);
 	if (!possible_side_renames(renames, side_index)) {
 		/*
 		 * No rename detection needed for this side, but we still need
@@ -2522,7 +2556,8 @@ static void detect_regular_renames(struct merge_options *opt,
 	diffcore_rename_extended(&diff_opts,
 				 &renames->relevant_sources[side_index],
 				 &renames->dirs_removed[side_index],
-				 &renames->dir_rename_count[side_index]);
+				 &renames->dir_rename_count[side_index],
+				 &renames->cached_pairs[side_index]);
 	trace2_region_leave("diff", "diffcore_rename", opt->repo);
 	resolve_diffpair_statuses(&diff_queued_diff);
 
@@ -2629,6 +2664,8 @@ static int detect_and_process_renames(struct merge_options *opt,
 	trace2_region_enter("merge", "regular renames", opt->repo);
 	detect_regular_renames(opt, MERGE_SIDE1);
 	detect_regular_renames(opt, MERGE_SIDE2);
+	use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]);
+	use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]);
 	trace2_region_leave("merge", "regular renames", opt->repo);
 
 	trace2_region_enter("merge", "directory renames", opt->repo);
diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
index f47d8924ee73..035edc40b1eb 100755
--- a/t/t6429-merge-sequence-rename-caching.sh
+++ b/t/t6429-merge-sequence-rename-caching.sh
@@ -101,10 +101,10 @@ test_expect_success 'caching renames does not preclude finding new ones' '
 # dramatic change in size of the file, but remembering the rename and
 # reusing it is reasonable too.
 #
-# Rename detection (diffcore_rename_extended()) will run twice here; it is
-# not needed on the topic side of history for either of the two commits
-# being merged, but it is needed on the upstream side of history for each
-# commit being picked.
+# We do test here that we expect rename detection to only be run once total
+# (the topic side of history doesn't need renames, and with caching we
+# should be able to only run rename detection on the upstream side one
+# time.)
 test_expect_success 'cherry-pick both a commit and its immediate revert' '
 	test_create_repo pick-commit-and-its-immediate-revert &&
 	(
@@ -140,11 +140,11 @@ test_expect_success 'cherry-pick both a commit and its immediate revert' '
 		GIT_TRACE2_PERF="$(pwd)/trace.output" &&
 		export GIT_TRACE2_PERF &&
 
-		test_might_fail test-tool fast-rebase --onto HEAD upstream~1 topic &&
+		test-tool fast-rebase --onto HEAD upstream~1 topic &&
 		#git cherry-pick upstream~1..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
-		test_line_count = 2 calls
+		test_line_count = 1 calls
 	)
 '
 
@@ -304,9 +304,11 @@ test_expect_success 'rename same file identically, then add file to old dir' '
 # Here we are just concerned that cached renames might prevent us from seeing
 # the rename conflict, and we want to ensure that we do get a conflict.
 #
-# While at it, also test that we do rename detection three times.  We have to
-# detect renames on the upstream side of history once for each merge, plus
-# Topic_2 has renames.
+# While at it, though, we do test that we only try to detect renames 2
+# times and not three.  (The first merge needs to detect renames on the
+# upstream side.  Traditionally, the second merge would need to detect
+# renames on both sides of history, but our caching of upstream renames
+# should avoid the need to re-detect upstream renames.)
 #
 test_expect_success 'cached dir rename does not prevent noticing later conflict' '
 	test_create_repo dir-rename-cache-not-occluding-later-conflict &&
@@ -357,7 +359,7 @@ test_expect_success 'cached dir rename does not prevent noticing later conflict'
 		grep CONFLICT..rename/rename output &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
-		test_line_count = 3 calls
+		test_line_count = 2 calls
 	)
 '
 
@@ -412,10 +414,17 @@ test_setup_upstream_rename () {
 # commit to mess up its location either.  We want to make sure that
 # olddir/newfile doesn't exist in the result and that newdir/newfile does.
 #
-# We also expect rename detection to occur three times.  Although it is
-# typically needed two times per commit, there are no deleted files on the
-# topic side of history, so we only need to detect renames on the upstream
-# side for each of the 3 commits we need to pick.
+# We also test that we only do rename detection twice.  We never need
+# rename detection on the topic side of history, but we do need it twice on
+# the upstream side of history.  For the first topic commit, we only need
+# the
+#   relevant-rename -> renamed
+# rename, because olddir is unmodified by Topic_1.  For Topic_2, however,
+# the new file being added to olddir means files that were previously
+# irrelevant for rename detection are now relevant, forcing us to repeat
+# rename detection for the paths we don't already have cached.  Topic_3 also
+# tweaks olddir/newfile, but the renames in olddir/ will have been cached
+# from the second rename detection run.
 #
 test_expect_success 'dir rename unneeded, then add new file to old dir' '
 	test_setup_upstream_rename dir-rename-unneeded-until-new-file &&
@@ -450,7 +459,7 @@ test_expect_success 'dir rename unneeded, then add new file to old dir' '
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
-		test_line_count = 3 calls &&
+		test_line_count = 2 calls &&
 
 		git ls-files >tracked &&
 		test_line_count = 5 tracked &&
@@ -516,7 +525,7 @@ test_expect_success 'dir rename unneeded, then rename existing file into old dir
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
-		test_line_count = 4 calls &&
+		test_line_count = 3 calls &&
 
 		test_path_is_missing somefile &&
 		test_path_is_missing olddir/newfile &&
@@ -648,9 +657,8 @@ test_expect_success 'caching renames only on upstream side, part 1' '
 # for the wrong side of history.
 #
 #
-# This testcase should only need three calls to diffcore_rename_extended(),
-# because there are no renames on the topic side of history for picking
-# Topic_2.
+# This testcase should only need two calls to diffcore_rename_extended(),
+# both for the first merge, one for each side of history.
 #
 test_expect_success 'caching renames only on upstream side, part 2' '
 	test_setup_topic_rename cache-renames-only-upstream-rename-file &&
@@ -677,7 +685,7 @@ test_expect_success 'caching renames only on upstream side, part 2' '
 		#git cherry-pick upstream..topic &&
 
 		grep region_enter.*diffcore_rename trace.output >calls &&
-		test_line_count = 3 calls &&
+		test_line_count = 2 calls &&
 
 		git ls-files >tracked &&
 		test_line_count = 4 tracked &&