From patchwork Thu Mar 11 00:38:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FC7DC433DB for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1EA564FBF for ; Thu, 11 Mar 2021 00:39:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbhCKAiu (ORCPT ); Wed, 10 Mar 2021 19:38:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230288AbhCKAif (ORCPT ); Wed, 10 Mar 2021 19:38:35 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5997C061760 for ; Wed, 10 Mar 2021 16:38:34 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id l19so264573wmh.1 for ; Wed, 10 Mar 2021 16:38:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=OnN/B9G0iQPf8qsCwAjIg6HB7RrPcPM8hs2LL4tvh7M=; b=eZNNMQr8YsaGfBCkY+B+fYFD2BMPKs1fWiBYpGQfllRADPO/zFNVLgHEUhzBIHuw7a I76d+y12P9/pT28eyjeNuQ1He4zzU11dscZu4Wn9pvmvrx4x7yA/sRLzHXYQrSBjeBvw Lv3UTx08CXWsAqCMsuHpQej9CJ3gjk2h4COqLH/hmsMj8gEob6/7o1Qg8PAWhxOe40ti Gfa+Oz7HXNBToQUdUmKrUwRwW7gPR0b9FmGpTp/eJfEpIgaPabn8iKhgLEaqNSJBJ10A aWvdffRCrHZ8W94mEay8swz8qymY20naKEqIwF/g3WIFewtUivYOL5fXWghtDFZqPTdn PN7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=OnN/B9G0iQPf8qsCwAjIg6HB7RrPcPM8hs2LL4tvh7M=; b=Nc0C3d7G+S/320+CUENBvkx2NpASdkKNvg4ejSv5SzReUWFft4DycUR+DGfGE5ITS+ W4RcHQ3f1kC3ZPfiHzZciovd/HJ4QEuC/5vfgt6DRXLkZVNJ30/7/xMnz4rJVde/fMRJ zWol/dk6vaSW/tRBwX9F3Q0f1o9L5D4vxHB7C4lHMxtRCm31q1koZg8nQxfDYKIOw0/H 34PBOFNRDdayHLV+zk5Sx3VPO9QPfSfJf4CRlep8hnA4Kh7OT5ZOICVDoqstzGiQR7Rv IoIAFLuYk5DExJtmkzYgXXdvv12jrb+A1M3oOneXByiyJ747tjnuxbndgW/J6BixATIw EvAg== X-Gm-Message-State: AOAM5319tbrkKq1y3QBd5rplE0PVrrFAD5lIX/1n5K30mK+wCZTVU2qg F4b3f+4d0AqrglKzRWNjMBBwURcPi74= X-Google-Smtp-Source: ABdhPJyaWn+wkkaz1+doBrw89wOq59/GcxvWsupKam2SqH/XyOJuNoE+xWg/fFJ/uNV9ZddIQ96gqw== X-Received: by 2002:a7b:c195:: with SMTP id y21mr5619088wmi.178.1615423113495; Wed, 10 Mar 2021 16:38:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l15sm1030343wru.38.2021.03.10.16.38.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:33 -0800 (PST) Message-Id: <3b253f5a63edfac5d4d2f4f60f5c824bcfafead0.1615423111.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:24 +0000 Subject: [PATCH v3 1/8] diffcore-rename: enable filtering possible rename sources Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add the ability to diffcore_rename_extended() to allow external callers to declare that they only need renames detected for a subset of source files, and use that information to skip detecting renames for them. There are two important pieces to this optimization that may not be obvious at first glance: * We do not require callers to just filter the filepairs out to remove the non-relevant sources, because exact rename detection is fast and when it finds a match it can remove both a source and a destination whereas the relevant_sources filter can only remove a source. * We need to filter out the source pairs in a preliminary pass instead of adding a strset_contains(relevant_sources, one->path) check within the nested matrix loop. The reason for that is if we have 30k renames, doing 30k * 30k = 900M strset_contains() calls becomes extraordinarily expensive and defeats the performance gains from this change; we only want to do 30k such calls instead. If callers pass NULL for relevant_sources, that is special cases to treat all sources as relevant. Since all callers currently pass NULL, this optimization does not yet have any effect. Subsequent commits will have merge-ort compute a set of relevant_sources to restrict which sources we detect renames for, and have merge-ort pass that set of relevant_sources to diffcore_rename_extended(). A note about filtering order: Some may be curious why we don't filter out irrelevant sources at the same time we filter out exact renames. While that technically could be done at this point, there are two reasons to defer it: First, was to reinforce a lesson that was too easy to forget. As I mentioned above, in the past I filtered irrelevant sources out before exact rename checking, and then discovered that exact renames' ability to remove both sources and destinations was an important consideration and thus doing the filtering after exact rename checking would speed things up. Then at some point I realized that basename matching could also remove both sources and destinations, and decided to put irrelevant source filtering after basename filtering. That slowed things down a lot. But, despite learning about this important ordering, in later restructuring I forgot and made the same mistake of putting the filtering after basename guided rename detection again. So, I have this series of patches structured to do the irrelevant filtering last to start to show how much extra it costs, and then add relevant filtering in to find_basename_matches() to show how much it speeds things up. Basically, it's a way to reinforce something that apparently was too easy to forget, and make sure the commit messages record this lesson. Second, the items in the "relevant_sources" in this patch series will include all sources that *might be* relevant. It has to be conservative and catch anything that might need a rename, but in the patch series after this one we'll find ways to weed out more of the *might be* relevant ones. Unfortunately, merge-ort does not have sufficient information to weed those ones out, and there isn't enough information at the time of filtering exact renames out to remove the extra ones either. It has to be deferred. So the deferral is in part to simplify some later additions. Signed-off-by: Elijah Newren --- diffcore-rename.c | 26 +++++++++++++++++++------- diffcore.h | 1 + merge-ort.c | 1 + 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 1fe902ed2af0..7f6115fd9018 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -991,11 +991,12 @@ static int find_renames(struct diff_score *mx, return count; } -static void remove_unneeded_paths_from_src(int detecting_copies) +static void remove_unneeded_paths_from_src(int detecting_copies, + struct strset *interesting) { int i, new_num_src; - if (detecting_copies) + if (detecting_copies && !interesting) return; /* nothing to remove */ if (break_idx) return; /* culling incompatible with break detection */ @@ -1022,12 +1023,18 @@ static void remove_unneeded_paths_from_src(int detecting_copies) * from rename_src here. */ for (i = 0, new_num_src = 0; i < rename_src_nr; i++) { + struct diff_filespec *one = rename_src[i].p->one; + /* * renames are stored in rename_dst, so if a rename has * already been detected using this source, we can just * remove the source knowing rename_dst has its info. */ - if (rename_src[i].p->one->rename_used) + if (!detecting_copies && one->rename_used) + continue; + + /* If we don't care about the source path, skip it */ + if (interesting && !strset_contains(interesting, one->path)) continue; if (new_num_src < i) @@ -1040,6 +1047,7 @@ static void remove_unneeded_paths_from_src(int detecting_copies) } void diffcore_rename_extended(struct diff_options *options, + struct strset *relevant_sources, struct strset *dirs_removed, struct strmap *dir_rename_count) { @@ -1060,6 +1068,8 @@ void diffcore_rename_extended(struct diff_options *options, want_copies = (detect_rename == DIFF_DETECT_COPY); if (dirs_removed && (break_idx || want_copies)) BUG("dirs_removed incompatible with break/copy detection"); + if (break_idx && relevant_sources) + BUG("break detection incompatible with source specification"); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -1127,9 +1137,10 @@ void diffcore_rename_extended(struct diff_options *options, /* * Cull sources: * - remove ones corresponding to exact renames + * - remove ones not found in relevant_sources */ trace2_region_enter("diff", "cull after exact", options->repo); - remove_unneeded_paths_from_src(want_copies); + remove_unneeded_paths_from_src(want_copies, relevant_sources); trace2_region_leave("diff", "cull after exact", options->repo); } else { /* Determine minimum score to match basenames */ @@ -1148,7 +1159,7 @@ void diffcore_rename_extended(struct diff_options *options, * - remove ones involved in renames (found via exact match) */ trace2_region_enter("diff", "cull after exact", options->repo); - remove_unneeded_paths_from_src(want_copies); + remove_unneeded_paths_from_src(want_copies, NULL); trace2_region_leave("diff", "cull after exact", options->repo); /* Preparation for basename-driven matching. */ @@ -1167,9 +1178,10 @@ void diffcore_rename_extended(struct diff_options *options, /* * Cull sources, again: * - remove ones involved in renames (found via basenames) + * - remove ones not found in relevant_sources */ trace2_region_enter("diff", "cull basename", options->repo); - remove_unneeded_paths_from_src(want_copies); + remove_unneeded_paths_from_src(want_copies, relevant_sources); trace2_region_leave("diff", "cull basename", options->repo); } @@ -1342,5 +1354,5 @@ void diffcore_rename_extended(struct diff_options *options, void diffcore_rename(struct diff_options *options) { - diffcore_rename_extended(options, NULL, NULL); + diffcore_rename_extended(options, NULL, NULL, NULL); } diff --git a/diffcore.h b/diffcore.h index c6ba64abd198..737c93a6cc79 100644 --- a/diffcore.h +++ b/diffcore.h @@ -166,6 +166,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count); void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, + struct strset *relevant_sources, struct strset *dirs_removed, struct strmap *dir_rename_count); void diffcore_merge_broken(void); diff --git a/merge-ort.c b/merge-ort.c index 467404cc0a35..aba0b9fa54c3 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2029,6 +2029,7 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); diffcore_rename_extended(&diff_opts, + NULL, &renames->dirs_removed[side_index], &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); From patchwork Thu Mar 11 00:38:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FABEC433E6 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0CF464FCA for ; Thu, 11 Mar 2021 00:39:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229734AbhCKAit (ORCPT ); Wed, 10 Mar 2021 19:38:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230341AbhCKAif (ORCPT ); Wed, 10 Mar 2021 19:38:35 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EF86C061574 for ; Wed, 10 Mar 2021 16:38:35 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id v15so25364491wrx.4 for ; Wed, 10 Mar 2021 16:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=OoGVy7XI/LiOyejMW1pDhlCHg7nAygWY1bt1LdPodSI=; b=WPCXryaoz24sN84gxp3fUMWP5D2jAsnUfUGWOtRek7/sJKl4hQjRVxnLAZPtc+IEWw sEaWy//YqP08FMooXiaj5T29FL2dny8yKYZCeNkDQEn5zTL7kuV6KtO50zYfmpeS4h1c n+LV5gRyu4oQRPoy3kfL9Qd5gm7UjvmZXCIQs8155qlfvsZGgpweaqMvpR3d98f36ZQW Y6Oc640yZ7lQbbWovArIKqyR1lbGK1TA97nMiElzd4GLDIAQPQqf2duwiG5yVZnj3mHO 4LbYNhJ1PzD0Z8mKkIA52jOv+d34Hp7Htnj7FomWnQd3c/LtvGbSG/LRmB52NqMWLLJe oF2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=OoGVy7XI/LiOyejMW1pDhlCHg7nAygWY1bt1LdPodSI=; b=Cjm/DdpMeELCmfHBBI9PcqnhAWIZCB2WSSsOYtkAjLcgBqf9f/56NT7ytHNq05aX4Q fqPn4tYoSWToBatDMutj6hJDuZytJmnRrzsK+MM/a8qikw+uxhubRD/ZzAJPgJvhkKu6 vGMGWrvOOXJ14pLvlskyP5aottb8e22ykyWcNNkD9PMQF7NjgvGCzmJFDHuUdHMY+hQQ j1zt7/mc2dwUpysRGhGFVjT0NnPLkOwwYsFeRc45/l137RM1QVgoP7j5Q+489uL4HqYZ 7dWEy48bitLyghne4lzdl0O2ylGVfKRb3UUeFavzWPl6tlQ5jORl5FbzO5v/B01Hsj1w DQkw== X-Gm-Message-State: AOAM531yeHzaVnM0Ez+26RVeHC7QRUyW7aSNHClR+3N966Doc802daUH WfNA/iG39bgtBU4d9eD4Q6pUqjtiebI= X-Google-Smtp-Source: ABdhPJxCppzrLv/X5jLneV8b8TzuY1l0B5XR045390Lh5mHYMs81z4+t/1UG2AO1IwxnuAQGCm3PAQ== X-Received: by 2002:adf:b1c9:: with SMTP id r9mr6154312wra.51.1615423114039; Wed, 10 Mar 2021 16:38:34 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y1sm1044591wmq.29.2021.03.10.16.38.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:33 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:25 +0000 Subject: [PATCH v3 2/8] merge-ort: precompute subset of sources for which we need rename detection Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren --- merge-ort.c | 35 ++++++++++++-- t/t6423-merge-rename-directories.sh | 71 +++++++++++++++++++++++++++++ 2 files changed, 103 insertions(+), 3 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index aba0b9fa54c3..83aa4c08121f 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -88,6 +88,20 @@ struct rename_info { */ struct strmap dir_renames[3]; + /* + * relevant_sources: deleted paths for which we need rename detection + * + * relevant_sources is a set of deleted paths on each side of + * history for which we need rename detection. If a path is deleted + * on one side of history, we need to detect if it is part of a + * rename if either + * * we need to detect renames for an ancestor directory + * * the file is modified/deleted on the other side of history + * If neither of those are true, we can skip rename detection for + * that path. + */ + struct strset relevant_sources[3]; + /* * needed_limit: value needed for inexact rename detection to run * @@ -358,6 +372,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_clear(&renames->dir_rename_count[i], 1); strmap_func(&renames->dir_renames[i], 0); + + strset_func(&renames->relevant_sources[i]); } if (!reinitialize) { @@ -533,12 +549,21 @@ static void add_pair(struct merge_options *opt, struct name_entry *names, const char *pathname, unsigned side, - unsigned is_add /* if false, is_delete */) + unsigned is_add /* if false, is_delete */, + unsigned match_mask) { struct diff_filespec *one, *two; struct rename_info *renames = &opt->priv->renames; int names_idx = is_add ? side : 0; + if (!is_add) { + unsigned content_relevant = (match_mask == 0); + unsigned location_relevant = 1; /* FIXME: compute this */ + + if (content_relevant || location_relevant) + strset_add(&renames->relevant_sources[side], pathname); + } + one = alloc_filespec(pathname); two = alloc_filespec(pathname); fill_filespec(is_add ? two : one, @@ -575,11 +600,13 @@ static void collect_rename_info(struct merge_options *opt, /* Check for deletion on side */ if ((filemask & 1) && !(filemask & side_mask)) - add_pair(opt, names, fullname, side, 0 /* delete */); + add_pair(opt, names, fullname, side, 0 /* delete */, + match_mask & filemask); /* Check for addition on side */ if (!(filemask & 1) && (filemask & side_mask)) - add_pair(opt, names, fullname, side, 1 /* add */); + add_pair(opt, names, fullname, side, 1 /* add */, + match_mask & filemask); } } @@ -3228,6 +3255,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) NULL, 1); strmap_init_with_options(&renames->dir_renames[i], NULL, 0); + strset_init_with_options(&renames->relevant_sources[i], + NULL, 0); } /* diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh index 4ab133f489ca..4c568050dd27 100755 --- a/t/t6423-merge-rename-directories.sh +++ b/t/t6423-merge-rename-directories.sh @@ -4895,6 +4895,77 @@ test_expect_merge_algorithm failure success '12f: Trivial directory resolve, cac ) ' +# Testcase 12g, Testcase with two kinds of "relevant" renames +# Commit O: somefile_O, subdir/{a_O,b_O} +# Commit A: somefile_A, subdir/{a_O,b_O,c_A} +# Commit B: newfile_B, newdir/{a_B,b_B} +# Expected: newfile_{merged}, newdir/{a_B,b_B,c_A} + +test_setup_12g () { + test_create_repo 12g && + ( + cd 12g && + + mkdir -p subdir && + test_write_lines upon a time there was a >somefile && + test_write_lines 1 2 3 4 5 6 7 8 9 10 >subdir/a && + test_write_lines one two three four five six >subdir/b && + git add . && + test_tick && + git commit -m "O" && + + git branch O && + git branch A && + git branch B && + + git switch A && + test_write_lines once upon a time there was a >somefile && + > subdir/c && + git add somefile subdir/c && + test_tick && + git commit -m "A" && + + git checkout B && + git mv somefile newfile && + git mv subdir newdir && + echo repo >>newfile && + test_write_lines 1 2 3 4 5 6 7 8 9 10 11 >newdir/a && + test_write_lines one two three four five six seven >newdir/b && + git add newfile newdir && + test_tick && + git commit -m "B" + ) +} + +test_expect_success '12g: Testcase with two kinds of "relevant" renames' ' + test_setup_12g && + ( + cd 12g && + + git checkout A^0 && + + git -c merge.directoryRenames=true merge -s recursive B^0 && + + test_write_lines once upon a time there was a repo >expect && + test_cmp expect newfile && + + git ls-files -s >out && + test_line_count = 4 out && + + git rev-parse >actual \ + HEAD:newdir/a HEAD:newdir/b HEAD:newdir/c && + git rev-parse >expect \ + B:newdir/a B:newdir/b A:subdir/c && + test_cmp expect actual && + + test_must_fail git rev-parse HEAD:subdir/a && + test_must_fail git rev-parse HEAD:subdir/b && + test_must_fail git rev-parse HEAD:subdir/c && + test_path_is_missing subdir/ && + test_path_is_file newdir/c + ) +' + ########################################################################### # SECTION 13: Checking informational and conflict messages # From patchwork Thu Mar 11 00:38:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82A90C43381 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A6A864D99 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229805AbhCKAiv (ORCPT ); Wed, 10 Mar 2021 19:38:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230397AbhCKAig (ORCPT ); Wed, 10 Mar 2021 19:38:36 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D44D5C061760 for ; Wed, 10 Mar 2021 16:38:35 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id d15so25361956wrv.5 for ; Wed, 10 Mar 2021 16:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=fcQkjCw6zC06N7gSUFzG0EzW9qGvc97YSBo5+h04zSE=; b=aMBGdwreoGCIZbtaMq5yrMLmywq4rFNjwuqdqpnPCUSEWSQy4VSzVDztasgXwCId07 YU0eJGpNVAKLTqo59r2FzUUs/cnwJmYSKfY03W5lxB3kF1z00+5pX/CDSu2EFJItsiY6 7jEjTobvg7v2nmxLKH0Euss4uV7mDYR3BM4tRTuax3h9IDJsFGf0rgKocbefqjS915fm 4sRS1dlUB4hIDoxJNN9sLOh0E8gMGXzfwonKZi7KYsjCB/1LWLOW2SqQHTMwJYmGJkMo zKxxghTvWcZbv+umPNyvAI6PefC1cI/H+OyZGg35GTsGWC9/FH6sElq3OzqAPFDxhYZp 5/SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=fcQkjCw6zC06N7gSUFzG0EzW9qGvc97YSBo5+h04zSE=; b=HjnP3dYJ06rcp+iMlDFFGrKfDiSDtMQ6kgODKF96FbXVcegEiEyOMUpWyaQqeR+LFc 7qaGwU5y+roixOYrEghef5zdE8sHDOu9wvapQkotfDlOkKn52rQDXAhMA2aFoDsFRbfC KlZEaghoFHZuyRoZaSE8tpFEzT2TtJOp6NbJIUB7JCJAZAeK+vDV+tlpWoN+3Hvvtzo+ XLJQi/zHI6pCW/rk5Dil6mx2KXzuSk7DXki/WS8enx5et0Ixi9+QQwku2DKFHYoWGUka OLzjhopk5d1Tivt6IP2Qz2vF9HuDDiBPOff1MdSdMM1xHMw/MabP/jxAHK14J0C3wpko OXqA== X-Gm-Message-State: AOAM531Ozf2/dyMTzhr1Ql/k1AYsmG6iDBfeO/XoQ7zL1LIoPFyOr+k7 7zTmo0mqLBAl/q1VUohYw2VDpgdEP0I= X-Google-Smtp-Source: ABdhPJwMjJ8JymazwUMAY6L4bnvyIFHaOBQsIdN5BIqUo1Wvg7gn4uWW8ouXEp6PkkMtGmq0bwGO6Q== X-Received: by 2002:adf:df10:: with SMTP id y16mr6006493wrl.372.1615423114556; Wed, 10 Mar 2021 16:38:34 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h25sm1273362wml.32.2021.03.10.16.38.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:34 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:26 +0000 Subject: [PATCH v3 3/8] merge-ort: add data structures for an alternate tree traversal Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren In order to determine whether directory rename detection is needed, we as a pre-requisite need a way to traverse through all the files in a given tree before visiting any directories within that tree. traverse_trees() only iterates through the entries in the order they appear, so add some data structures that will store all the entries as we iterate through them in traverse_trees(), which will allow us to re-traverse them in our desired order. Signed-off-by: Elijah Newren --- merge-ort.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 83aa4c08121f..d49cfa8b030b 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -51,6 +51,12 @@ enum merge_side { MERGE_SIDE2 = 2 }; +struct traversal_callback_data { + unsigned long mask; + unsigned long dirmask; + struct name_entry names[3]; +}; + struct rename_info { /* * All variables that are arrays of size 3 correspond to data tracked @@ -102,6 +108,22 @@ struct rename_info { */ struct strset relevant_sources[3]; + /* + * callback_data_*: supporting data structures for alternate traversal + * + * We sometimes need to be able to traverse through all the files + * in a given tree before all immediate subdirectories within that + * tree. Since traverse_trees() doesn't do that naturally, we have + * a traverse_trees_wrapper() that stores any immediate + * subdirectories while traversing files, then traverses the + * immediate subdirectories later. These callback_data* variables + * store the information for the subdirectories so that we can do + * that traversal order. + */ + struct traversal_callback_data *callback_data; + int callback_data_nr, callback_data_alloc; + char *callback_data_traverse_path; + /* * needed_limit: value needed for inexact rename detection to run * @@ -396,6 +418,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, } strmap_clear(&opti->output, 0); } + + /* Clean out callback_data as well. */ + FREE_AND_NULL(renames->callback_data); + renames->callback_data_nr = renames->callback_data_alloc = 0; } static int err(struct merge_options *opt, const char *err, ...) From patchwork Thu Mar 11 00:38:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88703C4332D for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 576CB64FD2 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230051AbhCKAiu (ORCPT ); Wed, 10 Mar 2021 19:38:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230319AbhCKAig (ORCPT ); Wed, 10 Mar 2021 19:38:36 -0500 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65E5FC061574 for ; Wed, 10 Mar 2021 16:38:36 -0800 (PST) Received: by mail-wm1-x336.google.com with SMTP id m20-20020a7bcb940000b029010cab7e5a9fso12149289wmi.3 for ; Wed, 10 Mar 2021 16:38:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ijB+6Mdz6kINhfvqDLL7RglzBui/VGymAbp4nuQDrWc=; b=TWhDMG6scPGAGR/GMgxU5JymLAwCPxDLodleRB8LxnG4qWxU4y6ZarBY51TOUF5hA+ M24dK/CSY+NDUyU6B+luhYJSgP8ZrwTJxyPC3z/ieE4XAYnEQHpDKc67GF/GsHxlvLNE 8V3mWDvjgudogfCmhcOVnqvGNrocha2feqAxmznaA9BTzMCi//NjtjR1r7V/Z/9RVMmW p3qF7T1UazMmF/1R011qq+sS6HiA6MVdFfj3FZFkjb7c1lyAWwqJJadczl9ZJTecza4o dB48KaUhoyvlFPV5wUBz1ocKgPEe7eXXtu3fPeM1dtW/XU5Iij16QhmofgQwjYJ5o/l0 Ji7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ijB+6Mdz6kINhfvqDLL7RglzBui/VGymAbp4nuQDrWc=; b=WkmkQZc24w77qF554rjqIRydcoUGN7ZUXv7OqVh6VqvKpwS/baUMfl+99h43/6Blq2 /jA8uakcENXem+XtYivZvseambVam7SrooZvbO+qw3VTtfxPIVeJ4YkL05XN/9aFFRkm xLNfQ5C9pACb3p0AXO8J5FlWa51OgxeJTjLocZtkt8Z3a8Gi8TXV2TExztTeMMRFTTdC nxIJBP8g33kQjcdeLvKe+0KxAXZiKI5ikk5y9mdQROdq2cW0AhbFeyyPzRaOIFAWjb3T gSrL5K+l9eSCYH6SHF7Bz7ngaPk9X0UZAZfMS6opAfvQ19ntt9DAtlWCwkGhbjOxDV7M s3Ag== X-Gm-Message-State: AOAM533jktotJ/Ug4bod3sjMhLHzvxSoxGOxM8XDSW5+p/37tk1WPVN5 VOOJ/Mw2a8pboIXyu2ELEeo0o/t655o= X-Google-Smtp-Source: ABdhPJxIOugkWnRs2F2NdXnqVfslWpkspc5BMg7Jon5zC9qMdq4lZIC8b5rmJIhDOpiQ/a3sUOdWfA== X-Received: by 2002:a05:600c:2945:: with SMTP id n5mr5676257wmd.78.1615423115255; Wed, 10 Mar 2021 16:38:35 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d16sm980853wrx.79.2021.03.10.16.38.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:34 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:27 +0000 Subject: [PATCH v3 4/8] merge-ort: introduce wrappers for alternate tree traversal Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add traverse_trees_wrapper() and traverse_trees_wrapper_callback() functions. The former runs traverse_trees() with info->fn set to traverse_trees_wrapper_callback, in order to simply save all the entries without processing or recursing into any of them. This step allows extra computation to be done (e.g. checking some condition across all files) that can be used later. Then, after that is completed, it iterates over all the saved entries and calls the original info->fn callback with the saved data. Currently, this does nothing more than marginally slowing down the tree traversal since we do not take advantage of the opportunity to compute anything special in traverse_trees_wrapper_callback(), and thus the real callback will be called identically as it would have been without this extra wrapper. However, a subsequent commit will add some special computation of some values that the real callback will be able to use. Signed-off-by: Elijah Newren --- merge-ort.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index d49cfa8b030b..f8f7d06d481a 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -512,6 +512,77 @@ static char *unique_path(struct strmap *existing_paths, /*** Function Grouping: functions related to collect_merge_info() ***/ +static int traverse_trees_wrapper_callback(int n, + unsigned long mask, + unsigned long dirmask, + struct name_entry *names, + struct traverse_info *info) +{ + struct merge_options *opt = info->data; + struct rename_info *renames = &opt->priv->renames; + + assert(n==3); + + if (!renames->callback_data_traverse_path) + renames->callback_data_traverse_path = xstrdup(info->traverse_path); + + ALLOC_GROW(renames->callback_data, renames->callback_data_nr + 1, + renames->callback_data_alloc); + renames->callback_data[renames->callback_data_nr].mask = mask; + renames->callback_data[renames->callback_data_nr].dirmask = dirmask; + COPY_ARRAY(renames->callback_data[renames->callback_data_nr].names, + names, 3); + renames->callback_data_nr++; + + return mask; +} + +/* + * Much like traverse_trees(), BUT: + * - read all the tree entries FIRST, saving them + * - note that the above step provides an opportunity to compute necessary + * additional details before the "real" traversal + * - loop through the saved entries and call the original callback on them + */ +MAYBE_UNUSED +static int traverse_trees_wrapper(struct index_state *istate, + int n, + struct tree_desc *t, + struct traverse_info *info) +{ + int ret, i, old_offset; + traverse_callback_t old_fn; + char *old_callback_data_traverse_path; + struct merge_options *opt = info->data; + struct rename_info *renames = &opt->priv->renames; + + old_callback_data_traverse_path = renames->callback_data_traverse_path; + old_fn = info->fn; + old_offset = renames->callback_data_nr; + + renames->callback_data_traverse_path = NULL; + info->fn = traverse_trees_wrapper_callback; + ret = traverse_trees(istate, n, t, info); + if (ret < 0) + return ret; + + info->traverse_path = renames->callback_data_traverse_path; + info->fn = old_fn; + for (i = old_offset; i < renames->callback_data_nr; ++i) { + info->fn(n, + renames->callback_data[i].mask, + renames->callback_data[i].dirmask, + renames->callback_data[i].names, + info); + } + + renames->callback_data_nr = old_offset; + free(renames->callback_data_traverse_path); + renames->callback_data_traverse_path = old_callback_data_traverse_path; + info->traverse_path = NULL; + return 0; +} + static void setup_path_info(struct merge_options *opt, struct string_list_item *result, const char *current_dir_name, From patchwork Thu Mar 11 00:38:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58B4DC433E9 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A05E64FCD for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230212AbhCKAiv (ORCPT ); Wed, 10 Mar 2021 19:38:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230409AbhCKAih (ORCPT ); Wed, 10 Mar 2021 19:38:37 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05C9FC061574 for ; Wed, 10 Mar 2021 16:38:37 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id u16so25338544wrt.1 for ; Wed, 10 Mar 2021 16:38:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XMmTPgcNN7fjlnFPTnl/MMazJMOaP5Kcor5uUNPOJEM=; b=W+qA/dzaN0ZxI5LNQ+uy8zVmbiA1sHL2wwXVFBVRwStbP7N5GUQimpRBK6Vu+4K/GM ntmPNXvM42Y5XeP2TZ6ELDnwBoEbCbnlfmQi75hSlcnq8vJuRhg/rLsyc3Lz4nuOa8FQ 0YO+CZwu94QjrIcfAiFYZ1hasytHbJZaZ7nj1wxz5ovXDmS/J/4fGDUo75QPvqepqdPf BsXudSUbP3mjV7uewI0WlPq2Jiqevi02GCcx4+S74/Vv3cl8+5of+AXQ+Vf9dEHWEY9N 5HqfTwwfG1Y0hbdxMqbRYgdCQ4LLjBIqJMgIBg8c7JqtcSv2kVKL5+jk0Ja7lzWAMSO9 cnKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XMmTPgcNN7fjlnFPTnl/MMazJMOaP5Kcor5uUNPOJEM=; b=q2ba9tKFYxDF/5aLQ/YbcMEFpKMBmit8vZYaklcy4iwyRInCqJG74S6hp6eL4FzdOc RgMpfVKREnBuAwU5hUe+JqLI79E9vnbe/md0RZP05ZZfcE8w6PextMNhOhYWh34Tzvl8 ptlrc1Kf1Rmg29TluB4PzChiI1Iy/prr8doRUrVn1yGsCoklSIacnPzqiJ4EWyqztxOt nkf/stqzTByyzkSVmM6VFr02XP/idopEiyEmRtmGV0mg0hPKZhw+2B0xPpTzb1oNE21J /D1EVGGZ95/1m36yl/Ov2DKfDmN4ccZRLsLNZCYmuue04MCOy+F4yXnXMBcT7bcq9TAE 7HKA== X-Gm-Message-State: AOAM532arSaenpmKM6kBzaXcj62UdDvAkJ/o3EFFhxxVMAumv0GroGO/ NGQtfcxV6+uI0jBFkYii3uWra4iFUcw= X-Google-Smtp-Source: ABdhPJyePovR+kbJPH6SdbTn75Uj1sAr7SxpL2JLgUzBXi3o6CX8QZYtcZ7NWZhv97dDfX2Yin5fuw== X-Received: by 2002:a05:6000:1a8c:: with SMTP id f12mr5929214wry.173.1615423115812; Wed, 10 Mar 2021 16:38:35 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m17sm1004329wrx.92.2021.03.10.16.38.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:35 -0800 (PST) Message-Id: In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:28 +0000 Subject: [PATCH v3 5/8] merge-ort: precompute whether directory rename detection is needed Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The point of directory rename detection is that if one side of history renames a directory, and the other side adds new files under the old directory, then the merge can move those new files into the new directory. This leads to the following important observation: * If the other side does not add any new files under the old directory, we do not need to detect any renames for that directory. Similarly, directory rename detection had an important requirement: * If a directory still exists on one side of history, it has not been renamed on that side of history. (See section 4 of t6423 or Documentation/technical/directory-rename-detection.txt for more details). Using these two bits of information, we note that directory rename detection is only needed in cases where (1) directories exist in the merge base and on one side of history (i.e. dirmask == 3 or dirmask == 5), and (2) where there is some new file added to that directory on the side where it still exists (thus where the file has filemask == 2 or filemask == 4, respectively). This has to be done in two steps, because we have the dirmask when we are first considering the directory, and won't get the filemasks for the files within it until we recurse into that directory. So, we save dir_rename_mask = dirmask - 1 when we hit a directory that is missing on one side, and then later look for cases of filemask == dir_rename_mask One final note is that as soon as we hit a directory that needs directory rename detection, we will need to detect renames in all subdirectories of that directory as well due to the "majority rules" decision when files are renamed into different directory hierarchies. We arbitrarily use the special value of 0x07 to record when we've hit such a directory. The combination of all the above mean that we introduce a variable named dir_rename_mask (couldn't think of a better name) which has one of the following values as we traverse into a directory: * 0x00: directory rename detection not needed * 0x02 or 0x04: directory rename detection only needed if files added * 0x07: directory rename detection definitely needed We then pass this value through to add_pairs() so that it can mark location_relevant as true only when dir_rename_mask is 0x07. Signed-off-by: Elijah Newren --- merge-ort.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 61 insertions(+), 6 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index f8f7d06d481a..5840832cf3ed 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -108,6 +108,14 @@ struct rename_info { */ struct strset relevant_sources[3]; + /* + * dir_rename_mask: + * 0: optimization removing unmodified potential rename source okay + * 2 or 4: optimization okay, but must check for files added to dir + * 7: optimization forbidden; need rename source in case of dir rename + */ + unsigned dir_rename_mask:3; + /* * callback_data_*: supporting data structures for alternate traversal * @@ -419,6 +427,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_clear(&opti->output, 0); } + renames->dir_rename_mask = 0; + /* Clean out callback_data as well. */ FREE_AND_NULL(renames->callback_data); renames->callback_data_nr = renames->callback_data_alloc = 0; @@ -520,12 +530,16 @@ static int traverse_trees_wrapper_callback(int n, { struct merge_options *opt = info->data; struct rename_info *renames = &opt->priv->renames; + unsigned filemask = mask & ~dirmask; assert(n==3); if (!renames->callback_data_traverse_path) renames->callback_data_traverse_path = xstrdup(info->traverse_path); + if (filemask && filemask == renames->dir_rename_mask) + renames->dir_rename_mask = 0x07; + ALLOC_GROW(renames->callback_data, renames->callback_data_nr + 1, renames->callback_data_alloc); renames->callback_data[renames->callback_data_nr].mask = mask; @@ -544,7 +558,6 @@ static int traverse_trees_wrapper_callback(int n, * additional details before the "real" traversal * - loop through the saved entries and call the original callback on them */ -MAYBE_UNUSED static int traverse_trees_wrapper(struct index_state *istate, int n, struct tree_desc *t, @@ -556,6 +569,8 @@ static int traverse_trees_wrapper(struct index_state *istate, struct merge_options *opt = info->data; struct rename_info *renames = &opt->priv->renames; + assert(renames->dir_rename_mask == 2 || renames->dir_rename_mask == 4); + old_callback_data_traverse_path = renames->callback_data_traverse_path; old_fn = info->fn; old_offset = renames->callback_data_nr; @@ -647,7 +662,8 @@ static void add_pair(struct merge_options *opt, const char *pathname, unsigned side, unsigned is_add /* if false, is_delete */, - unsigned match_mask) + unsigned match_mask, + unsigned dir_rename_mask) { struct diff_filespec *one, *two; struct rename_info *renames = &opt->priv->renames; @@ -655,7 +671,7 @@ static void add_pair(struct merge_options *opt, if (!is_add) { unsigned content_relevant = (match_mask == 0); - unsigned location_relevant = 1; /* FIXME: compute this */ + unsigned location_relevant = (dir_rename_mask == 0x07); if (content_relevant || location_relevant) strset_add(&renames->relevant_sources[side], pathname); @@ -679,6 +695,36 @@ static void collect_rename_info(struct merge_options *opt, struct rename_info *renames = &opt->priv->renames; unsigned side; + /* + * Update dir_rename_mask (determines ignore-rename-source validity) + * + * dir_rename_mask helps us keep track of when directory rename + * detection may be relevant. Basically, whenver a directory is + * removed on one side of history, and a file is added to that + * directory on the other side of history, directory rename + * detection is relevant (meaning we have to detect renames for all + * files within that directory to deduce where the directory + * moved). Also, whenever a directory needs directory rename + * detection, due to the "majority rules" choice for where to move + * it (see t6423 testcase 1f), we also need to detect renames for + * all files within subdirectories of that directory as well. + * + * Here we haven't looked at files within the directory yet, we are + * just looking at the directory itself. So, if we aren't yet in + * a case where a parent directory needed directory rename detection + * (i.e. dir_rename_mask != 0x07), and if the directory was removed + * on one side of history, record the mask of the other side of + * history in dir_rename_mask. + */ + if (renames->dir_rename_mask != 0x07 && + (dirmask == 3 || dirmask == 5)) { + /* simple sanity check */ + assert(renames->dir_rename_mask == 0 || + renames->dir_rename_mask == (dirmask & ~1)); + /* update dir_rename_mask; have it record mask of new side */ + renames->dir_rename_mask = (dirmask & ~1); + } + /* Update dirs_removed, as needed */ if (dirmask == 1 || dirmask == 3 || dirmask == 5) { /* absent_mask = 0x07 - dirmask; sides = absent_mask/2 */ @@ -698,12 +744,14 @@ static void collect_rename_info(struct merge_options *opt, /* Check for deletion on side */ if ((filemask & 1) && !(filemask & side_mask)) add_pair(opt, names, fullname, side, 0 /* delete */, - match_mask & filemask); + match_mask & filemask, + renames->dir_rename_mask); /* Check for addition on side */ if (!(filemask & 1) && (filemask & side_mask)) add_pair(opt, names, fullname, side, 1 /* add */, - match_mask & filemask); + match_mask & filemask, + renames->dir_rename_mask); } } @@ -721,12 +769,14 @@ static int collect_merge_info_callback(int n, */ struct merge_options *opt = info->data; struct merge_options_internal *opti = opt->priv; + struct rename_info *renames = &opt->priv->renames; struct string_list_item pi; /* Path Info */ struct conflict_info *ci; /* typed alias to pi.util (which is void*) */ struct name_entry *p; size_t len; char *fullpath; const char *dirname = opti->current_dir_name; + unsigned prev_dir_rename_mask = renames->dir_rename_mask; unsigned filemask = mask & ~dirmask; unsigned match_mask = 0; /* will be updated below */ unsigned mbase_null = !(mask & 1); @@ -867,8 +917,13 @@ static int collect_merge_info_callback(int n, original_dir_name = opti->current_dir_name; opti->current_dir_name = pi.string; - ret = traverse_trees(NULL, 3, t, &newinfo); + if (renames->dir_rename_mask == 0 || + renames->dir_rename_mask == 0x07) + ret = traverse_trees(NULL, 3, t, &newinfo); + else + ret = traverse_trees_wrapper(NULL, 3, t, &newinfo); opti->current_dir_name = original_dir_name; + renames->dir_rename_mask = prev_dir_rename_mask; for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) free(buf[i]); From patchwork Thu Mar 11 00:38:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFF64C43331 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A8CAA64D99 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230260AbhCKAiw (ORCPT ); Wed, 10 Mar 2021 19:38:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230473AbhCKAih (ORCPT ); Wed, 10 Mar 2021 19:38:37 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AB67C061574 for ; Wed, 10 Mar 2021 16:38:37 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id r10-20020a05600c35cab029010c946c95easo11768166wmq.4 for ; Wed, 10 Mar 2021 16:38:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=kd5td6wc9FWn/SjExJSr1jffv9qVUuehEeLs/f0/N9o=; b=HEoEP0ffM8A2Qs6QS96VSn7MNtjmtiYYC1iJm7K7SwSb1uGqp/ZKLDMxjX4zpnFwpb uNQ9HZkJxkR3bEKpXSPqio3hzFmypQ9SFKi9RHyfcjsBaaV7j3N3NupbmNwpIbeH1JZY D/X0mWnfhnwnIGStrTtk9PHfgsHUTeMDmQr/I10fTX253E4UH/aDOpzKRoTRAwGSsgSS x0tqMuMHG8C2e1MAzvZmM9nnKF7m7xUYHba/B5oKmiY+KJAoMuuBfAQzBhpZvRs4KkNg N4e2BwiJykrH6klBZ9VuQVlSpHnmlXZSmcGJ6IPADbJQnRrEdXuHUKV4a5kpYo01/LtJ Oqow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=kd5td6wc9FWn/SjExJSr1jffv9qVUuehEeLs/f0/N9o=; b=pY+csx4NlcowLOzX8SGeNSRmYstrFLXWrlqxluB+csL7QS22J4dvbE2rfs1Jju2/7F /idsVPqYg9yHiGdYuaqnZqNoaBX5Wy6sh8fpmyXtEbxWgtwK6etSRznpeI2bwq/2sKE6 Pf0x+3HoF+vnHNvwPTQa/43K80d0Ur0hIyGEJrKIEGq7/1UV97tC+91k7xTmxx8PZpU9 mfqs9r+XfDCZmhwwphLCh9Em3loloex4FDfe5AMQM9xlGAzKSgXlHVM+SNkEAwaNbMJo Aa9/cHFnnAgLh+nr/mfb0Jp1TJfpbiPjwzoMImFcgcVJj7dlc18PY9TS4V5ZJZsotACy NsXw== X-Gm-Message-State: AOAM5339w9r7QjCeT9gCtk0ZiNrMs40d2/IHpG8n6TA0rsq3PznSq7ka VZ2QxY/jeqI253vUPMY3j6M/lnl2tJ4= X-Google-Smtp-Source: ABdhPJwEadQV7Byq3dTOXJO3lN6MR4HYy0RrH7G0Q6CxKJdzes7OiZdaqru/LO+8cW4ATYkhIlsUig== X-Received: by 2002:a05:600c:c4:: with SMTP id u4mr5712144wmm.27.1615423116349; Wed, 10 Mar 2021 16:38:36 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a3sm1019488wrt.68.2021.03.10.16.38.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:36 -0800 (PST) Message-Id: <375c9b36861b2068f34e3d6e5073ab768644a030.1615423112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:29 +0000 Subject: [PATCH v3 6/8] merge-ort: use relevant_sources to filter possible rename sources MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The past several commits determined conditions when rename sources might be needed, and filled a relevant_sources strset with those paths. Pass these along to diffcore_rename_extended() to use to limit the sources that we need to detect renames for. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 12.596 s ± 0.061 s 6.003 s ± 0.048 s mega-renames: 130.465 s ± 0.259 s 114.009 s ± 0.236 s just-one-mega: 3.958 s ± 0.010 s 3.489 s ± 0.017 s Signed-off-by: Elijah Newren --- merge-ort.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/merge-ort.c b/merge-ort.c index 5840832cf3ed..eea14024c657 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2208,7 +2208,7 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); diffcore_rename_extended(&diff_opts, - NULL, + &renames->relevant_sources[side_index], &renames->dirs_removed[side_index], &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); From patchwork Thu Mar 11 00:38:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E86F5C4332E for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C69BE64FC4 for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230228AbhCKAiv (ORCPT ); Wed, 10 Mar 2021 19:38:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230250AbhCKAii (ORCPT ); Wed, 10 Mar 2021 19:38:38 -0500 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A43EC061574 for ; Wed, 10 Mar 2021 16:38:38 -0800 (PST) Received: by mail-wm1-x332.google.com with SMTP id d139-20020a1c1d910000b029010b895cb6f2so11779283wmd.5 for ; Wed, 10 Mar 2021 16:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=dMhr+Gv2VLBJYwJz6UREAFs9FMhAGm7N3yrGCW/kKxQ=; b=K25FXreiM1dNTe+1qjAOSrBGkoUo/NaOJ+/LYhpjft2SyDvUyyaD3H5bqs6Uk3Dd19 U7pVyXl4J6Vux8f1Wn1C4+H5jkI7PDsZABLDPh3kzqwfrWvHyZ19VZlqRbnEz0xmCWbO GkBRRl8zKdOe95PzJa7tXRCBtupeHBLYOx3qb5AEpYjW9V72aKe7XsM7PQ/ZOLwGTJXy qaYpSscORpyxbUPiLnSLt0QLbpP9E3gyvGpF7erBhNwgqJjd0+Kc3rlBVT06fBdgciCg 9B8VLssGsa/tBMRFhToNGJS7FphG1FyFI5HppURbqqLd9MjoD1Z6z0+ht9Y3U8O7kuWY By9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=dMhr+Gv2VLBJYwJz6UREAFs9FMhAGm7N3yrGCW/kKxQ=; b=iTxrSvG9gn9vaAqSYZ5vPoxDmah8HImldApkhVb5RP6A4ZvlRbKsEQmByqm3W90YZD ppBAJC/pc6S+lIi9iOO2eRB745ZHZAyZX0zumgad34HmE0DeSuX+xxhKT4zJjq63J4SC qnYJvh8zdifSH/2LVTW+ZgUu8+u2QldFCfxcl+ip1djvTw1DlwWsmBXFvm5YolV5EW7Q 8mqkldBf6tY6uSql86SS8lIvHxpfygkzH2IpXgmYzF3UzgQefolXKpC7buOijdMl5SjY a5iQGDh3sCCFjeFWLPh7HfmMK9GkransK7nSIZIIQtaTAqM/wKXZYZ65Vp0DRoIzun/O Nm/Q== X-Gm-Message-State: AOAM5313DGBIpKdocyy8lgce/GM0s8lDzI2LWoFQKkV6xGSwJnAdNvZp wLg2p9hf3E3SGBwoM1Ppn+3ym1sDopw= X-Google-Smtp-Source: ABdhPJykR2VOMFkJofUQJMnuriU7ieUm87aea6H//uLdCxa73Vs0NXhnDd3VJqOcmPAOxwa02whQCA== X-Received: by 2002:a05:600c:214d:: with SMTP id v13mr5728557wml.7.1615423116915; Wed, 10 Mar 2021 16:38:36 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j12sm1012625wrx.59.2021.03.10.16.38.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:36 -0800 (PST) Message-Id: <80a0c27a74ad0314a84e956acf233f6de3b16252.1615423112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:30 +0000 Subject: [PATCH v3 7/8] merge-ort: skip rename detection entirely if possible MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren diffcore_rename_extended() will do a bunch of setup, then check for exact renames, then abort before inexact rename detection if there are no more sources or destinations that need to be matched. It will sometimes be the case, however, that either * we start with neither any sources or destinations * we start with no *relevant* sources In the first of these two cases, the setup and exact rename detection will be very cheap since there are 0 files to operate on. In the second case, it is quite possible to have thousands of files with none of the source ones being relevant. Avoid calling diffcore_rename_extended() or even some of the setup before diffcore_rename_extended() when we can determine that rename detection is unnecessary. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 6.003 s ± 0.048 s 5.708 s ± 0.111 s mega-renames: 114.009 s ± 0.236 s 102.171 s ± 0.440 s just-one-mega: 3.489 s ± 0.017 s 3.471 s ± 0.015 s Signed-off-by: Elijah Newren --- merge-ort.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index eea14024c657..bd089cb9a76f 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2157,6 +2157,19 @@ static int process_renames(struct merge_options *opt, return clean_merge; } +static inline int possible_side_renames(struct rename_info *renames, + unsigned side_index) +{ + return renames->pairs[side_index].nr > 0 && + !strset_empty(&renames->relevant_sources[side_index]); +} + +static inline int possible_renames(struct rename_info *renames) +{ + return possible_side_renames(renames, 1) || + possible_side_renames(renames, 2); +} + static void resolve_diffpair_statuses(struct diff_queue_struct *q) { /* @@ -2193,6 +2206,16 @@ static void detect_regular_renames(struct merge_options *opt, struct diff_options diff_opts; struct rename_info *renames = &opt->priv->renames; + if (!possible_side_renames(renames, side_index)) { + /* + * No rename detection needed for this side, but we still need + * to make sure 'adds' are marked correctly in case the other + * side had directory renames. + */ + resolve_diffpair_statuses(&renames->pairs[side_index]); + return; + } + repo_diff_setup(opt->repo, &diff_opts); diff_opts.flags.recursive = 1; diff_opts.flags.rename_empty = 0; @@ -2310,6 +2333,8 @@ static int detect_and_process_renames(struct merge_options *opt, int need_dir_renames, s, clean = 1; memset(&combined, 0, sizeof(combined)); + if (!possible_renames(renames)) + goto cleanup; trace2_region_enter("merge", "regular renames", opt->repo); detect_regular_renames(opt, MERGE_SIDE1); @@ -2344,6 +2369,25 @@ static int detect_and_process_renames(struct merge_options *opt, clean &= process_renames(opt, &combined); trace2_region_leave("merge", "process renames", opt->repo); + goto simple_cleanup; /* collect_renames() handles some of cleanup */ + +cleanup: + /* + * Free now unneeded filepairs, which would have been handled + * in collect_renames() normally but we skipped that code. + */ + for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) { + struct diff_queue_struct *side_pairs; + int i; + + side_pairs = &renames->pairs[s]; + for (i = 0; i < side_pairs->nr; ++i) { + struct diff_filepair *p = side_pairs->queue[i]; + diff_free_filepair(p); + } + } + +simple_cleanup: /* Free memory for renames->pairs[] and combined */ for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) { free(renames->pairs[s].queue); From patchwork Thu Mar 11 00:38:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12129877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0BA5C4332B for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8215464FBA for ; Thu, 11 Mar 2021 00:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230319AbhCKAiw (ORCPT ); Wed, 10 Mar 2021 19:38:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231209AbhCKAij (ORCPT ); Wed, 10 Mar 2021 19:38:39 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BBBAC061574 for ; Wed, 10 Mar 2021 16:38:38 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id b18so25343767wrn.6 for ; Wed, 10 Mar 2021 16:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=C9kP6y0QQbh5bjK+FNiTeo+2CujVbad8UAP6L8gcidQ=; b=N/a7RaSEn5vbWk0rZ6cB4Kk3r2kPDOF1x5bPVwpusn0z5Rl53qd0CD4pU0b3Qgvbqg QTbgLoeyAquvX6IY8z9XfkMkHI9v/+Zph2siYQvNMUvMkLGJvajsD96mdoGdjR8dvpE/ Woo2ePu7hAafZ8Vk8JT4v7u9WsAJy9/smcN/6g5y+TCnXE8OH5bAz+QFsE8R0au5wGop MrvqH9IP4x1N6eMBpSZLiyBRJOthGBvFhPUX4ITkMdk+MP6MKXeSTj2YTacOdgNq4+yZ dPJdSGfgVS4XVRvBwCbNfr+mblNbpVxOQkvohFejCtJB++/+fqamw3De+4uCvpqR+zYO CLeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=C9kP6y0QQbh5bjK+FNiTeo+2CujVbad8UAP6L8gcidQ=; b=G6H0pbc1ckppF36vhXL0VShO3rlBO2Y+L0KbTF1CnvWK9fS4aMOFa76RGJWn/HflMK 9sIMwi/WISgghfIY2kj7j85gs5bDP+TdKM5r5bYgMHHJfWvYwUz8Do+lO5JabaA8m7AN DYUaETCpAJPnY6N9qH6Of3+XM2Fx9eDQJiaRkjzramD093F/868dQCuEyJj+HphTvTBh vSZwBaN2JL5ZTK7eE1FYCTldVvG8s87rTKefbnLxtZTxatJxq/DqetUnffF5ZOcoRqwi vrORLJJ1mwuTzwY61OcJQ2PKwloY89QuuVUrVto9dfNOvpUNhk2AhTbrDk4HzNLcdYgS um7Q== X-Gm-Message-State: AOAM531rWcg9RvRig2RgxBrhkp/gEEsgc/3iF6cQ/VU470Si/R9KsuKv A8Ft61jqAXfoKBerE+ZXXQaiHmDJtQ4= X-Google-Smtp-Source: ABdhPJzMZSZYf2nmpo9lxPpL2TM+VZoq3NXxOEXTcqOAN/wY0Ea1mBBObnuwq+0skSKrh3jEBDrYoA== X-Received: by 2002:a5d:63d2:: with SMTP id c18mr5902365wrw.277.1615423117471; Wed, 10 Mar 2021 16:38:37 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z3sm985867wrw.96.2021.03.10.16.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 16:38:37 -0800 (PST) Message-Id: <98b0c7de5e70d62d47c3eeb3d290c6a234214f40.1615423112.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 11 Mar 2021 00:38:31 +0000 Subject: [PATCH v3 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Derrick Stolee , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The basename comparison optimization implemented in find_basename_matches() is very beneficial since it allows a source to sometimes only be compared with one other file instead of N other files. When a match is found, both a source and destination can be removed from the matrix of inexact rename comparisons. In contrast, the irrelevant source optimization only allows us to remove a source from the matrix of inexact rename comparisons...but it has the advantage of allowing a source file to not even be loaded into memory at all and be compared to 0 other files. Generally, not even comparing is a bigger performance win, so when both optimizations could apply, prefer to use the irrelevant-source optimization. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.708 s ± 0.111 s 5.680 s ± 0.096 s mega-renames: 102.171 s ± 0.440 s 13.812 s ± 0.162 s just-one-mega: 3.471 s ± 0.015 s 506.0 ms ± 3.9 ms Signed-off-by: Elijah Newren --- diffcore-rename.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7f6115fd9018..e8508541be14 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -527,6 +527,7 @@ static void update_dir_rename_counts(struct dir_rename_info *info, } static void initialize_dir_rename_info(struct dir_rename_info *info, + struct strset *relevant_sources, struct strset *dirs_removed, struct strmap *dir_rename_count) { @@ -534,7 +535,7 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strmap_entry *entry; int i; - if (!dirs_removed) { + if (!dirs_removed && !relevant_sources) { info->setup = 0; return; } @@ -549,7 +550,20 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strmap_init_with_options(&info->dir_rename_guess, NULL, 0); /* Setup info->relevant_source_dirs */ - info->relevant_source_dirs = dirs_removed; + info->relevant_source_dirs = NULL; + if (dirs_removed || !relevant_sources) { + info->relevant_source_dirs = dirs_removed; /* might be NULL */ + } else { + info->relevant_source_dirs = xmalloc(sizeof(struct strintmap)); + strset_init(info->relevant_source_dirs); + strset_for_each_entry(relevant_sources, &iter, entry) { + char *dirname = get_dirname(entry->key); + if (!dirs_removed || + strset_contains(dirs_removed, dirname)) + strset_add(info->relevant_source_dirs, dirname); + free(dirname); + } + } /* * Loop setting up both info->idx_map, and doing setup of @@ -627,6 +641,13 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, /* dir_rename_guess */ strmap_clear(&info->dir_rename_guess, 1); + /* relevant_source_dirs */ + if (info->relevant_source_dirs && + info->relevant_source_dirs != dirs_removed) { + strset_clear(info->relevant_source_dirs); + FREE_AND_NULL(info->relevant_source_dirs); + } + /* dir_rename_count */ if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); @@ -749,6 +770,7 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info) static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, + struct strset *relevant_sources, struct strset *dirs_removed) { /* @@ -839,6 +861,11 @@ static int find_basename_matches(struct diff_options *options, intptr_t src_index; intptr_t dst_index; + /* Skip irrelevant sources */ + if (relevant_sources && + !strset_contains(relevant_sources, filename)) + continue; + /* * If the basename is unique among remaining sources, then * src_index will equal 'i' and we can attempt to match it @@ -1164,7 +1191,7 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); - initialize_dir_rename_info(&info, + initialize_dir_rename_info(&info, relevant_sources, dirs_removed, dir_rename_count); trace2_region_leave("diff", "dir rename setup", options->repo); @@ -1172,7 +1199,9 @@ void diffcore_rename_extended(struct diff_options *options, trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, min_basename_score, - &info, dirs_removed); + &info, + relevant_sources, + dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); /*