From patchwork Sun Feb 28 03:58:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107985 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50FD2C433DB for ; Sun, 28 Feb 2021 03:59:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 170A664E38 for ; Sun, 28 Feb 2021 03:59:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230230AbhB1D7S (ORCPT ); Sat, 27 Feb 2021 22:59:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230090AbhB1D7K (ORCPT ); Sat, 27 Feb 2021 22:59:10 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A996C061756 for ; Sat, 27 Feb 2021 19:58:30 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id e23so437473wmh.3 for ; Sat, 27 Feb 2021 19:58:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=HfxVlplU/cnoYrBNq0JnG8uKRsCkGyRl+116KdStR84=; b=lZkayiStyDXsIkyggVTQbCm0yCpU/rqNcp9g73y7tDPxtnl6lSl+sGa3gqJ8zsGm54 u6AwmJM1KAiCIkmhKxAzhd38dU3qcIrXhqVWT/sWJrEcX0Kpki9YTjbv+Dk/YMnfKg16 5NjO25aP4k5nJTM4jNH+R1HF+oKa1GydGYp8tWvlAToei+HQnLWT2p/bwv1UcG9kHblg HCmtHzG11AUN7C1U53DXJ2zG3BaGGiWTnKt5LgDp4Q475hYeDWJj+2/y1mwNML6/wJzU Nvgf28SIj0zgWJq3xHjeD6i9ulCAm6W7ncakSZkhE2Z+Y78B/q0Mrfqc4KgFoa99SIq4 ZoRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=HfxVlplU/cnoYrBNq0JnG8uKRsCkGyRl+116KdStR84=; b=AODj1a/LYA9o7RGOqrQMxkkM0/pJ8FhuiMRzx4xqlTfjTO6INqDtWTtcOYSvKBAPnl mLnBRDNlqmaQkmZUJeeXMrKFZFMcKrChtWHmNDkX6UAdSlbNqct3Ly/9x9ILHqRW4nex aIR7nLOxjbXZ84x9zkuuHi3n5EaMm0dNLX/vVTY5a5P13s8YuEbHt4Pxjb6qErqm8B0R OWoYKds459+TeJSfU70FtgWDcP0VToIgxyMOqN5KU5r0TLEAfE37AsOZPO6LxWpUpRT6 fup2woobrgKwAK7bVXI6YHszJwdXnw7osWPqdvFzXhQYZM4Fhe+2AkHP9E6Pk1evZaL4 6cTg== X-Gm-Message-State: AOAM533wIIl+Pi5DFoii4CSVt2wu/8d2Y7SppyaoDrhgI/FEGcYfB562 kNQ41ScrSI1PYQAjM9TkfUN+gWAbrpU= X-Google-Smtp-Source: ABdhPJzBb7Tl+2P8vykufuCgM4okGNGJfwCxcWX+3hZWGciLGJ1rSVfVXlkjqH3/iaAXE3ANS7riTw== X-Received: by 2002:a1c:7301:: with SMTP id d1mr9525798wmb.33.1614484708805; Sat, 27 Feb 2021 19:58:28 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u12sm15754606wmq.38.2021.02.27.19.58.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:28 -0800 (PST) Message-Id: <064fa5de1e20ada3d9b2225d8561f4f6429bdf02.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:19 +0000 Subject: [PATCH 1/8] diffcore-rename: enable filtering possible rename sources Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add the ability to diffcore_rename_extended() to allow external callers to declare that they only need renames detected for a subset of source files, and use that information to skip detecting renames for them. There are two important pieces to this optimization that may not be obvious at first glance: * We do not require callers to just filter the filepairs out to remove the non-relevant sources, because exact rename detection is fast and when it finds a match it can remove both a source and a destination whereas the relevant_sources filter can only remove a source. * We need to filter out the source pairs in a preliminary pass instead of adding a strset_contains(relevant_sources, one->path) check within the nested matrix loop. The reason for that is if we have 30k renames, doing 30k * 30k = 900M strset_contains() calls becomes extraordinarily expensive and defeats the performance gains from this change; we only want to do 30k such calls instead. If callers pass NULL for relevant_sources, that is special cases to treat all sources as relevant. Since all callers currently pass NULL, this optimization does not yet have any effect. Subsequent commits will have merge-ort compute a set of relevant_sources to restrict which sources we detect renames for, and have merge-ort pass that set of relevant_sources to diffcore_rename_extended(). Signed-off-by: Elijah Newren --- diffcore-rename.c | 26 +++++++++++++++++++------- diffcore.h | 1 + merge-ort.c | 1 + 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 1fe902ed2af0..7f6115fd9018 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -991,11 +991,12 @@ static int find_renames(struct diff_score *mx, return count; } -static void remove_unneeded_paths_from_src(int detecting_copies) +static void remove_unneeded_paths_from_src(int detecting_copies, + struct strset *interesting) { int i, new_num_src; - if (detecting_copies) + if (detecting_copies && !interesting) return; /* nothing to remove */ if (break_idx) return; /* culling incompatible with break detection */ @@ -1022,12 +1023,18 @@ static void remove_unneeded_paths_from_src(int detecting_copies) * from rename_src here. */ for (i = 0, new_num_src = 0; i < rename_src_nr; i++) { + struct diff_filespec *one = rename_src[i].p->one; + /* * renames are stored in rename_dst, so if a rename has * already been detected using this source, we can just * remove the source knowing rename_dst has its info. */ - if (rename_src[i].p->one->rename_used) + if (!detecting_copies && one->rename_used) + continue; + + /* If we don't care about the source path, skip it */ + if (interesting && !strset_contains(interesting, one->path)) continue; if (new_num_src < i) @@ -1040,6 +1047,7 @@ static void remove_unneeded_paths_from_src(int detecting_copies) } void diffcore_rename_extended(struct diff_options *options, + struct strset *relevant_sources, struct strset *dirs_removed, struct strmap *dir_rename_count) { @@ -1060,6 +1068,8 @@ void diffcore_rename_extended(struct diff_options *options, want_copies = (detect_rename == DIFF_DETECT_COPY); if (dirs_removed && (break_idx || want_copies)) BUG("dirs_removed incompatible with break/copy detection"); + if (break_idx && relevant_sources) + BUG("break detection incompatible with source specification"); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -1127,9 +1137,10 @@ void diffcore_rename_extended(struct diff_options *options, /* * Cull sources: * - remove ones corresponding to exact renames + * - remove ones not found in relevant_sources */ trace2_region_enter("diff", "cull after exact", options->repo); - remove_unneeded_paths_from_src(want_copies); + remove_unneeded_paths_from_src(want_copies, relevant_sources); trace2_region_leave("diff", "cull after exact", options->repo); } else { /* Determine minimum score to match basenames */ @@ -1148,7 +1159,7 @@ void diffcore_rename_extended(struct diff_options *options, * - remove ones involved in renames (found via exact match) */ trace2_region_enter("diff", "cull after exact", options->repo); - remove_unneeded_paths_from_src(want_copies); + remove_unneeded_paths_from_src(want_copies, NULL); trace2_region_leave("diff", "cull after exact", options->repo); /* Preparation for basename-driven matching. */ @@ -1167,9 +1178,10 @@ void diffcore_rename_extended(struct diff_options *options, /* * Cull sources, again: * - remove ones involved in renames (found via basenames) + * - remove ones not found in relevant_sources */ trace2_region_enter("diff", "cull basename", options->repo); - remove_unneeded_paths_from_src(want_copies); + remove_unneeded_paths_from_src(want_copies, relevant_sources); trace2_region_leave("diff", "cull basename", options->repo); } @@ -1342,5 +1354,5 @@ void diffcore_rename_extended(struct diff_options *options, void diffcore_rename(struct diff_options *options) { - diffcore_rename_extended(options, NULL, NULL); + diffcore_rename_extended(options, NULL, NULL, NULL); } diff --git a/diffcore.h b/diffcore.h index c6ba64abd198..737c93a6cc79 100644 --- a/diffcore.h +++ b/diffcore.h @@ -166,6 +166,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count); void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, + struct strset *relevant_sources, struct strset *dirs_removed, struct strmap *dir_rename_count); void diffcore_merge_broken(void); diff --git a/merge-ort.c b/merge-ort.c index 467404cc0a35..aba0b9fa54c3 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2029,6 +2029,7 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); diffcore_rename_extended(&diff_opts, + NULL, &renames->dirs_removed[side_index], &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); From patchwork Sun Feb 28 03:58:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 661F5C433E6 for ; Sun, 28 Feb 2021 03:59:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3016C64E21 for ; Sun, 28 Feb 2021 03:59:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230248AbhB1D70 (ORCPT ); Sat, 27 Feb 2021 22:59:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230148AbhB1D7L (ORCPT ); Sat, 27 Feb 2021 22:59:11 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B5C5C061786 for ; Sat, 27 Feb 2021 19:58:30 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id e10so12270435wro.12 for ; Sat, 27 Feb 2021 19:58:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=epK0IZOqPbqDp9yCjCsz0W5BjBwKPxEN87dSlW9uDUY=; b=EDSuLkjv7p7sABhyGb+afWCNj9GmuJ8nUeU09gHu/JcB+9LTVxsh8xtPDGlr3v2qOY NPuSKtxJCD9kUNLqCg2hn7crzYKPM9DctMyBd9/QJMJys7XKpQm/AulVBDV/iLlxHI2W EQUQZ+j7iiCBlUJdqKAV5evg9yIun5zQfaWl9zg8ECuXi8kRVoZh0YCD0Y0AVTvrpsE8 neCW9a0lxO8qzkwEegoCL5NQVEInTOp4CrP9xt+nagcpBd5ybPitmwbW16n+WDp215dn hfvmB2Dg4wsF736peVXceI/TiYyZCLdsyS4l5QQe8n5zV7LeCQi5p6rtRnes2NkUbMQ7 FxUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=epK0IZOqPbqDp9yCjCsz0W5BjBwKPxEN87dSlW9uDUY=; b=kIwkuAJXqThb2A2rRKvZNxv1D5INnj+2/wpmOv7b8Zp20Nv6jcdeYw0Wxld1CnQJqk 85cZU5M42e2JsNlecXfZIVAhuSlP2sIVOyOocgzEhVstbPjXmu7XYdPIMe8GAB7kpPQG 1WB9JmYWlE8X76ecENDyRLhesa/VytECdMDYsWJJSMInLKcAQSmyyUDfyqTPekKxYagU hYsfu2tmyx5Oc0nWR4XYYJHsnKgMPIYG2+sCN5ADcASkuctEysly1ocgLdE8gUyYgSVG SYwme9gebbW1vmH1FpfRHu39Mf2RugaUb2b3dLExMUmNOrpg7ZTNZhR/P56ikSzzBPCi wf2g== X-Gm-Message-State: AOAM5335VyIbk/gDbW7FvQZL7s74sVE3qJAadsWIVTbtV0542GIUmxit Y699FF9Ta+ZcR2kfQBvWarThqsQXrXU= X-Google-Smtp-Source: ABdhPJxKhTT5W2JYhV1KxeyRbeTRQEaosZ9DJNq4zQs0+L9q6ynAvCql8KelVHaIoV+Uu7noEZZKpA== X-Received: by 2002:adf:b313:: with SMTP id j19mr10009489wrd.188.1614484709360; Sat, 27 Feb 2021 19:58:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o9sm18007926wmc.8.2021.02.27.19.58.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:29 -0800 (PST) Message-Id: <69b42a41e83a006408b6f875dd17a0714be5b9ad.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:20 +0000 Subject: [PATCH 2/8] merge-ort: precompute subset of sources for which we need rename detection Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. This commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren --- merge-ort.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index aba0b9fa54c3..83aa4c08121f 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -88,6 +88,20 @@ struct rename_info { */ struct strmap dir_renames[3]; + /* + * relevant_sources: deleted paths for which we need rename detection + * + * relevant_sources is a set of deleted paths on each side of + * history for which we need rename detection. If a path is deleted + * on one side of history, we need to detect if it is part of a + * rename if either + * * we need to detect renames for an ancestor directory + * * the file is modified/deleted on the other side of history + * If neither of those are true, we can skip rename detection for + * that path. + */ + struct strset relevant_sources[3]; + /* * needed_limit: value needed for inexact rename detection to run * @@ -358,6 +372,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_clear(&renames->dir_rename_count[i], 1); strmap_func(&renames->dir_renames[i], 0); + + strset_func(&renames->relevant_sources[i]); } if (!reinitialize) { @@ -533,12 +549,21 @@ static void add_pair(struct merge_options *opt, struct name_entry *names, const char *pathname, unsigned side, - unsigned is_add /* if false, is_delete */) + unsigned is_add /* if false, is_delete */, + unsigned match_mask) { struct diff_filespec *one, *two; struct rename_info *renames = &opt->priv->renames; int names_idx = is_add ? side : 0; + if (!is_add) { + unsigned content_relevant = (match_mask == 0); + unsigned location_relevant = 1; /* FIXME: compute this */ + + if (content_relevant || location_relevant) + strset_add(&renames->relevant_sources[side], pathname); + } + one = alloc_filespec(pathname); two = alloc_filespec(pathname); fill_filespec(is_add ? two : one, @@ -575,11 +600,13 @@ static void collect_rename_info(struct merge_options *opt, /* Check for deletion on side */ if ((filemask & 1) && !(filemask & side_mask)) - add_pair(opt, names, fullname, side, 0 /* delete */); + add_pair(opt, names, fullname, side, 0 /* delete */, + match_mask & filemask); /* Check for addition on side */ if (!(filemask & 1) && (filemask & side_mask)) - add_pair(opt, names, fullname, side, 1 /* add */); + add_pair(opt, names, fullname, side, 1 /* add */, + match_mask & filemask); } } @@ -3228,6 +3255,8 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) NULL, 1); strmap_init_with_options(&renames->dir_renames[i], NULL, 0); + strset_init_with_options(&renames->relevant_sources[i], + NULL, 0); } /* From patchwork Sun Feb 28 03:58:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77E52C433E9 for ; Sun, 28 Feb 2021 03:59:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C05964E4B for ; Sun, 28 Feb 2021 03:59:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230148AbhB1D72 (ORCPT ); Sat, 27 Feb 2021 22:59:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230190AbhB1D7L (ORCPT ); Sat, 27 Feb 2021 22:59:11 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0342EC061788 for ; Sat, 27 Feb 2021 19:58:31 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id u125so11212778wmg.4 for ; Sat, 27 Feb 2021 19:58:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=fcQkjCw6zC06N7gSUFzG0EzW9qGvc97YSBo5+h04zSE=; b=NLK1AJX8DsDdfcFMBPF2w8upNoZ2p/u9Ax0ytw1MR8LoS0K5Chrb3TgdZp68KbS6Xf afwl1H9o0Uaao1/+pYYp9KsbW+C+SKWzPtDyHGES0Sa8ACppPlI5lFv0uvCO6/Naq7pl iJaaHsIci3cmKlLwFNsBshTL1KrTaK6OGiEi+8eFhLabga4seh8VpBtExNn3FKu/0FcJ BtsGexJH5iYY/dhswWeokLi/BVH4Sr59UB6gniYrIvMX0X/i/az/+SsFavt9uUGP0dEd 41M5350bGxZuB2/pn0j5HzE4ztOhEVy1qaNW6BDo+2rZ/IPoETja2CJOoJ0S/eJHm4lr uugA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=fcQkjCw6zC06N7gSUFzG0EzW9qGvc97YSBo5+h04zSE=; b=Cw4Tzn8yuf2F5fxKrfj5HqACZj44hfCMhHfzujXUnOqjUtlnk+6HNWwWBAMAXmVi71 QrwUKoRiKrBmxK2ig/wAXTmksGAE5Owg/c+pEnzClHzm6lkzuBT8fgOxx0I2vNLUJrgm A/DakdZ4rssmi8qrBnMf08na0/coIFVHy+4VkM5fPF+V4Zhum7wLunkrL04mjdD37D1W 9Ghdq6O+zseRvnYGpH8AtdcTYn3cBLqBBXD5uswruVY4Ec05dimU12sTORljcYphq/h6 sKZjwO9VXZfqfnxtXmjsDQsd14NEnvtTSDeGYgmDqVLKAlmbc1e62rHf0eAI8bkWUlcq I5kg== X-Gm-Message-State: AOAM533xjfTLa1E4l4lpXNaRWLqjdanFepJHGhyrX2pZjRrUyiZLF0NU gLwg8vXOHLoIyzlP3vzwnuVI7NLiGlE= X-Google-Smtp-Source: ABdhPJxpsnJJagCFkl7o5yoIZLdJOEnGOKbYUqYlfcA3cD7YaMDQPjeoW8lSep3U1zqryICvmEWKrA== X-Received: by 2002:a05:600c:358d:: with SMTP id p13mr9867213wmq.152.1614484709853; Sat, 27 Feb 2021 19:58:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a198sm9913388wmd.11.2021.02.27.19.58.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:29 -0800 (PST) Message-Id: <042ce66011efabcf15be697816d9a97b76877fbf.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:21 +0000 Subject: [PATCH 3/8] merge-ort: add data structures for an alternate tree traversal Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren In order to determine whether directory rename detection is needed, we as a pre-requisite need a way to traverse through all the files in a given tree before visiting any directories within that tree. traverse_trees() only iterates through the entries in the order they appear, so add some data structures that will store all the entries as we iterate through them in traverse_trees(), which will allow us to re-traverse them in our desired order. Signed-off-by: Elijah Newren --- merge-ort.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 83aa4c08121f..d49cfa8b030b 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -51,6 +51,12 @@ enum merge_side { MERGE_SIDE2 = 2 }; +struct traversal_callback_data { + unsigned long mask; + unsigned long dirmask; + struct name_entry names[3]; +}; + struct rename_info { /* * All variables that are arrays of size 3 correspond to data tracked @@ -102,6 +108,22 @@ struct rename_info { */ struct strset relevant_sources[3]; + /* + * callback_data_*: supporting data structures for alternate traversal + * + * We sometimes need to be able to traverse through all the files + * in a given tree before all immediate subdirectories within that + * tree. Since traverse_trees() doesn't do that naturally, we have + * a traverse_trees_wrapper() that stores any immediate + * subdirectories while traversing files, then traverses the + * immediate subdirectories later. These callback_data* variables + * store the information for the subdirectories so that we can do + * that traversal order. + */ + struct traversal_callback_data *callback_data; + int callback_data_nr, callback_data_alloc; + char *callback_data_traverse_path; + /* * needed_limit: value needed for inexact rename detection to run * @@ -396,6 +418,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, } strmap_clear(&opti->output, 0); } + + /* Clean out callback_data as well. */ + FREE_AND_NULL(renames->callback_data); + renames->callback_data_nr = renames->callback_data_alloc = 0; } static int err(struct merge_options *opt, const char *err, ...) From patchwork Sun Feb 28 03:58:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 186DBC433DB for ; Sun, 28 Feb 2021 03:59:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3FA964E21 for ; Sun, 28 Feb 2021 03:59:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230261AbhB1D73 (ORCPT ); Sat, 27 Feb 2021 22:59:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230206AbhB1D7L (ORCPT ); Sat, 27 Feb 2021 22:59:11 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76B28C06178A for ; Sat, 27 Feb 2021 19:58:31 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id o16so11245364wmh.0 for ; Sat, 27 Feb 2021 19:58:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=qzZe6EEUXndnEwCz4cXC0NUftpDdVBGzVehx3WE1KBs=; b=tmcdGP0E5MWU8GXtANzpgewe1Yb9TN/4rlt3Im8oyLAdOpvdFtYyIrnntuGAXwNm49 uM4RMxbSPgXAu976UMqXfChrRfCnm5zniLrZwF4r23HO7c/uY+4tjw5JtT2HMBLirX9o GxPOVaV/WRZlP1qssnXFvjUM6vSlK9w4fEXCsrfCPIGwEmwk5VzlcgX1EglZppdU000m AGCgyZzm6zCzPt5wliHNpQVx+QFPWAvGtcfDKfuzVPgaOeh3jEpJ9rOFxQRfACGA0plq +mtaODyxORyVkZzYZh+u1oB2qESgLhNV7KgYQpH7mvlGjYZtb8UK6yBmyHdHqSuCrQTY /PNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=qzZe6EEUXndnEwCz4cXC0NUftpDdVBGzVehx3WE1KBs=; b=C35beN9/V5esi4YuS/N0l3ZeL2Yonad4FarIZlDx5xPoqX+UF6YnlO1jmbwEz7RcJ1 GOHGv8bEar32SL+lzNlBl7YmZxTmkrHG1+y4ahjFUNuoB7vVJSoTYQXgJ6WTiZjx3zb4 GUNxYHhJPF5gnhOMUmgbQq/hDVV4oq7mYXnFy9pmGaf2xpKODiH+l9ZsPIlLXenCwf/+ uQ/ZuLLcUJY79DuZQw5FAQ1sp6kZmOrI28oAFBt+Fj0UdMrQkNslg2dJ6VvflaFyrkjP YjFAQ6JNs3XLaqosJ2V2/BJa+zns/fbBRG3dF56diQiAIhEqIyH+fz2xACotxi+2Tv+e A2zQ== X-Gm-Message-State: AOAM530Bn9C4jS51cGLlRJL6Zd18y379rrcKOqFXTYEFs97cTar9Luk4 zDrowQRPUft/VoMhIzdtRrjzloVk2+Y= X-Google-Smtp-Source: ABdhPJxR5NWA1D56TI9KSmqr9nRiZxwNSGisy9h2p1UBM8y131TX+akJEXhQFBgI2gCVC+LXPq0MlQ== X-Received: by 2002:a05:600c:4fd5:: with SMTP id o21mr9389775wmq.20.1614484710323; Sat, 27 Feb 2021 19:58:30 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c62sm17536825wme.16.2021.02.27.19.58.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:30 -0800 (PST) Message-Id: <7673e4c23bbb4eb1ee625467a6515ff52145e79c.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:22 +0000 Subject: [PATCH 4/8] merge-ort: introduce wrappers for alternate tree traversal Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add traverse_trees_wrapper() and traverse_trees_wrapper_callback() functions. The former runs traverse_trees() with info->fn set to traverse_trees_wrapper_callback, in order to simply save all the entries without processing or recursing into any of them. This step allows extra computation to be done (e.g. checking some condition across all files) that can be used later. Then, after that is completed, it iterates over all the saved entries and calls the original info->fn callback with the saved data. Currently, this does nothing more than marginally slowing down the tree traversal since we do not take advantage of the opportunity to compute anything special in traverse_trees_wrapper_callback(), and thus the real callback will be called identically as it would have been without this extra wrapper. However, a subsequent commit will add some special computation of some values that the real callback will be able to use. Signed-off-by: Elijah Newren --- merge-ort.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index d49cfa8b030b..bd2b93a31141 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -512,6 +512,78 @@ static char *unique_path(struct strmap *existing_paths, /*** Function Grouping: functions related to collect_merge_info() ***/ +static int traverse_trees_wrapper_callback(int n, + unsigned long mask, + unsigned long dirmask, + struct name_entry *names, + struct traverse_info *info) +{ + struct merge_options *opt = info->data; + struct rename_info *renames = &opt->priv->renames; + + assert(n==3); + + if (!renames->callback_data_traverse_path) + renames->callback_data_traverse_path = xstrdup(info->traverse_path); + + ALLOC_GROW(renames->callback_data, renames->callback_data_nr + 1, + renames->callback_data_alloc); + renames->callback_data[renames->callback_data_nr].mask = mask; + renames->callback_data[renames->callback_data_nr].dirmask = dirmask; + COPY_ARRAY(renames->callback_data[renames->callback_data_nr].names, + names, 3); + renames->callback_data_nr++; + + return mask; +} + +/* + * Much like traverse_trees(), BUT: + * - read all the tree entries FIRST, saving them + * - note that the above step provides an opportunity to compute necessary + * additional details before the "real" traversal + * - loop through the saved entries and call the original callback on them + */ +MAYBE_UNUSED +static int traverse_trees_wrapper(struct index_state *istate, + int n, + struct tree_desc *t, + struct traverse_info *info) +{ + int ret, i, old_offset; + traverse_callback_t old_fn; + char *old_callback_data_traverse_path; + struct merge_options *opt = info->data; + struct rename_info *renames = &opt->priv->renames; + + old_callback_data_traverse_path = renames->callback_data_traverse_path; + old_fn = info->fn; + old_offset = renames->callback_data_nr; + + renames->callback_data_traverse_path = NULL; + info->fn = traverse_trees_wrapper_callback; + ret = traverse_trees(istate, n, t, info); + if (ret < 0) + return ret; + + info->traverse_path = renames->callback_data_traverse_path; + info->fn = old_fn; + for (i = old_offset; i < renames->callback_data_nr; ++i) { + info->fn(n, + renames->callback_data[i].mask, + renames->callback_data[i].dirmask, + renames->callback_data[i].names, + info); + + } + + renames->callback_data_nr = old_offset; + free(renames->callback_data_traverse_path); + renames->callback_data_traverse_path = old_callback_data_traverse_path; + info->traverse_path = NULL; + return 0; +} + static void setup_path_info(struct merge_options *opt, struct string_list_item *result, const char *current_dir_name, From patchwork Sun Feb 28 03:58:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B9C7C433E0 for ; Sun, 28 Feb 2021 03:59:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6ABCE64E38 for ; Sun, 28 Feb 2021 03:59:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230240AbhB1D74 (ORCPT ); Sat, 27 Feb 2021 22:59:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230063AbhB1D7v (ORCPT ); Sat, 27 Feb 2021 22:59:51 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C0FCC06178B for ; Sat, 27 Feb 2021 19:58:32 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id f12so8703653wrx.8 for ; Sat, 27 Feb 2021 19:58:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=eExpm0932XIGGpgZpJYMwF54JEpa10NnV/4dpF0C2CM=; b=Rg9S9M3oNkEWTE34iW8nAgZM3qF+tI+cb2kWcxzs9iySytifXc44MPSTGQ+h1d/gm1 KlAf3yC0yht2GYAB/fIV8d3OaeY6kqB0zh4GIWuO/nppvSPLAU/OnoA7iW6WChPEqEtI ZuAGv9X+wKN4IGrunKhdAcpPi1srwRg4X1g8Kq6kRMsDInB1UTnNJdm5TWWKsFKYu0qm cecazncip3TCalNrzYf0Xx++88zkc+a7UdMlSTJsHXdl2DhfQ+JFfhE/8MU9FXaQaXHR n637e5kQCuBtXSALD8WU97QZLaqhMUNrVyWkRmrLRudKIm0m77hiBO0wWXiHm+VyKByv L+3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=eExpm0932XIGGpgZpJYMwF54JEpa10NnV/4dpF0C2CM=; b=fwyPs8Z0dHX5ee6C4XUc0NF6rSrgmUiKwVgudTsdLkGPHW8hr7tUsl60wz4lmqcUZC BMcQ/uOW0iiePfEJD85//2eAqU95GcmlMU8+5yU6PVPtHM2F8GxmYiEqUPLs2Jls8Q0e NgVnrEL0AZoRkYVZ9ORyHRfNx9GT9KsMDlIqUSO3QPxX//4f8639oofetsMCKZMB5jo6 FI1l04W0Q41BuCce5PxwLlCVJyzBM34C5xuiqWr493wqzq0Y1UJco98XsKFJLBPn7fXB 2bNKMwPtqbD+msCO20x1lS3O0R81d0qSN8zIuu/C2sBz5rBgS+KI3iM/GP9Wb0EQ+BL0 tc3Q== X-Gm-Message-State: AOAM532wM5Dmrt4C9TVNee5uuvN/ry/CAw/4zgD6s7SOPBimaWq6Nli8 fp8GNvhbDc58Bz+tW7k3VG3Y3nkUVgs= X-Google-Smtp-Source: ABdhPJytKAy4CwiwGFNOPdJfATz51VfLUO+HG8zpOY2vfFqL5K9EerIlWpOksvIjOQQiWfT8InzBmQ== X-Received: by 2002:a5d:620d:: with SMTP id y13mr10232246wru.88.1614484710837; Sat, 27 Feb 2021 19:58:30 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 6sm21092582wra.63.2021.02.27.19.58.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:30 -0800 (PST) Message-Id: <8dbf0a4525453078cf87ee9149d89227d18c96f0.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:23 +0000 Subject: [PATCH 5/8] merge-ort: precompute whether directory rename detection is needed Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The point of directory rename detection is that if one side of history renames a directory, and the other side adds new files under the old directory, then the merge can move those new files into the new directory. This leads to the following important observation: * If the other side does not add any new files under the old directory, we do not need to detect any renames for that directory. Similarly, directory rename detection had an important requirement: * If a directory still exists on one side of history, it has not been renamed on that side of history. (See section 4 of t6423 or Documentation/technical/directory-rename-detection.txt for more details). Using these two bits of information, we note that directory rename detection is only needed in cases where (1) directories exist in the merge base and on one side of history (i.e. dirmask == 3 or dirmask == 5), and (2) where there is some new file added to that directory on the side where it still exists (thus where the file has filemask == 2 or filemask == 4, respectively). This has to be done in two steps, because we have the dirmask when we are first considering the directory, and won't get the filemasks for the files within it until we recurse into that directory. So, we save dir_rename_mask = dirmask - 1 when we hit a directory that is missing on one side, and then later look for cases of filemask == dir_rename_mask One final note is that as soon as we hit a directory that needs directory rename detection, we will need to detect renames in all subdirectories of that directory as well due to the "majority rules" decision when files are renamed into different directory hierarchies. We arbitrarily use the special value of 0x07 to record when we've hit such a directory. The combination of all the above mean that we introduce a variable named dir_rename_mask (couldn't think of a better name) which has one of the following values as we traverse into a directory: * 0x00: directory rename detection not needed * 0x02 or 0x04: directory rename detection only needed if files added * 0x07: directory rename detection definitely needed We then pass this value through to add_pairs() so that it can mark location_relevant as true only when dir_rename_mask is 0x07. Signed-off-by: Elijah Newren --- merge-ort.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 61 insertions(+), 6 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index bd2b93a31141..27acaa7380be 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -108,6 +108,14 @@ struct rename_info { */ struct strset relevant_sources[3]; + /* + * dir_rename_mask: + * 0: optimization removing unmodified potential rename source okay + * 2 or 4: optimization okay, but must check for files added to dir + * 7: optimization forbidden; need rename source in case of dir rename + */ + unsigned dir_rename_mask:3; + /* * callback_data_*: supporting data structures for alternate traversal * @@ -419,6 +427,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_clear(&opti->output, 0); } + renames->dir_rename_mask = 0; + /* Clean out callback_data as well. */ FREE_AND_NULL(renames->callback_data); renames->callback_data_nr = renames->callback_data_alloc = 0; @@ -520,12 +530,16 @@ static int traverse_trees_wrapper_callback(int n, { struct merge_options *opt = info->data; struct rename_info *renames = &opt->priv->renames; + unsigned filemask = mask & ~dirmask; assert(n==3); if (!renames->callback_data_traverse_path) renames->callback_data_traverse_path = xstrdup(info->traverse_path); + if (filemask && filemask == renames->dir_rename_mask) + renames->dir_rename_mask = 0x07; + ALLOC_GROW(renames->callback_data, renames->callback_data_nr + 1, renames->callback_data_alloc); renames->callback_data[renames->callback_data_nr].mask = mask; @@ -544,7 +558,6 @@ static int traverse_trees_wrapper_callback(int n, * additional details before the "real" traversal * - loop through the saved entries and call the original callback on them */ -MAYBE_UNUSED static int traverse_trees_wrapper(struct index_state *istate, int n, struct tree_desc *t, @@ -556,6 +569,8 @@ static int traverse_trees_wrapper(struct index_state *istate, struct merge_options *opt = info->data; struct rename_info *renames = &opt->priv->renames; + assert(renames->dir_rename_mask == 2 || renames->dir_rename_mask == 4); + old_callback_data_traverse_path = renames->callback_data_traverse_path; old_fn = info->fn; old_offset = renames->callback_data_nr; @@ -648,7 +663,8 @@ static void add_pair(struct merge_options *opt, const char *pathname, unsigned side, unsigned is_add /* if false, is_delete */, - unsigned match_mask) + unsigned match_mask, + unsigned dir_rename_mask) { struct diff_filespec *one, *two; struct rename_info *renames = &opt->priv->renames; @@ -656,7 +672,7 @@ static void add_pair(struct merge_options *opt, if (!is_add) { unsigned content_relevant = (match_mask == 0); - unsigned location_relevant = 1; /* FIXME: compute this */ + unsigned location_relevant = (dir_rename_mask == 0x07); if (content_relevant || location_relevant) strset_add(&renames->relevant_sources[side], pathname); @@ -680,6 +696,36 @@ static void collect_rename_info(struct merge_options *opt, struct rename_info *renames = &opt->priv->renames; unsigned side; + /* + * Update dir_rename_mask (determines ignore-rename-source validity) + * + * dir_rename_mask helps us keep track of when directory rename + * detection may be relevant. Basically, whenver a directory is + * removed on one side of history, and a file is added to that + * directory on the other side of history, directory rename + * detection is relevant (meaning we have to detect renames for all + * files within that directory to deduce where the directory + * moved). Also, whenever a directory needs directory rename + * detection, due to the "majority rules" choice for where to move + * it (see t6423 testcase 1f), we also need to detect renames for + * all files within subdirectories of that directory as well. + * + * Here we haven't looked at files within the directory yet, we are + * just looking at the directory itself. So, if we aren't yet in + * a case where a parent directory needed directory rename detection + * (i.e. dir_rename_mask != 0x07), and if the directory was removed + * on one side of history, record the mask of the other side of + * history in dir_rename_mask. + */ + if (renames->dir_rename_mask != 0x07 && + (dirmask == 3 || dirmask == 5)) { + /* simple sanity check */ + assert(renames->dir_rename_mask == 0 || + renames->dir_rename_mask == (dirmask & ~1)); + /* update dir_rename_mask; have it record mask of new side */ + renames->dir_rename_mask = (dirmask & ~1); + } + /* Update dirs_removed, as needed */ if (dirmask == 1 || dirmask == 3 || dirmask == 5) { /* absent_mask = 0x07 - dirmask; sides = absent_mask/2 */ @@ -699,12 +745,14 @@ static void collect_rename_info(struct merge_options *opt, /* Check for deletion on side */ if ((filemask & 1) && !(filemask & side_mask)) add_pair(opt, names, fullname, side, 0 /* delete */, - match_mask & filemask); + match_mask & filemask, + renames->dir_rename_mask); /* Check for addition on side */ if (!(filemask & 1) && (filemask & side_mask)) add_pair(opt, names, fullname, side, 1 /* add */, - match_mask & filemask); + match_mask & filemask, + renames->dir_rename_mask); } } @@ -722,12 +770,14 @@ static int collect_merge_info_callback(int n, */ struct merge_options *opt = info->data; struct merge_options_internal *opti = opt->priv; + struct rename_info *renames = &opt->priv->renames; struct string_list_item pi; /* Path Info */ struct conflict_info *ci; /* typed alias to pi.util (which is void*) */ struct name_entry *p; size_t len; char *fullpath; const char *dirname = opti->current_dir_name; + unsigned prev_dir_rename_mask = renames->dir_rename_mask; unsigned filemask = mask & ~dirmask; unsigned match_mask = 0; /* will be updated below */ unsigned mbase_null = !(mask & 1); @@ -868,8 +918,13 @@ static int collect_merge_info_callback(int n, original_dir_name = opti->current_dir_name; opti->current_dir_name = pi.string; - ret = traverse_trees(NULL, 3, t, &newinfo); + if (renames->dir_rename_mask == 0 || + renames->dir_rename_mask == 0x07) + ret = traverse_trees(NULL, 3, t, &newinfo); + else + ret = traverse_trees_wrapper(NULL, 3, t, &newinfo); opti->current_dir_name = original_dir_name; + renames->dir_rename_mask = prev_dir_rename_mask; for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) free(buf[i]); From patchwork Sun Feb 28 03:58:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88FBEC433DB for ; Sun, 28 Feb 2021 04:00:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5B77E64E4A for ; Sun, 28 Feb 2021 04:00:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230267AbhB1D75 (ORCPT ); Sat, 27 Feb 2021 22:59:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230112AbhB1D7v (ORCPT ); Sat, 27 Feb 2021 22:59:51 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78B3FC06178C for ; Sat, 27 Feb 2021 19:58:32 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id f12so8703670wrx.8 for ; Sat, 27 Feb 2021 19:58:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=D0Uir+rhhF+WjYNrc7vEgebea/JdT67xx/aFC/0Z1LU=; b=Lm791Y0qY9PpERotd4A5ZAzYDJsPPqfsVGa9fDjDHWqjQAyX3EZ2QFFGXeAGQPs/E9 Et0WdA6Jb57jmNxeIxPNTPYCVpUUvle8upB64uck9s6tzJ/1Rpt+GMGlPrM7Fc2bT8H6 29eJeNZ6UZeBt5OX1h+/IghYBIdpyS70Rm6Cq3Wqb3gWaDg1iYPFSIUeAZ0SrB6iNMmJ OhriNu6/SUlncs+sdFOoC+WImKeb+di8OerYX8PSLXzRHCoRgigIKkO192O9gDtj6/74 xUQLdoFWcd6IQlSfPtz0G+0o+6spjIoQPUCFbP3X3F2O5MRCwzcHbau9wlq2HVZGqq40 X7gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=D0Uir+rhhF+WjYNrc7vEgebea/JdT67xx/aFC/0Z1LU=; b=PRNSa8LvQZu8+k98RRVFdZ3wA2AvnsxFjxAFDr/Ero9x8TkWGk5UPUO/PyXpdnL8dX iViYXnx26lwHk9Z3fhjzLMyaYaF3QmNj+heiv2ojJpAWfVnnwojqvOsZyEUUwWN2o5oi oY7jch4N0tSfuJWZbNjeQOi2jQ7b99PmMRGSMHvXIdBkWp8EaUGbmV6DQi0TFqCOkm9V lTdysYrlhOHRu26bYZKudNcUW64Atw9FgXU93/2rMGoec0kMkHHA918O/31xnrGKW4qb sAaBmzoQSy/q0ME0zNWi735zizyWUzS4jJob7Czu4oHn6IJncaHKZvtBy1VtJTLFpwgW ofGQ== X-Gm-Message-State: AOAM530WRwQR1FLSdoj+Av4NYlfZa3/OZOHoe/t5gBLBkuOckflXo9kH WX8/z7BFFL2uHuOcJhfzeSD8pggYalY= X-Google-Smtp-Source: ABdhPJxnZlYTJ5ax3pbFXY5mrgbCZWZXSwIviAO8zig8jdraztsVnabBvdDhMbFcJBxg+u6VJVFtHg== X-Received: by 2002:adf:ce0a:: with SMTP id p10mr10319153wrn.255.1614484711340; Sat, 27 Feb 2021 19:58:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f9sm13408147wro.77.2021.02.27.19.58.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:31 -0800 (PST) Message-Id: <6b20977a5a811e3f3038aec58f615b621139f24b.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:24 +0000 Subject: [PATCH 6/8] merge-ort: use relevant_sources to filter possible rename sources MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The past several commits determined conditions when rename sources might be needed, and filled a relevant_sources strset with those paths. Pass these along to diffcore_rename_extended() to use to limit the sources that we need to detect renames for. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 12.596 s ± 0.061 s 6.003 s ± 0.048 s mega-renames: 130.465 s ± 0.259 s 114.009 s ± 0.236 s just-one-mega: 3.958 s ± 0.010 s 3.489 s ± 0.017 s Signed-off-by: Elijah Newren --- merge-ort.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/merge-ort.c b/merge-ort.c index 27acaa7380be..66892c63cee2 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2209,7 +2209,7 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); diffcore_rename_extended(&diff_opts, - NULL, + &renames->relevant_sources[side_index], &renames->dirs_removed[side_index], &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); From patchwork Sun Feb 28 03:58:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C7B4C433E0 for ; Sun, 28 Feb 2021 04:00:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C99ED64E4A for ; Sun, 28 Feb 2021 04:00:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230270AbhB1EAK (ORCPT ); Sat, 27 Feb 2021 23:00:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230125AbhB1D7v (ORCPT ); Sat, 27 Feb 2021 22:59:51 -0500 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06995C061793 for ; Sat, 27 Feb 2021 19:58:33 -0800 (PST) Received: by mail-wm1-x331.google.com with SMTP id u187so8620447wmg.4 for ; Sat, 27 Feb 2021 19:58:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=KWaxSZgzerRR2+AOXRgojJoGF8Bm1+wLklgOoYKNF54=; b=lIcqGOccJygHG5PqZjABJlYlXpSGFZ6JXBYesZ+1bvmesvYT7c2g9Q/23ClAjCi+ts krsO3hNS+kX3myma+C+ddHS6QOFFm4B6LX/hTOHTQggUQwHlOIAnXDcBUX+hn3ObQDhW zjjsCZmn1rJKm3fTRad9v+qYQoe/BsmBsdx2wwqoM5rrpQoc4qa7SALtfRYyUz7cUWYM j5LGS8IHEEf1hk31OJvTAJyXXtdSJzjlEKZC81YSV0UxnRQH0pYNKz62hUBVpmyNz56J 4UGpWs7bLLdOWcte8kWYPXQa7aGaGV3GlDHZfi++Zitm5LF4CT1FclwO0UVRGO8Mey29 Fnfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=KWaxSZgzerRR2+AOXRgojJoGF8Bm1+wLklgOoYKNF54=; b=q0beUYts/YCRkJnafk8Z4Z+h+4lOZ3/QFmfGdF7OYZ6YsIN1CT681qKFZ6kcRE1xPc m3qvh9vl8j/E9UvJ1x7YIffSBZ4G5czkh9JdwIbqKi2zd4AyqKWKR9rMMI+7UM1MWqzh iBLB5qynaSRJEH+MxhDKMxnTo38URpXg2g/eWnitzQOd9jPZ8jZPEURlsSZzvH/uXuer PCd53aT0rWcaN4/f8oJOxOmJtV4BebirNYV64M55o1N41F8E2j0AMk+j+RNgsXUf+AL3 R4UgJE/iBFT/ToY3YF2M+LZ3g3H+WqOA2jUU/j1KTm+o38r7FGkIOkF7y5f8I8k70OgJ nd4Q== X-Gm-Message-State: AOAM530WqW0A6JgFCUWI8SbIN7EFpaH3iFOWn0nT026tMX5EtwmHkq0+ lgECFzJiHH7N9MzWad/AKGrTwQaspSg= X-Google-Smtp-Source: ABdhPJwf9rsbv82OQCIiuhuWaBgAG+GGPphRoe9+YumFGltUQ/ZRVsQi3cTzvrbic1UUeiDL7hf3GQ== X-Received: by 2002:a05:600c:21d8:: with SMTP id x24mr9798065wmj.29.1614484711866; Sat, 27 Feb 2021 19:58:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p6sm5995254wru.2.2021.02.27.19.58.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:31 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:25 +0000 Subject: [PATCH 7/8] merge-ort: skip rename detection entirely if possible MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren diffcore_rename_extended() will do a bunch of setup, then check for exact renames, then abort before inexact rename detection if there are no more sources or destinations that need to be matched. It will sometimes be the case, however, that either * we start with neither any sources or destinations * we start with no *relevant* sources In the first of these two cases, the setup and exact rename detection will be very cheap since there are 0 files to operate on. In the second case, it is quite possible to have thousands of files with none of the source ones being relevant. Avoid calling diffcore_rename_extended() or even some of the setup before diffcore_rename_extended() when we can determine that rename detection is unnecessary. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 6.003 s ± 0.048 s 5.708 s ± 0.111 s mega-renames: 114.009 s ± 0.236 s 102.171 s ± 0.440 s just-one-mega: 3.489 s ± 0.017 s 3.471 s ± 0.015 s Signed-off-by: Elijah Newren --- merge-ort.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 66892c63cee2..8602c7b8936f 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2158,6 +2158,19 @@ static int process_renames(struct merge_options *opt, return clean_merge; } +static inline int possible_side_renames(struct rename_info *renames, + unsigned side_index) +{ + return renames->pairs[side_index].nr > 0 && + !strset_empty(&renames->relevant_sources[side_index]); +} + +static inline int possible_renames(struct rename_info *renames) +{ + return possible_side_renames(renames, 1) || + possible_side_renames(renames, 2); +} + static void resolve_diffpair_statuses(struct diff_queue_struct *q) { /* @@ -2194,6 +2207,16 @@ static void detect_regular_renames(struct merge_options *opt, struct diff_options diff_opts; struct rename_info *renames = &opt->priv->renames; + if (!possible_side_renames(renames, side_index)) { + /* + * No rename detection needed for this side, but we still need + * to make sure 'adds' are marked correctly in case the other + * side had directory renames. + */ + resolve_diffpair_statuses(&renames->pairs[side_index]); + return; + } + repo_diff_setup(opt->repo, &diff_opts); diff_opts.flags.recursive = 1; diff_opts.flags.rename_empty = 0; @@ -2311,6 +2334,8 @@ static int detect_and_process_renames(struct merge_options *opt, int need_dir_renames, s, clean = 1; memset(&combined, 0, sizeof(combined)); + if (!possible_renames(renames)) + goto cleanup; trace2_region_enter("merge", "regular renames", opt->repo); detect_regular_renames(opt, MERGE_SIDE1); @@ -2345,6 +2370,26 @@ static int detect_and_process_renames(struct merge_options *opt, clean &= process_renames(opt, &combined); trace2_region_leave("merge", "process renames", opt->repo); + goto simple_cleanup; /* collect_renames() handles some of cleanup */ + +cleanup: + /* + * Free now unneeded filepairs, which would have been handled + * in collect_renames() normally but we're about to skip that + * code... + */ + for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) { + struct diff_queue_struct *side_pairs; + int i; + + side_pairs = &renames->pairs[s]; + for (i = 0; i < side_pairs->nr; ++i) { + struct diff_filepair *p = side_pairs->queue[i]; + diff_free_filepair(p); + } + } + +simple_cleanup: /* Free memory for renames->pairs[] and combined */ for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) { free(renames->pairs[s].queue); From patchwork Sun Feb 28 03:58:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12108001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8063EC433E0 for ; Sun, 28 Feb 2021 04:00:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 54B6364E4A for ; Sun, 28 Feb 2021 04:00:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230268AbhB1EAL (ORCPT ); Sat, 27 Feb 2021 23:00:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230167AbhB1D7w (ORCPT ); Sat, 27 Feb 2021 22:59:52 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2BAEC061794 for ; Sat, 27 Feb 2021 19:58:34 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id m1so11257840wml.2 for ; Sat, 27 Feb 2021 19:58:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=C9kP6y0QQbh5bjK+FNiTeo+2CujVbad8UAP6L8gcidQ=; b=qlgPWAzLb7wXiDPZfGZRc+We2aRe32VVdTeE1x8m41tqEflLMYY2iX/qVA6o9lt9a4 Wu6+zIZVIYgzsW5gBNcy/XSKs39pmuupzb+lBx04yIH/rocEhFKoXNDU/GLFv2WpeJIF DFqzJM7ds4y586UV7rzK2V8QpS2Uu/ob97+hOgfcFbDoN6qBmwUYb/Q2xe4h+FlDnS0f xCktZSxt+mxGgEJokBcsvy6Bi2zQrOluvG/HVctpxo/i258uwwinn21wl5P1AvzMckUW P9OHIrzYEqPXvOa/UpLESiDTReeluu22qV9AHmoMAopnQ9Vltp5ZfR7YPnOLC70qQSxO UXCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=C9kP6y0QQbh5bjK+FNiTeo+2CujVbad8UAP6L8gcidQ=; b=K2q3M4yt6wDxlgfcxuF/bLpeZISkElQbfwicDBPuC23h9/cQFrmOKss5DsRq3T66uF 5FKVqjyzL+iWFKUAQUyS14k5FeGtaEhaF7ICkVf3D/H4msQbDBbOXCPGZCsoRSN4erHl tj48YIJGVeonqdLRudgnardz/SjMdXvHpzh/gfdyPqvZYWMIqv4txzeWyxBovYQPEU7n GiDEcVpfvlCYzvJTkkzm4KXoktkg+S2to0PWaHqcJ58DELeP6Q/lBumphCAMLW4ZN3ls SrFGU1bVjVCfcRm2UtUmQJoYwbGJ4hNs+RvxA6hoOW6eQGoavdni+aLR3YfVccVNjjME Qe/A== X-Gm-Message-State: AOAM533V4TI3oentWU//BDq2hHhsZRivwcrqa/DtwihLqwwNtVnN+sOS TXj6Hsz5YaDrQ5PRgfG0AoSMgVPy7yA= X-Google-Smtp-Source: ABdhPJyXFGanlMvM3c2YbyckSp52m9sIkTg++RidKTzZvf6kB7ID+89MEHbYABFnQATHcMoqATCq5g== X-Received: by 2002:a05:600c:35c6:: with SMTP id r6mr9523153wmq.83.1614484712374; Sat, 27 Feb 2021 19:58:32 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h2sm22650837wrq.81.2021.02.27.19.58.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 27 Feb 2021 19:58:32 -0800 (PST) Message-Id: <8fed92b62f375f97551a4e07bae5e43f9030d0d2.1614484707.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 28 Feb 2021 03:58:26 +0000 Subject: [PATCH 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The basename comparison optimization implemented in find_basename_matches() is very beneficial since it allows a source to sometimes only be compared with one other file instead of N other files. When a match is found, both a source and destination can be removed from the matrix of inexact rename comparisons. In contrast, the irrelevant source optimization only allows us to remove a source from the matrix of inexact rename comparisons...but it has the advantage of allowing a source file to not even be loaded into memory at all and be compared to 0 other files. Generally, not even comparing is a bigger performance win, so when both optimizations could apply, prefer to use the irrelevant-source optimization. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.708 s ± 0.111 s 5.680 s ± 0.096 s mega-renames: 102.171 s ± 0.440 s 13.812 s ± 0.162 s just-one-mega: 3.471 s ± 0.015 s 506.0 ms ± 3.9 ms Signed-off-by: Elijah Newren --- diffcore-rename.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7f6115fd9018..e8508541be14 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -527,6 +527,7 @@ static void update_dir_rename_counts(struct dir_rename_info *info, } static void initialize_dir_rename_info(struct dir_rename_info *info, + struct strset *relevant_sources, struct strset *dirs_removed, struct strmap *dir_rename_count) { @@ -534,7 +535,7 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strmap_entry *entry; int i; - if (!dirs_removed) { + if (!dirs_removed && !relevant_sources) { info->setup = 0; return; } @@ -549,7 +550,20 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strmap_init_with_options(&info->dir_rename_guess, NULL, 0); /* Setup info->relevant_source_dirs */ - info->relevant_source_dirs = dirs_removed; + info->relevant_source_dirs = NULL; + if (dirs_removed || !relevant_sources) { + info->relevant_source_dirs = dirs_removed; /* might be NULL */ + } else { + info->relevant_source_dirs = xmalloc(sizeof(struct strintmap)); + strset_init(info->relevant_source_dirs); + strset_for_each_entry(relevant_sources, &iter, entry) { + char *dirname = get_dirname(entry->key); + if (!dirs_removed || + strset_contains(dirs_removed, dirname)) + strset_add(info->relevant_source_dirs, dirname); + free(dirname); + } + } /* * Loop setting up both info->idx_map, and doing setup of @@ -627,6 +641,13 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, /* dir_rename_guess */ strmap_clear(&info->dir_rename_guess, 1); + /* relevant_source_dirs */ + if (info->relevant_source_dirs && + info->relevant_source_dirs != dirs_removed) { + strset_clear(info->relevant_source_dirs); + FREE_AND_NULL(info->relevant_source_dirs); + } + /* dir_rename_count */ if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); @@ -749,6 +770,7 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info) static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, + struct strset *relevant_sources, struct strset *dirs_removed) { /* @@ -839,6 +861,11 @@ static int find_basename_matches(struct diff_options *options, intptr_t src_index; intptr_t dst_index; + /* Skip irrelevant sources */ + if (relevant_sources && + !strset_contains(relevant_sources, filename)) + continue; + /* * If the basename is unique among remaining sources, then * src_index will equal 'i' and we can attempt to match it @@ -1164,7 +1191,7 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); - initialize_dir_rename_info(&info, + initialize_dir_rename_info(&info, relevant_sources, dirs_removed, dir_rename_count); trace2_region_leave("diff", "dir rename setup", options->repo); @@ -1172,7 +1199,9 @@ void diffcore_rename_extended(struct diff_options *options, trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, min_basename_score, - &info, dirs_removed); + &info, + relevant_sources, + dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); /*