From patchwork Sun Feb 14 07:58:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34A00C433DB for ; Sun, 14 Feb 2021 08:00:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3A4764E4E for ; Sun, 14 Feb 2021 08:00:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229615AbhBNH7w (ORCPT ); Sun, 14 Feb 2021 02:59:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229694AbhBNH7r (ORCPT ); Sun, 14 Feb 2021 02:59:47 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DC78C061756 for ; Sat, 13 Feb 2021 23:59:07 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id v15so4894534wrx.4 for ; Sat, 13 Feb 2021 23:59:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=DrzK4XLjW6/g9k0d7UOZEDc9hB0xPnmg5wOxP/Cp3Wg=; b=cyfWlQRSKNzagQcx4ESt7tNGKkwYXBodkT2Z3mMVJAf90uC/hTYNC2HcNBXiSVdVJC IlqB0sXGDl9MATRATMXN9gtQHS7kXro73nLYAIwIZD4CMG+XOmjIvAysbH6EcxVFpIbv QJVejWqIZYutm9GfAJuBNIQLn5H7csoYHRKkHWJ6AH9yAOTawThMJ3VgaAmEGLXmQ+Wy hs+wNp83HHMjJuh0Fx4skb80cH74rtnQFC8Rs3Ci4BZdFk3zGcw9Oh4KwMMCktEfkfm4 7irGhrxapDqFOqcdNqcTXCmfmXYqlP9vT2ZSal2w5tJ24pWa8kHV97LLzosyNdpys6ON VpAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=DrzK4XLjW6/g9k0d7UOZEDc9hB0xPnmg5wOxP/Cp3Wg=; b=theWmFA0mx+PXYjd8u+H9kygOdlqFzQ/EiytKySQgRjXHFXZCrpSR1/zOjwnsDHYAo YhQas8fyUA0vh7KOnJ0sUyH5wn6j66RYOZjR7Yvy3nD5INWLs5d9pT606AVjB7v4vqow Tw8M1ZkrNLHqoAv55y0wmg5+lFhBdbTEQaHDFd6mYqoeQH2Ueqn0K/3Jcbs2AqpDDcMn mCq9sJumJznSLALWP6QTAIPVCjXy3eIV/gE0bG+yLfLfHQwtYBNlGY6rxch6ZL/EPakl Kz188gnwKEnIQrgdh6P9dnEVLA8w/FoBs9+J/K6qNip26YPfw3VONyUKDeE6ccGA4T8u EXig== X-Gm-Message-State: AOAM531g+xAjtrtdmBB6aHicFwPSTpWfo57BPxfrL9hA0FZ0jmsKEI2Y SQByChYmTji9sPCYRTf7aCrnKIuM/os= X-Google-Smtp-Source: ABdhPJzJcFCprqjaH6Do1hGUT7AzzxEBAPZh5C1rDpkwaQHVn6V/ZGi7Ve8PrFh5GawrCD0d/36wPw== X-Received: by 2002:adf:f206:: with SMTP id p6mr11287610wro.337.1613289545918; Sat, 13 Feb 2021 23:59:05 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y16sm14342597wrw.46.2021.02.13.23.59.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:05 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sun, 14 Feb 2021 07:58:54 +0000 Subject: [PATCH 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren A previous commit noted that it is very common for people to move files across directories while keeping their filename the same. The last few commits took advantage of this and showed that we can accelerate rename detection significantly using basenames; since files with the same basename serve as likely rename candidates, we can check those first and remove them from the rename candidate pool if they are sufficiently similar. Unfortunately, the previous optimization was limited by the fact that the remaining basenames after exact rename detection are not always unique. Many repositories have hundreds of build files with the same name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have hundreds of source files with the same name. (For example, the linux kernel has 100 setup.c, 87 irq.c, and 112 core.c files. A repository at $DAYJOB has a lot of ObjectFactory.java and Plugin.java files). For these files with non-unique basenames, we are faced with the task of attempting to determine or guess which directory they may have been relocated to. Such a task is precisely the job of directory rename detection. However, there are two catches: (1) the directory rename detection code has traditionally been part of the merge machinery rather than diffcore-rename.c, and (2) directory rename detection currently runs after regular rename detection is complete. The 1st catch is just an implementation issue that can be overcome by some code shuffling. The 2nd requires us to add a further approximation: we only have access to exact renames at this point, so we need to do directory rename detection based on just exact renames. In some cases we won't have exact renames, in which case this extra optimization won't apply. We also choose to not apply the optimization unless we know that the underlying directory was removed, which will require extra data to be passed in to diffcore_rename_extended(). Also, even if we get a prediction about which directory a file may have relocated to, we will still need to check to see if there is a file in the predicted directory, and then compare the two files to see if they meet the higher min_basename_score threshold required for marking the two files as renames. This commit and the next few will set up the necessary infrastructure to do such computations. This commit merely moves the computation of dir_rename_count from merge-ort.c to diffcore-rename.c, making slight adjustments to the data structures based on the move. While the diffstat looks large, viewing this commit with --color-moved makes it clear that only about 20 lines changed. With this patch, the computation of dir_rename_count is still only done after inexact rename detection, but subsequent commits will add a preliminary computation of dir_rename_count after exact rename detection, followed by some updates after inexact rename detection. Signed-off-by: Elijah Newren --- diffcore-rename.c | 134 +++++++++++++++++++++++++++++++++++++++++++++- diffcore.h | 5 ++ merge-ort.c | 132 ++------------------------------------------- 3 files changed, 141 insertions(+), 130 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 41558185ae1d..33cfc5848611 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -367,6 +367,125 @@ static int find_exact_renames(struct diff_options *options) return renames; } +static void dirname_munge(char *filename) +{ + char *slash = strrchr(filename, '/'); + if (!slash) + slash = filename; + *slash = '\0'; +} + +static void increment_count(struct strmap *dir_rename_count, + char *old_dir, + char *new_dir) +{ + struct strintmap *counts; + struct strmap_entry *e; + + /* Get the {new_dirs -> counts} mapping using old_dir */ + e = strmap_get_entry(dir_rename_count, old_dir); + if (e) { + counts = e->value; + } else { + counts = xmalloc(sizeof(*counts)); + strintmap_init_with_options(counts, 0, NULL, 1); + strmap_put(dir_rename_count, old_dir, counts); + } + + /* Increment the count for new_dir */ + strintmap_incr(counts, new_dir, 1); +} + +static void update_dir_rename_counts(struct strmap *dir_rename_count, + struct strset *dirs_removed, + const char *oldname, + const char *newname) +{ + char *old_dir = xstrdup(oldname); + char *new_dir = xstrdup(newname); + char new_dir_first_char = new_dir[0]; + int first_time_in_loop = 1; + + while (1) { + dirname_munge(old_dir); + dirname_munge(new_dir); + + /* + * When renaming + * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c" + * then this suggests that both + * a/b/c/d/e/ => a/b/some/thing/else/e/ + * a/b/c/d/ => a/b/some/thing/else/ + * so we want to increment counters for both. We do NOT, + * however, also want to suggest that there was the following + * rename: + * a/b/c/ => a/b/some/thing/ + * so we need to quit at that point. + * + * Note the when first_time_in_loop, we only strip off the + * basename, and we don't care if that's different. + */ + if (!first_time_in_loop) { + char *old_sub_dir = strchr(old_dir, '\0')+1; + char *new_sub_dir = strchr(new_dir, '\0')+1; + if (!*new_dir) { + /* + * Special case when renaming to root directory, + * i.e. when new_dir == "". In this case, we had + * something like + * a/b/subdir => subdir + * and so dirname_munge() sets things up so that + * old_dir = "a/b\0subdir\0" + * new_dir = "\0ubdir\0" + * We didn't have a '/' to overwrite a '\0' onto + * in new_dir, so we have to compare differently. + */ + if (new_dir_first_char != old_sub_dir[0] || + strcmp(old_sub_dir+1, new_sub_dir)) + break; + } else { + if (strcmp(old_sub_dir, new_sub_dir)) + break; + } + } + + if (strset_contains(dirs_removed, old_dir)) + increment_count(dir_rename_count, old_dir, new_dir); + else + break; + + /* If we hit toplevel directory ("") for old or new dir, quit */ + if (!*old_dir || !*new_dir) + break; + + first_time_in_loop = 0; + } + + /* Free resources we don't need anymore */ + free(old_dir); + free(new_dir); +} + +static void compute_dir_rename_counts(struct strmap *dir_rename_count, + struct strset *dirs_removed) +{ + int i; + + /* Set up dir_rename_count */ + for (i = 0; i < rename_dst_nr; ++i) { + /* + * Make dir_rename_count contain a map of a map: + * old_directory -> {new_directory -> count} + * In other words, for every pair look at the directories for + * the old filename and the new filename and count how many + * times that pairing occurs. + */ + update_dir_rename_counts(dir_rename_count, dirs_removed, + rename_dst[i].p->one->path, + rename_dst[i].p->two->path); + } +} + static const char *get_basename(const char *filename) { /* @@ -640,7 +759,9 @@ static void remove_unneeded_paths_from_src(int detecting_copies) rename_src_nr = new_num_src; } -void diffcore_rename(struct diff_options *options) +void diffcore_rename_extended(struct diff_options *options, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int detect_rename = options->detect_rename; int minimum_score = options->rename_score; @@ -653,6 +774,7 @@ void diffcore_rename(struct diff_options *options) struct progress *progress = NULL; trace2_region_enter("diff", "setup", options->repo); + assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -841,6 +963,11 @@ void diffcore_rename(struct diff_options *options) trace2_region_leave("diff", "inexact renames", options->repo); cleanup: + /* + * Now that renames have been computed, compute dir_rename_count */ + if (dirs_removed && dir_rename_count) + compute_dir_rename_counts(dir_rename_count, dirs_removed); + /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ @@ -923,3 +1050,8 @@ void diffcore_rename(struct diff_options *options) trace2_region_leave("diff", "write back to queue", options->repo); return; } + +void diffcore_rename(struct diff_options *options) +{ + diffcore_rename_extended(options, NULL, NULL); +} diff --git a/diffcore.h b/diffcore.h index d2a63c5c71f4..db55d3853071 100644 --- a/diffcore.h +++ b/diffcore.h @@ -8,6 +8,8 @@ struct diff_options; struct repository; +struct strmap; +struct strset; struct userdiff_driver; /* This header file is internal between diff.c and its diff transformers @@ -161,6 +163,9 @@ void diff_q(struct diff_queue_struct *, struct diff_filepair *); void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); +void diffcore_rename_extended(struct diff_options *options, + struct strset *dirs_removed, + struct strmap *dir_rename_count); void diffcore_merge_broken(void); void diffcore_pickaxe(struct diff_options *); void diffcore_order(const char *orderfile); diff --git a/merge-ort.c b/merge-ort.c index 603d30c52170..c4467e073b45 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1302,131 +1302,6 @@ static char *handle_path_level_conflicts(struct merge_options *opt, return new_path; } -static void dirname_munge(char *filename) -{ - char *slash = strrchr(filename, '/'); - if (!slash) - slash = filename; - *slash = '\0'; -} - -static void increment_count(struct strmap *dir_rename_count, - char *old_dir, - char *new_dir) -{ - struct strintmap *counts; - struct strmap_entry *e; - - /* Get the {new_dirs -> counts} mapping using old_dir */ - e = strmap_get_entry(dir_rename_count, old_dir); - if (e) { - counts = e->value; - } else { - counts = xmalloc(sizeof(*counts)); - strintmap_init_with_options(counts, 0, NULL, 1); - strmap_put(dir_rename_count, old_dir, counts); - } - - /* Increment the count for new_dir */ - strintmap_incr(counts, new_dir, 1); -} - -static void update_dir_rename_counts(struct strmap *dir_rename_count, - struct strset *dirs_removed, - const char *oldname, - const char *newname) -{ - char *old_dir = xstrdup(oldname); - char *new_dir = xstrdup(newname); - char new_dir_first_char = new_dir[0]; - int first_time_in_loop = 1; - - while (1) { - dirname_munge(old_dir); - dirname_munge(new_dir); - - /* - * When renaming - * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c" - * then this suggests that both - * a/b/c/d/e/ => a/b/some/thing/else/e/ - * a/b/c/d/ => a/b/some/thing/else/ - * so we want to increment counters for both. We do NOT, - * however, also want to suggest that there was the following - * rename: - * a/b/c/ => a/b/some/thing/ - * so we need to quit at that point. - * - * Note the when first_time_in_loop, we only strip off the - * basename, and we don't care if that's different. - */ - if (!first_time_in_loop) { - char *old_sub_dir = strchr(old_dir, '\0')+1; - char *new_sub_dir = strchr(new_dir, '\0')+1; - if (!*new_dir) { - /* - * Special case when renaming to root directory, - * i.e. when new_dir == "". In this case, we had - * something like - * a/b/subdir => subdir - * and so dirname_munge() sets things up so that - * old_dir = "a/b\0subdir\0" - * new_dir = "\0ubdir\0" - * We didn't have a '/' to overwrite a '\0' onto - * in new_dir, so we have to compare differently. - */ - if (new_dir_first_char != old_sub_dir[0] || - strcmp(old_sub_dir+1, new_sub_dir)) - break; - } else { - if (strcmp(old_sub_dir, new_sub_dir)) - break; - } - } - - if (strset_contains(dirs_removed, old_dir)) - increment_count(dir_rename_count, old_dir, new_dir); - else - break; - - /* If we hit toplevel directory ("") for old or new dir, quit */ - if (!*old_dir || !*new_dir) - break; - - first_time_in_loop = 0; - } - - /* Free resources we don't need anymore */ - free(old_dir); - free(new_dir); -} - -static void compute_rename_counts(struct diff_queue_struct *pairs, - struct strmap *dir_rename_count, - struct strset *dirs_removed) -{ - int i; - - for (i = 0; i < pairs->nr; ++i) { - struct diff_filepair *pair = pairs->queue[i]; - - /* File not part of directory rename if it wasn't renamed */ - if (pair->status != 'R') - continue; - - /* - * Make dir_rename_count contain a map of a map: - * old_directory -> {new_directory -> count} - * In other words, for every pair look at the directories for - * the old filename and the new filename and count how many - * times that pairing occurs. - */ - update_dir_rename_counts(dir_rename_count, dirs_removed, - pair->one->path, - pair->two->path); - } -} - static void get_provisional_directory_renames(struct merge_options *opt, unsigned side, int *clean) @@ -1435,9 +1310,6 @@ static void get_provisional_directory_renames(struct merge_options *opt, struct strmap_entry *entry; struct rename_info *renames = &opt->priv->renames; - compute_rename_counts(&renames->pairs[side], - &renames->dir_rename_count[side], - &renames->dirs_removed[side]); /* * Collapse * dir_rename_count: old_directory -> {new_directory -> count} @@ -2162,7 +2034,9 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); - diffcore_rename(&diff_opts); + diffcore_rename_extended(&diff_opts, + &renames->dirs_removed[side_index], + &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); From patchwork Sun Feb 14 07:58:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F1E9C433E6 for ; Sun, 14 Feb 2021 08:00:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0833E64E43 for ; Sun, 14 Feb 2021 08:00:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229807AbhBNH75 (ORCPT ); Sun, 14 Feb 2021 02:59:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229718AbhBNH7s (ORCPT ); Sun, 14 Feb 2021 02:59:48 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A09DFC0613D6 for ; Sat, 13 Feb 2021 23:59:07 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id l17so3515881wmq.2 for ; Sat, 13 Feb 2021 23:59:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Dtn2wkmX7msgWH+66SXQpt+Ko4iDx/XxyOyo9sZP6QE=; b=fFZPtdD0ATBR5bOs7KRGX1nve8DPRv7AQERZ0SRrDH9gTZ8i5tnRCeWKV3NBG172JT lLThCWWxaBm9gLu7jwbstWu4MJjUdgoMNEok9MB4SxLgPIsAmXrWRxepGL01XnFiUBpi qLvqCl/VtCjfQW7dBa/M8sAqZ7ENItvDD1H5ljqjEHD7TU2KW1t3Y6YxvZd7gh0xflhM SvaQy0tXuYhyTq/rN1v5WJzRq9RcnLHG5k7djLsTlaaKA39UaLn8Xi1rdQY5MrMRwn4O KXPwXSEvlZAOWumI5+TU54RN8CDicGkQzttYycBS7Q/Yw76f36LpjT2mWv7/kUr+kfsD gRBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Dtn2wkmX7msgWH+66SXQpt+Ko4iDx/XxyOyo9sZP6QE=; b=lfiH6nlrrlu0LclgQZMvQvzef77osmOcblTBRsNFTOwBguR+OOq0XgxIW7fJf3QG34 +o7/q6Kxu7Oj+PjugpWbDtiHvi6kUa5HQcl9tIGOtNhDM7IO08gT46umSZcWfZs60t/u 43LfflHGRNxBYRlMVUWruhB031EuGh5kL/2biecYg8Jb4vSQ+4OH9pM1H2sJGU73XbwX 82fJSmkrFmlKWcbN/d9rYmLKJKFv+ZsSfAXkTGwrA+SKGJKMJkjiE8BS05IH7kZfRUG6 qWiSszJN7ywN7A19KI/vXH+7dLUa1vZ0SXrXitq8RY/Po30heNFNASqa2QLtsQ3ylDDc Ls6g== X-Gm-Message-State: AOAM5320hIgxVU5jxVuJCkeeyftqREvnXpJR7etBH/QbHiue/P0SHliA qqcBQKiWgauH6RsF8XPlGocheGYyUj4= X-Google-Smtp-Source: ABdhPJxNBWmIfVQiTIgOKEtxl1W0jgVszF4XfzAku4wIVYR2a9nXHt66mVu/RMSY5VgbTvho4Tp8GA== X-Received: by 2002:a7b:c0ca:: with SMTP id s10mr2162065wmh.1.1613289546428; Sat, 13 Feb 2021 23:59:06 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c3sm5209191wrr.29.2021.02.13.23.59.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:06 -0800 (PST) Message-Id: <612da82f049cbe877cf924f743a3e4059483ce51.1613289544.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Feb 2021 07:58:55 +0000 Subject: [PATCH 02/10] diffcore-rename: add functions for clearing dir_rename_count Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren As we adjust the usage of dir_rename_count we want to have functions for clearing, or partially clearing it out. Add such functions. Signed-off-by: Elijah Newren --- diffcore-rename.c | 19 +++++++++++++++++++ diffcore.h | 2 ++ merge-ort.c | 12 +++--------- 3 files changed, 24 insertions(+), 9 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 33cfc5848611..614a8d63012d 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -486,6 +486,25 @@ static void compute_dir_rename_counts(struct strmap *dir_rename_count, } } +void partial_clear_dir_rename_count(struct strmap *dir_rename_count) +{ + struct hashmap_iter iter; + struct strmap_entry *entry; + + strmap_for_each_entry(dir_rename_count, &iter, entry) { + struct strintmap *counts = entry->value; + strintmap_clear(counts); + } + strmap_partial_clear(dir_rename_count, 1); +} + +MAYBE_UNUSED +static void clear_dir_rename_count(struct strmap *dir_rename_count) +{ + partial_clear_dir_rename_count(dir_rename_count); + strmap_clear(dir_rename_count, 1); +} + static const char *get_basename(const char *filename) { /* diff --git a/diffcore.h b/diffcore.h index db55d3853071..c6ba64abd198 100644 --- a/diffcore.h +++ b/diffcore.h @@ -161,6 +161,8 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *, struct diff_filespec *); void diff_q(struct diff_queue_struct *, struct diff_filepair *); +void partial_clear_dir_rename_count(struct strmap *dir_rename_count); + void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, diff --git a/merge-ort.c b/merge-ort.c index c4467e073b45..467404cc0a35 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -351,17 +351,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, /* Free memory used by various renames maps */ for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) { - struct hashmap_iter iter; - struct strmap_entry *entry; - strset_func(&renames->dirs_removed[i]); - strmap_for_each_entry(&renames->dir_rename_count[i], - &iter, entry) { - struct strintmap *counts = entry->value; - strintmap_clear(counts); - } - strmap_func(&renames->dir_rename_count[i], 1); + partial_clear_dir_rename_count(&renames->dir_rename_count[i]); + if (!reinitialize) + strmap_clear(&renames->dir_rename_count[i], 1); strmap_func(&renames->dir_renames[i], 0); } From patchwork Sun Feb 14 07:58:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C71BC433E9 for ; Sun, 14 Feb 2021 08:00:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 273CF64E68 for ; Sun, 14 Feb 2021 08:00:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229812AbhBNIAC (ORCPT ); Sun, 14 Feb 2021 03:00:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229741AbhBNH7s (ORCPT ); Sun, 14 Feb 2021 02:59:48 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 295B5C061786 for ; Sat, 13 Feb 2021 23:59:08 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id a16so2822559wmm.0 for ; Sat, 13 Feb 2021 23:59:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XuiYx4EO4YirUq4iYtjYiQAhDHbYcuAz1/sCPONw96E=; b=gTpxQUaxFt+uX88J3qD68M6ezDDzzLDqF+t5WclmJBw15v7Fei/kgjGYGjDrGKvESl OlKjbYqeG/TUQiAAAHfyhUpHMmPt8oCR+T+hhzKe3I5Zf7vSJ7D0bcXzwU+lcNwbAtLD sLLXIrahXMTB1qmCtFL1AyeLnP4KNF3zEr73MTqmatZcvrL8Ea2WTJ49lmGcphE2GD3g VS/2XiqCjffqCqh8jmOqZRduKsETwzsPOj4dfRGY7CTlhdd2O8RHaYfBZa2gRTifOIia pRCiwN6dsBYGDT/cjV5GjIPINdF28O3gRjQ6/rxvn1i4rXO31swQZY1NjUw957wAJjmL 6/tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XuiYx4EO4YirUq4iYtjYiQAhDHbYcuAz1/sCPONw96E=; b=PUJpGcUnKAphZfocPTfCWGuIAL0YnFyiVRxnQIdhQUoMYAXZgbb4sKteMWtNo2gmOQ OPMn1OnT9d210rQ6OF7pY+/SFHwNP2zu/lksqlda1vIE4O9v5BVqjoJZ/TSdvIFQtVWh AxkED5QgYw7nsAMhY6G3BczeyrJTZ+muYiHgux08ZVFfAQ3cvBnO0miHlk4pk62uC3VB 6YNNy6vLVKkgEBLjisE8gVd/f9UsWNxNK/E2eTN1CGY8pFRQDAQlWq7n8ky6RfjQZEar ruSPB5/UwNne2S6/4QRXF9PzOCw5PsVXwqzQqE0BK0Fmd+iumBXuH0cfczITRh65XyIh 9ZRg== X-Gm-Message-State: AOAM530sC6eLP8as/LmjJPmzOj4gYaCridMc9LRXsDRY9f4DILbYRWIy 6QH//Y7JLtQkTTZxPhHzGUVR4fgyUe0= X-Google-Smtp-Source: ABdhPJxV7iub8lUnE22FbYomZ06kHkK6QjS9UbOrLqhbyX60rfrPL7o3l3jReYMgW94JhswEktK2UA== X-Received: by 2002:a05:600c:4242:: with SMTP id r2mr9144918wmm.109.1613289546943; Sat, 13 Feb 2021 23:59:06 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v17sm13479276wru.85.2021.02.13.23.59.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:06 -0800 (PST) Message-Id: <93f98fc0b2644aab9e98b7a32e88561618c7d4c0.1613289544.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Feb 2021 07:58:56 +0000 Subject: [PATCH 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren This is a purely cosmetic change for now, but we will be adding additional information to the struct and changing where and how it is setup and used in subsequent patches. Signed-off-by: Elijah Newren --- diffcore-rename.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 614a8d63012d..7759c9a3a2ed 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -367,6 +367,11 @@ static int find_exact_renames(struct diff_options *options) return renames; } +struct dir_rename_info { + struct strmap *dir_rename_count; + unsigned setup; +}; + static void dirname_munge(char *filename) { char *slash = strrchr(filename, '/'); @@ -375,7 +380,7 @@ static void dirname_munge(char *filename) *slash = '\0'; } -static void increment_count(struct strmap *dir_rename_count, +static void increment_count(struct dir_rename_info *info, char *old_dir, char *new_dir) { @@ -383,20 +388,20 @@ static void increment_count(struct strmap *dir_rename_count, struct strmap_entry *e; /* Get the {new_dirs -> counts} mapping using old_dir */ - e = strmap_get_entry(dir_rename_count, old_dir); + e = strmap_get_entry(info->dir_rename_count, old_dir); if (e) { counts = e->value; } else { counts = xmalloc(sizeof(*counts)); strintmap_init_with_options(counts, 0, NULL, 1); - strmap_put(dir_rename_count, old_dir, counts); + strmap_put(info->dir_rename_count, old_dir, counts); } /* Increment the count for new_dir */ strintmap_incr(counts, new_dir, 1); } -static void update_dir_rename_counts(struct strmap *dir_rename_count, +static void update_dir_rename_counts(struct dir_rename_info *info, struct strset *dirs_removed, const char *oldname, const char *newname) @@ -450,7 +455,7 @@ static void update_dir_rename_counts(struct strmap *dir_rename_count, } if (strset_contains(dirs_removed, old_dir)) - increment_count(dir_rename_count, old_dir, new_dir); + increment_count(info, old_dir, new_dir); else break; @@ -466,12 +471,15 @@ static void update_dir_rename_counts(struct strmap *dir_rename_count, free(new_dir); } -static void compute_dir_rename_counts(struct strmap *dir_rename_count, - struct strset *dirs_removed) +static void compute_dir_rename_counts(struct dir_rename_info *info, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int i; - /* Set up dir_rename_count */ + info->setup = 1; + info->dir_rename_count = dir_rename_count; + for (i = 0; i < rename_dst_nr; ++i) { /* * Make dir_rename_count contain a map of a map: @@ -480,7 +488,7 @@ static void compute_dir_rename_counts(struct strmap *dir_rename_count, * the old filename and the new filename and count how many * times that pairing occurs. */ - update_dir_rename_counts(dir_rename_count, dirs_removed, + update_dir_rename_counts(info, dirs_removed, rename_dst[i].p->one->path, rename_dst[i].p->two->path); } @@ -499,10 +507,13 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) } MAYBE_UNUSED -static void clear_dir_rename_count(struct strmap *dir_rename_count) +static void cleanup_dir_rename_info(struct dir_rename_info *info) { - partial_clear_dir_rename_count(dir_rename_count); - strmap_clear(dir_rename_count, 1); + if (!info->setup) + return; + + partial_clear_dir_rename_count(info->dir_rename_count); + strmap_clear(info->dir_rename_count, 1); } static const char *get_basename(const char *filename) @@ -791,8 +802,10 @@ void diffcore_rename_extended(struct diff_options *options, int num_destinations, dst_cnt; int num_sources, want_copies; struct progress *progress = NULL; + struct dir_rename_info info; trace2_region_enter("diff", "setup", options->repo); + info.setup = 0; assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); if (!minimum_score) @@ -985,7 +998,7 @@ void diffcore_rename_extended(struct diff_options *options, /* * Now that renames have been computed, compute dir_rename_count */ if (dirs_removed && dir_rename_count) - compute_dir_rename_counts(dir_rename_count, dirs_removed); + compute_dir_rename_counts(&info, dirs_removed, dir_rename_count); /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. From patchwork Sun Feb 14 07:58:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D2DCC433DB for ; Sun, 14 Feb 2021 08:00:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C54564E23 for ; Sun, 14 Feb 2021 08:00:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229832AbhBNIAa (ORCPT ); Sun, 14 Feb 2021 03:00:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229792AbhBNH7t (ORCPT ); Sun, 14 Feb 2021 02:59:49 -0500 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15CECC061788 for ; Sat, 13 Feb 2021 23:59:09 -0800 (PST) Received: by mail-wm1-x331.google.com with SMTP id x4so4933045wmi.3 for ; Sat, 13 Feb 2021 23:59:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=JN0wvt58itYZYUkNRwdYTTFw6FJIsIxDn0Jcd0Za3ok=; b=FYKmwOP49GzR75K4RaLIAzd5UDhBHxkstlRXw901PVYRCkNnYLghhFB5jH8aPd1zFp LOyZNZknAxNo33Z3SeObb7jzA9oUd3jxyMm2Ecz+ECFCw4CoTANfwMOctVc+DeG5a3B7 to3fmyjCFleUpbrkYFsOWEKaEwSeFVT169Wxq1jUl6xhtwz5Pm4ohMTi1a4juXzyFsZE PnwxLFYosN/uSAPc6MAtTBvAE8KWt6ip+TxzJDE2aKN5Fibf/OhVLFjcuZeMZfg+R4JS bPkacm0O+xkpgT/R7DV9MIul0Ncp2xWW+vPW9rdwDv3E9BCiMQzYPbpkykyIV7yAGMXu BUxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=JN0wvt58itYZYUkNRwdYTTFw6FJIsIxDn0Jcd0Za3ok=; b=dp0o4fXDKgMKbm9bV8rI/9WhxDtSxmEvCvftd9OCVV8qYkzohn8qbbTew2YJP+3hsO FfU0oDzrC2yYodBHUHKUr8rG2bHWf80FlWdXtaQVlE551Lwy9gFeoanbvp3bmbaesEu+ JXHRKLkmTLNAEVHgfKkR2ygflhLcGiLI+9iLPdzNFqzzz0AvWQy2CaCsnorgHNLPtXaD xAPh4UgVuSLx6lZTfNs7Ecr7MZ625A7tuli8EB3tedB1DFj62Np3r0AAItEeGBbcYVTT QhhERrKvjFnP4HoIWhlesbA7+IAZsSIb+gQES9xH557w6JiSBsic4d2NWOSsbcCeLTj3 U4ew== X-Gm-Message-State: AOAM530/L+wIOLqg0o1xCe2FuVY0utyrXDCUr/X3nGDts1XiyKWBNXv+ aGK/E5aEO8A5300b4k+8c3SvlEMrjzw= X-Google-Smtp-Source: ABdhPJwnIkjD6o6bnjRNKHfbCqUQn4bPvb1JLasLxdvsqq7WmGddulcip9P3uP4S8XMmJvBFHwMWyQ== X-Received: by 2002:a7b:c304:: with SMTP id k4mr9204471wmj.11.1613289547515; Sat, 13 Feb 2021 23:59:07 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z17sm3659414wrv.9.2021.02.13.23.59.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:07 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sun, 14 Feb 2021 07:58:57 +0000 Subject: [PATCH 04/10] diffcore-rename: extend cleanup_dir_rename_info() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When diffcore_rename_extended() is passed a NULL dir_rename_count, we will still want to create a temporary one for use by find_basename_matches(), but have it fully deallocated before diffcore_rename_extended() returns. However, when diffcore_rename_extended() is passed a dir_rename_count, we want to fill that strmap with appropriate values and return it. However, for our interim purposes we may also add entries corresponding to directories that cannot have been renamed due to still existing on both sides. Extend cleanup_dir_rename_info() to handle these two different cases, cleaning up the relevant bits of information for each case. Signed-off-by: Elijah Newren --- diffcore-rename.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7759c9a3a2ed..aa21d4e7175c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -507,13 +507,45 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) } MAYBE_UNUSED -static void cleanup_dir_rename_info(struct dir_rename_info *info) +static void cleanup_dir_rename_info(struct dir_rename_info *info, + struct strset *dirs_removed, + int keep_dir_rename_count) { + struct hashmap_iter iter; + struct strmap_entry *entry; + if (!info->setup) return; - partial_clear_dir_rename_count(info->dir_rename_count); - strmap_clear(info->dir_rename_count, 1); + if (!keep_dir_rename_count) { + partial_clear_dir_rename_count(info->dir_rename_count); + strmap_clear(info->dir_rename_count, 1); + FREE_AND_NULL(info->dir_rename_count); + } else { + /* + * Although dir_rename_count was passed in + * diffcore_rename_extended() and we want to keep it around and + * return it to that caller, we first want to remove any data + * associated with directories that weren't renamed. + */ + struct string_list to_remove = STRING_LIST_INIT_NODUP; + int i; + + strmap_for_each_entry(info->dir_rename_count, &iter, entry) { + const char *source_dir = entry->key; + struct strintmap *counts = entry->value; + + if (!strset_contains(dirs_removed, source_dir)) { + string_list_append(&to_remove, source_dir); + strintmap_clear(counts); + continue; + } + } + for (i=0; idir_rename_count, + to_remove.items[i].string, 1); + string_list_clear(&to_remove, 0); + } } static const char *get_basename(const char *filename) From patchwork Sun Feb 14 07:58:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9579FC433DB for ; Sun, 14 Feb 2021 08:01:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6CC7964E26 for ; Sun, 14 Feb 2021 08:01:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229864AbhBNIBD (ORCPT ); Sun, 14 Feb 2021 03:01:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbhBNIA2 (ORCPT ); Sun, 14 Feb 2021 03:00:28 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 488BBC06178A for ; Sat, 13 Feb 2021 23:59:09 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id n6so4871834wrv.8 for ; Sat, 13 Feb 2021 23:59:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=QWJ9yu7X8wmM7ii3WiU9OKa150wYmtvxUxj7efV4mx4=; b=MzjSu9UgG+D/m4HldJNhwuDOtUwrWKA7UuDy9KksaYQtKapK6b2Q1tOPI/YxyzLdCZ inoIBmFLaHAd2CXGgV38QpQecfr36o6woQC242PqMeD9R/QJj/WFjt5GHoi+CxhVu259 c1/2z3uZ4lYK1PdQJO4gxwCjG1F5XqrdRkr7lLbNcr4Wc9egJxDe++t/U0sRif/ngjSJ 0dCydg8Z8tb/9711BnS3opnBeDguXkPSJkvDqYO8MrEc/DC7TsOtRJWwsGTLD/aYpCzp oTFH8yyP6Ar+2mJCAC3zJhpJvxtwACBksr+oVRG8zG73iYGGhS9y8fpJTDCWKKZXK3NS /ZUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=QWJ9yu7X8wmM7ii3WiU9OKa150wYmtvxUxj7efV4mx4=; b=GU8EAQ8Qjy0Tdww4coTxift4SpeosxUydbVNkoM1afzDJNPni+wsc19IlXlw+IJfGP d+R89HSq5fxg86O1BqKOHDacDwicuNYDf9geX7vrVu+aiI3MBSm0EHMpOP3EW8MN9qwE usA/rqPyZP6PhQj0cnZNQCYTh8X3A+AbtjLqGBjw3alWTnjFdiHQFC8OQ3dtk04uSvm2 GlvZXQxebhAqSXuQKoCxg9aSapdH19iqygZSbdNPXcYUJmesuFcVf4VQAeLCzidDnksC Ga4H/bImJy9ciTP4zaVss4bvO8pg+xIGlaRQRxapGy4TmdVMw8ppsWyJUvcqzCymV8W+ v5Hw== X-Gm-Message-State: AOAM533etgAx/Hs7XrakbE6Oy6qPNJ5qpZf2lvlSx0tUH2fbkEWgSLJM rC9YehP88pDYFXIz0LFbQnR3KdBtTR8= X-Google-Smtp-Source: ABdhPJzB2QOrtmeLzu/+jRZ5aI1yERMzW9iWarTW2ZbiUx4PDep0yc2nvucNeOUsmR7CExrFib2qBQ== X-Received: by 2002:adf:ec89:: with SMTP id z9mr12504933wrn.410.1613289548042; Sat, 13 Feb 2021 23:59:08 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y15sm18374921wrm.93.2021.02.13.23.59.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:07 -0800 (PST) Message-Id: <3a29cf9e526fba0227a7eec92c0c6bd58a7850f0.1613289544.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Feb 2021 07:58:58 +0000 Subject: [PATCH 05/10] diffcore-rename: compute dir_rename_counts in stages Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren We want to first compute dir_rename_counts based just on exact renames to start, as that can provide us useful information in find_basename_matches(). That will give us an incomplete result, which we can then later augment as basename and inexact rename matches are found. Signed-off-by: Elijah Newren --- diffcore-rename.c | 76 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 62 insertions(+), 14 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index aa21d4e7175c..489e9cb0871e 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -411,6 +411,28 @@ static void update_dir_rename_counts(struct dir_rename_info *info, char new_dir_first_char = new_dir[0]; int first_time_in_loop = 1; + if (!info->setup) + /* + * info->setup is 0 here in two cases: (1) all auxiliary + * vars (like dirs_removed) were NULL so + * initialize_dir_rename_info() returned early, or (2) + * either break detection or copy detection are active so + * that we never called initialize_dir_rename_info(). In + * the former case, we don't have enough info to know if + * directories were renamed (because dirs_removed lets us + * know about a necessary prerequisite, namely if they were + * removed), and in the latter, we don't care about + * directory renames or find_basename_matches. + * + * This matters because both basename and inexact matching + * will also call update_dir_rename_counts(). In either of + * the above two cases info->dir_rename_counts will not + * have been properly initialized which prevents us from + * updating it, but in these two cases we don't care about + * dir_rename_counts anyway, so we can just exit early. + */ + return; + while (1) { dirname_munge(old_dir); dirname_munge(new_dir); @@ -471,14 +493,22 @@ static void update_dir_rename_counts(struct dir_rename_info *info, free(new_dir); } -static void compute_dir_rename_counts(struct dir_rename_info *info, - struct strset *dirs_removed, - struct strmap *dir_rename_count) +static void initialize_dir_rename_info(struct dir_rename_info *info, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int i; + info->setup = 0; + if (!dirs_removed) + return; info->setup = 1; + info->dir_rename_count = dir_rename_count; + if (!info->dir_rename_count) { + info->dir_rename_count = xmalloc(sizeof(*dir_rename_count)); + strmap_init(info->dir_rename_count); + } for (i = 0; i < rename_dst_nr; ++i) { /* @@ -506,7 +536,6 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) strmap_partial_clear(dir_rename_count, 1); } -MAYBE_UNUSED static void cleanup_dir_rename_info(struct dir_rename_info *info, struct strset *dirs_removed, int keep_dir_rename_count) @@ -561,7 +590,9 @@ static const char *get_basename(const char *filename) } static int find_basename_matches(struct diff_options *options, - int minimum_score) + int minimum_score, + struct dir_rename_info *info, + struct strset *dirs_removed) { /* * When I checked in early 2020, over 76% of file renames in linux @@ -669,6 +700,8 @@ static int find_basename_matches(struct diff_options *options, continue; record_rename_pair(dst_index, src_index, score); renames++; + update_dir_rename_counts(info, dirs_removed, + one->path, two->path); /* * Found a rename so don't need text anymore; if we @@ -752,7 +785,12 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, return 1; } -static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, int copies) +static int find_renames(struct diff_score *mx, + int dst_cnt, + int minimum_score, + int copies, + struct dir_rename_info *info, + struct strset *dirs_removed) { int count = 0, i; @@ -769,6 +807,9 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i continue; record_rename_pair(mx[i].dst, mx[i].src, mx[i].score); count++; + update_dir_rename_counts(info, dirs_removed, + rename_src[mx[i].src].p->one->path, + rename_dst[mx[i].dst].p->two->path); } return count; } @@ -840,6 +881,8 @@ void diffcore_rename_extended(struct diff_options *options, info.setup = 0; assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); + if (dirs_removed && (break_idx || want_copies)) + BUG("dirs_removed incompatible with break/copy detection"); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -931,10 +974,17 @@ void diffcore_rename_extended(struct diff_options *options, remove_unneeded_paths_from_src(want_copies); trace2_region_leave("diff", "cull after exact", options->repo); + /* Preparation for basename-driven matching. */ + trace2_region_enter("diff", "dir rename setup", options->repo); + initialize_dir_rename_info(&info, + dirs_removed, dir_rename_count); + trace2_region_leave("diff", "dir rename setup", options->repo); + /* Utilize file basenames to quickly find renames. */ trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, - min_basename_score); + min_basename_score, + &info, dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); /* @@ -1020,18 +1070,15 @@ void diffcore_rename_extended(struct diff_options *options, /* cost matrix sorted by most to least similar pair */ STABLE_QSORT(mx, dst_cnt * NUM_CANDIDATE_PER_DST, score_compare); - rename_count += find_renames(mx, dst_cnt, minimum_score, 0); + rename_count += find_renames(mx, dst_cnt, minimum_score, 0, + &info, dirs_removed); if (want_copies) - rename_count += find_renames(mx, dst_cnt, minimum_score, 1); + rename_count += find_renames(mx, dst_cnt, minimum_score, 1, + &info, dirs_removed); free(mx); trace2_region_leave("diff", "inexact renames", options->repo); cleanup: - /* - * Now that renames have been computed, compute dir_rename_count */ - if (dirs_removed && dir_rename_count) - compute_dir_rename_counts(&info, dirs_removed, dir_rename_count); - /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ @@ -1103,6 +1150,7 @@ void diffcore_rename_extended(struct diff_options *options, if (rename_dst[i].filespec_to_free) free_filespec(rename_dst[i].filespec_to_free); + cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL); FREE_AND_NULL(rename_dst); rename_dst_nr = rename_dst_alloc = 0; FREE_AND_NULL(rename_src); From patchwork Sun Feb 14 07:58:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18A18C433E0 for ; Sun, 14 Feb 2021 08:01:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCE5E64E23 for ; Sun, 14 Feb 2021 08:01:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229869AbhBNIBO (ORCPT ); Sun, 14 Feb 2021 03:01:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229821AbhBNIA2 (ORCPT ); Sun, 14 Feb 2021 03:00:28 -0500 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5691C06178B for ; Sat, 13 Feb 2021 23:59:09 -0800 (PST) Received: by mail-wr1-x435.google.com with SMTP id v14so4877758wro.7 for ; Sat, 13 Feb 2021 23:59:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=fVtGL9oxPIeXkpNd5TvKQjAWx6Z6VstTh0BJNl3lABw=; b=Gk0KcSbsNtZ914Ixrr8V+cD1/M7AwVLHuMscnK0yegRpOC8BFi1Qs5RLtzEP0qSU/g R4LJjezShM1+qkzCwX1+CXehjEda5OnvCMKW9Dx0+w7U0GPmXwgylCCEvrupjVJIjMsL YdFyfJHxFIMZH4kOOpJpuVVRJhKRtUCZRwCMKQhIg+WmN1b/gZAPDq+JckG04uQIej07 wb1NipNJx30W8nbOcT/kG9TEpSb11kD1Yb2fbwN6vCkGhaLuwge+dOGwanskv4tlJQvG qnML40dozjZUCCETVhhUDFeqpSw0SWD1rHMsb+0/2Bkfmu09AtDXv+W5Xuf221kMgs4Q jTjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=fVtGL9oxPIeXkpNd5TvKQjAWx6Z6VstTh0BJNl3lABw=; b=t0A2NkFQCc39At+9zwRZpIi7wbrWgr3xpd6WYSDf7xU06Sbz6KOvLq+wRdsGySGKSJ QW4XNVTGKH0vnI04IhUVMj4LpDGHtKVPNuq6vpCsXBPLzm20v5SpZ34TTjsd5hzXnGYn TLA4qcYO+gJcACM2yNt5t8+DCBoG6j+vZ7GpLnrK13Rcd7SnErspS6Q1TbtteuooFSqs MooWdMRZWjl0U3i8ySFH8rRO4sM3Koj07ipb+Bf/GVq+f5cLfwvIuMyCRizVFwurl+LW YmORZ5v2k/RkC4MPpbeBsPVFvsFpkLxK3oBP7UXMTkqmP5SgdogFjpQofpfnHu6vqhGR 9VDg== X-Gm-Message-State: AOAM5316JDFJOX5H8vW7nSOzizxH5xnsb8HMcw507/QNQbbkedtVN/mn QJbtXy5rgGxRJwqTF+qpDvGCIztKe+k= X-Google-Smtp-Source: ABdhPJzWkKUbuU6Vgca8HGgVdF+RvRlqajFxiHtB3Mk7+b2hDoz+A65mjSOZhtv7ttmzrZOQf2Zs/A== X-Received: by 2002:a05:6000:cd:: with SMTP id q13mr12498754wrx.138.1613289548622; Sat, 13 Feb 2021 23:59:08 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a16sm17465850wrr.89.2021.02.13.23.59.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:08 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sun, 14 Feb 2021 07:58:59 +0000 Subject: [PATCH 06/10] diffcore-rename: add a mapping of destination names to their indices Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add an idx_map member to struct dir_rename_info, which tracks a mapping of the full filename to the index within rename_dst where that filename is found. We will later use this for quickly finding an array entry in rename_dst given the pathname. Signed-off-by: Elijah Newren --- diffcore-rename.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 489e9cb0871e..db569e4a0b0a 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -368,6 +368,7 @@ static int find_exact_renames(struct diff_options *options) } struct dir_rename_info { + struct strintmap idx_map; struct strmap *dir_rename_count; unsigned setup; }; @@ -509,10 +510,26 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, info->dir_rename_count = xmalloc(sizeof(*dir_rename_count)); strmap_init(info->dir_rename_count); } + strintmap_init_with_options(&info->idx_map, -1, NULL, 0); + /* + * Loop setting up both info->idx_map, and doing setup of + * info->dir_rename_count. + */ for (i = 0; i < rename_dst_nr; ++i) { /* - * Make dir_rename_count contain a map of a map: + * For non-renamed files, make idx_map contain mapping of + * filename -> index (index within rename_dst, that is) + */ + if (!rename_dst[i].is_rename) { + char *filename = rename_dst[i].p->two->path; + strintmap_set(&info->idx_map, filename, i); + continue; + } + + /* + * For everything else (i.e. renamed files), make + * dir_rename_count contain a map of a map: * old_directory -> {new_directory -> count} * In other words, for every pair look at the directories for * the old filename and the new filename and count how many @@ -546,6 +563,9 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, if (!info->setup) return; + /* idx_map */ + strintmap_clear(&info->idx_map); + if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); strmap_clear(info->dir_rename_count, 1); From patchwork Sun Feb 14 07:59:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8D6AC433E0 for ; Sun, 14 Feb 2021 08:01:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8609564E23 for ; Sun, 14 Feb 2021 08:01:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229782AbhBNIBj (ORCPT ); Sun, 14 Feb 2021 03:01:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229829AbhBNIAa (ORCPT ); Sun, 14 Feb 2021 03:00:30 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AF4FC06178C for ; Sat, 13 Feb 2021 23:59:10 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id r5so262946wmp.1 for ; Sat, 13 Feb 2021 23:59:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=l2ofDQi8CbOpEuxER5ApmN4GoQTgWbuAVaO6UqD+chc=; b=XXQ0EkQkUiN71Fv6R7C02/BqL7s9Kq+SEIM16Ai1yGsYTFXdhjnWYWeppmFNRSny8L de4gq22fZdjcxPrPTqOFmnWAbmHtfJj1Kc4d4T5uQDzEFA/mX9DjW6Qj8it4CEwG58zL MWVA8yJO08DqxagynnDSN+NP2F3+anRLozJMF1u2wPnBFRYaiQDOm6uo2xk+FLd/ld92 HuMyzV3u7SFkzN4n3ZwIbopfUB/7bmL36C/kV7pLr5zRk+yJrGYO/LRgbssBwiEC7UgS oSMpOZxB4gS9fXahNQkLnrwoVt+ethdCd/GofiivEXoZrsT2lXHGvwnvxErif4oQ3RDJ PxxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=l2ofDQi8CbOpEuxER5ApmN4GoQTgWbuAVaO6UqD+chc=; b=Kk0WlQAsSN4vaJEa8YpetAHwuH4qJVrT5nizqjfghd9lW983dECmsX9gFbvu46x97x Sol7QVQJc3+0+S8wzAewPnDtSmCoA97lXRG9buG0hy83cG5dDz/bT84QnSm6SlyLS8Kn xBMLFFZRx+JDPB/mn7XdeT3JsJT7kZSMYK8/MwNQCMK/Rogpg511EBxNQ9JuoLNmDeSA Z1FmoLeShMG2u4ILhxja4qK6r6/TdQ2YI6H2/+gjDpjHxHls6R3OwQzvBNaae8WSux5h Q4P2z1Kxy3M6k3A08vYNnf5B9y8RNi3qnqWy4t63UMQC/GL1KCva/d+gw+qooscnKdXF HQzw== X-Gm-Message-State: AOAM530CaAXglU3+4EgeDGnfOvqklQ9WYJxhCLsmE4dWYsfe4jX5Cick Iq7aiCnWaD5+O26AwWz0yE2RzRn0c+U= X-Google-Smtp-Source: ABdhPJyDV1kMvF4FrtGhMoJah1ajC0lJq3dBOFMprwDHq602Z9tXcsNjrwxdthsN3z2rvN6j2+2s+w== X-Received: by 2002:a1c:4c03:: with SMTP id z3mr9399293wmf.82.1613289549201; Sat, 13 Feb 2021 23:59:09 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m205sm19934190wmf.40.2021.02.13.23.59.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:08 -0800 (PST) Message-Id: <4983a1c2f908f02bc8a47f883c31652723ffde51.1613289544.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Feb 2021 07:59:00 +0000 Subject: [PATCH 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren dir_rename_counts has a mapping of a mapping, in particular, it has old_dir => { new_dir => count } We want a simple mapping of old_dir => new_dir based on which new_dir had the highest count for a given old_dir. Introduce dir_rename_guess for this purpose. Signed-off-by: Elijah Newren --- diffcore-rename.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index db569e4a0b0a..d24f104aa81c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -369,6 +369,7 @@ static int find_exact_renames(struct diff_options *options) struct dir_rename_info { struct strintmap idx_map; + struct strmap dir_rename_guess; struct strmap *dir_rename_count; unsigned setup; }; @@ -381,6 +382,24 @@ static void dirname_munge(char *filename) *slash = '\0'; } +static const char *get_highest_rename_path(struct strintmap *counts) +{ + int highest_count = 0; + const char *highest_destination_dir = NULL; + struct hashmap_iter iter; + struct strmap_entry *entry; + + strintmap_for_each_entry(counts, &iter, entry) { + const char *destination_dir = entry->key; + intptr_t count = (intptr_t)entry->value; + if (count > highest_count) { + highest_count = count; + highest_destination_dir = destination_dir; + } + } + return highest_destination_dir; +} + static void increment_count(struct dir_rename_info *info, char *old_dir, char *new_dir) @@ -498,6 +517,8 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strset *dirs_removed, struct strmap *dir_rename_count) { + struct hashmap_iter iter; + struct strmap_entry *entry; int i; info->setup = 0; @@ -511,6 +532,7 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strmap_init(info->dir_rename_count); } strintmap_init_with_options(&info->idx_map, -1, NULL, 0); + strmap_init_with_options(&info->dir_rename_guess, NULL, 0); /* * Loop setting up both info->idx_map, and doing setup of @@ -539,6 +561,23 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, rename_dst[i].p->one->path, rename_dst[i].p->two->path); } + + /* + * Now we collapse + * dir_rename_count: old_directory -> {new_directory -> count} + * down to + * dir_rename_guess: old_directory -> best_new_directory + * where best_new_directory is the one with the highest count. + */ + strmap_for_each_entry(info->dir_rename_count, &iter, entry) { + /* entry->key is source_dir */ + struct strintmap *counts = entry->value; + char *best_newdir; + + best_newdir = xstrdup(get_highest_rename_path(counts)); + strmap_put(&info->dir_rename_guess, entry->key, + best_newdir); + } } void partial_clear_dir_rename_count(struct strmap *dir_rename_count) @@ -566,6 +605,9 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, /* idx_map */ strintmap_clear(&info->idx_map); + /* dir_rename_guess */ + strmap_clear(&info->dir_rename_guess, 1); + if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); strmap_clear(info->dir_rename_count, 1); From patchwork Sun Feb 14 07:59:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB595C433DB for ; Sun, 14 Feb 2021 08:01:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9580564E23 for ; Sun, 14 Feb 2021 08:01:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229871AbhBNIBW (ORCPT ); Sun, 14 Feb 2021 03:01:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229827AbhBNIAa (ORCPT ); Sun, 14 Feb 2021 03:00:30 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 077B1C061793 for ; Sat, 13 Feb 2021 23:59:11 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id w4so3491206wmi.4 for ; Sat, 13 Feb 2021 23:59:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=d8rKxuBLiSni/HR5WuT2zLq1YB82P0UW4xQ3B2kXB28=; b=Ie1Udvvwpg9cacC2dNBTexcz9MgyWiPMhS5yKkELHFvFGJcUmPOM+ExZOJmA3fQxcN dLKwMfU794GeMPbEavd9pxMQKpeZn/YzqZX8ob/qEZEQ4DSNEvip98JIS6qQHCwIWFPI A9NPsc4Lt7cdHkfkcZvePQdq472pwyk6RTwZvHfvQ5VfH9ntvm+xZK7fNeU7ZuvJJ/KE HZCNmUdS2Kt4pmhdGUjxl7Y3C/6t2tCatXNtdrwzNARiQ3VQQhh9Rd2hOLC5UsemBDsg YR/yg9A/b/b2cyWBS6OZ6TvxjyIzdQxwaEn0PtOz+8OTUcz/M/14Bbjz6udG8jHhwHgM JTrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=d8rKxuBLiSni/HR5WuT2zLq1YB82P0UW4xQ3B2kXB28=; b=tew+lzBFvOAIKKk4PhCm8tKJYRBRoVtNcVP+61gim2/4NmODm0oZXn/keeUhxHeEnF 6wJ3YbFdFVZNTmmogYQkKfsRsQTyrWZWOXMj8FhLR1zsE7UFISdRlEqZLLsYeywARCDD /5PHuk625SsBhmEzuYi6a4rex4y7uwD8XLIK0yuUCRiSIj/zwhE9ekvUqvo5dY44M5yV Mza32zbZv9gkUVonC2HfY1VeZ5ywQl6q3Y8jGtx1nF8rnTEZEukhEM/N+afPdK68RSUV 2Y1E7ZvweUe2KeF1sRogW967sTE3osdw5lkYBCGsp9cclrW7AI7f+xM9cQrmocJnQZih OPdA== X-Gm-Message-State: AOAM530j3W+XDFcUYv3Cv6r3CO59g9pE2x+69y7MPDYNYlI7ys5WJJ1P sbl6M/jNGUZS7/ZRF8jFVoAmyb3CzUY= X-Google-Smtp-Source: ABdhPJxgeoXE4YOlXczZa9oDmAA3hASmPfmpBoYIdBzQhpZz/0JItiD8WQAPoIAb+89zTqJh/1N5MQ== X-Received: by 2002:a05:600c:2e48:: with SMTP id q8mr6277589wmf.88.1613289549766; Sat, 13 Feb 2021 23:59:09 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z17sm3659566wrv.9.2021.02.13.23.59.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:09 -0800 (PST) Message-Id: In-Reply-To: References: Date: Sun, 14 Feb 2021 07:59:01 +0000 Subject: [PATCH 08/10] diffcore-rename: add a new idx_possible_rename function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren find_basename_matches() is great when both the remaining set of possible rename sources and the remaining set of possible rename destinations have exactly one file each with a given basename. It allows us to match up files that have been moved to different directories without changing filenames. When basenames are not unique, though, we want to be able to guess which directories the source files have been moved to. Since this is the job of directory rename detection, we employ it. However, since it is a directory rename detection idea, we also limit it to cases where we know there could have been a directory rename, i.e. where the source directory has been removed. This has to be signalled by dirs_removed being non-NULL and containing an entry for the relevant directory. Since merge-ort.c is the only caller that currently does so, this optimization is only effective for merge-ort right now. In the future, this condition could be reconsidered or we could modify other callers to pass the necessary strset. Anyway, that's a lot of background so that we can actually describe the new function. Add an idx_possible_rename() function which combines the recently added dir_rename_guess and idx_map fields to provide the index within rename_dst of a potential match for a given file. Future commits will add checks after calling this function to compare the resulting 'likely rename' candidates to see if the two files meet the elevated min_basename_score threshold for marking them as actual renames. Signed-off-by: Elijah Newren --- diffcore-rename.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index d24f104aa81c..1e4a56adde2c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -374,6 +374,12 @@ struct dir_rename_info { unsigned setup; }; +static char *get_dirname(const char *filename) +{ + char *slash = strrchr(filename, '/'); + return slash ? xstrndup(filename, slash-filename) : xstrdup(""); +} + static void dirname_munge(char *filename) { char *slash = strrchr(filename, '/'); @@ -651,6 +657,81 @@ static const char *get_basename(const char *filename) return base ? base + 1 : filename; } +MAYBE_UNUSED +static int idx_possible_rename(char *filename, struct dir_rename_info *info) +{ + /* + * Our comparison of files with the same basename (see + * find_basename_matches() below), is only helpful when after exact + * rename detection we have exactly one file with a given basename + * among the rename sources and also only exactly one file with + * that basename among the rename destinations. When we have + * multiple files with the same basename in either set, we do not + * know which to compare against. However, there are some + * filenames that occur in large numbers (particularly + * build-related filenames such as 'Makefile', '.gitignore', or + * 'build.gradle' that potentially exist within every single + * subdirectory), and for performance we want to be able to quickly + * find renames for these files too. + * + * The reason basename comparisons are a useful heuristic was that it + * is common for people to move files across directories while keeping + * their filename the same. If we had a way of determining or even + * making a good educated guess about which directory these non-unique + * basename files had moved the file to, we could check it. + * Luckily... + * + * When an entire directory is in fact renamed, we have two factors + * helping us out: + * (a) the original directory disappeared giving us a hint + * about when we can apply an extra heuristic. + * (a) we often have several files within that directory and + * subdirectories that are renamed without changes + * So, rules for a heuristic: + * (0) If there basename matches are non-unique (the condition under + * which this function is called) AND + * (1) the directory in which the file was found has disappeared + * (i.e. dirs_removed is non-NULL and has a relevant entry) THEN + * (2) use exact renames of files within the directory to determine + * where the directory is likely to have been renamed to. IF + * there is at least one exact rename from within that + * directory, we can proceed. + * (3) If there are multiple places the directory could have been + * renamed to based on exact renames, ignore all but one of them. + * Just use the destination with the most renames going to it. + * (4) Check if applying that directory rename to the original file + * would result in a destination filename that is in the + * potential rename set. If so, return the index of the + * destination file (the index within rename_dst). + * (5) Compare the original file and returned destination for + * similarity, and if they are sufficiently similar, record the + * rename. + * + * This function, idx_possible_rename(), is only responsible for (4). + * The conditions/steps in (1)-(3) are handled via setting up + * dir_rename_count and dir_rename_guess in + * initialize_dir_rename_info(). Steps (0) and (5) are handled by + * the caller of this function. + */ + char *old_dir, *new_dir, *new_path; + int idx; + + if (!info->setup) + return -1; + + old_dir = get_dirname(filename); + new_dir = strmap_get(&info->dir_rename_guess, old_dir); + free(old_dir); + if (!new_dir) + return -1; + + new_path = xstrfmt("%s/%s", new_dir, get_basename(filename)); + + idx = strintmap_get(&info->idx_map, new_path); + free(new_path); + return idx; +} + static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, From patchwork Sun Feb 14 07:59:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7439C433DB for ; Sun, 14 Feb 2021 08:02:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7CB89614A5 for ; Sun, 14 Feb 2021 08:02:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229702AbhBNICO (ORCPT ); Sun, 14 Feb 2021 03:02:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229792AbhBNIAi (ORCPT ); Sun, 14 Feb 2021 03:00:38 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88F9FC061794 for ; Sat, 13 Feb 2021 23:59:11 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id n4so1905119wrx.1 for ; Sat, 13 Feb 2021 23:59:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Paoo+pQzJ8Xi5K9u8+x+UaLgBVJk1KVJ1L2MCTL/81Y=; b=caeLRAkyow/SDCyZHhwoDrHMhZg7Hl+pZmcJ1W+ssKRyw8kmHoyRFx2MbbA9YQeC0+ dEAv2G3OK8WYvcns7PfvK8o39439A3fEPxoauKBA5Ky/Ne/r3f7QAZgUdHWuZ8IfWQ7F tZiYDWDNCrtCvwF25Nd/op0mQRUZRPFhhkNvuOuPhsFAXSYEcrnI7LezR0trsvEPjNvp oCMUl/9oIavnffwrQrnb2Syb/wILXnTKR0TQy506eRAm39bOIHGazlotahxNB6AARaqW qzN6Zu8SxhJqmZ/B8mHjlXJGckCZFmEFYuj4J2EpuiQqyv6A9P4NBAnq8NBH6irtVI8B G8hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Paoo+pQzJ8Xi5K9u8+x+UaLgBVJk1KVJ1L2MCTL/81Y=; b=X6dYz5+NOJeqbMA6QSFeNKr75th0ZogrWZWwN7RKX/vPJlrjO4J0UhF/CztNZOAEzJ G+CF+nR95Kj+X34j3LyzGuff0L2oPEbo8o7zQQRWP94iVrOP3IwAz9MOu3adVWxp88Fg Tb2m1OACfGOJa+C2LfAMbTewnVcuJ6UD6B/YcFgw/vI8q0vluAw3BEy99LXOb+DK9qTA pB338kRhehlbO6BMAyUk8BQOfxQHOTt2tTqVa9K+83p+3Pqt+7Jas+JxkV9PC/diH+GH 9tyeMFEjkoQVuBQchNiNCKKynpVxJ4PtdeZNHcwnvDjSe4IgzkuRFBo9eHqGR/cvZTrw L5mg== X-Gm-Message-State: AOAM532lluKxq7Zz5Glsxrxz2TOpiHZH0ZZ7xlTHzjvsQuAMqunO5Uvl AP/8LiUI/rKb9V9e9p+Pb796SriizD0= X-Google-Smtp-Source: ABdhPJybI7svVVdPwdp/kUY4l9+5wv3vGPyGc/Iy/BIcnr6/eSEsRl4xNmgmRxZMD9gzzLHji5p4/Q== X-Received: by 2002:adf:facb:: with SMTP id a11mr12840315wrs.161.1613289550330; Sat, 13 Feb 2021 23:59:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b4sm11499942wrp.74.2021.02.13.23.59.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:10 -0800 (PST) Message-Id: <4e095ea7c4390cb47828bbba50af876249983870.1613289544.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Feb 2021 07:59:02 +0000 Subject: [PATCH 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren We are using dir_rename_counts to count the number of other directories that files within a directory moved to. We only need this information for directories that disappeared, though, so we can return early from update_dir_rename_counts() for other paths. While dirs_removed provides the relevant information for us right now, we introduce a new info->relevant_source_dirs parameter because future optimizations will want to change how things are called somewhat. Signed-off-by: Elijah Newren --- diffcore-rename.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index 1e4a56adde2c..5de4497e04fa 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -371,6 +371,7 @@ struct dir_rename_info { struct strintmap idx_map; struct strmap dir_rename_guess; struct strmap *dir_rename_count; + struct strset *relevant_source_dirs; unsigned setup; }; @@ -460,7 +461,13 @@ static void update_dir_rename_counts(struct dir_rename_info *info, return; while (1) { + /* Get old_dir, skip if its directory isn't relevant. */ dirname_munge(old_dir); + if (info->relevant_source_dirs && + !strset_contains(info->relevant_source_dirs, old_dir)) + break; + + /* Get new_dir */ dirname_munge(new_dir); /* @@ -540,6 +547,9 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strintmap_init_with_options(&info->idx_map, -1, NULL, 0); strmap_init_with_options(&info->dir_rename_guess, NULL, 0); + /* Setup info->relevant_source_dirs */ + info->relevant_source_dirs = dirs_removed; + /* * Loop setting up both info->idx_map, and doing setup of * info->dir_rename_count. From patchwork Sun Feb 14 07:59:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12087143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C93EC433DB for ; Sun, 14 Feb 2021 08:02:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4D8B164E26 for ; Sun, 14 Feb 2021 08:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229694AbhBNIBw (ORCPT ); Sun, 14 Feb 2021 03:01:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229789AbhBNIAi (ORCPT ); Sun, 14 Feb 2021 03:00:38 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A054C061797 for ; Sat, 13 Feb 2021 23:59:12 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id v1so4885603wrd.6 for ; Sat, 13 Feb 2021 23:59:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=l/z2OvLd3svp846r/v6fGN490jMZ+a4VTznQAsWEl60=; b=tNLxzPaAbv464yhBh0BhMjWC8DCHXQCnjd/7tpM0DfVsePduCkQ2Y3lqAADSv6b7vZ xK3bDmqZ5TD9MYTgfmLPBs73Wqo8ndjlnPBtuWjTzK1AHcwQIoCkkJ+42A+aZvWNYgah ceVlXkeHvLWmNJ7L3fu4LiLUfTWeaEg2q1krm7BP+5pDVwPJkR2d4Jdai9CPVDM/0IsW nVSnDfRN0Yv+yrIeYvkW0fSkx45PDOh9gsAF1ZtK61QaC0yqbh1ok6PPGsPqG5OOJSgr aBCurQVndbeJ01XuKxqLHWmJnhDa439a741g9Yi682cjQsLDFsfMA9/uTdaAdMzII8j0 M0NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=l/z2OvLd3svp846r/v6fGN490jMZ+a4VTznQAsWEl60=; b=sjcEVbGjLMCdbFTO4kPD61tiIKVvKo3dB0ySxTO/Ua0xBoloh8l9gIWwTZ2TRhn3w/ TRIYzGKzVnNHPTmiN37cBDrbr1LqtFJsZabCDvlfyCRk4Mz/RBQzzlxCuXnSnBo2fc0k b8uDs+x9ifKJI6xKczcZJaz/CS3AhXa21+d+l6K/nhxfIxC4QTizNwohlyXxYAs2CFPj xMBLGTlm4OUp+e4Ce4JPJUdwyPTaWaRMStGJWS3QZx+ErZ1tfiJmXLJrKWv82bHOoFjG lxcyiDs/sSN4bJbT0y6f61OZGOvHiTqS4wGOE620DuqiWD8+TzFznw6sXCMuRORevDBQ eAGw== X-Gm-Message-State: AOAM532vDnmim5toCbE+EFJtIyNlLCSz6KLOxMElcJzByTNJuQ0mMS5C uzthMUsuBvwIk5mC/jL3Nr2Tbg0ZJ0Y= X-Google-Smtp-Source: ABdhPJyVKn0RcF91G23Tc/KaZVW8gXqq7Gxc/lh90Xyg2bifA5Ae2LL3LAn7kMay/++69fzUZc+gFA== X-Received: by 2002:a5d:680e:: with SMTP id w14mr9906762wru.322.1613289550956; Sat, 13 Feb 2021 23:59:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z66sm1759839wmg.39.2021.02.13.23.59.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Feb 2021 23:59:10 -0800 (PST) Message-Id: <1df498b3a2f00cdcbe93acda1dc637e3fca0ee0b.1613289544.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 14 Feb 2021 07:59:03 +0000 Subject: [PATCH 10/10] diffcore-rename: use directory rename guided basename comparisons MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Hook the work from the last several patches together so that when basenames in the sets of possible remaining rename sources or destinations aren't unique, we can guess which directory source files were renamed into. When that guess gives us a pairing of files, and those files are sufficiently similar, we record the two files as a rename and remove them from the large matrix of comparisons for inexact rename detection. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 12.775 s ± 0.062 s 12.596 s ± 0.061 s mega-renames: 188.754 s ± 0.284 s 130.465 s ± 0.259 s just-one-mega: 5.599 s ± 0.019 s 3.958 s ± 0.010 s Signed-off-by: Elijah Newren --- Documentation/gitdiffcore.txt | 2 +- diffcore-rename.c | 32 +++++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt index 80fcf9542441..8673a5c5b2f2 100644 --- a/Documentation/gitdiffcore.txt +++ b/Documentation/gitdiffcore.txt @@ -186,7 +186,7 @@ mark a file pair as a rename and stop considering other candidates for better matches. At most, one comparison is done per file in this preliminary pass; so if there are several remaining ext.txt files throughout the directory hierarchy after exact rename detection, this -preliminary step will be skipped for those files. +preliminary step may be skipped for those files. Note. When the "-C" option is used with `--find-copies-harder` option, 'git diff-{asterisk}' commands feed unmodified filepairs to diff --git a/diffcore-rename.c b/diffcore-rename.c index 5de4497e04fa..70a484b9b63e 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -667,7 +667,6 @@ static const char *get_basename(const char *filename) return base ? base + 1 : filename; } -MAYBE_UNUSED static int idx_possible_rename(char *filename, struct dir_rename_info *info) { /* @@ -780,8 +779,6 @@ static int find_basename_matches(struct diff_options *options, int i, renames = 0; struct strintmap sources; struct strintmap dests; - struct hashmap_iter iter; - struct strmap_entry *entry; /* * The prefeteching stuff wants to know if it can skip prefetching @@ -831,17 +828,34 @@ static int find_basename_matches(struct diff_options *options, } /* Now look for basename matchups and do similarity estimation */ - strintmap_for_each_entry(&sources, &iter, entry) { - const char *base = entry->key; - intptr_t src_index = (intptr_t)entry->value; + for (i = 0; i < rename_src_nr; ++i) { + char *filename = rename_src[i].p->one->path; + const char *base = NULL; + intptr_t src_index; intptr_t dst_index; - if (src_index == -1) - continue; - if (0 <= (dst_index = strintmap_get(&dests, base))) { + /* Is this basename unique among remaining sources? */ + base = get_basename(filename); + src_index = strintmap_get(&sources, base); + assert(src_index == -1 || src_index == i); + + if (strintmap_contains(&dests, base)) { struct diff_filespec *one, *two; int score; + /* Find a matching destination, if possible */ + dst_index = strintmap_get(&dests, base); + if (src_index == -1 || dst_index == -1) { + src_index = i; + dst_index = idx_possible_rename(filename, info); + } + if (dst_index == -1) + continue; + + /* Ignore this dest if already used in a rename */ + if (rename_dst[dst_index].is_rename) + continue; /* already used previously */ + /* Estimate the similarity */ one = rename_src[src_index].p->one; two = rename_dst[dst_index].p->two;