From patchwork Tue Feb 23 23:43:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7CE9C433E0 for ; Tue, 23 Feb 2021 23:52:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50DD464E83 for ; Tue, 23 Feb 2021 23:52:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231932AbhBWXuu (ORCPT ); Tue, 23 Feb 2021 18:50:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234074AbhBWXoz (ORCPT ); Tue, 23 Feb 2021 18:44:55 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E3D4C06174A for ; Tue, 23 Feb 2021 15:44:11 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id v1so124017wrd.6 for ; Tue, 23 Feb 2021 15:44:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=DrzK4XLjW6/g9k0d7UOZEDc9hB0xPnmg5wOxP/Cp3Wg=; b=Z7ki+JtPtC93BQfwbQDGjXeLzV25cAYvizLyuq9YACi/XIDX66i3kaWhr5A39gpW+b zU1mjBwjFdu65iCDM8OZ7tuvBwY+r9CcNoHMSmJ+pLqkeni+sSbPRwTpIoJNuHMT7Xwq JAXe3SStnRQN25Ge8FAhi/BS24MV/2yQ2GW7cBDtIntgL1YDNvwCiyDu0HhKkOIKH3x+ iscnkJpKX768KDH1a8Nw8P3t2PMirz2NeQvEWZhdE4JWI5qY2zmtDHXp6BuVa/UN9f/d 8QABXzrs5nyRr9Zgc2Dan95dbJg7eVs/kkXJq6glbmTvozACRcrWzHYiKV8h77ISPTgX RjJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=DrzK4XLjW6/g9k0d7UOZEDc9hB0xPnmg5wOxP/Cp3Wg=; b=sco0bjkZK+J0lhXMnGou+tjb1AUKFVub9bKj6QJGw3QcKpRX4GiFtfilTA9sxXo5Ck +miH8Nk232Lb4uAo8JNW4u0BkZyB+KyXAtWLMkel+zImrL76Z4cUcitJNSiwOkZ0MBv3 Q42HUM73OUJek8oYOamUhUF7WkGWYMmgOMGowwOylBStGmg+fgXhw0W6vSa5ZKbssVDa NpkP59HkQG/MhaCZOmOYjVDEdieOFYEISjn9kuqy5CTdD+EAx1snnl3PRJHknzc8nWuT fy1gfaW9j8iflVZEqR1wZK8vZNWrJfz6bKxgyqK9G97x5u2APTcL3Si+Rtb/s2+3cmqv mYTQ== X-Gm-Message-State: AOAM532wgKKKgSQJkw9NzTOWnTDNlaTqEKmvgPT8ScIzlyJfFyzdO7Jd PJ+pOGG+6J78LnubMIMxp0sBel0r8T4= X-Google-Smtp-Source: ABdhPJzFKMLdpI8F+nDRuPXQtZbJCllylWaRSFwUkhHujJT5KapwbX6JGt75rLfyh86lgFCNMZkNgw== X-Received: by 2002:a5d:5283:: with SMTP id c3mr27841230wrv.319.1614123849984; Tue, 23 Feb 2021 15:44:09 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g18sm355736wrw.40.2021.02.23.15.44.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:09 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 23 Feb 2021 23:43:58 +0000 Subject: [PATCH v2 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren A previous commit noted that it is very common for people to move files across directories while keeping their filename the same. The last few commits took advantage of this and showed that we can accelerate rename detection significantly using basenames; since files with the same basename serve as likely rename candidates, we can check those first and remove them from the rename candidate pool if they are sufficiently similar. Unfortunately, the previous optimization was limited by the fact that the remaining basenames after exact rename detection are not always unique. Many repositories have hundreds of build files with the same name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have hundreds of source files with the same name. (For example, the linux kernel has 100 setup.c, 87 irq.c, and 112 core.c files. A repository at $DAYJOB has a lot of ObjectFactory.java and Plugin.java files). For these files with non-unique basenames, we are faced with the task of attempting to determine or guess which directory they may have been relocated to. Such a task is precisely the job of directory rename detection. However, there are two catches: (1) the directory rename detection code has traditionally been part of the merge machinery rather than diffcore-rename.c, and (2) directory rename detection currently runs after regular rename detection is complete. The 1st catch is just an implementation issue that can be overcome by some code shuffling. The 2nd requires us to add a further approximation: we only have access to exact renames at this point, so we need to do directory rename detection based on just exact renames. In some cases we won't have exact renames, in which case this extra optimization won't apply. We also choose to not apply the optimization unless we know that the underlying directory was removed, which will require extra data to be passed in to diffcore_rename_extended(). Also, even if we get a prediction about which directory a file may have relocated to, we will still need to check to see if there is a file in the predicted directory, and then compare the two files to see if they meet the higher min_basename_score threshold required for marking the two files as renames. This commit and the next few will set up the necessary infrastructure to do such computations. This commit merely moves the computation of dir_rename_count from merge-ort.c to diffcore-rename.c, making slight adjustments to the data structures based on the move. While the diffstat looks large, viewing this commit with --color-moved makes it clear that only about 20 lines changed. With this patch, the computation of dir_rename_count is still only done after inexact rename detection, but subsequent commits will add a preliminary computation of dir_rename_count after exact rename detection, followed by some updates after inexact rename detection. Signed-off-by: Elijah Newren --- diffcore-rename.c | 134 +++++++++++++++++++++++++++++++++++++++++++++- diffcore.h | 5 ++ merge-ort.c | 132 ++------------------------------------------- 3 files changed, 141 insertions(+), 130 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 41558185ae1d..33cfc5848611 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -367,6 +367,125 @@ static int find_exact_renames(struct diff_options *options) return renames; } +static void dirname_munge(char *filename) +{ + char *slash = strrchr(filename, '/'); + if (!slash) + slash = filename; + *slash = '\0'; +} + +static void increment_count(struct strmap *dir_rename_count, + char *old_dir, + char *new_dir) +{ + struct strintmap *counts; + struct strmap_entry *e; + + /* Get the {new_dirs -> counts} mapping using old_dir */ + e = strmap_get_entry(dir_rename_count, old_dir); + if (e) { + counts = e->value; + } else { + counts = xmalloc(sizeof(*counts)); + strintmap_init_with_options(counts, 0, NULL, 1); + strmap_put(dir_rename_count, old_dir, counts); + } + + /* Increment the count for new_dir */ + strintmap_incr(counts, new_dir, 1); +} + +static void update_dir_rename_counts(struct strmap *dir_rename_count, + struct strset *dirs_removed, + const char *oldname, + const char *newname) +{ + char *old_dir = xstrdup(oldname); + char *new_dir = xstrdup(newname); + char new_dir_first_char = new_dir[0]; + int first_time_in_loop = 1; + + while (1) { + dirname_munge(old_dir); + dirname_munge(new_dir); + + /* + * When renaming + * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c" + * then this suggests that both + * a/b/c/d/e/ => a/b/some/thing/else/e/ + * a/b/c/d/ => a/b/some/thing/else/ + * so we want to increment counters for both. We do NOT, + * however, also want to suggest that there was the following + * rename: + * a/b/c/ => a/b/some/thing/ + * so we need to quit at that point. + * + * Note the when first_time_in_loop, we only strip off the + * basename, and we don't care if that's different. + */ + if (!first_time_in_loop) { + char *old_sub_dir = strchr(old_dir, '\0')+1; + char *new_sub_dir = strchr(new_dir, '\0')+1; + if (!*new_dir) { + /* + * Special case when renaming to root directory, + * i.e. when new_dir == "". In this case, we had + * something like + * a/b/subdir => subdir + * and so dirname_munge() sets things up so that + * old_dir = "a/b\0subdir\0" + * new_dir = "\0ubdir\0" + * We didn't have a '/' to overwrite a '\0' onto + * in new_dir, so we have to compare differently. + */ + if (new_dir_first_char != old_sub_dir[0] || + strcmp(old_sub_dir+1, new_sub_dir)) + break; + } else { + if (strcmp(old_sub_dir, new_sub_dir)) + break; + } + } + + if (strset_contains(dirs_removed, old_dir)) + increment_count(dir_rename_count, old_dir, new_dir); + else + break; + + /* If we hit toplevel directory ("") for old or new dir, quit */ + if (!*old_dir || !*new_dir) + break; + + first_time_in_loop = 0; + } + + /* Free resources we don't need anymore */ + free(old_dir); + free(new_dir); +} + +static void compute_dir_rename_counts(struct strmap *dir_rename_count, + struct strset *dirs_removed) +{ + int i; + + /* Set up dir_rename_count */ + for (i = 0; i < rename_dst_nr; ++i) { + /* + * Make dir_rename_count contain a map of a map: + * old_directory -> {new_directory -> count} + * In other words, for every pair look at the directories for + * the old filename and the new filename and count how many + * times that pairing occurs. + */ + update_dir_rename_counts(dir_rename_count, dirs_removed, + rename_dst[i].p->one->path, + rename_dst[i].p->two->path); + } +} + static const char *get_basename(const char *filename) { /* @@ -640,7 +759,9 @@ static void remove_unneeded_paths_from_src(int detecting_copies) rename_src_nr = new_num_src; } -void diffcore_rename(struct diff_options *options) +void diffcore_rename_extended(struct diff_options *options, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int detect_rename = options->detect_rename; int minimum_score = options->rename_score; @@ -653,6 +774,7 @@ void diffcore_rename(struct diff_options *options) struct progress *progress = NULL; trace2_region_enter("diff", "setup", options->repo); + assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -841,6 +963,11 @@ void diffcore_rename(struct diff_options *options) trace2_region_leave("diff", "inexact renames", options->repo); cleanup: + /* + * Now that renames have been computed, compute dir_rename_count */ + if (dirs_removed && dir_rename_count) + compute_dir_rename_counts(dir_rename_count, dirs_removed); + /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ @@ -923,3 +1050,8 @@ void diffcore_rename(struct diff_options *options) trace2_region_leave("diff", "write back to queue", options->repo); return; } + +void diffcore_rename(struct diff_options *options) +{ + diffcore_rename_extended(options, NULL, NULL); +} diff --git a/diffcore.h b/diffcore.h index d2a63c5c71f4..db55d3853071 100644 --- a/diffcore.h +++ b/diffcore.h @@ -8,6 +8,8 @@ struct diff_options; struct repository; +struct strmap; +struct strset; struct userdiff_driver; /* This header file is internal between diff.c and its diff transformers @@ -161,6 +163,9 @@ void diff_q(struct diff_queue_struct *, struct diff_filepair *); void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); +void diffcore_rename_extended(struct diff_options *options, + struct strset *dirs_removed, + struct strmap *dir_rename_count); void diffcore_merge_broken(void); void diffcore_pickaxe(struct diff_options *); void diffcore_order(const char *orderfile); diff --git a/merge-ort.c b/merge-ort.c index 603d30c52170..c4467e073b45 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1302,131 +1302,6 @@ static char *handle_path_level_conflicts(struct merge_options *opt, return new_path; } -static void dirname_munge(char *filename) -{ - char *slash = strrchr(filename, '/'); - if (!slash) - slash = filename; - *slash = '\0'; -} - -static void increment_count(struct strmap *dir_rename_count, - char *old_dir, - char *new_dir) -{ - struct strintmap *counts; - struct strmap_entry *e; - - /* Get the {new_dirs -> counts} mapping using old_dir */ - e = strmap_get_entry(dir_rename_count, old_dir); - if (e) { - counts = e->value; - } else { - counts = xmalloc(sizeof(*counts)); - strintmap_init_with_options(counts, 0, NULL, 1); - strmap_put(dir_rename_count, old_dir, counts); - } - - /* Increment the count for new_dir */ - strintmap_incr(counts, new_dir, 1); -} - -static void update_dir_rename_counts(struct strmap *dir_rename_count, - struct strset *dirs_removed, - const char *oldname, - const char *newname) -{ - char *old_dir = xstrdup(oldname); - char *new_dir = xstrdup(newname); - char new_dir_first_char = new_dir[0]; - int first_time_in_loop = 1; - - while (1) { - dirname_munge(old_dir); - dirname_munge(new_dir); - - /* - * When renaming - * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c" - * then this suggests that both - * a/b/c/d/e/ => a/b/some/thing/else/e/ - * a/b/c/d/ => a/b/some/thing/else/ - * so we want to increment counters for both. We do NOT, - * however, also want to suggest that there was the following - * rename: - * a/b/c/ => a/b/some/thing/ - * so we need to quit at that point. - * - * Note the when first_time_in_loop, we only strip off the - * basename, and we don't care if that's different. - */ - if (!first_time_in_loop) { - char *old_sub_dir = strchr(old_dir, '\0')+1; - char *new_sub_dir = strchr(new_dir, '\0')+1; - if (!*new_dir) { - /* - * Special case when renaming to root directory, - * i.e. when new_dir == "". In this case, we had - * something like - * a/b/subdir => subdir - * and so dirname_munge() sets things up so that - * old_dir = "a/b\0subdir\0" - * new_dir = "\0ubdir\0" - * We didn't have a '/' to overwrite a '\0' onto - * in new_dir, so we have to compare differently. - */ - if (new_dir_first_char != old_sub_dir[0] || - strcmp(old_sub_dir+1, new_sub_dir)) - break; - } else { - if (strcmp(old_sub_dir, new_sub_dir)) - break; - } - } - - if (strset_contains(dirs_removed, old_dir)) - increment_count(dir_rename_count, old_dir, new_dir); - else - break; - - /* If we hit toplevel directory ("") for old or new dir, quit */ - if (!*old_dir || !*new_dir) - break; - - first_time_in_loop = 0; - } - - /* Free resources we don't need anymore */ - free(old_dir); - free(new_dir); -} - -static void compute_rename_counts(struct diff_queue_struct *pairs, - struct strmap *dir_rename_count, - struct strset *dirs_removed) -{ - int i; - - for (i = 0; i < pairs->nr; ++i) { - struct diff_filepair *pair = pairs->queue[i]; - - /* File not part of directory rename if it wasn't renamed */ - if (pair->status != 'R') - continue; - - /* - * Make dir_rename_count contain a map of a map: - * old_directory -> {new_directory -> count} - * In other words, for every pair look at the directories for - * the old filename and the new filename and count how many - * times that pairing occurs. - */ - update_dir_rename_counts(dir_rename_count, dirs_removed, - pair->one->path, - pair->two->path); - } -} - static void get_provisional_directory_renames(struct merge_options *opt, unsigned side, int *clean) @@ -1435,9 +1310,6 @@ static void get_provisional_directory_renames(struct merge_options *opt, struct strmap_entry *entry; struct rename_info *renames = &opt->priv->renames; - compute_rename_counts(&renames->pairs[side], - &renames->dir_rename_count[side], - &renames->dirs_removed[side]); /* * Collapse * dir_rename_count: old_directory -> {new_directory -> count} @@ -2162,7 +2034,9 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); - diffcore_rename(&diff_opts); + diffcore_rename_extended(&diff_opts, + &renames->dirs_removed[side_index], + &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); From patchwork Tue Feb 23 23:43:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7DCAC433DB for ; Tue, 23 Feb 2021 23:52:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85C8764E85 for ; Tue, 23 Feb 2021 23:52:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234125AbhBWXvW (ORCPT ); Tue, 23 Feb 2021 18:51:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234068AbhBWXoz (ORCPT ); Tue, 23 Feb 2021 18:44:55 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAAB5C06178B for ; Tue, 23 Feb 2021 15:44:11 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id w7so277595wmb.5 for ; Tue, 23 Feb 2021 15:44:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Dtn2wkmX7msgWH+66SXQpt+Ko4iDx/XxyOyo9sZP6QE=; b=TzigRVTeSaWvFy/Tu7F+iz3Wat9dY22lCmQ+21L9SpGirCv1neKmL0BXmD096NpY61 HD72vHTVp+gYOnb/TdNZ7EKkzS/PFmTyxumDegGf9Voja+Nq8cg/PMewJaXK8Q8qpjnL viz5c2ittyHOzAH40iCHL1ms7Ju5Sy2zE4tOOSVJUZutf7RXwYr8TH/rprHKXfPAoydH uQOGi136ik/ElKD3we2Ljbmig/4hxcr6XwEuZ0F/Ceqso6xE8iRU/juvC6qAuTtevU2b AE6UQ9uVrMhZwNHXDfCqKpTBBzP2fennTLMeQLVGnLLdCCKWRA80zZ8L1EF5w0iF19VF 8Khg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Dtn2wkmX7msgWH+66SXQpt+Ko4iDx/XxyOyo9sZP6QE=; b=o1X+Hp6kgNKk5oSVwJRoOHXNvhXwIxIFhoJ1knPHd8RqAntYjeVNR96i1vuwA/K0cF CxlHdG5F2OvjQRll9GrmPGFjxZZqLu83iAMH4fcFEVjagvv1FxulYBBMIJovTFAx60LS YvLE5b4PH81FGTPqgO8kl1sU9UiSoRImOi9KL3Om1znFm+3lPJo8sL969dcC3dWiGhY9 FcmVyyfU1eDk8pVtlbKLx71PWyEXZdrxJ4EE753yesfnIExnVTJWLhL7xpu3Yz1+aD1f tWM8OmqqQ0cYxxDZO2habtsZAPxaprz/LMsbLY9ze5yg1+hjiXFQCt8XFpCDbps+FzDE fs/Q== X-Gm-Message-State: AOAM531RPESA6JPGSZ4xk/duAitKruAukuNGwj5jVBTINYWhRmht5kP3 bVtkNRQghg+IZYjpiqoEcgvTbozkjLM= X-Google-Smtp-Source: ABdhPJyWJb/hfEc7VVyqfiqpyCx5NQFVtAiNMcVC3tyXu5XcJu+l+xEYKvcnY6lZLEf6cfjdA3rThg== X-Received: by 2002:a1c:dd09:: with SMTP id u9mr966996wmg.183.1614123850642; Tue, 23 Feb 2021 15:44:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s23sm164330wmc.29.2021.02.23.15.44.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:10 -0800 (PST) Message-Id: <612da82f049cbe877cf924f743a3e4059483ce51.1614123848.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 23 Feb 2021 23:43:59 +0000 Subject: [PATCH v2 02/10] diffcore-rename: add functions for clearing dir_rename_count Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren As we adjust the usage of dir_rename_count we want to have functions for clearing, or partially clearing it out. Add such functions. Signed-off-by: Elijah Newren --- diffcore-rename.c | 19 +++++++++++++++++++ diffcore.h | 2 ++ merge-ort.c | 12 +++--------- 3 files changed, 24 insertions(+), 9 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 33cfc5848611..614a8d63012d 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -486,6 +486,25 @@ static void compute_dir_rename_counts(struct strmap *dir_rename_count, } } +void partial_clear_dir_rename_count(struct strmap *dir_rename_count) +{ + struct hashmap_iter iter; + struct strmap_entry *entry; + + strmap_for_each_entry(dir_rename_count, &iter, entry) { + struct strintmap *counts = entry->value; + strintmap_clear(counts); + } + strmap_partial_clear(dir_rename_count, 1); +} + +MAYBE_UNUSED +static void clear_dir_rename_count(struct strmap *dir_rename_count) +{ + partial_clear_dir_rename_count(dir_rename_count); + strmap_clear(dir_rename_count, 1); +} + static const char *get_basename(const char *filename) { /* diff --git a/diffcore.h b/diffcore.h index db55d3853071..c6ba64abd198 100644 --- a/diffcore.h +++ b/diffcore.h @@ -161,6 +161,8 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *, struct diff_filespec *); void diff_q(struct diff_queue_struct *, struct diff_filepair *); +void partial_clear_dir_rename_count(struct strmap *dir_rename_count); + void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, diff --git a/merge-ort.c b/merge-ort.c index c4467e073b45..467404cc0a35 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -351,17 +351,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, /* Free memory used by various renames maps */ for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) { - struct hashmap_iter iter; - struct strmap_entry *entry; - strset_func(&renames->dirs_removed[i]); - strmap_for_each_entry(&renames->dir_rename_count[i], - &iter, entry) { - struct strintmap *counts = entry->value; - strintmap_clear(counts); - } - strmap_func(&renames->dir_rename_count[i], 1); + partial_clear_dir_rename_count(&renames->dir_rename_count[i]); + if (!reinitialize) + strmap_clear(&renames->dir_rename_count[i], 1); strmap_func(&renames->dir_renames[i], 0); } From patchwork Tue Feb 23 23:44:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F173C433DB for ; Tue, 23 Feb 2021 23:50:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 050DC64E6F for ; Tue, 23 Feb 2021 23:50:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232277AbhBWXtV (ORCPT ); Tue, 23 Feb 2021 18:49:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234070AbhBWXoz (ORCPT ); Tue, 23 Feb 2021 18:44:55 -0500 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74383C06178C for ; Tue, 23 Feb 2021 15:44:12 -0800 (PST) Received: by mail-wm1-x332.google.com with SMTP id m1so299672wml.2 for ; Tue, 23 Feb 2021 15:44:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XuiYx4EO4YirUq4iYtjYiQAhDHbYcuAz1/sCPONw96E=; b=eo/uM8SRqcdPu2HSwtyRGTC2OXLc0LVF4/KM8mlgRhPm6OT+dAtPlA+F5hVXdrIL0n vT3f0x/Hj4DGMike529SIutn4SCYG1zj+y/QKEYTCb9Lc8DUO6p66DPF6RHNfvOUe9Pf PqAqYtL7FMXP2IVvowXYOP3bvw99VaMqGVC0DZ+1orEK6nnDmUHWwmS7vhQ4PJ28hSHM ZXfoJ1D8ar/NVQIZewB9ETqmDRgA5efsT4gRQqi06qDaqr5qHXMGsnC06wsUjc8DVnxJ IwcHZWB03znBvhyrcBrGRE5cRp8xLBu57GIWsCjZw+EO6AsAkBM0q+EQ9FsthcvUN7KH LUAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XuiYx4EO4YirUq4iYtjYiQAhDHbYcuAz1/sCPONw96E=; b=KIDym1nSkm7qMWQ4UjMSL5bQFPukPUQALxuXdLkEb/l/+yDATk5zzv5RVZI3uLeK5e pWHXeSInc0kHx5vjqZ16xkyjRiCREh2viHeymCUdaFr9HnCPZHMSdUEmw3Oa7RjHhuP9 xhcAweT1/KxDZ4tfrYmRiB2ORtgDthow+3hW88zVtTx+rNm/h3cB++fJRlUxZ/v3SKXf i+46BTZwNgmT9WIUcfuLsLW/kf95QSq/oWcElu5/cGH0CeFo1smTFD1I6qwPagWJNyBB IRFal7viJzxQ2NyQAh8h3Ae6UYwSyBJfHQUMB9UQsitvKjYfSQVW7tGh5+b27+W6hFBV iadA== X-Gm-Message-State: AOAM532rw/HK2YBXuoWfN3uCZqwC2kHdXlAzs4JHGykNu/GTiqrA7LXf oE1ywytiPgVs6J3Q+N5KCmXNRGvDVYY= X-Google-Smtp-Source: ABdhPJwBd71ATwkRPqedf3h9P0xdH+oCb83V2rEcnE1npIiF3qkIQBjzLxXlOKBtIHVzRDFx6HmgDQ== X-Received: by 2002:a1c:1982:: with SMTP id 124mr1052175wmz.84.1614123851252; Tue, 23 Feb 2021 15:44:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g18sm184925wmh.17.2021.02.23.15.44.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:11 -0800 (PST) Message-Id: <93f98fc0b2644aab9e98b7a32e88561618c7d4c0.1614123848.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:00 +0000 Subject: [PATCH v2 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren This is a purely cosmetic change for now, but we will be adding additional information to the struct and changing where and how it is setup and used in subsequent patches. Signed-off-by: Elijah Newren --- diffcore-rename.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 614a8d63012d..7759c9a3a2ed 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -367,6 +367,11 @@ static int find_exact_renames(struct diff_options *options) return renames; } +struct dir_rename_info { + struct strmap *dir_rename_count; + unsigned setup; +}; + static void dirname_munge(char *filename) { char *slash = strrchr(filename, '/'); @@ -375,7 +380,7 @@ static void dirname_munge(char *filename) *slash = '\0'; } -static void increment_count(struct strmap *dir_rename_count, +static void increment_count(struct dir_rename_info *info, char *old_dir, char *new_dir) { @@ -383,20 +388,20 @@ static void increment_count(struct strmap *dir_rename_count, struct strmap_entry *e; /* Get the {new_dirs -> counts} mapping using old_dir */ - e = strmap_get_entry(dir_rename_count, old_dir); + e = strmap_get_entry(info->dir_rename_count, old_dir); if (e) { counts = e->value; } else { counts = xmalloc(sizeof(*counts)); strintmap_init_with_options(counts, 0, NULL, 1); - strmap_put(dir_rename_count, old_dir, counts); + strmap_put(info->dir_rename_count, old_dir, counts); } /* Increment the count for new_dir */ strintmap_incr(counts, new_dir, 1); } -static void update_dir_rename_counts(struct strmap *dir_rename_count, +static void update_dir_rename_counts(struct dir_rename_info *info, struct strset *dirs_removed, const char *oldname, const char *newname) @@ -450,7 +455,7 @@ static void update_dir_rename_counts(struct strmap *dir_rename_count, } if (strset_contains(dirs_removed, old_dir)) - increment_count(dir_rename_count, old_dir, new_dir); + increment_count(info, old_dir, new_dir); else break; @@ -466,12 +471,15 @@ static void update_dir_rename_counts(struct strmap *dir_rename_count, free(new_dir); } -static void compute_dir_rename_counts(struct strmap *dir_rename_count, - struct strset *dirs_removed) +static void compute_dir_rename_counts(struct dir_rename_info *info, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int i; - /* Set up dir_rename_count */ + info->setup = 1; + info->dir_rename_count = dir_rename_count; + for (i = 0; i < rename_dst_nr; ++i) { /* * Make dir_rename_count contain a map of a map: @@ -480,7 +488,7 @@ static void compute_dir_rename_counts(struct strmap *dir_rename_count, * the old filename and the new filename and count how many * times that pairing occurs. */ - update_dir_rename_counts(dir_rename_count, dirs_removed, + update_dir_rename_counts(info, dirs_removed, rename_dst[i].p->one->path, rename_dst[i].p->two->path); } @@ -499,10 +507,13 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) } MAYBE_UNUSED -static void clear_dir_rename_count(struct strmap *dir_rename_count) +static void cleanup_dir_rename_info(struct dir_rename_info *info) { - partial_clear_dir_rename_count(dir_rename_count); - strmap_clear(dir_rename_count, 1); + if (!info->setup) + return; + + partial_clear_dir_rename_count(info->dir_rename_count); + strmap_clear(info->dir_rename_count, 1); } static const char *get_basename(const char *filename) @@ -791,8 +802,10 @@ void diffcore_rename_extended(struct diff_options *options, int num_destinations, dst_cnt; int num_sources, want_copies; struct progress *progress = NULL; + struct dir_rename_info info; trace2_region_enter("diff", "setup", options->repo); + info.setup = 0; assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); if (!minimum_score) @@ -985,7 +998,7 @@ void diffcore_rename_extended(struct diff_options *options, /* * Now that renames have been computed, compute dir_rename_count */ if (dirs_removed && dir_rename_count) - compute_dir_rename_counts(dir_rename_count, dirs_removed); + compute_dir_rename_counts(&info, dirs_removed, dir_rename_count); /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. From patchwork Tue Feb 23 23:44:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2FDC433E6 for ; Tue, 23 Feb 2021 23:56:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 970E964E57 for ; Tue, 23 Feb 2021 23:56:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234205AbhBWXxA (ORCPT ); Tue, 23 Feb 2021 18:53:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234057AbhBWXo4 (ORCPT ); Tue, 23 Feb 2021 18:44:56 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E743CC061793 for ; Tue, 23 Feb 2021 15:44:12 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id c7so117736wru.8 for ; Tue, 23 Feb 2021 15:44:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=JN0wvt58itYZYUkNRwdYTTFw6FJIsIxDn0Jcd0Za3ok=; b=Q4vmyRl0EnVzFINToiQSZdzST0m9KlJ05x9KI1sjjhz5ZoaPrw7kljdITz6CdjAYnp ALzjDNy1NQJ6U+szgL8+mtu/pyXus4dXDj5xx7PCTdvCKhlLQ1fFNNUqQMeovrnqYO5o LXY+xPJOcg+DanB7FFTvkt2mV8Fxh9W9i2YaTnzLXGAEZYT83rIhq8YGhl2WLxoL1JKO aHZbE4U+PGsHAvX4ISRCk/kiXHSuAyP9GXtnAq2Q/pM9OPUGCVsXtAaEk3cAERCRq0Wd 4DrSzTWL+04SZ7lbvzFDHGpGebRwT3pZQH4oasxys7k+u+Yp/GsNd/2kFsIX7CYxUkD9 JAew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=JN0wvt58itYZYUkNRwdYTTFw6FJIsIxDn0Jcd0Za3ok=; b=X9ggg06R+CmpBVH6tStSLfRHBrUXVbW0g78bIL1fJR4ZYLlb7/nNeu1ex022NXKEb6 fhrwQCoIQgWH0X+bG80aBFYNj0PWPPOadgv4OZtG2o6A28+zb0nyTRAA0TK4V7P1srNH 0IbH7XkRkjnBbsi194DTBKtfIns0f90SJtDx6T1E1zmV7bMGUZENmC4mZnd9xWs9gAQY nC2qel9dbJK5Nn5g68cFwO3xJGUI0eVVeTj6kx91Su9BXzZq8cBzEXC5fp6JfgOOa1+r 8rhhg7SxorKs4PM6FYvLSW1uqpLUO/Nsmk023iyK0i2e/OpHniMEGLdTJKovVArlZ5pD +8jw== X-Gm-Message-State: AOAM531DTB9kHWY5ucLMeDiAGBTJqzrLy+OybQaGfl/1DXI7j7OOc0T0 K7KLCLTV+R/pUX3qHWX0FFGN5GgMMI4= X-Google-Smtp-Source: ABdhPJzMhaX54nJ+8rqKdk3HvllZJwvxxSVCbb0NTzmOguN9fss+dpn6GyIhQeF0z5wVP2dsCeymdw== X-Received: by 2002:a5d:4bc1:: with SMTP id l1mr21709495wrt.396.1614123851793; Tue, 23 Feb 2021 15:44:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h22sm187092wmb.36.2021.02.23.15.44.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:11 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:01 +0000 Subject: [PATCH v2 04/10] diffcore-rename: extend cleanup_dir_rename_info() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When diffcore_rename_extended() is passed a NULL dir_rename_count, we will still want to create a temporary one for use by find_basename_matches(), but have it fully deallocated before diffcore_rename_extended() returns. However, when diffcore_rename_extended() is passed a dir_rename_count, we want to fill that strmap with appropriate values and return it. However, for our interim purposes we may also add entries corresponding to directories that cannot have been renamed due to still existing on both sides. Extend cleanup_dir_rename_info() to handle these two different cases, cleaning up the relevant bits of information for each case. Signed-off-by: Elijah Newren --- diffcore-rename.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7759c9a3a2ed..aa21d4e7175c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -507,13 +507,45 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) } MAYBE_UNUSED -static void cleanup_dir_rename_info(struct dir_rename_info *info) +static void cleanup_dir_rename_info(struct dir_rename_info *info, + struct strset *dirs_removed, + int keep_dir_rename_count) { + struct hashmap_iter iter; + struct strmap_entry *entry; + if (!info->setup) return; - partial_clear_dir_rename_count(info->dir_rename_count); - strmap_clear(info->dir_rename_count, 1); + if (!keep_dir_rename_count) { + partial_clear_dir_rename_count(info->dir_rename_count); + strmap_clear(info->dir_rename_count, 1); + FREE_AND_NULL(info->dir_rename_count); + } else { + /* + * Although dir_rename_count was passed in + * diffcore_rename_extended() and we want to keep it around and + * return it to that caller, we first want to remove any data + * associated with directories that weren't renamed. + */ + struct string_list to_remove = STRING_LIST_INIT_NODUP; + int i; + + strmap_for_each_entry(info->dir_rename_count, &iter, entry) { + const char *source_dir = entry->key; + struct strintmap *counts = entry->value; + + if (!strset_contains(dirs_removed, source_dir)) { + string_list_append(&to_remove, source_dir); + strintmap_clear(counts); + continue; + } + } + for (i=0; idir_rename_count, + to_remove.items[i].string, 1); + string_list_clear(&to_remove, 0); + } } static const char *get_basename(const char *filename) From patchwork Tue Feb 23 23:44:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F8FBC433DB for ; Wed, 24 Feb 2021 00:09:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1400864E89 for ; Wed, 24 Feb 2021 00:09:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234640AbhBXACj (ORCPT ); Tue, 23 Feb 2021 19:02:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233839AbhBWXrJ (ORCPT ); Tue, 23 Feb 2021 18:47:09 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90151C061794 for ; Tue, 23 Feb 2021 15:44:13 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id d11so121182wrj.7 for ; Tue, 23 Feb 2021 15:44:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=QWJ9yu7X8wmM7ii3WiU9OKa150wYmtvxUxj7efV4mx4=; b=U0niMc+5Q4YmhzSQ50VVsyoXb5RdmvRM/xyQvfZoM6Kz8lr9OAR0wjJgCmGxpJPBqI UHa8Q7tcRnf3fx90p5EPGiljbNXAnCwZNC/xR5r4hibgWSKN/pe0pOPZsf3Z4BjVFmvc 65ywMhu5VzkM/9gEDo/GlWhRAisnYjXleQ6QIHsioWgY3pIKVoI4Q//g+0XOzPEbwpPj hyOnvIJgpwlvRDhmU0K7ZAkYZQUklZvCTxhZu3ysVQpeqcxNFTI22ih2gx3GezUBKyO7 tBSOl5uJxKdpiuJiB8k+WUQXJM89kwh+ua0a5uJVfph47IEfx2h5eUvHekIDgzuMDjMV cujA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=QWJ9yu7X8wmM7ii3WiU9OKa150wYmtvxUxj7efV4mx4=; b=VECScG5a9JUmBXyZYAyyM2A0m1gc57fwC5AE/PMp/mUGQObVir1jDqTlEkk1WyOTYx F7KZ2ya8IpGJr4rPY2/VMfNh4rQp05kK+UOy9KjXAW50CQgtNdCAAPqItcF4cceeDVl9 dGHvROsz1dvr7gD4UgH6RHFa96pGRJmMk7X0XIWBGPVFl+7NAESJ3es8td0bWV0OdR+4 1/asJuw+XSidutFhD/ViCgEQPO6mObd2aO73PyGSuQcoUaxoQnMes0xL9D0cRCBjoR5U ZEQIySi9LSIaTn+jpb4EiMK1hTalXSprmiYJ6syzbI1+rLrEpIEQwuGg2Rmubmh4EmxK d5Pg== X-Gm-Message-State: AOAM532zNMswRJPpS4CmnlTWIf+LzUoAcvftKn4fLOasqFsmlqRMeze/ Ga9BB6hAEszM0vFLD3VUjGuFX8bBEy4= X-Google-Smtp-Source: ABdhPJzHDuBJcJXt/x2nllos/8UDDEgGmbsAhhhW76Su9E/H2xtjtfrxJ2OFeAmHLglPGZEZqqfCNA== X-Received: by 2002:a5d:5149:: with SMTP id u9mr29507020wrt.348.1614123852340; Tue, 23 Feb 2021 15:44:12 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q25sm174879wmq.15.2021.02.23.15.44.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:12 -0800 (PST) Message-Id: <3a29cf9e526fba0227a7eec92c0c6bd58a7850f0.1614123848.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:02 +0000 Subject: [PATCH v2 05/10] diffcore-rename: compute dir_rename_counts in stages Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren We want to first compute dir_rename_counts based just on exact renames to start, as that can provide us useful information in find_basename_matches(). That will give us an incomplete result, which we can then later augment as basename and inexact rename matches are found. Signed-off-by: Elijah Newren --- diffcore-rename.c | 76 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 62 insertions(+), 14 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index aa21d4e7175c..489e9cb0871e 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -411,6 +411,28 @@ static void update_dir_rename_counts(struct dir_rename_info *info, char new_dir_first_char = new_dir[0]; int first_time_in_loop = 1; + if (!info->setup) + /* + * info->setup is 0 here in two cases: (1) all auxiliary + * vars (like dirs_removed) were NULL so + * initialize_dir_rename_info() returned early, or (2) + * either break detection or copy detection are active so + * that we never called initialize_dir_rename_info(). In + * the former case, we don't have enough info to know if + * directories were renamed (because dirs_removed lets us + * know about a necessary prerequisite, namely if they were + * removed), and in the latter, we don't care about + * directory renames or find_basename_matches. + * + * This matters because both basename and inexact matching + * will also call update_dir_rename_counts(). In either of + * the above two cases info->dir_rename_counts will not + * have been properly initialized which prevents us from + * updating it, but in these two cases we don't care about + * dir_rename_counts anyway, so we can just exit early. + */ + return; + while (1) { dirname_munge(old_dir); dirname_munge(new_dir); @@ -471,14 +493,22 @@ static void update_dir_rename_counts(struct dir_rename_info *info, free(new_dir); } -static void compute_dir_rename_counts(struct dir_rename_info *info, - struct strset *dirs_removed, - struct strmap *dir_rename_count) +static void initialize_dir_rename_info(struct dir_rename_info *info, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int i; + info->setup = 0; + if (!dirs_removed) + return; info->setup = 1; + info->dir_rename_count = dir_rename_count; + if (!info->dir_rename_count) { + info->dir_rename_count = xmalloc(sizeof(*dir_rename_count)); + strmap_init(info->dir_rename_count); + } for (i = 0; i < rename_dst_nr; ++i) { /* @@ -506,7 +536,6 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) strmap_partial_clear(dir_rename_count, 1); } -MAYBE_UNUSED static void cleanup_dir_rename_info(struct dir_rename_info *info, struct strset *dirs_removed, int keep_dir_rename_count) @@ -561,7 +590,9 @@ static const char *get_basename(const char *filename) } static int find_basename_matches(struct diff_options *options, - int minimum_score) + int minimum_score, + struct dir_rename_info *info, + struct strset *dirs_removed) { /* * When I checked in early 2020, over 76% of file renames in linux @@ -669,6 +700,8 @@ static int find_basename_matches(struct diff_options *options, continue; record_rename_pair(dst_index, src_index, score); renames++; + update_dir_rename_counts(info, dirs_removed, + one->path, two->path); /* * Found a rename so don't need text anymore; if we @@ -752,7 +785,12 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, return 1; } -static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, int copies) +static int find_renames(struct diff_score *mx, + int dst_cnt, + int minimum_score, + int copies, + struct dir_rename_info *info, + struct strset *dirs_removed) { int count = 0, i; @@ -769,6 +807,9 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i continue; record_rename_pair(mx[i].dst, mx[i].src, mx[i].score); count++; + update_dir_rename_counts(info, dirs_removed, + rename_src[mx[i].src].p->one->path, + rename_dst[mx[i].dst].p->two->path); } return count; } @@ -840,6 +881,8 @@ void diffcore_rename_extended(struct diff_options *options, info.setup = 0; assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); + if (dirs_removed && (break_idx || want_copies)) + BUG("dirs_removed incompatible with break/copy detection"); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -931,10 +974,17 @@ void diffcore_rename_extended(struct diff_options *options, remove_unneeded_paths_from_src(want_copies); trace2_region_leave("diff", "cull after exact", options->repo); + /* Preparation for basename-driven matching. */ + trace2_region_enter("diff", "dir rename setup", options->repo); + initialize_dir_rename_info(&info, + dirs_removed, dir_rename_count); + trace2_region_leave("diff", "dir rename setup", options->repo); + /* Utilize file basenames to quickly find renames. */ trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, - min_basename_score); + min_basename_score, + &info, dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); /* @@ -1020,18 +1070,15 @@ void diffcore_rename_extended(struct diff_options *options, /* cost matrix sorted by most to least similar pair */ STABLE_QSORT(mx, dst_cnt * NUM_CANDIDATE_PER_DST, score_compare); - rename_count += find_renames(mx, dst_cnt, minimum_score, 0); + rename_count += find_renames(mx, dst_cnt, minimum_score, 0, + &info, dirs_removed); if (want_copies) - rename_count += find_renames(mx, dst_cnt, minimum_score, 1); + rename_count += find_renames(mx, dst_cnt, minimum_score, 1, + &info, dirs_removed); free(mx); trace2_region_leave("diff", "inexact renames", options->repo); cleanup: - /* - * Now that renames have been computed, compute dir_rename_count */ - if (dirs_removed && dir_rename_count) - compute_dir_rename_counts(&info, dirs_removed, dir_rename_count); - /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ @@ -1103,6 +1150,7 @@ void diffcore_rename_extended(struct diff_options *options, if (rename_dst[i].filespec_to_free) free_filespec(rename_dst[i].filespec_to_free); + cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL); FREE_AND_NULL(rename_dst); rename_dst_nr = rename_dst_alloc = 0; FREE_AND_NULL(rename_src); From patchwork Tue Feb 23 23:44:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FF83C43381 for ; Tue, 23 Feb 2021 23:56:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E76E964E57 for ; Tue, 23 Feb 2021 23:56:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234396AbhBWXyK (ORCPT ); Tue, 23 Feb 2021 18:54:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233864AbhBWXrJ (ORCPT ); Tue, 23 Feb 2021 18:47:09 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D0AFC061797 for ; Tue, 23 Feb 2021 15:44:14 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id p3so288669wmc.2 for ; Tue, 23 Feb 2021 15:44:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=fVtGL9oxPIeXkpNd5TvKQjAWx6Z6VstTh0BJNl3lABw=; b=l5N9PfYh5xPLxA8MpjIUmizN/BCgqh1i11R7NaEBrI/FJiBTG/DBpQGOG0JqVGSZxA ecQQjxpYXmO6TiqkVinjofmy3F7OznG4Beevp5w1vdmKGKUakgct3ssiwcp7/UBeTpA0 psA/quQGX/uD/PzMJXkfxGZTMAHA1YppeT8v/jgNDWDD5rV9ogz7jkMeKRWYFEL9KyDZ kovBe8oNAGfSu/AELPTYqLD+7sslftv0dg3M/H8AKrGKUWD1bcmQvNfUrClQ8AR87q9V YM6C40wY+uw5qG+1xc/tN1NDzZO7kfCN7WvCjxgHhYuH5dd9FQGVkwfDs7DLezEC/z5c 3CAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=fVtGL9oxPIeXkpNd5TvKQjAWx6Z6VstTh0BJNl3lABw=; b=IlGNmGt7j2URWfCtDs18qpIJbEJujJ1j7gTLEY36EbY1etGrMlqrDklChe5BIgI8dT 1rLETc0rAZRj6r/7ofYpF52oYgy8BScAAlFvcqg2JZ5qW0TwFlUEFsHX+vWZOjETTOHq pFjqKmfu/6bqlERPhHbPV61awi9xBL4kx7nOAcyQQrEKvO4Wk0rY6v3fbRhSh4OEN0kA 627Vm0khe6KAOBJW0IKZQWiv57Mr+Xv9dahq2fiEl3msVv3Qpg57wa/07YSDGyAjx2Ea j9sGjFyoLwNA51A1C5nx+2IWYWwva3f4uaXRneGmHhQVZFWNTnfzJ+ixNbqpR2WkCdVK A4lQ== X-Gm-Message-State: AOAM532rT6Ug7w8ZAF5SfalqD7ybH/N47KzDLC8jiydd5EGC5f5lK6uv Hc6EjnKNsgeV0Y9xYp55KBnHXOTQahs= X-Google-Smtp-Source: ABdhPJxnTZR/n5LvMO7HnpA0gWJGJ2Sv5yo7CR4BT8LpSBDLZBbUdYan8COL5+ZeiSfF3qGgPZbwSw== X-Received: by 2002:a1c:a985:: with SMTP id s127mr998584wme.158.1614123852884; Tue, 23 Feb 2021 15:44:12 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s124sm155665wms.40.2021.02.23.15.44.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:12 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:03 +0000 Subject: [PATCH v2 06/10] diffcore-rename: add a mapping of destination names to their indices Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add an idx_map member to struct dir_rename_info, which tracks a mapping of the full filename to the index within rename_dst where that filename is found. We will later use this for quickly finding an array entry in rename_dst given the pathname. Signed-off-by: Elijah Newren --- diffcore-rename.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 489e9cb0871e..db569e4a0b0a 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -368,6 +368,7 @@ static int find_exact_renames(struct diff_options *options) } struct dir_rename_info { + struct strintmap idx_map; struct strmap *dir_rename_count; unsigned setup; }; @@ -509,10 +510,26 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, info->dir_rename_count = xmalloc(sizeof(*dir_rename_count)); strmap_init(info->dir_rename_count); } + strintmap_init_with_options(&info->idx_map, -1, NULL, 0); + /* + * Loop setting up both info->idx_map, and doing setup of + * info->dir_rename_count. + */ for (i = 0; i < rename_dst_nr; ++i) { /* - * Make dir_rename_count contain a map of a map: + * For non-renamed files, make idx_map contain mapping of + * filename -> index (index within rename_dst, that is) + */ + if (!rename_dst[i].is_rename) { + char *filename = rename_dst[i].p->two->path; + strintmap_set(&info->idx_map, filename, i); + continue; + } + + /* + * For everything else (i.e. renamed files), make + * dir_rename_count contain a map of a map: * old_directory -> {new_directory -> count} * In other words, for every pair look at the directories for * the old filename and the new filename and count how many @@ -546,6 +563,9 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, if (!info->setup) return; + /* idx_map */ + strintmap_clear(&info->idx_map); + if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); strmap_clear(info->dir_rename_count, 1); From patchwork Tue Feb 23 23:44:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52168C433DB for ; Wed, 24 Feb 2021 00:08:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 048C964EC3 for ; Wed, 24 Feb 2021 00:08:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234594AbhBXAAy (ORCPT ); Tue, 23 Feb 2021 19:00:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233875AbhBWXrJ (ORCPT ); Tue, 23 Feb 2021 18:47:09 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0F70C0617A7 for ; Tue, 23 Feb 2021 15:44:14 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id u125so268569wmg.4 for ; Tue, 23 Feb 2021 15:44:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=l2ofDQi8CbOpEuxER5ApmN4GoQTgWbuAVaO6UqD+chc=; b=TkqlqSXbBJ6n48K+uAxasaDkRmVfwsNFO+pioGixShctGtTCKfDNAxdzxo11nkEHh4 G8ObHMuPmS3jDzGv7Vh8VacVqTOwlTDV9b0uFuz54FWV+Ma3r6okeOBJCPhsrVKu8j32 V3dKLsOPa1G1lLXzg3qT8mF4TW4+ovkJW5CDNk5DVDLf3Qk5Yl+CuloG7adI70v6Qhvt bq05xCIAaw677C5SQ/XHb5aM5aCTg0z93+b2o6FWnYXcnyRNae/Nueo5EmCOO3jZHeNK ArG2Ti6u5cM0CQO6JoaHr7WYvGmAdWkFBGNNeN4Ol/7xnS+z17bwB3H/WP8yJUPUeFWd oN9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=l2ofDQi8CbOpEuxER5ApmN4GoQTgWbuAVaO6UqD+chc=; b=sjlZDsmyB4lHZvziFaPKgfBDsSmQxYeoewDL2DR95U8jdpTABzBbGaSizObfIAT5SA CgNnZCFOVvlxtgEHuaGKFCex4a/e+DYlIdYCuHbGV89A9LP7WCfGAuVXtmrPnP5AGOs2 LKo8PMoMmf65a/hXdPHn4frlrkvHLxu4jXOMdcsXScl718IJy64J1UzF2bMVSF605LGt mUlTNtzBhmdn2HZI+v1FF6oHcXDmmSq99TEHPEI5gHzH3xjD5CruMIymMt3hW9aUtvyf 2DBTV6h9MfSx9vrUoZleOtl5vqoITkLrFUd69f5oopDsLtf5V55ILLeMub/Y00WmVQtM tYOQ== X-Gm-Message-State: AOAM5303NM6c9Xob3ZBekJlMMK0okPU9q+IKb1nQleUVxB7J2wJ5cTpC w9KHTleLbGdJBRPmtK07S7OCvyHwv6g= X-Google-Smtp-Source: ABdhPJyyzwmyqxHApSnTqOxW1oJKY/Jaqa3M+DFLklKdQ27Dn93I9RgQ01CFEUBHJavulwaWwxpctw== X-Received: by 2002:a1c:e384:: with SMTP id a126mr1060114wmh.42.1614123853437; Tue, 23 Feb 2021 15:44:13 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p13sm353567wrj.52.2021.02.23.15.44.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:13 -0800 (PST) Message-Id: <4983a1c2f908f02bc8a47f883c31652723ffde51.1614123848.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:04 +0000 Subject: [PATCH v2 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren dir_rename_counts has a mapping of a mapping, in particular, it has old_dir => { new_dir => count } We want a simple mapping of old_dir => new_dir based on which new_dir had the highest count for a given old_dir. Introduce dir_rename_guess for this purpose. Signed-off-by: Elijah Newren --- diffcore-rename.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index db569e4a0b0a..d24f104aa81c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -369,6 +369,7 @@ static int find_exact_renames(struct diff_options *options) struct dir_rename_info { struct strintmap idx_map; + struct strmap dir_rename_guess; struct strmap *dir_rename_count; unsigned setup; }; @@ -381,6 +382,24 @@ static void dirname_munge(char *filename) *slash = '\0'; } +static const char *get_highest_rename_path(struct strintmap *counts) +{ + int highest_count = 0; + const char *highest_destination_dir = NULL; + struct hashmap_iter iter; + struct strmap_entry *entry; + + strintmap_for_each_entry(counts, &iter, entry) { + const char *destination_dir = entry->key; + intptr_t count = (intptr_t)entry->value; + if (count > highest_count) { + highest_count = count; + highest_destination_dir = destination_dir; + } + } + return highest_destination_dir; +} + static void increment_count(struct dir_rename_info *info, char *old_dir, char *new_dir) @@ -498,6 +517,8 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strset *dirs_removed, struct strmap *dir_rename_count) { + struct hashmap_iter iter; + struct strmap_entry *entry; int i; info->setup = 0; @@ -511,6 +532,7 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strmap_init(info->dir_rename_count); } strintmap_init_with_options(&info->idx_map, -1, NULL, 0); + strmap_init_with_options(&info->dir_rename_guess, NULL, 0); /* * Loop setting up both info->idx_map, and doing setup of @@ -539,6 +561,23 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, rename_dst[i].p->one->path, rename_dst[i].p->two->path); } + + /* + * Now we collapse + * dir_rename_count: old_directory -> {new_directory -> count} + * down to + * dir_rename_guess: old_directory -> best_new_directory + * where best_new_directory is the one with the highest count. + */ + strmap_for_each_entry(info->dir_rename_count, &iter, entry) { + /* entry->key is source_dir */ + struct strintmap *counts = entry->value; + char *best_newdir; + + best_newdir = xstrdup(get_highest_rename_path(counts)); + strmap_put(&info->dir_rename_guess, entry->key, + best_newdir); + } } void partial_clear_dir_rename_count(struct strmap *dir_rename_count) @@ -566,6 +605,9 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, /* idx_map */ strintmap_clear(&info->idx_map); + /* dir_rename_guess */ + strmap_clear(&info->dir_rename_guess, 1); + if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); strmap_clear(info->dir_rename_count, 1); From patchwork Tue Feb 23 23:44:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDA3EC433E0 for ; Tue, 23 Feb 2021 23:56:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B386364EBA for ; Tue, 23 Feb 2021 23:56:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234150AbhBWXyB (ORCPT ); Tue, 23 Feb 2021 18:54:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233888AbhBWXrJ (ORCPT ); Tue, 23 Feb 2021 18:47:09 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E028C0617A9 for ; Tue, 23 Feb 2021 15:44:15 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id d11so121227wrj.7 for ; Tue, 23 Feb 2021 15:44:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=d8rKxuBLiSni/HR5WuT2zLq1YB82P0UW4xQ3B2kXB28=; b=aPoWH7CRUbRyx6sRfZjnnMEjriNmV9tNsUw2mb1CVSEYHfoLnU83gNJSbQwAIKepBq rutkhk5XM5Lt8C/3TheLBeORcz6Vzd6nzi7TT6YC+QemkWs2nK4wOiLU9KctUYOVDA32 YgIUThH9VIlqEBVd/BxdQh1+/lJQtPBaietPbZL4sveVro/RanQyWC3FuGYDX8OGnfAA XDWhageZo9k2zV2EY8Tu2JcGS+z+h/qQiKh+Z8igvEH6EDFaNOe3DfgwV8bhJwBLvTj/ Xn5KhnztnINDB75JBXtsjFJaY7hRvHiUI31k9V7uEWp8Er3AoA4VPPQ/sJVO6YGTNh0o f9qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=d8rKxuBLiSni/HR5WuT2zLq1YB82P0UW4xQ3B2kXB28=; b=AITqSDzqiNm0cgKE5MHvIGQUJXbmM1eRTUzOBXSngfMWYlIH/lM5/e4DHLSHBw3p7s W2Th3s67DyIHg0YcoejrDSJEl2e7/XatWHlRLYBIvgI9+ajmTLcbLBOlsBS7qnjBi9JH Sl2qIP2QGEST7eKRGVZncAhHRPPmHgW4FwTvQMdyTai6VVKzxmWRqqtxEfW1qlqrBfmK ejbtJY0O3+Kit7CHpimVh+Tc5f8a9v/daXE+tL7KbaxXAfCt/7lMrlDV6dVfZBAiDT86 F/vkuBJl8/Osdl4kqvFkKEySm6FbXhaZM1D1hp6Et2x3VtwR049A5NHSgxh8KQeNHRcD csVg== X-Gm-Message-State: AOAM5333ppsJyoQgJeqLw7qIyKj8lshMSRUuAbiHqJyN9KGTbrziiWVB vREk7QC7hQclK7tm9Pp5MH9bFhk0m20= X-Google-Smtp-Source: ABdhPJxEyfd/4VTUYhlXoZysw8sN2/Eu2JsYpA4BjtoWQs8IhcjCsB9CBitkpLa7NMafL3joQYa2gg== X-Received: by 2002:adf:f4d1:: with SMTP id h17mr10451562wrp.350.1614123854159; Tue, 23 Feb 2021 15:44:14 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p3sm352221wro.55.2021.02.23.15.44.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:13 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:05 +0000 Subject: [PATCH v2 08/10] diffcore-rename: add a new idx_possible_rename function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren find_basename_matches() is great when both the remaining set of possible rename sources and the remaining set of possible rename destinations have exactly one file each with a given basename. It allows us to match up files that have been moved to different directories without changing filenames. When basenames are not unique, though, we want to be able to guess which directories the source files have been moved to. Since this is the job of directory rename detection, we employ it. However, since it is a directory rename detection idea, we also limit it to cases where we know there could have been a directory rename, i.e. where the source directory has been removed. This has to be signalled by dirs_removed being non-NULL and containing an entry for the relevant directory. Since merge-ort.c is the only caller that currently does so, this optimization is only effective for merge-ort right now. In the future, this condition could be reconsidered or we could modify other callers to pass the necessary strset. Anyway, that's a lot of background so that we can actually describe the new function. Add an idx_possible_rename() function which combines the recently added dir_rename_guess and idx_map fields to provide the index within rename_dst of a potential match for a given file. Future commits will add checks after calling this function to compare the resulting 'likely rename' candidates to see if the two files meet the elevated min_basename_score threshold for marking them as actual renames. Signed-off-by: Elijah Newren --- diffcore-rename.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index d24f104aa81c..1e4a56adde2c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -374,6 +374,12 @@ struct dir_rename_info { unsigned setup; }; +static char *get_dirname(const char *filename) +{ + char *slash = strrchr(filename, '/'); + return slash ? xstrndup(filename, slash-filename) : xstrdup(""); +} + static void dirname_munge(char *filename) { char *slash = strrchr(filename, '/'); @@ -651,6 +657,81 @@ static const char *get_basename(const char *filename) return base ? base + 1 : filename; } +MAYBE_UNUSED +static int idx_possible_rename(char *filename, struct dir_rename_info *info) +{ + /* + * Our comparison of files with the same basename (see + * find_basename_matches() below), is only helpful when after exact + * rename detection we have exactly one file with a given basename + * among the rename sources and also only exactly one file with + * that basename among the rename destinations. When we have + * multiple files with the same basename in either set, we do not + * know which to compare against. However, there are some + * filenames that occur in large numbers (particularly + * build-related filenames such as 'Makefile', '.gitignore', or + * 'build.gradle' that potentially exist within every single + * subdirectory), and for performance we want to be able to quickly + * find renames for these files too. + * + * The reason basename comparisons are a useful heuristic was that it + * is common for people to move files across directories while keeping + * their filename the same. If we had a way of determining or even + * making a good educated guess about which directory these non-unique + * basename files had moved the file to, we could check it. + * Luckily... + * + * When an entire directory is in fact renamed, we have two factors + * helping us out: + * (a) the original directory disappeared giving us a hint + * about when we can apply an extra heuristic. + * (a) we often have several files within that directory and + * subdirectories that are renamed without changes + * So, rules for a heuristic: + * (0) If there basename matches are non-unique (the condition under + * which this function is called) AND + * (1) the directory in which the file was found has disappeared + * (i.e. dirs_removed is non-NULL and has a relevant entry) THEN + * (2) use exact renames of files within the directory to determine + * where the directory is likely to have been renamed to. IF + * there is at least one exact rename from within that + * directory, we can proceed. + * (3) If there are multiple places the directory could have been + * renamed to based on exact renames, ignore all but one of them. + * Just use the destination with the most renames going to it. + * (4) Check if applying that directory rename to the original file + * would result in a destination filename that is in the + * potential rename set. If so, return the index of the + * destination file (the index within rename_dst). + * (5) Compare the original file and returned destination for + * similarity, and if they are sufficiently similar, record the + * rename. + * + * This function, idx_possible_rename(), is only responsible for (4). + * The conditions/steps in (1)-(3) are handled via setting up + * dir_rename_count and dir_rename_guess in + * initialize_dir_rename_info(). Steps (0) and (5) are handled by + * the caller of this function. + */ + char *old_dir, *new_dir, *new_path; + int idx; + + if (!info->setup) + return -1; + + old_dir = get_dirname(filename); + new_dir = strmap_get(&info->dir_rename_guess, old_dir); + free(old_dir); + if (!new_dir) + return -1; + + new_path = xstrfmt("%s/%s", new_dir, get_basename(filename)); + + idx = strintmap_get(&info->idx_map, new_path); + free(new_path); + return idx; +} + static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, From patchwork Tue Feb 23 23:44:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4405BC433E0 for ; Wed, 24 Feb 2021 00:00:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E73C64E7C for ; Wed, 24 Feb 2021 00:00:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234188AbhBXAAE (ORCPT ); Tue, 23 Feb 2021 19:00:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232008AbhBWXro (ORCPT ); Tue, 23 Feb 2021 18:47:44 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D482CC0617AA for ; Tue, 23 Feb 2021 15:44:15 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id v15so130739wrx.4 for ; Tue, 23 Feb 2021 15:44:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Paoo+pQzJ8Xi5K9u8+x+UaLgBVJk1KVJ1L2MCTL/81Y=; b=MDRcqMAglAmWFssoaNKL2Jkp80c8z5aSWE6VpwQEqhXfMR+Fx2xU9UOnpRdoUXJfhP Xw1HQYsfuLaACFIfGUPlXa95joS5FNAbPsRznE2otZFvBGapepL5ZzNs6LjXCJviJJGb CX06wLwItSFgyUBBA5MsJ3KO8h2Ozv0M0pHXZgakPcWOKvPqzUduJm/CQJgwo80z9WRL XGNE45TUWC9kgoljeN3wuiw1dZCLo9UwkxZV6ziVUJWWSOUy6nOaRM2+htbNMhfnQLLM NLX4oS6lsFJhYOKLWjfquBRU+NjerCgfqZsC2sS+Kb+JAiq9zuXZ/hQM3rNY2lyH/c3V 4TCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Paoo+pQzJ8Xi5K9u8+x+UaLgBVJk1KVJ1L2MCTL/81Y=; b=dmuQW3c6TS/ex0MLlBMD5T+HRCyZcKJh0fENqqqAtrpdZusJCFCgxKE2AWfZlZ5unx OFzlf1dciQwriWxtRljnO16tvymSfODhvgkpGYtMxWJvY8Y0z32s43k8dKN1CbOOLUpn d7uavhcYUPj6e4PqFFdH1NxhDhaZL9pSN3H2jL8diT+SOxlY18mU+SN0r9wM5eAyz5D1 Eqf4e9ZKWRa/43b2OD5a5UrU0a1fgG/uDIy90VEg5Pa60UYmLROqx9QrwnyjLu4mjgUo 9mSfIy6uj7TEIbY/IcEakxT1tZ+FBFi7rA6HdmIwTwtydq8gn3DEPIh/L45Qt45yITTZ v9nw== X-Gm-Message-State: AOAM531oPGpCp+A3HVIe4I2bypDThs9ZNByB/okRVsOjzm0Y7OaYdV8I wMi+9rhoQL8uSE4O0UDtVVXRJe9EJ9M= X-Google-Smtp-Source: ABdhPJwEV/CYS0SaBqBBPOg/u1dXQDnDWYSzZEW9qENLgybP32LFEyIW5RhuP47kqM9x0qgt81CUvA== X-Received: by 2002:adf:c14a:: with SMTP id w10mr3123648wre.282.1614123854623; Tue, 23 Feb 2021 15:44:14 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d20sm395437wrc.12.2021.02.23.15.44.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:14 -0800 (PST) Message-Id: <4e095ea7c4390cb47828bbba50af876249983870.1614123848.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:06 +0000 Subject: [PATCH v2 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren We are using dir_rename_counts to count the number of other directories that files within a directory moved to. We only need this information for directories that disappeared, though, so we can return early from update_dir_rename_counts() for other paths. While dirs_removed provides the relevant information for us right now, we introduce a new info->relevant_source_dirs parameter because future optimizations will want to change how things are called somewhat. Signed-off-by: Elijah Newren --- diffcore-rename.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index 1e4a56adde2c..5de4497e04fa 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -371,6 +371,7 @@ struct dir_rename_info { struct strintmap idx_map; struct strmap dir_rename_guess; struct strmap *dir_rename_count; + struct strset *relevant_source_dirs; unsigned setup; }; @@ -460,7 +461,13 @@ static void update_dir_rename_counts(struct dir_rename_info *info, return; while (1) { + /* Get old_dir, skip if its directory isn't relevant. */ dirname_munge(old_dir); + if (info->relevant_source_dirs && + !strset_contains(info->relevant_source_dirs, old_dir)) + break; + + /* Get new_dir */ dirname_munge(new_dir); /* @@ -540,6 +547,9 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strintmap_init_with_options(&info->idx_map, -1, NULL, 0); strmap_init_with_options(&info->dir_rename_guess, NULL, 0); + /* Setup info->relevant_source_dirs */ + info->relevant_source_dirs = dirs_removed; + /* * Loop setting up both info->idx_map, and doing setup of * info->dir_rename_count. From patchwork Tue Feb 23 23:44:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12101135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74565C433DB for ; Wed, 24 Feb 2021 00:00:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4014364EC9 for ; Wed, 24 Feb 2021 00:00:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234170AbhBWX7G (ORCPT ); Tue, 23 Feb 2021 18:59:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233609AbhBWXro (ORCPT ); Tue, 23 Feb 2021 18:47:44 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62EBEC0617AB for ; Tue, 23 Feb 2021 15:44:16 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id v15so130749wrx.4 for ; Tue, 23 Feb 2021 15:44:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=l/z2OvLd3svp846r/v6fGN490jMZ+a4VTznQAsWEl60=; b=kFF9C2j5ITKsKvA6ZZOcbgQb+VYSUuG9zQzn3BQBxb+pWSjBmf0Y/FWpNRpILahEBc fMJdG4kOCdAIV3ts+8sImmnTOSDEF6TPEqknMHUd7BP4q2Lh6TRayC5mH3318m3zwIAz UJqYJQZJ5P6MB7D/7YsLQWqGEzyWUuJeEDr29M7r2G8xif8Vk57NZvAk9UgYUCu9/shP YIOaCAPjnurTMEfJFRozKvm+SaU3J1MNAGe15AU6VE71aJ3mekornicZRUOFzC14M/7M tCsyRIt4j1MbY11Vkl5WZMEgYNgFJwAl1p+bwaQkIHyhS8PUkaATyy7t3741XcSs4iRd Wc5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=l/z2OvLd3svp846r/v6fGN490jMZ+a4VTznQAsWEl60=; b=ntfigzS2y4JHK52nlJw8iLWhwR2pPAnN6mKyxMbrccGmtD3Sj0lEPXB8klIDnGmPUe SEO3CgjDIbCySX9kaf4efdswcDPBsWn5vhQnOrm0p5y7ZqlFs5hrTbmlxqb3QgjLZk7w VVljJQ/My8eyGYOnwCG1zVWoWgQskvnEtI53qwlCzp8xW+xY97eReIgduFMhsm/+Qv4N d9tRT64bWESDdbxLjyS18HGgQbB54PaXFC8+t/5ccZAblCsrKzrGediAPYIM0ptjVaU2 YzE3oaZam/RAzW6bkMwvcQgd8hBXUxg4WJt0qCkyolFXeCRtvI+EDrYaP5x0YoCXA07P LjRQ== X-Gm-Message-State: AOAM530d4cnWmFMBm0wvNgSRV79EVurmbKm8T1Lvu0x2rBV8ckeK2vf/ hIpB5Fpd6r94otBBNogZAquQJz2ULIQ= X-Google-Smtp-Source: ABdhPJzoXCD6kKA7p0MMq8EqHfkyMqoTXve7HxkOwmWBDrVAyXpgJCF0taIQvEKBItOuE8xxQEfnXQ== X-Received: by 2002:a05:6000:1542:: with SMTP id 2mr29045767wry.356.1614123855212; Tue, 23 Feb 2021 15:44:15 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v10sm366749wrq.22.2021.02.23.15.44.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Feb 2021 15:44:14 -0800 (PST) Message-Id: <805c101cfd849db3a5defb30775c7abbfec99f68.1614123848.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 23 Feb 2021 23:44:07 +0000 Subject: [PATCH v2 10/10] diffcore-rename: use directory rename guided basename comparisons MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Hook the work from the last several patches together so that when basenames in the sets of possible remaining rename sources or destinations aren't unique, we can guess which directory source files were renamed into. When that guess gives us a pairing of files, and those files are sufficiently similar, we record the two files as a rename and remove them from the large matrix of comparisons for inexact rename detection. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 12.775 s ± 0.062 s 12.596 s ± 0.061 s mega-renames: 188.754 s ± 0.284 s 130.465 s ± 0.259 s just-one-mega: 5.599 s ± 0.019 s 3.958 s ± 0.010 s Signed-off-by: Elijah Newren --- Documentation/gitdiffcore.txt | 2 +- diffcore-rename.c | 32 +++++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt index 80fcf9542441..8673a5c5b2f2 100644 --- a/Documentation/gitdiffcore.txt +++ b/Documentation/gitdiffcore.txt @@ -186,7 +186,7 @@ mark a file pair as a rename and stop considering other candidates for better matches. At most, one comparison is done per file in this preliminary pass; so if there are several remaining ext.txt files throughout the directory hierarchy after exact rename detection, this -preliminary step will be skipped for those files. +preliminary step may be skipped for those files. Note. When the "-C" option is used with `--find-copies-harder` option, 'git diff-{asterisk}' commands feed unmodified filepairs to diff --git a/diffcore-rename.c b/diffcore-rename.c index 5de4497e04fa..70a484b9b63e 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -667,7 +667,6 @@ static const char *get_basename(const char *filename) return base ? base + 1 : filename; } -MAYBE_UNUSED static int idx_possible_rename(char *filename, struct dir_rename_info *info) { /* @@ -780,8 +779,6 @@ static int find_basename_matches(struct diff_options *options, int i, renames = 0; struct strintmap sources; struct strintmap dests; - struct hashmap_iter iter; - struct strmap_entry *entry; /* * The prefeteching stuff wants to know if it can skip prefetching @@ -831,17 +828,34 @@ static int find_basename_matches(struct diff_options *options, } /* Now look for basename matchups and do similarity estimation */ - strintmap_for_each_entry(&sources, &iter, entry) { - const char *base = entry->key; - intptr_t src_index = (intptr_t)entry->value; + for (i = 0; i < rename_src_nr; ++i) { + char *filename = rename_src[i].p->one->path; + const char *base = NULL; + intptr_t src_index; intptr_t dst_index; - if (src_index == -1) - continue; - if (0 <= (dst_index = strintmap_get(&dests, base))) { + /* Is this basename unique among remaining sources? */ + base = get_basename(filename); + src_index = strintmap_get(&sources, base); + assert(src_index == -1 || src_index == i); + + if (strintmap_contains(&dests, base)) { struct diff_filespec *one, *two; int score; + /* Find a matching destination, if possible */ + dst_index = strintmap_get(&dests, base); + if (src_index == -1 || dst_index == -1) { + src_index = i; + dst_index = idx_possible_rename(filename, info); + } + if (dst_index == -1) + continue; + + /* Ignore this dest if already used in a rename */ + if (rename_dst[dst_index].is_rename) + continue; /* already used previously */ + /* Estimate the similarity */ one = rename_src[src_index].p->one; two = rename_dst[dst_index].p->two;