From patchwork Sat Feb 27 00:30:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F13D0C433E9 for ; Sat, 27 Feb 2021 00:32:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B310C64E33 for ; Sat, 27 Feb 2021 00:32:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230022AbhB0Abz (ORCPT ); Fri, 26 Feb 2021 19:31:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229915AbhB0Abf (ORCPT ); Fri, 26 Feb 2021 19:31:35 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF8F0C061756 for ; Fri, 26 Feb 2021 16:30:52 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id h98so10193119wrh.11 for ; Fri, 26 Feb 2021 16:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=fbn6j9yecRzpkAPOum5e+vLczqyAc3Qa+nyqyPts5zY=; b=cCHhYDInrTUVuEeTVqeiRO91lfeVsRqHy+bmYjrZuxaEAloIG7hjKvi98ug6sCyBch 6ef28+tAj5z50AO87EJAEITI9fxO+jqHrAC4a5QmT8ds1GRdfvVZjvZJLN3hykL3eSLy nVFT3IoJk+fIl6YjICqZsSLAY9e70ap/tCr5Sxp2K/NBmNCOkAiPZse+KWo6EQXEw8C+ BZj9IzsOLr0ekF+OeTGYgSrl9IJtUIJGvF5Ix+Ms/SzvAeTBBbvQSkLcoHb9Fe6cLgbw R45dAS69kOJutyrFI2Y9b3IQKfl4iQLrSG0NPvGn1D2ZmAqs2DfbzzlF6JdE+uNGJYVP HQIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=fbn6j9yecRzpkAPOum5e+vLczqyAc3Qa+nyqyPts5zY=; b=Dc67ZrOjt8B1ehKTMS0sxhT+vLXjJso/7dJVDSOc6JcfiVcHhZYYieTj02g1tANJVH qIiGp676JODhtmetXIl20ya9A/2Fll8ldB9D9iVxZ9zkjcP96Rr1Ry7IEwTHvKNy/hgM G0CyskyRZimbNeMoI1Q6BdVUzq4EYLdHuK6O7xubbGDDInuezFSo5/ESmFyGhj/0R159 wTdrXZ+GlTMuBktDW40KL/Thk6kxz0bXF6UatJ1qAjpMeoicTh9G4svFQuoVVlA38R6a aqbnLdAYpxcGAfxk7/JmVAPCWym+d2fnxHest9G6SlaxDMHbZz81XKmw34Rt8q1xI7up HNjw== X-Gm-Message-State: AOAM53209JptZQxWXh/n2XXekUeJ2Oi1V2tWF25RYh8R+g/7DaFxEzWR Pr/DcOozT0OMB44ghMLQigaXyxdXCws= X-Google-Smtp-Source: ABdhPJwgKtDygdD3ZEUyrbxyNpEg7rjUmNBkcGLeStbz+URVDe3ukQNvIB8aIfTf25gHGjPQjU/5/g== X-Received: by 2002:a5d:66c8:: with SMTP id k8mr5765571wrw.163.1614385851466; Fri, 26 Feb 2021 16:30:51 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g1sm13202560wmh.9.2021.02.26.16.30.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:51 -0800 (PST) Message-Id: <823d07532e0077b24de0a4f2e145cc47f59d1b36.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:39 +0000 Subject: [PATCH v4 01/10] diffcore-rename: use directory rename guided basename comparisons Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren A previous commit noted that it is very common for people to move files across directories while keeping their filename the same. The last few commits took advantage of this and showed that we can accelerate rename detection significantly using basenames; since files with the same basename serve as likely rename candidates, we can check those first and remove them from the rename candidate pool if they are sufficiently similar. Unfortunately, the previous optimization was limited by the fact that the remaining basenames after exact rename detection are not always unique. Many repositories have hundreds of build files with the same name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have hundreds of source files with the same name. (For example, the linux kernel has 100 setup.c, 87 irq.c, and 112 core.c files. A repository at $DAYJOB has a lot of ObjectFactory.java and Plugin.java files). For these files with non-unique basenames, we are faced with the task of attempting to determine or guess which directory they may have been relocated to. Such a task is precisely the job of directory rename detection. However, there are two catches: (1) the directory rename detection code has traditionally been part of the merge machinery rather than diffcore-rename.c, and (2) directory rename detection currently runs after regular rename detection is complete. The 1st catch is just an implementation issue that can be overcome by some code shuffling. The 2nd requires us to add a further approximation: we only have access to exact renames at this point, so we need to do directory rename detection based on just exact renames. In some cases we won't have exact renames, in which case this extra optimization won't apply. We also choose to not apply the optimization unless we know that the underlying directory was removed, which will require extra data to be passed in to diffcore_rename_extended(). Also, even if we get a prediction about which directory a file may have relocated to, we will still need to check to see if there is a file in the predicted directory, and then compare the two files to see if they meet the higher min_basename_score threshold required for marking the two files as renames. This commit introduces an idx_possible_rename() function which will do this directory rename detection for us and give us the index within rename_dst of the resulting filename. For now, this function is hardcoded to return -1 (not found) and just hooks up how its results would be used once we have a more complete implementation in place. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- Documentation/gitdiffcore.txt | 2 +- diffcore-rename.c | 42 ++++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 9 deletions(-) diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt index 80fcf9542441..8673a5c5b2f2 100644 --- a/Documentation/gitdiffcore.txt +++ b/Documentation/gitdiffcore.txt @@ -186,7 +186,7 @@ mark a file pair as a rename and stop considering other candidates for better matches. At most, one comparison is done per file in this preliminary pass; so if there are several remaining ext.txt files throughout the directory hierarchy after exact rename detection, this -preliminary step will be skipped for those files. +preliminary step may be skipped for those files. Note. When the "-C" option is used with `--find-copies-harder` option, 'git diff-{asterisk}' commands feed unmodified filepairs to diff --git a/diffcore-rename.c b/diffcore-rename.c index 41558185ae1d..b3055683bac2 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -379,6 +379,12 @@ static const char *get_basename(const char *filename) return base ? base + 1 : filename; } +static int idx_possible_rename(char *filename) +{ + /* Unconditionally return -1, "not found", for now */ + return -1; +} + static int find_basename_matches(struct diff_options *options, int minimum_score) { @@ -415,8 +421,6 @@ static int find_basename_matches(struct diff_options *options, int i, renames = 0; struct strintmap sources; struct strintmap dests; - struct hashmap_iter iter; - struct strmap_entry *entry; /* * The prefeteching stuff wants to know if it can skip prefetching @@ -466,17 +470,39 @@ static int find_basename_matches(struct diff_options *options, } /* Now look for basename matchups and do similarity estimation */ - strintmap_for_each_entry(&sources, &iter, entry) { - const char *base = entry->key; - intptr_t src_index = (intptr_t)entry->value; + for (i = 0; i < rename_src_nr; ++i) { + char *filename = rename_src[i].p->one->path; + const char *base = NULL; + intptr_t src_index; intptr_t dst_index; - if (src_index == -1) - continue; - if (0 <= (dst_index = strintmap_get(&dests, base))) { + /* + * If the basename is unique among remaining sources, then + * src_index will equal 'i' and we can attempt to match it + * to a unique basename in the destinations. Otherwise, + * use directory rename heuristics, if possible. + */ + base = get_basename(filename); + src_index = strintmap_get(&sources, base); + assert(src_index == -1 || src_index == i); + + if (strintmap_contains(&dests, base)) { struct diff_filespec *one, *two; int score; + /* Find a matching destination, if possible */ + dst_index = strintmap_get(&dests, base); + if (src_index == -1 || dst_index == -1) { + src_index = i; + dst_index = idx_possible_rename(filename); + } + if (dst_index == -1) + continue; + + /* Ignore this dest if already used in a rename */ + if (rename_dst[dst_index].is_rename) + continue; /* already used previously */ + /* Estimate the similarity */ one = rename_src[src_index].p->one; two = rename_dst[dst_index].p->two; From patchwork Sat Feb 27 00:30:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DF56C433DB for ; Sat, 27 Feb 2021 00:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 26C9E64E33 for ; Sat, 27 Feb 2021 00:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230006AbhB0Abj (ORCPT ); Fri, 26 Feb 2021 19:31:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229949AbhB0Abf (ORCPT ); Fri, 26 Feb 2021 19:31:35 -0500 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62993C061786 for ; Fri, 26 Feb 2021 16:30:53 -0800 (PST) Received: by mail-wm1-x336.google.com with SMTP id m1so9179163wml.2 for ; Fri, 26 Feb 2021 16:30:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=YH98+MPTpshmL/MR1C63NS+khcjsnv9sR7WApFpBRiM=; b=fQUs9uS5Qg1wITRRNSVbDZGzUg6AFm9QBEco7U0xpVu21a2sdFugrsDUCqblGBj+fa FKkP/dTefR7u7lUNN5r37wAB8gwxIe4+iyfekFfbjBP9Jzvy3hTwhshcy4WE9MaJ+Fs+ j58sdR8uNydXRyNgP1URLDVCfBzxJ9cb9ZGKCy+7b8lk58QvvoonHw3VDArKIPsBDcR8 38IxEE/fXyY8ZQOhyGKAYHhFNxNtGL5NK/seHCddvRxRVfQR2tl+MZGkwNQWVuTgIz2Q F9qiyZ1bM4+VIfn0OEgoZr5lsaRllmtmDb53HdST5mAQQEIsB8QEzxragmoTlWf590Ok GNuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=YH98+MPTpshmL/MR1C63NS+khcjsnv9sR7WApFpBRiM=; b=WLMuAi1F1OKitYOHcexLlHYuchWVlgFBwBIN/k1JhAYlMJLLCcA07sozHGBqfq+vAn xv2Mw5IBka2HT2UbGOFYiELrszX24bxI07SRr1JxF7wvCuXiJaK+gSCJw+KdV1ia/rLP tgo3r+2OqbH1sHkILcAI+fe0G452z1H7/EojlTA/JJdjEwDAdpZl2U03dc1svzSSz8wv 1WDzIkjtMbZdPVCDy0jSG2QhqSB07gpkNxTv1ZaHUOSb104tuZ4JYXWLKrPh1B+WjBF/ T6vtU2A3cN5DrzdGhs2u2xPAGgL2UNvjFgraC+lVHaroqfPsLmhAWVrxWsMTf7Iq0CVX ebjQ== X-Gm-Message-State: AOAM5320xvxHsn2U88RzNXL9/jfOuaij7A9r+srGMvljtJdKLjG9jT7O bKWYHPw01FHjRda2zJ/5nBDfixvIQtI= X-Google-Smtp-Source: ABdhPJyOBehR2QOZW9v3cmvhSfPYIVZ6iC49DhvUmGJ9cxRz6oCz0Vt9V5brD6Ia8Px5qqj2HzKbGw== X-Received: by 2002:a7b:c412:: with SMTP id k18mr4981307wmi.37.1614385852074; Fri, 26 Feb 2021 16:30:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w4sm13257574wmc.13.2021.02.26.16.30.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:51 -0800 (PST) Message-Id: <2dde621d7de596d7aa0bb31245b04683de2fa3d9.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:40 +0000 Subject: [PATCH v4 02/10] diffcore-rename: provide basic implementation of idx_possible_rename() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add a new struct dir_rename_info with various values we need inside our idx_possible_rename() function introduced in the previous commit. Add a basic implementation for this function showing how we plan to use the variables, but which will just return early with a value of -1 (not found) when those variables are not set up. Future commits will do the work necessary to set up those other variables so that idx_possible_rename() does not always return -1. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 100 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 94 insertions(+), 6 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index b3055683bac2..edb0effb6ef4 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -367,6 +367,19 @@ static int find_exact_renames(struct diff_options *options) return renames; } +struct dir_rename_info { + struct strintmap idx_map; + struct strmap dir_rename_guess; + struct strmap *dir_rename_count; + unsigned setup; +}; + +static char *get_dirname(const char *filename) +{ + char *slash = strrchr(filename, '/'); + return slash ? xstrndup(filename, slash - filename) : xstrdup(""); +} + static const char *get_basename(const char *filename) { /* @@ -379,14 +392,86 @@ static const char *get_basename(const char *filename) return base ? base + 1 : filename; } -static int idx_possible_rename(char *filename) +static int idx_possible_rename(char *filename, struct dir_rename_info *info) { - /* Unconditionally return -1, "not found", for now */ - return -1; + /* + * Our comparison of files with the same basename (see + * find_basename_matches() below), is only helpful when after exact + * rename detection we have exactly one file with a given basename + * among the rename sources and also only exactly one file with + * that basename among the rename destinations. When we have + * multiple files with the same basename in either set, we do not + * know which to compare against. However, there are some + * filenames that occur in large numbers (particularly + * build-related filenames such as 'Makefile', '.gitignore', or + * 'build.gradle' that potentially exist within every single + * subdirectory), and for performance we want to be able to quickly + * find renames for these files too. + * + * The reason basename comparisons are a useful heuristic was that it + * is common for people to move files across directories while keeping + * their filename the same. If we had a way of determining or even + * making a good educated guess about which directory these non-unique + * basename files had moved the file to, we could check it. + * Luckily... + * + * When an entire directory is in fact renamed, we have two factors + * helping us out: + * (a) the original directory disappeared giving us a hint + * about when we can apply an extra heuristic. + * (a) we often have several files within that directory and + * subdirectories that are renamed without changes + * So, rules for a heuristic: + * (0) If there basename matches are non-unique (the condition under + * which this function is called) AND + * (1) the directory in which the file was found has disappeared + * (i.e. dirs_removed is non-NULL and has a relevant entry) THEN + * (2) use exact renames of files within the directory to determine + * where the directory is likely to have been renamed to. IF + * there is at least one exact rename from within that + * directory, we can proceed. + * (3) If there are multiple places the directory could have been + * renamed to based on exact renames, ignore all but one of them. + * Just use the destination with the most renames going to it. + * (4) Check if applying that directory rename to the original file + * would result in a destination filename that is in the + * potential rename set. If so, return the index of the + * destination file (the index within rename_dst). + * (5) Compare the original file and returned destination for + * similarity, and if they are sufficiently similar, record the + * rename. + * + * This function, idx_possible_rename(), is only responsible for (4). + * The conditions/steps in (1)-(3) will be handled via setting up + * dir_rename_count and dir_rename_guess in a future + * initialize_dir_rename_info() function. Steps (0) and (5) are + * handled by the caller of this function. + */ + char *old_dir, *new_dir; + struct strbuf new_path = STRBUF_INIT; + int idx; + + if (!info->setup) + return -1; + + old_dir = get_dirname(filename); + new_dir = strmap_get(&info->dir_rename_guess, old_dir); + free(old_dir); + if (!new_dir) + return -1; + + strbuf_addstr(&new_path, new_dir); + strbuf_addch(&new_path, '/'); + strbuf_addstr(&new_path, get_basename(filename)); + + idx = strintmap_get(&info->idx_map, new_path.buf); + strbuf_release(&new_path); + return idx; } static int find_basename_matches(struct diff_options *options, - int minimum_score) + int minimum_score, + struct dir_rename_info *info) { /* * When I checked in early 2020, over 76% of file renames in linux @@ -494,7 +579,7 @@ static int find_basename_matches(struct diff_options *options, dst_index = strintmap_get(&dests, base); if (src_index == -1 || dst_index == -1) { src_index = i; - dst_index = idx_possible_rename(filename); + dst_index = idx_possible_rename(filename, info); } if (dst_index == -1) continue; @@ -677,8 +762,10 @@ void diffcore_rename(struct diff_options *options) int num_destinations, dst_cnt; int num_sources, want_copies; struct progress *progress = NULL; + struct dir_rename_info info; trace2_region_enter("diff", "setup", options->repo); + info.setup = 0; want_copies = (detect_rename == DIFF_DETECT_COPY); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -774,7 +861,8 @@ void diffcore_rename(struct diff_options *options) /* Utilize file basenames to quickly find renames. */ trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, - min_basename_score); + min_basename_score, + &info); trace2_region_leave("diff", "basename matches", options->repo); /* From patchwork Sat Feb 27 00:30:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8703CC433E0 for ; Sat, 27 Feb 2021 00:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 49FC464DF5 for ; Sat, 27 Feb 2021 00:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230014AbhB0Abm (ORCPT ); Fri, 26 Feb 2021 19:31:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbhB0Abf (ORCPT ); Fri, 26 Feb 2021 19:31:35 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8D4CC061788 for ; Fri, 26 Feb 2021 16:30:53 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id a18so2002731wrc.13 for ; Fri, 26 Feb 2021 16:30:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XQkS37T12dZlfsAoYAP4aoO+V29JAFmm6r72q1zdEEQ=; b=DneNLHwwdNrdwzfA34Kwb0WsTgxypLG9FSGyanqSuhGYUktEhftWHZOJ+dZCO4B/+x dmV9r4SKEN9qH508/WVScXAKCDjkF4wZLpkDQCfugvDIcJtfZYsnQghld46N8v981CmH s84DE/1GUjsuegzXAdahd6kmp6ZuIBQsG4Ur/+BEGhWY/8uz1v7NWTdIC4sQaUNaclo5 6iOEOiFzj0b6x7mxSjPnCEXcpOdHRoBdppqQsB+80wDi7bWVuK/5ufsEcW8C8RSZOaED dcu/dsztVCf82LGRZwywLsijMsMnvdXQFHHnT+itqCgX3kTZFMQcxcXdZjOzlIY/mmWk 74yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XQkS37T12dZlfsAoYAP4aoO+V29JAFmm6r72q1zdEEQ=; b=rkZu7VU9PTfLQ+djPNgamL1BVxF0U0F5Ph8wzGAfoDHyH/N4QZqzP43mVEP6OYJKjh YRWcodfc/1gB01TQid9H5or3salF4US069F5U2p4LAsADR1dwEOyYW1bFcbU2fo20o2p 7W+UNfUucaZaOExO+CahWoMQVpEuYwUszXP7ztuY/WHGWzLYqX/S2i75JFQnEVPFADNY mnme6MeKTKXg8qHq8hRBJu2ZVs1Bp5JN7laMx3GcGrghlVhqUkllnnqLAIgMFt4DCy9t C0Szfyuxiaj7+O5xWCTnPDeP8Cg8hrl7WZKzzuZtrpbICzqwfPu2MUgThH7YdRoae8gF geVg== X-Gm-Message-State: AOAM530/+4TWE+6VYJTiIphFd5s4c3o0De9G1djeT6++YOmRB0W77zuC hjMZvILpYEIlNghwwZqgX1iSfbiUu+M= X-Google-Smtp-Source: ABdhPJwHxTtgAx8i24OfHfyS6ZVsA3ylUVCxkdzGCoFCShYTrExuodJ4DtQBUmCGQiLM+luIrKDqAQ== X-Received: by 2002:a05:6000:181b:: with SMTP id m27mr4852329wrh.363.1614385852706; Fri, 26 Feb 2021 16:30:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w11sm15137409wru.3.2021.02.26.16.30.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:52 -0800 (PST) Message-Id: <21b9cf1da30e41b4e4645a4db64f4e6d15c03518.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:41 +0000 Subject: [PATCH v4 03/10] diffcore-rename: add a mapping of destination names to their indices Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Compute a mapping of full filename to the index within rename_dst where that filename is found, and store it in idx_map. idx_possible_rename() needs this to quickly finding an array entry in rename_dst given the pathname. While at it, add placeholder initializations for dir_rename_count and dir_rename_guess; these will be more fully populated in subsequent commits. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index edb0effb6ef4..8eeb8c73664c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -380,6 +380,45 @@ static char *get_dirname(const char *filename) return slash ? xstrndup(filename, slash - filename) : xstrdup(""); } +static void initialize_dir_rename_info(struct dir_rename_info *info) +{ + int i; + + info->setup = 1; + + strintmap_init_with_options(&info->idx_map, -1, NULL, 0); + strmap_init_with_options(&info->dir_rename_guess, NULL, 0); + info->dir_rename_count = NULL; + + /* + * Loop setting up both info->idx_map. + */ + for (i = 0; i < rename_dst_nr; ++i) { + /* + * For non-renamed files, make idx_map contain mapping of + * filename -> index (index within rename_dst, that is) + */ + if (!rename_dst[i].is_rename) { + char *filename = rename_dst[i].p->two->path; + strintmap_set(&info->idx_map, filename, i); + } + } +} + +static void cleanup_dir_rename_info(struct dir_rename_info *info) +{ + if (!info->setup) + return; + + /* idx_map */ + strintmap_clear(&info->idx_map); + + /* dir_rename_guess */ + strmap_clear(&info->dir_rename_guess, 1); + + /* Nothing to do for dir_rename_count, yet */ +} + static const char *get_basename(const char *filename) { /* @@ -858,6 +897,11 @@ void diffcore_rename(struct diff_options *options) remove_unneeded_paths_from_src(want_copies); trace2_region_leave("diff", "cull after exact", options->repo); + /* Preparation for basename-driven matching. */ + trace2_region_enter("diff", "dir rename setup", options->repo); + initialize_dir_rename_info(&info); + trace2_region_leave("diff", "dir rename setup", options->repo); + /* Utilize file basenames to quickly find renames. */ trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, @@ -1026,6 +1070,7 @@ void diffcore_rename(struct diff_options *options) if (rename_dst[i].filespec_to_free) free_filespec(rename_dst[i].filespec_to_free); + cleanup_dir_rename_info(&info); FREE_AND_NULL(rename_dst); rename_dst_nr = rename_dst_alloc = 0; FREE_AND_NULL(rename_src); From patchwork Sat Feb 27 00:30:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BDDBC433DB for ; Sat, 27 Feb 2021 00:32:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5C74464E33 for ; Sat, 27 Feb 2021 00:32:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230070AbhB0AcN (ORCPT ); Fri, 26 Feb 2021 19:32:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229915AbhB0AcL (ORCPT ); Fri, 26 Feb 2021 19:32:11 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB41FC06178A for ; Fri, 26 Feb 2021 16:30:54 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id f12so6407101wrx.8 for ; Fri, 26 Feb 2021 16:30:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=wDEIi81r3GcUklVi7oBFLu8jyHzro2dKTvTKyeHL9m4=; b=q/AL8EaZljL5IXAQWsIe/uR6RA2F0Gbv+yMBEbuWINriNgZD5PkBIhrF0PdjpmLZq3 Uv0qe92qOZFBCAl+h9DjDLIBa9kP9IkuQXOgM2azF/4R4v2qGVvctNB/CWcH1iRAjtoB hpQoF7hTBaOjTLEaMcwtWTKjxhUMhW9Ia15OlovhPCD1eBk7d+6ChxaodSlwHi7KVusq 8dysIWPJTfxhpdXhDRdJBtR1+M67VJncHWrBylZrQ7QpALqzO9ko2F5FmCxGLc4l67oT PprYnT04+WZAWveRa0IlxgXj74b7nqrUb2fIA20TZT+i1j2ZrryGtQeCQqSa3JS7I1+4 mMkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=wDEIi81r3GcUklVi7oBFLu8jyHzro2dKTvTKyeHL9m4=; b=ksR2dlHVZ8zMNZ664jvbqCVmr1NyU9M+MOK5Kl5/UZUa6h6kEnmrmCQaqFKRnwqErE ZzIEsZareTQQAqJX1DWOkA6iPNyAEhVejwPXihlkR5YER1oM0GvKOrjod3Fq7qTXkEzL Ov8QK6sc26gxkpSvYZpqeWbV/Q89j3CML7HPH5Q0fuHYcPAhx+4c8AE321jL+oyMrOU5 nnDcH6OjletSRI7OfVaidZigi5v9HRLYienyjwN/M4w8IeqXvIUnJhJeAY1OtvOyi5t5 3VpQlD9OVt7FKjymjsSNyQGjGLkvaDND2sxndwmmUygX+LqJdtsxdyo9Bl4Ceg+iiGCN Lslg== X-Gm-Message-State: AOAM5331c2aA3oDifplXcnxGpkBF/C6Cgh8qnuaWu5WMv0qx4mU5/j77 6aA0OK4w9Xb+5+HVO4TZnsfH4IE+efo= X-Google-Smtp-Source: ABdhPJzZ7jfL2+1Efh7MWPtIDhrSNq9rVminBC+S/3tYWBvw0bptH7XeUf5jJ3JNbS5OwDQ+47mViA== X-Received: by 2002:adf:f44b:: with SMTP id f11mr1778460wrp.345.1614385853461; Fri, 26 Feb 2021 16:30:53 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o13sm18014496wro.15.2021.02.26.16.30.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:53 -0800 (PST) Message-Id: <3617b0209cc44b01a75f42ca57341fedc64ca8fd.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:42 +0000 Subject: [PATCH v4 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Move the computation of dir_rename_count from merge-ort.c to diffcore-rename.c, making slight adjustments to the data structures based on the move. While the diffstat looks large, viewing this commit with --color-moved makes it clear that only about 20 lines changed. With this patch, the computation of dir_rename_count is still only done after inexact rename detection, but subsequent commits will add a preliminary computation of dir_rename_count after exact rename detection, followed by some updates after inexact rename detection. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 138 +++++++++++++++++++++++++++++++++++++++++++++- diffcore.h | 5 ++ merge-ort.c | 132 +------------------------------------------- 3 files changed, 145 insertions(+), 130 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 8eeb8c73664c..39e23d57e7bc 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -380,6 +380,129 @@ static char *get_dirname(const char *filename) return slash ? xstrndup(filename, slash - filename) : xstrdup(""); } +static void dirname_munge(char *filename) +{ + char *slash = strrchr(filename, '/'); + if (!slash) + slash = filename; + *slash = '\0'; +} + +static void increment_count(struct strmap *dir_rename_count, + char *old_dir, + char *new_dir) +{ + struct strintmap *counts; + struct strmap_entry *e; + + /* Get the {new_dirs -> counts} mapping using old_dir */ + e = strmap_get_entry(dir_rename_count, old_dir); + if (e) { + counts = e->value; + } else { + counts = xmalloc(sizeof(*counts)); + strintmap_init_with_options(counts, 0, NULL, 1); + strmap_put(dir_rename_count, old_dir, counts); + } + + /* Increment the count for new_dir */ + strintmap_incr(counts, new_dir, 1); +} + +static void update_dir_rename_counts(struct strmap *dir_rename_count, + struct strset *dirs_removed, + const char *oldname, + const char *newname) +{ + char *old_dir = xstrdup(oldname); + char *new_dir = xstrdup(newname); + char new_dir_first_char = new_dir[0]; + int first_time_in_loop = 1; + + while (1) { + dirname_munge(old_dir); + dirname_munge(new_dir); + + /* + * When renaming + * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c" + * then this suggests that both + * a/b/c/d/e/ => a/b/some/thing/else/e/ + * a/b/c/d/ => a/b/some/thing/else/ + * so we want to increment counters for both. We do NOT, + * however, also want to suggest that there was the following + * rename: + * a/b/c/ => a/b/some/thing/ + * so we need to quit at that point. + * + * Note the when first_time_in_loop, we only strip off the + * basename, and we don't care if that's different. + */ + if (!first_time_in_loop) { + char *old_sub_dir = strchr(old_dir, '\0')+1; + char *new_sub_dir = strchr(new_dir, '\0')+1; + if (!*new_dir) { + /* + * Special case when renaming to root directory, + * i.e. when new_dir == "". In this case, we had + * something like + * a/b/subdir => subdir + * and so dirname_munge() sets things up so that + * old_dir = "a/b\0subdir\0" + * new_dir = "\0ubdir\0" + * We didn't have a '/' to overwrite a '\0' onto + * in new_dir, so we have to compare differently. + */ + if (new_dir_first_char != old_sub_dir[0] || + strcmp(old_sub_dir+1, new_sub_dir)) + break; + } else { + if (strcmp(old_sub_dir, new_sub_dir)) + break; + } + } + + if (strset_contains(dirs_removed, old_dir)) + increment_count(dir_rename_count, old_dir, new_dir); + else + break; + + /* If we hit toplevel directory ("") for old or new dir, quit */ + if (!*old_dir || !*new_dir) + break; + + first_time_in_loop = 0; + } + + /* Free resources we don't need anymore */ + free(old_dir); + free(new_dir); +} + +static void compute_dir_rename_counts(struct strmap *dir_rename_count, + struct strset *dirs_removed) +{ + int i; + + /* Set up dir_rename_count */ + for (i = 0; i < rename_dst_nr; ++i) { + /* File not part of directory rename counts if not a rename */ + if (!rename_dst[i].is_rename) + continue; + + /* + * Make dir_rename_count contain a map of a map: + * old_directory -> {new_directory -> count} + * In other words, for every pair look at the directories for + * the old filename and the new filename and count how many + * times that pairing occurs. + */ + update_dir_rename_counts(dir_rename_count, dirs_removed, + rename_dst[i].p->one->path, + rename_dst[i].p->two->path); + } +} + static void initialize_dir_rename_info(struct dir_rename_info *info) { int i; @@ -790,7 +913,9 @@ static void remove_unneeded_paths_from_src(int detecting_copies) rename_src_nr = new_num_src; } -void diffcore_rename(struct diff_options *options) +void diffcore_rename_extended(struct diff_options *options, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int detect_rename = options->detect_rename; int minimum_score = options->rename_score; @@ -805,6 +930,7 @@ void diffcore_rename(struct diff_options *options) trace2_region_enter("diff", "setup", options->repo); info.setup = 0; + assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -999,6 +1125,11 @@ void diffcore_rename(struct diff_options *options) trace2_region_leave("diff", "inexact renames", options->repo); cleanup: + /* + * Now that renames have been computed, compute dir_rename_count */ + if (dirs_removed && dir_rename_count) + compute_dir_rename_counts(dir_rename_count, dirs_removed); + /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ @@ -1082,3 +1213,8 @@ void diffcore_rename(struct diff_options *options) trace2_region_leave("diff", "write back to queue", options->repo); return; } + +void diffcore_rename(struct diff_options *options) +{ + diffcore_rename_extended(options, NULL, NULL); +} diff --git a/diffcore.h b/diffcore.h index d2a63c5c71f4..db55d3853071 100644 --- a/diffcore.h +++ b/diffcore.h @@ -8,6 +8,8 @@ struct diff_options; struct repository; +struct strmap; +struct strset; struct userdiff_driver; /* This header file is internal between diff.c and its diff transformers @@ -161,6 +163,9 @@ void diff_q(struct diff_queue_struct *, struct diff_filepair *); void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); +void diffcore_rename_extended(struct diff_options *options, + struct strset *dirs_removed, + struct strmap *dir_rename_count); void diffcore_merge_broken(void); void diffcore_pickaxe(struct diff_options *); void diffcore_order(const char *orderfile); diff --git a/merge-ort.c b/merge-ort.c index 603d30c52170..c4467e073b45 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1302,131 +1302,6 @@ static char *handle_path_level_conflicts(struct merge_options *opt, return new_path; } -static void dirname_munge(char *filename) -{ - char *slash = strrchr(filename, '/'); - if (!slash) - slash = filename; - *slash = '\0'; -} - -static void increment_count(struct strmap *dir_rename_count, - char *old_dir, - char *new_dir) -{ - struct strintmap *counts; - struct strmap_entry *e; - - /* Get the {new_dirs -> counts} mapping using old_dir */ - e = strmap_get_entry(dir_rename_count, old_dir); - if (e) { - counts = e->value; - } else { - counts = xmalloc(sizeof(*counts)); - strintmap_init_with_options(counts, 0, NULL, 1); - strmap_put(dir_rename_count, old_dir, counts); - } - - /* Increment the count for new_dir */ - strintmap_incr(counts, new_dir, 1); -} - -static void update_dir_rename_counts(struct strmap *dir_rename_count, - struct strset *dirs_removed, - const char *oldname, - const char *newname) -{ - char *old_dir = xstrdup(oldname); - char *new_dir = xstrdup(newname); - char new_dir_first_char = new_dir[0]; - int first_time_in_loop = 1; - - while (1) { - dirname_munge(old_dir); - dirname_munge(new_dir); - - /* - * When renaming - * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c" - * then this suggests that both - * a/b/c/d/e/ => a/b/some/thing/else/e/ - * a/b/c/d/ => a/b/some/thing/else/ - * so we want to increment counters for both. We do NOT, - * however, also want to suggest that there was the following - * rename: - * a/b/c/ => a/b/some/thing/ - * so we need to quit at that point. - * - * Note the when first_time_in_loop, we only strip off the - * basename, and we don't care if that's different. - */ - if (!first_time_in_loop) { - char *old_sub_dir = strchr(old_dir, '\0')+1; - char *new_sub_dir = strchr(new_dir, '\0')+1; - if (!*new_dir) { - /* - * Special case when renaming to root directory, - * i.e. when new_dir == "". In this case, we had - * something like - * a/b/subdir => subdir - * and so dirname_munge() sets things up so that - * old_dir = "a/b\0subdir\0" - * new_dir = "\0ubdir\0" - * We didn't have a '/' to overwrite a '\0' onto - * in new_dir, so we have to compare differently. - */ - if (new_dir_first_char != old_sub_dir[0] || - strcmp(old_sub_dir+1, new_sub_dir)) - break; - } else { - if (strcmp(old_sub_dir, new_sub_dir)) - break; - } - } - - if (strset_contains(dirs_removed, old_dir)) - increment_count(dir_rename_count, old_dir, new_dir); - else - break; - - /* If we hit toplevel directory ("") for old or new dir, quit */ - if (!*old_dir || !*new_dir) - break; - - first_time_in_loop = 0; - } - - /* Free resources we don't need anymore */ - free(old_dir); - free(new_dir); -} - -static void compute_rename_counts(struct diff_queue_struct *pairs, - struct strmap *dir_rename_count, - struct strset *dirs_removed) -{ - int i; - - for (i = 0; i < pairs->nr; ++i) { - struct diff_filepair *pair = pairs->queue[i]; - - /* File not part of directory rename if it wasn't renamed */ - if (pair->status != 'R') - continue; - - /* - * Make dir_rename_count contain a map of a map: - * old_directory -> {new_directory -> count} - * In other words, for every pair look at the directories for - * the old filename and the new filename and count how many - * times that pairing occurs. - */ - update_dir_rename_counts(dir_rename_count, dirs_removed, - pair->one->path, - pair->two->path); - } -} - static void get_provisional_directory_renames(struct merge_options *opt, unsigned side, int *clean) @@ -1435,9 +1310,6 @@ static void get_provisional_directory_renames(struct merge_options *opt, struct strmap_entry *entry; struct rename_info *renames = &opt->priv->renames; - compute_rename_counts(&renames->pairs[side], - &renames->dir_rename_count[side], - &renames->dirs_removed[side]); /* * Collapse * dir_rename_count: old_directory -> {new_directory -> count} @@ -2162,7 +2034,9 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff = renames->pairs[side_index]; trace2_region_enter("diff", "diffcore_rename", opt->repo); - diffcore_rename(&diff_opts); + diffcore_rename_extended(&diff_opts, + &renames->dirs_removed[side_index], + &renames->dir_rename_count[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); From patchwork Sat Feb 27 00:30:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09D2BC433E6 for ; Sat, 27 Feb 2021 00:32:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA2AE64E4E for ; Sat, 27 Feb 2021 00:32:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230083AbhB0AcQ (ORCPT ); Fri, 26 Feb 2021 19:32:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230040AbhB0AcL (ORCPT ); Fri, 26 Feb 2021 19:32:11 -0500 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 499ACC06178B for ; Fri, 26 Feb 2021 16:30:55 -0800 (PST) Received: by mail-wm1-x335.google.com with SMTP id x16so8651973wmk.3 for ; Fri, 26 Feb 2021 16:30:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=thfVd0tHN5z+Pv8zu46+TgGQ7hsjp/64gQDt7RDlSeQ=; b=l50olkwpGrVg1gDXGo1tn6DRhSubqpjQa779BoFgrnpmoSV8ixgRsVdwSX4xZdYxIw Qrni24EIuT13M6SI6JHrl9WpIdvvzpzAXHfSYt481SzjgiBSaTfIJwVdzlpwLkgUW+4v D4L6CFpR++Vze+e5WBe9pYAUtKUXTQGir3v6AriO49GMJXiJtGiuPpSS8cFJkgF/+Y8p 0kNH+XcunXPJaVBqSfmyAkEDi+n3C2F+6erruSyfWzY7xV4nZ321X36ZXav11yVIau3s IJYAUhiZrq8Vvi/gzhy6/5ECtvEi1yg02XbNzr9LB9IJZMgZYYNjHlmsgkFaqfSLvrar 3lDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=thfVd0tHN5z+Pv8zu46+TgGQ7hsjp/64gQDt7RDlSeQ=; b=aw9BZAGX7VgDTWNJENoJJ/fnoNdyNx/76+B0mZOYzYzxdE5FTBQ5u1hduydDsLN+EZ 1o1+J8yDYrb2iw5oZeSp1DD2j9fswJdjjOGe9hR7sirDKrNoi00pnIcCMZzC9fi7PN5P Qhv/4waZUXvWte5izTLvYtQaFROZESo6w6diUXG6/YiX1nE1F3R7tekmofPjytMxiIRp kF07pptMXyCVffmTR9kBjpKARFVxU4cyCpugA60kMTvovSLyRRXGa68YYwI3imgLY4YB wAKoBr2efeqJgF3efGWgNX5eMerI4nZsycH0Jc3cxu8P+PH7bjB48/VelPLKzD9mwXOt 8gxA== X-Gm-Message-State: AOAM533Df/Cyw5PTMi/uOPMpnEzJQE8x7pb2Ys2M99ghEuBneD7YrJ0x uZNzq1YkhCSglzUAkskl+/1F4AV2f18= X-Google-Smtp-Source: ABdhPJydG8mnKELqArRSZcIp2Jp5e31zHhf6VnMUsFGc2YY4vHIaEwISPvuG9+JtUy3bRuGwctowuQ== X-Received: by 2002:a1c:7415:: with SMTP id p21mr4881434wmc.124.1614385854075; Fri, 26 Feb 2021 16:30:54 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b7sm15719961wrv.6.2021.02.26.16.30.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:53 -0800 (PST) Message-Id: <2baf39d82f3ed40925addd778356b143f516f157.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:43 +0000 Subject: [PATCH v4 05/10] diffcore-rename: add function for clearing dir_rename_count Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren As we adjust the usage of dir_rename_count we want to have a function for clearing, or partially clearing it out. Add a partial_clear_dir_rename_count() function for this purpose. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 12 ++++++++++++ diffcore.h | 2 ++ merge-ort.c | 12 +++--------- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 39e23d57e7bc..7dd475ff9a9f 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -528,6 +528,18 @@ static void initialize_dir_rename_info(struct dir_rename_info *info) } } +void partial_clear_dir_rename_count(struct strmap *dir_rename_count) +{ + struct hashmap_iter iter; + struct strmap_entry *entry; + + strmap_for_each_entry(dir_rename_count, &iter, entry) { + struct strintmap *counts = entry->value; + strintmap_clear(counts); + } + strmap_partial_clear(dir_rename_count, 1); +} + static void cleanup_dir_rename_info(struct dir_rename_info *info) { if (!info->setup) diff --git a/diffcore.h b/diffcore.h index db55d3853071..c6ba64abd198 100644 --- a/diffcore.h +++ b/diffcore.h @@ -161,6 +161,8 @@ struct diff_filepair *diff_queue(struct diff_queue_struct *, struct diff_filespec *); void diff_q(struct diff_queue_struct *, struct diff_filepair *); +void partial_clear_dir_rename_count(struct strmap *dir_rename_count); + void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, diff --git a/merge-ort.c b/merge-ort.c index c4467e073b45..467404cc0a35 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -351,17 +351,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, /* Free memory used by various renames maps */ for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) { - struct hashmap_iter iter; - struct strmap_entry *entry; - strset_func(&renames->dirs_removed[i]); - strmap_for_each_entry(&renames->dir_rename_count[i], - &iter, entry) { - struct strintmap *counts = entry->value; - strintmap_clear(counts); - } - strmap_func(&renames->dir_rename_count[i], 1); + partial_clear_dir_rename_count(&renames->dir_rename_count[i]); + if (!reinitialize) + strmap_clear(&renames->dir_rename_count[i], 1); strmap_func(&renames->dir_renames[i], 0); } From patchwork Sat Feb 27 00:30:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CED03C433E0 for ; Sat, 27 Feb 2021 00:32:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A529E64E4E for ; Sat, 27 Feb 2021 00:32:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230112AbhB0AcT (ORCPT ); Fri, 26 Feb 2021 19:32:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230040AbhB0AcR (ORCPT ); Fri, 26 Feb 2021 19:32:17 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1355C06178C for ; Fri, 26 Feb 2021 16:30:55 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id v15so10232615wrx.4 for ; Fri, 26 Feb 2021 16:30:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=z1A+8EyXFNW1JMng/HVEyaSUwKmkK6DIAI/dLN1n5Xw=; b=NEwQOS1trb0ULDjthR6is2LQoVgfpor01w8olY/EBCctDFwJiciCTVZgkYe/UKL+8D YZ5n7uE8Goc8WIyBmXlZpi5TcGLqnDM5FjK4xiD4jPbdVjll629Z2uHX2EpOHFLzx6+a YqDqf7xJ8Ej2IkM6wF9k4yu+8RpEwFhURpq547HEBDkWavRlA/Layw+eNvGf0aOSfcRr 1GiGfgQa9DTuAvAtIQvQCE/XzRul5ty9zNUlH0WINfj0Ylytgt3Xpp7o7D+bdNV7kEKx T6UuYmZa2+BHwvwj+H3UGW6ovBbu7veQ9ZHfbvEgwdR1vGDUYMOUf9ezW5lzYueL3o1r pQfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=z1A+8EyXFNW1JMng/HVEyaSUwKmkK6DIAI/dLN1n5Xw=; b=hzYe2Y0fmFeMFyeXqHA/SPSDGuI+4ayhir9ttqa82y3SDZWvVNFWyauYMXsL3dX0UW ziEJgnrcEMFagdKZUBmMQBrUr4BGO8OF4f01OKsfoVMTB0ftJTQJcEJyXOccFvpPgqaO yxdEL2X9wAAiRCUagkNhd6uHo/9J5aSClfcMpesv7zt1Q9fBXZxj78KSfaZNjlxOrg/p wC245igtP0GG7LCMGJ52QyUWm/KFhTZ5TdBzLI2Df/Nij+zoztuTkY4mKdOMQtQt4tWm KxbPTZ8WhyhC145WGzeajRbH1q60hecovcUbZa9erDgt3zBbiEoFV2PLXOA2t3hEanWS RH5A== X-Gm-Message-State: AOAM533sAxrAcc7hg56pye1IBFRrPtX8792aXafp/NVD7+okhIMZXVnT cFraftWW23yQ74LLZ229Aa+R/Ec0VmU= X-Google-Smtp-Source: ABdhPJzsMGkjSI6DtaAazb4aiPV2VRjacwKf4oR8Nkt74SmNqVUeAOlSi4iKresC8NAd6TcPnMZMBw== X-Received: by 2002:adf:ce8a:: with SMTP id r10mr5749299wrn.17.1614385854665; Fri, 26 Feb 2021 16:30:54 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g11sm13688979wmk.32.2021.02.26.16.30.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:54 -0800 (PST) Message-Id: <02f1f7c02d32dc37813abf5fcf767cc374b17fd3.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:44 +0000 Subject: [PATCH v4 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren This continues the migration of the directory rename detection code into diffcore-rename, now taking the simple step of combining it with the dir_rename_info struct. Future commits will then make dir_rename_counts be computed in stages, and add computation of dir_rename_guess. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 7dd475ff9a9f..a1ccf14001f5 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -388,7 +388,7 @@ static void dirname_munge(char *filename) *slash = '\0'; } -static void increment_count(struct strmap *dir_rename_count, +static void increment_count(struct dir_rename_info *info, char *old_dir, char *new_dir) { @@ -396,20 +396,20 @@ static void increment_count(struct strmap *dir_rename_count, struct strmap_entry *e; /* Get the {new_dirs -> counts} mapping using old_dir */ - e = strmap_get_entry(dir_rename_count, old_dir); + e = strmap_get_entry(info->dir_rename_count, old_dir); if (e) { counts = e->value; } else { counts = xmalloc(sizeof(*counts)); strintmap_init_with_options(counts, 0, NULL, 1); - strmap_put(dir_rename_count, old_dir, counts); + strmap_put(info->dir_rename_count, old_dir, counts); } /* Increment the count for new_dir */ strintmap_incr(counts, new_dir, 1); } -static void update_dir_rename_counts(struct strmap *dir_rename_count, +static void update_dir_rename_counts(struct dir_rename_info *info, struct strset *dirs_removed, const char *oldname, const char *newname) @@ -463,7 +463,7 @@ static void update_dir_rename_counts(struct strmap *dir_rename_count, } if (strset_contains(dirs_removed, old_dir)) - increment_count(dir_rename_count, old_dir, new_dir); + increment_count(info, old_dir, new_dir); else break; @@ -479,12 +479,15 @@ static void update_dir_rename_counts(struct strmap *dir_rename_count, free(new_dir); } -static void compute_dir_rename_counts(struct strmap *dir_rename_count, - struct strset *dirs_removed) +static void compute_dir_rename_counts(struct dir_rename_info *info, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int i; - /* Set up dir_rename_count */ + info->setup = 1; + info->dir_rename_count = dir_rename_count; + for (i = 0; i < rename_dst_nr; ++i) { /* File not part of directory rename counts if not a rename */ if (!rename_dst[i].is_rename) @@ -497,7 +500,7 @@ static void compute_dir_rename_counts(struct strmap *dir_rename_count, * the old filename and the new filename and count how many * times that pairing occurs. */ - update_dir_rename_counts(dir_rename_count, dirs_removed, + update_dir_rename_counts(info, dirs_removed, rename_dst[i].p->one->path, rename_dst[i].p->two->path); } @@ -551,7 +554,9 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info) /* dir_rename_guess */ strmap_clear(&info->dir_rename_guess, 1); - /* Nothing to do for dir_rename_count, yet */ + /* dir_rename_count */ + partial_clear_dir_rename_count(info->dir_rename_count); + strmap_clear(info->dir_rename_count, 1); } static const char *get_basename(const char *filename) @@ -1140,7 +1145,7 @@ void diffcore_rename_extended(struct diff_options *options, /* * Now that renames have been computed, compute dir_rename_count */ if (dirs_removed && dir_rename_count) - compute_dir_rename_counts(dir_rename_count, dirs_removed); + compute_dir_rename_counts(&info, dirs_removed, dir_rename_count); /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. From patchwork Sat Feb 27 00:30:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14E03C433DB for ; Sat, 27 Feb 2021 00:32:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D4F8264DF5 for ; Sat, 27 Feb 2021 00:32:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230134AbhB0Aci (ORCPT ); Fri, 26 Feb 2021 19:32:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230084AbhB0AcR (ORCPT ); Fri, 26 Feb 2021 19:32:17 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DADBC061793 for ; Fri, 26 Feb 2021 16:30:56 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id p3so8674160wmc.2 for ; Fri, 26 Feb 2021 16:30:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=jcp+2po5eYalNBBHgg5x93G8rLL4n6w/mFVuCzoRZ3s=; b=QDZQUKpQlY7PtLvvII0sPnd3Swh5ZRaKfCVVP5Ci7BTU6f/3JRs9OkTcAvSWw+WLmp kZE+6U4t0qErfG53exyx67ezBMzDsX5H+3rbYZc9vwp8lBIGeg58r7nvKyuJkamIZnMV llTh2RHn8BSUsqBQfIkx9Dux1AEU2FhYXizplZgGJJwAC06uoe4PGHr28sgQ7bGH72o9 S78Soq+HQqTgI3VSaKJK0XHpM5uSPjqmEDDM9vvD4JB3yBsNaFuq9Eiv1XbXtqt/rY5A jX6DKGb/teF3kd7A35jINrqqFmI1cc/SIRA/fU0NLcr830tcma6meG/h2/OZ41xFK+YW kq4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=jcp+2po5eYalNBBHgg5x93G8rLL4n6w/mFVuCzoRZ3s=; b=K4Boqj0y7bk6txD+fSl/cfGfHuvZI/BQnDKr/tmU9o+Mu30843NKnWb9uXk6Fcx1MP PJbOK/URmUvrSebmljwJRPWnMtIhshwVTLfbo7zhn9zgkwDfb6iY/Y8NvMPL+YaP6N9d 6fBfI8paKELz2Jl2Xq/LVSp9rOYCSgrLqjSRAjY5/DxumXCutDr2eTpka7AsIaKTLq+W w6o0hcp8F7mN0HQ0sl55Eihc/IMb9GVVIc9rFbSNTDUr6WkyX9CxMRxuc3RSxPY0oY/k zfAV6c5HQcNu2pcLMFvxHuhEGObNz00UDBSRES/TtVwZWamsalGppa4bIn8a+7PJzd2P HZVA== X-Gm-Message-State: AOAM530dhTe949XgXgzT5oxXa9QNusqCZs3WXpz+pq4YiS2hgTIfDLqo ioicJRzqzEeHwfrG2PZ2f6kQ6tXwZ6U= X-Google-Smtp-Source: ABdhPJwtzR1XGqcl3rzyAyVVyHrS9Py/RUAUO8/zk53hP2rikds+9GJEhN1yZ6dZxJvG4M2GHoXhjg== X-Received: by 2002:a1c:c904:: with SMTP id f4mr5181608wmb.14.1614385855408; Fri, 26 Feb 2021 16:30:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t14sm16401847wru.64.2021.02.26.16.30.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:55 -0800 (PST) Message-Id: <9c3436840534266ab257ab59e04364722bf3f6e9.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:45 +0000 Subject: [PATCH v4 07/10] diffcore-rename: extend cleanup_dir_rename_info() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When diffcore_rename_extended() is passed a NULL dir_rename_count, we will still want to create a temporary one for use by find_basename_matches(), but have it fully deallocated before diffcore_rename_extended() returns. However, when diffcore_rename_extended() is passed a dir_rename_count, we want to fill that strmap with appropriate values and return it. However, for our interim purposes we may also add entries corresponding to directories that cannot have been renamed due to still existing on both sides. Extend cleanup_dir_rename_info() to handle these two different cases, cleaning up the relevant bits of information for each case. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index a1ccf14001f5..2cf9c47c6364 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -543,8 +543,15 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count) strmap_partial_clear(dir_rename_count, 1); } -static void cleanup_dir_rename_info(struct dir_rename_info *info) +static void cleanup_dir_rename_info(struct dir_rename_info *info, + struct strset *dirs_removed, + int keep_dir_rename_count) { + struct hashmap_iter iter; + struct strmap_entry *entry; + struct string_list to_remove = STRING_LIST_INIT_NODUP; + int i; + if (!info->setup) return; @@ -555,8 +562,33 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info) strmap_clear(&info->dir_rename_guess, 1); /* dir_rename_count */ - partial_clear_dir_rename_count(info->dir_rename_count); - strmap_clear(info->dir_rename_count, 1); + if (!keep_dir_rename_count) { + partial_clear_dir_rename_count(info->dir_rename_count); + strmap_clear(info->dir_rename_count, 1); + FREE_AND_NULL(info->dir_rename_count); + return; + } + + /* + * Although dir_rename_count was passed in + * diffcore_rename_extended() and we want to keep it around and + * return it to that caller, we first want to remove any data + * associated with directories that weren't renamed. + */ + strmap_for_each_entry(info->dir_rename_count, &iter, entry) { + const char *source_dir = entry->key; + struct strintmap *counts = entry->value; + + if (!strset_contains(dirs_removed, source_dir)) { + string_list_append(&to_remove, source_dir); + strintmap_clear(counts); + continue; + } + } + for (i = 0; i < to_remove.nr; ++i) + strmap_remove(info->dir_rename_count, + to_remove.items[i].string, 1); + string_list_clear(&to_remove, 0); } static const char *get_basename(const char *filename) @@ -1218,7 +1250,7 @@ void diffcore_rename_extended(struct diff_options *options, if (rename_dst[i].filespec_to_free) free_filespec(rename_dst[i].filespec_to_free); - cleanup_dir_rename_info(&info); + cleanup_dir_rename_info(&info, dirs_removed, dir_rename_count != NULL); FREE_AND_NULL(rename_dst); rename_dst_nr = rename_dst_alloc = 0; FREE_AND_NULL(rename_src); From patchwork Sat Feb 27 00:30:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A240C433E0 for ; Sat, 27 Feb 2021 00:32:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03D2964E33 for ; Sat, 27 Feb 2021 00:32:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230164AbhB0Act (ORCPT ); Fri, 26 Feb 2021 19:32:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230087AbhB0AcR (ORCPT ); Fri, 26 Feb 2021 19:32:17 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42A1AC061794 for ; Fri, 26 Feb 2021 16:30:57 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id w11so10196626wrr.10 for ; Fri, 26 Feb 2021 16:30:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=GOSeeKvtRdz0GID/eiY63VEW9+XRRBYv7jvX2cxk/lQ=; b=M3iDYFzLse8YFzQdBaYZ1TjYIDutEg2H+s6YIi0cKeVvk1pWMT3ccZiQ2bNmuIib6i yOjrO6Y16oTfqJ0ntlPUNU48419zZZdmbBWkuL7Th2jLzZNUfjyMqv/5OSfSAzEQlHzE EUtaFoU1FpH54j03NKaPBKD+H9xEA212/pU5EvtMzhAN7aQ5Hgc1QCd+aOqec2dg4xvX 0RmreKN0mxkg961rBVT3dy2Smkgoz9PD+hcw6Wx63qF/G6DzCYlbDRt1t9Q+MdgftQE0 +NO4ISK6pFyXSBT8eTjKcAIRcZofmQmR9oQhXI+fFqD7leNbmQsvQCQun+0DUmRYloaM HcqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=GOSeeKvtRdz0GID/eiY63VEW9+XRRBYv7jvX2cxk/lQ=; b=iLdtTIhz1jRlaehnRX1EyOBhGkSqnbFgKQrR3yozuF44ClD1AyT8Lc0U6a3unlQ0s+ T2YLQU9beBFyjyChGxN8UPsI5pPupdEq/7pCyvhq3fL4gIeAlE8NaIzGQhjR0Gs9kqFZ j0FzTQXVK5DmaFgMhX/PUuAjuJLUnZ1IX6CkLuwpZgxXm359B75y6K4li+jC4FqWRlB3 uyd9wIS3fPPhfJhz7AuvS1cTYoN93mteRXq1HJWZpzxeaNFdhi3wR/ML8TXdmCPzyea8 E8XWoHfO/NJbJcAFbyrG4JG1vYaVVg51lP2RlbWwC+TjhHvxQ50R+DlvoBCdiogbYjYv /r+Q== X-Gm-Message-State: AOAM5337wAMo0Zgt7p6eTl6yjEd2qOyN9eQ7dJOMFoSC+H1TK4SBX6lM 8DeYhe9E7rXBmYAsU+Zqq+lnc7xaBnY= X-Google-Smtp-Source: ABdhPJzLcrt3O8DZoNdnTrXPVUE/im84GUAhM9MaGKm5SIBNcrBfXu+qaDrHTHY1e9REt+Epi5FMiA== X-Received: by 2002:a5d:66c8:: with SMTP id k8mr5765766wrw.163.1614385856000; Fri, 26 Feb 2021 16:30:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z3sm16102307wrw.96.2021.02.26.16.30.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:55 -0800 (PST) Message-Id: <6bd398d3707eb514d8758dcff0a21cae11c897b5.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:46 +0000 Subject: [PATCH v4 08/10] diffcore-rename: compute dir_rename_counts in stages Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Compute dir_rename_counts based just on exact renames to start, as that can provide us useful information in find_basename_matches(). This is done by moving the code from compute_dir_rename_counts() into initialize_dir_rename_info(), resulting in it being computed earlier and based just on exact renames. Since that's an incomplete result, we augment the counts via calling update_dir_rename_counts() after each basename-guide and inexact rename detection match is found. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 110 +++++++++++++++++++++++++++++----------------- 1 file changed, 70 insertions(+), 40 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 2cf9c47c6364..10f8f4a301e3 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -419,6 +419,28 @@ static void update_dir_rename_counts(struct dir_rename_info *info, char new_dir_first_char = new_dir[0]; int first_time_in_loop = 1; + if (!info->setup) + /* + * info->setup is 0 here in two cases: (1) all auxiliary + * vars (like dirs_removed) were NULL so + * initialize_dir_rename_info() returned early, or (2) + * either break detection or copy detection are active so + * that we never called initialize_dir_rename_info(). In + * the former case, we don't have enough info to know if + * directories were renamed (because dirs_removed lets us + * know about a necessary prerequisite, namely if they were + * removed), and in the latter, we don't care about + * directory renames or find_basename_matches. + * + * This matters because both basename and inexact matching + * will also call update_dir_rename_counts(). In either of + * the above two cases info->dir_rename_counts will not + * have been properly initialized which prevents us from + * updating it, but in these two cases we don't care about + * dir_rename_counts anyway, so we can just exit early. + */ + return; + while (1) { dirname_munge(old_dir); dirname_munge(new_dir); @@ -479,45 +501,29 @@ static void update_dir_rename_counts(struct dir_rename_info *info, free(new_dir); } -static void compute_dir_rename_counts(struct dir_rename_info *info, - struct strset *dirs_removed, - struct strmap *dir_rename_count) +static void initialize_dir_rename_info(struct dir_rename_info *info, + struct strset *dirs_removed, + struct strmap *dir_rename_count) { int i; - info->setup = 1; - info->dir_rename_count = dir_rename_count; - - for (i = 0; i < rename_dst_nr; ++i) { - /* File not part of directory rename counts if not a rename */ - if (!rename_dst[i].is_rename) - continue; - - /* - * Make dir_rename_count contain a map of a map: - * old_directory -> {new_directory -> count} - * In other words, for every pair look at the directories for - * the old filename and the new filename and count how many - * times that pairing occurs. - */ - update_dir_rename_counts(info, dirs_removed, - rename_dst[i].p->one->path, - rename_dst[i].p->two->path); + if (!dirs_removed) { + info->setup = 0; + return; } -} - -static void initialize_dir_rename_info(struct dir_rename_info *info) -{ - int i; - info->setup = 1; + info->dir_rename_count = dir_rename_count; + if (!info->dir_rename_count) { + info->dir_rename_count = xmalloc(sizeof(*dir_rename_count)); + strmap_init(info->dir_rename_count); + } strintmap_init_with_options(&info->idx_map, -1, NULL, 0); strmap_init_with_options(&info->dir_rename_guess, NULL, 0); - info->dir_rename_count = NULL; /* - * Loop setting up both info->idx_map. + * Loop setting up both info->idx_map, and doing setup of + * info->dir_rename_count. */ for (i = 0; i < rename_dst_nr; ++i) { /* @@ -527,7 +533,20 @@ static void initialize_dir_rename_info(struct dir_rename_info *info) if (!rename_dst[i].is_rename) { char *filename = rename_dst[i].p->two->path; strintmap_set(&info->idx_map, filename, i); + continue; } + + /* + * For everything else (i.e. renamed files), make + * dir_rename_count contain a map of a map: + * old_directory -> {new_directory -> count} + * In other words, for every pair look at the directories for + * the old filename and the new filename and count how many + * times that pairing occurs. + */ + update_dir_rename_counts(info, dirs_removed, + rename_dst[i].p->one->path, + rename_dst[i].p->two->path); } } @@ -682,7 +701,8 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info) static int find_basename_matches(struct diff_options *options, int minimum_score, - struct dir_rename_info *info) + struct dir_rename_info *info, + struct strset *dirs_removed) { /* * When I checked in early 2020, over 76% of file renames in linux @@ -810,6 +830,8 @@ static int find_basename_matches(struct diff_options *options, continue; record_rename_pair(dst_index, src_index, score); renames++; + update_dir_rename_counts(info, dirs_removed, + one->path, two->path); /* * Found a rename so don't need text anymore; if we @@ -893,7 +915,12 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, return 1; } -static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, int copies) +static int find_renames(struct diff_score *mx, + int dst_cnt, + int minimum_score, + int copies, + struct dir_rename_info *info, + struct strset *dirs_removed) { int count = 0, i; @@ -910,6 +937,9 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i continue; record_rename_pair(mx[i].dst, mx[i].src, mx[i].score); count++; + update_dir_rename_counts(info, dirs_removed, + rename_src[mx[i].src].p->one->path, + rename_dst[mx[i].dst].p->two->path); } return count; } @@ -981,6 +1011,8 @@ void diffcore_rename_extended(struct diff_options *options, info.setup = 0; assert(!dir_rename_count || strmap_empty(dir_rename_count)); want_copies = (detect_rename == DIFF_DETECT_COPY); + if (dirs_removed && (break_idx || want_copies)) + BUG("dirs_removed incompatible with break/copy detection"); if (!minimum_score) minimum_score = DEFAULT_RENAME_SCORE; @@ -1074,14 +1106,15 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); - initialize_dir_rename_info(&info); + initialize_dir_rename_info(&info, + dirs_removed, dir_rename_count); trace2_region_leave("diff", "dir rename setup", options->repo); /* Utilize file basenames to quickly find renames. */ trace2_region_enter("diff", "basename matches", options->repo); rename_count += find_basename_matches(options, min_basename_score, - &info); + &info, dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); /* @@ -1167,18 +1200,15 @@ void diffcore_rename_extended(struct diff_options *options, /* cost matrix sorted by most to least similar pair */ STABLE_QSORT(mx, dst_cnt * NUM_CANDIDATE_PER_DST, score_compare); - rename_count += find_renames(mx, dst_cnt, minimum_score, 0); + rename_count += find_renames(mx, dst_cnt, minimum_score, 0, + &info, dirs_removed); if (want_copies) - rename_count += find_renames(mx, dst_cnt, minimum_score, 1); + rename_count += find_renames(mx, dst_cnt, minimum_score, 1, + &info, dirs_removed); free(mx); trace2_region_leave("diff", "inexact renames", options->repo); cleanup: - /* - * Now that renames have been computed, compute dir_rename_count */ - if (dirs_removed && dir_rename_count) - compute_dir_rename_counts(&info, dirs_removed, dir_rename_count); - /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ From patchwork Sat Feb 27 00:30:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 652F3C433E0 for ; Sat, 27 Feb 2021 00:32:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33BEF64E4E for ; Sat, 27 Feb 2021 00:32:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230155AbhB0Acq (ORCPT ); Fri, 26 Feb 2021 19:32:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230090AbhB0AcR (ORCPT ); Fri, 26 Feb 2021 19:32:17 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6379C061797 for ; Fri, 26 Feb 2021 16:30:57 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id p3so8674176wmc.2 for ; Fri, 26 Feb 2021 16:30:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=AOjAsjDCnHz5JovKtZiMourT7thcpzEqmHnNEavBoco=; b=QoASgKJc2kWjy0X5LXE4OLVkTsG9Q9Lf9Z8+2RLCIRA1Cm3/1T+LQR+ZspDjNhHxPh vUU3sXfe9X5qw3rXrHlarkJLrO8LbtiAK1YLDK2kqXLeuI+KrxuTMrjf/I0rm1tmHC5e GRJ+2M4lXhbt+mRYIH8AUZIVwaGg2Z0qcUsS+8phjG19G3cJBkidydEHYpOEUTgZcXZq QYxOUN6tGSeMtm26uxMLY7A9J7qfFotKwv0BF72pCz9LMQlUVyiRhXb2rPOwtcdL26q2 g9Ue2aqp7/oVIgnTYu0FkOoddoR4kLeVMteZhX+BcrEhEhpIpO4KjboEy60FuTJah1fk Gyng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=AOjAsjDCnHz5JovKtZiMourT7thcpzEqmHnNEavBoco=; b=s4N4QSGkdwFo3dbP4E7xNIBmsIucIJSMG8u6o3iYY/fAABtIUg74091yiX7sa+isyQ gKXSeGaQEmuvczRumc4zOGBNs7/KMps4f8QR5JHT4tjmO4BS+a9/hJOFynnW7ltI6Wim HNzSCVNbxsqGKQU5FWS5SA4lI5ybNjItW9yWSok2SXbAzQpzbsoQ7L0PGHhKF8a63hKw +jdw3J6IdO+LNxkJDmbYCwkfgpNuHJ20IHdDlfFyuwJ9MJEaNvM31DIGyNT6wKxf+IjH gu0yBAFtKAwhDViZw27tkit+TP3r0nL1Tt/dlu3WuUyZSYtaz+S37jCIbBedQ8zDYOAW Xmog== X-Gm-Message-State: AOAM530PB7N6QOUnn3F5HbYyAaigBWeBvTStk5WMoMTCGSlUgNpKTmxe 1KW3wd/kmjscEOx3N8p539VqUN0MEtw= X-Google-Smtp-Source: ABdhPJw5jHaVLA/xudW6PYAgxtI0WR9I+Ud4Y2qOxhpoj1VlbLoUTiuHB7HiaVMC9LV/hcXwg0LBdw== X-Received: by 2002:a05:600c:4f85:: with SMTP id n5mr5038679wmq.3.1614385856585; Fri, 26 Feb 2021 16:30:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c2sm15945882wrx.70.2021.02.26.16.30.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:56 -0800 (PST) Message-Id: <46304aaebf5aa4120df81284c37efcb554cface5.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:47 +0000 Subject: [PATCH v4 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren We are using dir_rename_counts to count the number of other directories that files within a directory moved to. We only need this information for directories that disappeared, though, so we can return early from update_dir_rename_counts() for other paths. If dirs_removed is passed to diffcore_rename_extended(), then it provides the relevant bits of information for us to limit this counting to relevant dirs. If dirs_removed is not passed, we would need to compute some replacement in order to do this limiting. Introduce a new info->relevant_source_dirs variable for this purpose, even though at this stage we will only set it to dirs_removed for simplicity. Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index 10f8f4a301e3..e5fa0cb555dd 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -371,6 +371,7 @@ struct dir_rename_info { struct strintmap idx_map; struct strmap dir_rename_guess; struct strmap *dir_rename_count; + struct strset *relevant_source_dirs; unsigned setup; }; @@ -442,7 +443,13 @@ static void update_dir_rename_counts(struct dir_rename_info *info, return; while (1) { + /* Get old_dir, skip if its directory isn't relevant. */ dirname_munge(old_dir); + if (info->relevant_source_dirs && + !strset_contains(info->relevant_source_dirs, old_dir)) + break; + + /* Get new_dir */ dirname_munge(new_dir); /* @@ -521,6 +528,9 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strintmap_init_with_options(&info->idx_map, -1, NULL, 0); strmap_init_with_options(&info->dir_rename_guess, NULL, 0); + /* Setup info->relevant_source_dirs */ + info->relevant_source_dirs = dirs_removed; + /* * Loop setting up both info->idx_map, and doing setup of * info->dir_rename_count. From patchwork Sat Feb 27 00:30:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12107503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05E01C433E0 for ; Sat, 27 Feb 2021 00:32:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B67F264E4E for ; Sat, 27 Feb 2021 00:32:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230169AbhB0Acx (ORCPT ); Fri, 26 Feb 2021 19:32:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230099AbhB0AcR (ORCPT ); Fri, 26 Feb 2021 19:32:17 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B7C1C0617A7 for ; Fri, 26 Feb 2021 16:30:58 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id 7so10230833wrz.0 for ; Fri, 26 Feb 2021 16:30:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=I83zAEQWb0N/xYVhJPMdPGessk611TnJHpITpVbcPHk=; b=EQN8RdWXzJIyYH9TJVcansbFINOpczugV5j0jt8MKLLcog+blKZn0Jts2NhC+bLeVY uVQSqZgOXxa9kyQWQw6CuN/McvN+kyg50s3ce0ntu9SJCE8kDqB3vgGETc1G8MllI1tb 6oJdNyIKhnX7BwBt0eHMQd3JVsUwThmmzIsLWFfQMSv9zQcK8QdeqwENz6hRNu8MwZ6b jsaaokPR49moiUbvIuoCuLIxqcWT0aqTPtHBJPmDevJbMlp/ACv8e0onKmlLPMGX5s9Z y2gt34uf4Z9qtjP+HHluxulMGQVqnimCLyNhwmmY4QvI9Ie/N7HImS+tPupo8Dj7BGLj csYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=I83zAEQWb0N/xYVhJPMdPGessk611TnJHpITpVbcPHk=; b=iOQmIHJPj1kcth/17BQIGEzEyLIE3QxgjQuG3tUI+aJlDDzY6Eq56EOoQYfcd2sqs5 pZJvYGcnA7xFALbN7LdKFgd6z9zaWBWdOnbyIomJ9Kw6XUUB8I2j91T22wQX/3hlPkLZ WOlkuefw4bmRoKlOS5/CIK1h4wsD0553TH4gks+R/FZvFdGRQTXUCpUqrdXiYgNPmf47 4zla0SRa1XHbpZ50Bow8amy9vrdd3a7BGLKmLyUY3LUjRU6yROiXnx8ZFyeq+nCuODSc LhvC8eoshNUg0BsvVpXobVpgWCGQ2gsAcil1wvZvLGfK1qlY9HYF1TUb/vaq2WvAvDt5 3Bxw== X-Gm-Message-State: AOAM532KcpSPlddZ7KENjMupC+QuBcBsmT1N84Pxgbo17PlOcLdJod2Z ymhK5Xfcq6E3l9scxOlRENqtA26PngQ= X-Google-Smtp-Source: ABdhPJxQA4pclA0hhoKC7ewatjbpcYiRhW99QFYqebbFlQG3frpkpuMpfcgXq1+p05mDHOT4jpba8w== X-Received: by 2002:a5d:4e0e:: with SMTP id p14mr5580946wrt.130.1614385857135; Fri, 26 Feb 2021 16:30:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f16sm14348825wrt.21.2021.02.26.16.30.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 16:30:56 -0800 (PST) Message-Id: <4be565c472088d4144063b736308bf2a57331f45.1614385849.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sat, 27 Feb 2021 00:30:48 +0000 Subject: [PATCH v4 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Elijah Newren , Junio C Hamano , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren dir_rename_counts has a mapping of a mapping, in particular, it has old_dir => { new_dir => count } We want a simple mapping of old_dir => new_dir based on which new_dir had the highest count for a given old_dir. Compute this and store it in dir_rename_guess. This is the final piece of the puzzle needed to make our guesses at which directory files have been moved to when basenames aren't unique. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 12.775 s ± 0.062 s 12.596 s ± 0.061 s mega-renames: 188.754 s ± 0.284 s 130.465 s ± 0.259 s just-one-mega: 5.599 s ± 0.019 s 3.958 s ± 0.010 s Reviewed-by: Derrick Stolee Signed-off-by: Elijah Newren --- diffcore-rename.c | 45 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index e5fa0cb555dd..1fe902ed2af0 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -389,6 +389,24 @@ static void dirname_munge(char *filename) *slash = '\0'; } +static const char *get_highest_rename_path(struct strintmap *counts) +{ + int highest_count = 0; + const char *highest_destination_dir = NULL; + struct hashmap_iter iter; + struct strmap_entry *entry; + + strintmap_for_each_entry(counts, &iter, entry) { + const char *destination_dir = entry->key; + intptr_t count = (intptr_t)entry->value; + if (count > highest_count) { + highest_count = count; + highest_destination_dir = destination_dir; + } + } + return highest_destination_dir; +} + static void increment_count(struct dir_rename_info *info, char *old_dir, char *new_dir) @@ -512,6 +530,8 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strset *dirs_removed, struct strmap *dir_rename_count) { + struct hashmap_iter iter; + struct strmap_entry *entry; int i; if (!dirs_removed) { @@ -558,6 +578,23 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, rename_dst[i].p->one->path, rename_dst[i].p->two->path); } + + /* + * Now we collapse + * dir_rename_count: old_directory -> {new_directory -> count} + * down to + * dir_rename_guess: old_directory -> best_new_directory + * where best_new_directory is the one with the highest count. + */ + strmap_for_each_entry(info->dir_rename_count, &iter, entry) { + /* entry->key is source_dir */ + struct strintmap *counts = entry->value; + char *best_newdir; + + best_newdir = xstrdup(get_highest_rename_path(counts)); + strmap_put(&info->dir_rename_guess, entry->key, + best_newdir); + } } void partial_clear_dir_rename_count(struct strmap *dir_rename_count) @@ -682,10 +719,10 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info) * rename. * * This function, idx_possible_rename(), is only responsible for (4). - * The conditions/steps in (1)-(3) will be handled via setting up - * dir_rename_count and dir_rename_guess in a future - * initialize_dir_rename_info() function. Steps (0) and (5) are - * handled by the caller of this function. + * The conditions/steps in (1)-(3) are handled via setting up + * dir_rename_count and dir_rename_guess in + * initialize_dir_rename_info(). Steps (0) and (5) are handled by + * the caller of this function. */ char *old_dir, *new_dir; struct strbuf new_path = STRBUF_INIT;