From patchwork Thu May 27 08:37:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12283707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83D2FC47089 for ; Thu, 27 May 2021 08:37:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 602EC613C0 for ; Thu, 27 May 2021 08:37:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235488AbhE0Ii7 (ORCPT ); Thu, 27 May 2021 04:38:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235284AbhE0Ii7 (ORCPT ); Thu, 27 May 2021 04:38:59 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 014EEC0613CE for ; Thu, 27 May 2021 01:37:26 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id r10so3800756wrj.11 for ; Thu, 27 May 2021 01:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=Mv2+RTSWpB/XqYEOUbrAx2L54DZGVdKQ+82kshkg7UM=; b=cLdT3IwVXxjlARvphwAmeNbILL5gp0xJ8YyjaVm6C7PH83wUj1w+Od78BoW09Zkvhg bUDm3OEwNOX+CfIFOkaKRr3ox/FA/G3Gd1Io6VJI33xcf2NhgJKDYe1VSikgERWWjb1f xzC4S1V5EamnUG8eXJ2CIWlGnjyBtUDUH/AbiH2J0ICSBtFSKIq74kd5c6uQzjSeBKHN Sg5Oca1ctE4W0EMjTMK3K9gDFxa2E+ZZz3OfrqluU117gWhaVy9ZnyXkRYzwZnHDRiIq gU+rPCmrIqNc0hTW2hr43kyduDiO3M/LA6Pzfj4B+oifCmmf68XchNajnpqyb9ttn/we vV7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=Mv2+RTSWpB/XqYEOUbrAx2L54DZGVdKQ+82kshkg7UM=; b=D6XkwgRcZ8PcHpBErRvxFXRdEcQ5RfpLbgGisWV4ZpLIByf1wtwu0W3dk511gREIXf 0iRo4GJfKwpvezVoQgNGicLg5qlYMaqMNeZ0pbNMgJL+A3JTHi4FCXKa3ROwkpQnL0+T lk+SSJpMoe7Igemd9gx/2qjKNoHTa66dEkkBjbB9R6mkwl2xxOHOA7BNYr7W4/1B2dx9 lMOuNeBbcAV2jkZYtVvyTtmCBrORx2fSgr2UqbEaqwZgGK0NlAJpxw044swBqiFIC/7R IoitMZGzNHydBpkMLCzIT3BLUsWn8WZasnEFAefS6CBpU9pZmldsBjW+aeox/X8EDdnY bo+w== X-Gm-Message-State: AOAM533yIZLWlVWk8LHGuc/6UrTyzSG+wMOUqn4XBkwYzRhp7skmM+2A JSTnw9xSz4NAd6pitmyok2EcXD3gl5g= X-Google-Smtp-Source: ABdhPJx7ygqTaRJCaMTvf14cblxopb4/Ox3bXq8FsCXrUIEwU6ZhZSkdwUK1A2VKwlSuvnqqbx5fZg== X-Received: by 2002:a05:6000:1544:: with SMTP id 4mr2073018wry.370.1622104644562; Thu, 27 May 2021 01:37:24 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id k6sm9517958wmi.42.2021.05.27.01.37.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 01:37:24 -0700 (PDT) Message-Id: <5055dfce32815c8c8ec250457df389d4cd02ee12.1622104642.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 27 May 2021 08:37:17 +0000 Subject: [PATCH 1/5] merge-ort: replace string_list_df_name_compare with faster alternative MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Gathering accumulated times from trace2 output on the mega-renames testcase, I saw the following timings (where I'm only showing a few lines to highlight the portions of interest): 10.120 : label:incore_nonrecursive 4.462 : ..label:process_entries 3.143 : ....label:process_entries setup 2.988 : ......label:plist special sort 1.305 : ....label:processing 2.604 : ..label:collect_merge_info 2.018 : ..label:merge_start 1.018 : ..label:renames In the above output, note that the 4.462 seconds for process_entries was split as 3.143 seconds for "process_entries setup" and 1.305 seconds for "processing" (and a little time for other stuff removed from the highlight). Most of the "process_entries setup" time was spent on "plist special sort" which corresponds to the following code: trace2_region_enter("merge", "plist special sort", opt->repo); plist.cmp = string_list_df_name_compare; string_list_sort(&plist); trace2_region_leave("merge", "plist special sort", opt->repo); In other words, in a merge strategy that would be invoked by passing "-sort" to either rebase or merge, sorting an array takes more time than anything else. Serves me right for naming my merge strategy this way. Rewrite the comparison function and remove as many levels of indirection as possible (e.g. the old code had cmp_items() -> string_list_df_name_compare() -> df_name_compare() now we just have sort_dirs_next_to_their_children()), and tweak it to be as optimized as possible for our specific case. These changes reduced the time spent in "plist special sort" by ~25% in the mega-renames case. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.622 s ± 0.059 s 5.235 s ± 0.042 s mega-renames: 10.127 s ± 0.073 s 9.419 s ± 0.107 s just-one-mega: 500.3 ms ± 3.8 ms 480.1 ms ± 3.9 ms Signed-off-by: Elijah Newren --- merge-ort.c | 64 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 41 insertions(+), 23 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 142d44d74d63..367aec4b7def 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2746,31 +2746,50 @@ static int detect_and_process_renames(struct merge_options *opt, /*** Function Grouping: functions related to process_entries() ***/ -static int string_list_df_name_compare(const char *one, const char *two) +static int sort_dirs_next_to_their_children(const void *a, const void *b) { - int onelen = strlen(one); - int twolen = strlen(two); /* - * Here we only care that entries for D/F conflicts are - * adjacent, in particular with the file of the D/F conflict - * appearing before files below the corresponding directory. - * The order of the rest of the list is irrelevant for us. + * Here we only care that entries for directories appear adjacent + * to and before files underneath the directory. In other words, + * we do not want the natural sorting of + * foo + * foo.txt + * foo/bar + * Instead, we want "foo" to sort as though it were "foo/", so that + * we instead get + * foo.txt + * foo + * foo/bar + * To achieve this, we basically implement our own strcmp, except that + * if we get to the end of either string instead of comparing NUL to + * another character, we compare '/' to it. * - * To achieve this, we sort with df_name_compare and provide - * the mode S_IFDIR so that D/F conflicts will sort correctly. - * We use the mode S_IFDIR for everything else for simplicity, - * since in other cases any changes in their order due to - * sorting cause no problems for us. + * The reason to not use df_name_compare directly was that it was + * just too expensive, so I had to reimplement it. */ - int cmp = df_name_compare(one, onelen, S_IFDIR, - two, twolen, S_IFDIR); - /* - * Now that 'foo' and 'foo/bar' compare equal, we have to make sure - * that 'foo' comes before 'foo/bar'. - */ - if (cmp) - return cmp; - return onelen - twolen; + const char *one = ((struct string_list_item *)a)->string; + const char *two = ((struct string_list_item *)b)->string; + unsigned char c1, c2; + + while (*one && (*one == *two)) { + one++; + two++; + } + + c1 = *one; + if (!c1) + c1 = '/'; + + c2 = *two; + if (!c2) + c2 = '/'; + + if (c1 == c2) { + /* Getting here means one is a leading directory of the other */ + return (*one) ? 1 : -1; + } + else + return c1-c2; } static int read_oid_strbuf(struct merge_options *opt, @@ -3481,8 +3500,7 @@ static void process_entries(struct merge_options *opt, trace2_region_leave("merge", "plist copy", opt->repo); trace2_region_enter("merge", "plist special sort", opt->repo); - plist.cmp = string_list_df_name_compare; - string_list_sort(&plist); + QSORT(plist.items, plist.nr, sort_dirs_next_to_their_children); trace2_region_leave("merge", "plist special sort", opt->repo); trace2_region_leave("merge", "process_entries setup", opt->repo); From patchwork Thu May 27 08:37:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12283709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDBA5C4708A for ; Thu, 27 May 2021 08:37:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0199613C9 for ; Thu, 27 May 2021 08:37:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235497AbhE0Ii7 (ORCPT ); Thu, 27 May 2021 04:38:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235427AbhE0Ii7 (ORCPT ); Thu, 27 May 2021 04:38:59 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 730F4C061574 for ; Thu, 27 May 2021 01:37:26 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id f75-20020a1c1f4e0000b0290171001e7329so1996573wmf.1 for ; Thu, 27 May 2021 01:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=wMVXDPLGBMLYT6OBJ2rWFehw8Rz0HAUKh3+HU5yscHk=; b=UCCXTtg4efnLjuLVuZBrq7xE/lJecqI2uNqSJFHcJAGoxim2UJirYPJUvKZS6i2YWe aNR/OuSz9Qw3LQh+LvhTJQt7ELp0ZE4rF7GoymCBwJrjTBO02Ttnuhoma8nNe1eIlSNd OkAf7MrtHwNw9J2J1K0qK/514WuQTpMulqfg0Su8acpww+8Pvefa9kA64AtwICb89s0P w3rPH0gTO2nDnq/Al+MpW4sQjW3l/CAJEmyzDFEsL2cCiGJJaYndwFWel0cGXfEgZDA8 HEsn/uwakQq3xt6e4pIyJNHDyhyU18RI/tIb2gD0XHfgZ11J6LCtEeYo/rmP661m39f0 92hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=wMVXDPLGBMLYT6OBJ2rWFehw8Rz0HAUKh3+HU5yscHk=; b=eveJMKg99SNE6CvKMZxNSvLHDtG42kLkzJ88nlSFF+X7X3uneUNSksEo6RyyhswRNH 2qqU481S1zr9RAOxI4FYz044t3xHU0pUuVZJCbOhvnY3dbAr5exfv/Yx+YJ4Jl/2jSmr 8N3lRFBezjcITlHtB7RSH86UHLziG6Sp7qOXIs7rWTu147gfEYXTGg/wcvILHc4cQKGN jPprDVS1MpePDb+VM4fkq55VpShEiwfiY8dzXXh9PF6PIjTLM2F51opCmgVItPtNSJnu CO1wRhmvuijfxjzf7asms9exCOgumNcog9DDNUVCgp+RnEk9Jp+6x1di8eQ78KKml4z0 dcqw== X-Gm-Message-State: AOAM532y52jT3nlK2qEZKmDAnZc4Coe7KxIX5KeJSwlB9xjNicMuxQsm C/90Aq6FglGFh20/L9KtEW7Ws1iokFQ= X-Google-Smtp-Source: ABdhPJzpFVxcAh+ICyuLJ6dhjOv0SBNeUAoj0zLGP3BcrvDvCc5puKTm/yLDhWLkC0l7RfKx8pZDvw== X-Received: by 2002:a05:600c:4ca7:: with SMTP id g39mr2287374wmp.1.1622104645159; Thu, 27 May 2021 01:37:25 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p16sm2053906wrs.52.2021.05.27.01.37.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 01:37:24 -0700 (PDT) Message-Id: <7212816c8d4734b33a874835e53d9e13b3198971.1622104642.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 27 May 2021 08:37:18 +0000 Subject: [PATCH 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The keys of break_idx are strings from the diff_filepairs of diff_queued_diff. break_idx is only used in location_rename_dst(), and that usage is always before any free'ing of the pairs (and thus the strings in the pairs). As such, there is no need to strdup these keys; we can just reuse the existing strings as-is. The merge logic doesn't make use of break detection, so this does not affect the performance of any of my testcases. It was just a minor unrelated optimization noted in passing while looking at the code. Signed-off-by: Elijah Newren --- diffcore-rename.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 3375e24659ea..e333a6d64791 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -54,7 +54,7 @@ static void register_rename_src(struct diff_filepair *p) if (p->broken_pair) { if (!break_idx) { break_idx = xmalloc(sizeof(*break_idx)); - strintmap_init(break_idx, -1); + strintmap_init_with_options(break_idx, -1, NULL, 0); } strintmap_set(break_idx, p->one->path, rename_dst_nr); } From patchwork Thu May 27 08:37:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12283713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3B9FC47089 for ; Thu, 27 May 2021 08:37:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 832DA613D1 for ; Thu, 27 May 2021 08:37:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235524AbhE0IjC (ORCPT ); Thu, 27 May 2021 04:39:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235511AbhE0IjB (ORCPT ); Thu, 27 May 2021 04:39:01 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11AA4C061574 for ; Thu, 27 May 2021 01:37:27 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id g17so3610469wrs.13 for ; Thu, 27 May 2021 01:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=nWGMIEMq36+TyKFGD8lw4MR52upZEZSPH0ppbRIQWmc=; b=XcEDse+WVExF1qkqnG+EKf7jn9IChT7/zFtpswM6I1X3cdUyk+p66GPHs66EMH0Rx4 WdyYS7qUls9dZejbG4W5yfba41GYWfiuFHq1ax0P1oLX9pRY5a9mmKDuRRdrI+KepQiy qUjKGBzqu4D/t5NTYq3DDxjDCzuR3zxY67l0f1he6pWBepriIRoYZqDjB8ysaOO2ZUlj eH8lwiLAWeeeBdnH+PHUnvldhZfIWeQCCMhpmciTp6GIkQyANO29qdzRnE+5r5ddWOAn 8RPEOuuWJXVARis8X1R+Ir0i6n+W/dWr7HviLnUEh/5l+vgiaLIT/G05nSt3VcTjYmKZ 4q0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=nWGMIEMq36+TyKFGD8lw4MR52upZEZSPH0ppbRIQWmc=; b=TL53vwlxdcR+/4DNjeVKXGZ+x7EKrDQBFTcwXF5E/o3yyavO8e3TJSvh74736tW77O Zg1A58imd0k6IJy5mdUrjOjnOolUiyBTM/OtOOlCmFtyIp7VKH582wiL2XEEUE/XePJu 4jHSmHtECL4AHZNTNitq3xH4u/MbSxd/R096buu6XEgqluvfvNkoWBw9g9S78UQ3VKJc L+5TeNgLgchYkqLlgeLZGICN5fylI4JDb5wyRgs1d92FtfQRT/ycbaLu/FsYcJ77jvDy 0uzHPoCxOYX9VECc6UApuk5eejmIu5WOTv77GZNAoe5xB56MnUUItt6G90xMvBQPO+5n n1WQ== X-Gm-Message-State: AOAM533nlJJM5Au3oHvJBemT05yjNLvYGEstH4kdBAkdzs4AhyUiNvZD YC8JHUjwpWfwUkyhFGJz8us5mKi1xcM= X-Google-Smtp-Source: ABdhPJx6d9JXKwKbNfTAT2Xkrg2PW1bPlR7MK2bpZbagMxRnDvLK0zT1xsSIz5xsRZfBRD8PuHh84Q== X-Received: by 2002:a5d:4a81:: with SMTP id o1mr1984371wrq.177.1622104645711; Thu, 27 May 2021 01:37:25 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i1sm1946594wmb.46.2021.05.27.01.37.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 01:37:25 -0700 (PDT) Message-Id: <19150b5750586996383dc26f3801a9441486f9f0.1622104642.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 27 May 2021 08:37:19 +0000 Subject: [PATCH 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Our former optimizations focused on limiting rename detection to a pre-specified set of relevant sources. This was because the merge logic only had a way of knowing which sources were relevant. However, other callers of rename detection might benefit from being able to limit rename detection to a known set of relevant destinations. In particular, a properly implemented `git log --follow` might benefit from such an ability. Since the code to implement such limiting is very similar to what we've already done, just implement it now even though we do not yet have any callers making use of this ability. Signed-off-by: Elijah Newren --- diffcore-rename.c | 48 +++++++++++++++++++++++++++++++++++++++++------ diffcore.h | 2 ++ merge-ort.c | 1 + 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index e333a6d64791..8ff83a9f3b99 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -372,6 +372,7 @@ struct dir_rename_info { struct strmap dir_rename_guess; struct strmap *dir_rename_count; struct strintmap *relevant_source_dirs; + struct strset *relevant_destination_dirs; unsigned setup; }; @@ -491,8 +492,11 @@ static void update_dir_rename_counts(struct dir_rename_info *info, !strintmap_contains(info->relevant_source_dirs, old_dir)) break; - /* Get new_dir */ + /* Get new_dir, skip if its directory isn't relevant. */ dirname_munge(new_dir); + if (info->relevant_destination_dirs && + !strset_contains(info->relevant_destination_dirs, new_dir)) + break; /* * When renaming @@ -567,6 +571,7 @@ static void update_dir_rename_counts(struct dir_rename_info *info, static void initialize_dir_rename_info(struct dir_rename_info *info, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed, struct strmap *dir_rename_count, struct strmap *cached_pairs) @@ -575,7 +580,7 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, struct strmap_entry *entry; int i; - if (!dirs_removed && !relevant_sources) { + if (!dirs_removed && !relevant_sources && !relevant_destinations) { info->setup = 0; return; } @@ -589,6 +594,18 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, strintmap_init_with_options(&info->idx_map, -1, NULL, 0); strmap_init_with_options(&info->dir_rename_guess, NULL, 0); + /* Setup info->relevant_destination_dirs */ + info->relevant_destination_dirs = NULL; + if (relevant_destinations) { + info->relevant_destination_dirs = xmalloc(sizeof(struct strset)); + strset_init(info->relevant_destination_dirs); + strset_for_each_entry(relevant_destinations, &iter, entry) { + char *dirname = get_dirname(entry->key); + strset_add(info->relevant_destination_dirs, dirname); + free(dirname); + } + } + /* Setup info->relevant_source_dirs */ info->relevant_source_dirs = NULL; if (dirs_removed || !relevant_sources) { @@ -700,6 +717,12 @@ static void cleanup_dir_rename_info(struct dir_rename_info *info, FREE_AND_NULL(info->relevant_source_dirs); } + /* relevant_destination_dirs */ + if (info->relevant_destination_dirs) { + strset_clear(info->relevant_destination_dirs); + FREE_AND_NULL(info->relevant_destination_dirs); + } + /* dir_rename_count */ if (!keep_dir_rename_count) { partial_clear_dir_rename_count(info->dir_rename_count); @@ -827,6 +850,7 @@ static int find_basename_matches(struct diff_options *options, int minimum_score, struct dir_rename_info *info, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed) { /* @@ -949,9 +973,15 @@ static int find_basename_matches(struct diff_options *options, if (rename_dst[dst_index].is_rename) continue; /* already used previously */ - /* Estimate the similarity */ one = rename_src[src_index].p->one; two = rename_dst[dst_index].p->two; + + /* Skip irrelevant destinations */ + if (relevant_destinations && + !strset_contains(relevant_destinations, two->path)) + continue; + + /* Estimate the similarity */ score = estimate_similarity(options->repo, one, two, minimum_score, skip_unmodified); @@ -1258,6 +1288,7 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info, void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed, struct strmap *dir_rename_count, struct strmap *cached_pairs) @@ -1376,8 +1407,8 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); initialize_dir_rename_info(&info, relevant_sources, - dirs_removed, dir_rename_count, - cached_pairs); + relevant_destinations, dirs_removed, + dir_rename_count, cached_pairs); trace2_region_leave("diff", "dir rename setup", options->repo); /* Utilize file basenames to quickly find renames. */ @@ -1386,6 +1417,7 @@ void diffcore_rename_extended(struct diff_options *options, min_basename_score, &info, relevant_sources, + relevant_destinations, dirs_removed); trace2_region_leave("diff", "basename matches", options->repo); @@ -1441,6 +1473,10 @@ void diffcore_rename_extended(struct diff_options *options, if (rename_dst[i].is_rename) continue; /* exact or basename match already handled */ + if (relevant_destinations && + !strset_contains(relevant_destinations, two->path)) + continue; + m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST]; for (j = 0; j < NUM_CANDIDATE_PER_DST; j++) m[j].dst = -1; @@ -1574,5 +1610,5 @@ void diffcore_rename_extended(struct diff_options *options, void diffcore_rename(struct diff_options *options) { - diffcore_rename_extended(options, NULL, NULL, NULL, NULL); + diffcore_rename_extended(options, NULL, NULL, NULL, NULL, NULL); } diff --git a/diffcore.h b/diffcore.h index 533b30e21e7f..435c7094f403 100644 --- a/diffcore.h +++ b/diffcore.h @@ -10,6 +10,7 @@ struct diff_options; struct repository; struct strintmap; struct strmap; +struct strset; struct userdiff_driver; /* This header file is internal between diff.c and its diff transformers @@ -180,6 +181,7 @@ void diffcore_break(struct repository *, int); void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, + struct strset *relevant_destinations, struct strintmap *dirs_removed, struct strmap *dir_rename_count, struct strmap *cached_pairs); diff --git a/merge-ort.c b/merge-ort.c index 367aec4b7def..db16cbc3bd33 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2568,6 +2568,7 @@ static void detect_regular_renames(struct merge_options *opt, trace2_region_enter("diff", "diffcore_rename", opt->repo); diffcore_rename_extended(&diff_opts, &renames->relevant_sources[side_index], + NULL, &renames->dirs_removed[side_index], &renames->dir_rename_count[side_index], &renames->cached_pairs[side_index]); From patchwork Thu May 27 08:37:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12283715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9ADFC4708A for ; Thu, 27 May 2021 08:37:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9171E613C9 for ; Thu, 27 May 2021 08:37:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235530AbhE0IjD (ORCPT ); Thu, 27 May 2021 04:39:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235509AbhE0IjB (ORCPT ); Thu, 27 May 2021 04:39:01 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5A46C061760 for ; Thu, 27 May 2021 01:37:27 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id f6-20020a1c1f060000b0290175ca89f698so2058146wmf.5 for ; Thu, 27 May 2021 01:37:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=P7QN3I4WING/y1zuJ9mtu4XjnjEMhHH1ekim8qeDxq0=; b=ChTKnzcPqw0en7+ACX9hWyQKqn/QaFmCxNK+3M/gHRf2ijwJLCz9T60HvUScXl9YDS 1pK3YKwtOR498lbwNO4L89LwewpXZ2ZcKDOvBWA7rKuC2OyGSuiny4GY6jY+KvkhoBC1 qklKOSWiKDCZrPsKE1uN5uZK8mBtjZOVdYTMu302dsD2s4ony21HaHkDh/DdbSzh22tL OSu/Zt5f39vwMjdLNC4rd+fsnOxaYgUGLbVu64SlGpX4LDdUrKSY7fzHmrl/HLxv0yyu tA/ro38xjYe16Dmu8nZOGk+12njyqsCHyv3dwHiHPkzNyRQm/LSfgeJIeVdDDFRUpIM9 kjGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=P7QN3I4WING/y1zuJ9mtu4XjnjEMhHH1ekim8qeDxq0=; b=gE1W0YLnjBgwmwW8ScQkRRS46QgeHw7T/S3c8nYT6kgK6bUKbVaOrNdbds9HdWx/E4 FBRJSoObnJtuny7djUc7LsVCKmac5cNrVh1ix9v/MHn6lqiNcNlzDKH5ZCBMhHJG4ASX ZTUFTQU9GMrkC17dQk+VPXncF+F4vLIU1Tjz19SgumPkx7DKccDcpNbf+CEf2m1MYt/L 4w0q9CAO8XKZNsPXmY/xOZpIFixAJ8HJtN3s8H7mJOBnDo+v1DddMP41RmGvHuZqRlVo z1CN7xYy+hPcz/K/bbtobd4MPTm2M118MOZPOjbSYqy2P50gR5jMxvqzI6oxKKY1AU7H aKcw== X-Gm-Message-State: AOAM530EgPeLTdpFJXdKbi32LTqxPZkEBUHAGU2YMU1baYzlL8dMxkGt pi08LS6lMq94JCcDjMi4oYbSe+8EISk= X-Google-Smtp-Source: ABdhPJw3YAWVBCempxeOmfajuM6MPpUnlIctlCkY3NHxTYsJobCcwMx821g0huN7kBLW1k9wJjDm4Q== X-Received: by 2002:a7b:cd83:: with SMTP id y3mr2291584wmj.155.1622104646308; Thu, 27 May 2021 01:37:26 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b16sm1145722wmb.36.2021.05.27.01.37.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 01:37:25 -0700 (PDT) Message-Id: <98c9a419b313261a21d3794742181c27ed8cd0cb.1622104642.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 27 May 2021 08:37:20 +0000 Subject: [PATCH 4/5] Fix various issues found in comments Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren A random hodge-podge of incorrect or out-of-date comments that I found: * t6423 had a comment that has referred to the wrong test for years; fix it to refer to the right one. * diffcore-rename had a FIXME comment meant to remind myself to investigate if I could make another code change. I later investigated and removed the FIXME, but while cherry-picking the patch to submit upstream I missed the later update. Remove the comment now. * merge-ort had the early part of a comment for a function; I had meant to include the more involved description when I updated the function. Update the comment now. Signed-off-by: Elijah Newren --- diffcore-rename.c | 2 +- merge-ort.c | 8 +++++--- t/t6423-merge-rename-directories.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 8ff83a9f3b99..c9f7d59cf62b 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -1579,7 +1579,7 @@ void diffcore_rename_extended(struct diff_options *options, /* all the usual ones need to be kept */ diff_q(&outq, p); else - /* no need to keep unmodified pairs; FIXME: remove earlier? */ + /* no need to keep unmodified pairs */ pair_to_free = p; if (pair_to_free) diff --git a/merge-ort.c b/merge-ort.c index db16cbc3bd33..6cb103c8e855 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2533,7 +2533,7 @@ static int compare_pairs(const void *a_, const void *b_) return strcmp(a->one->path, b->one->path); } -/* Call diffcore_rename() to compute which files have changed on given side */ +/* Call diffcore_rename() to update deleted/added pairs into rename pairs */ static void detect_regular_renames(struct merge_options *opt, unsigned side_index) { @@ -2587,8 +2587,10 @@ static void detect_regular_renames(struct merge_options *opt, } /* - * Get information of all renames which occurred in 'side_pairs', discarding - * non-renames. + * Get information of all renames which occurred in 'side_pairs', making use + * of any implicit directory renames in side_dir_renames (also making use of + * implicit directory renames rename_exclusions as needed by + * check_for_directory_rename()). Add all (updated) renames into result. */ static int collect_renames(struct merge_options *opt, struct diff_queue_struct *result, diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh index be84d22419d9..e834b7e6efe0 100755 --- a/t/t6423-merge-rename-directories.sh +++ b/t/t6423-merge-rename-directories.sh @@ -454,7 +454,7 @@ test_expect_success '1f: Split a directory into two other directories' ' # the directory renamed, but the files within it. (see 1b) # # If renames split a directory into two or more others, the directory -# with the most renames, "wins" (see 1c). However, see the testcases +# with the most renames, "wins" (see 1f). However, see the testcases # in section 2, plus testcases 3a and 4a. ########################################################################### From patchwork Thu May 27 08:37:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12283717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7F0FC4708B for ; Thu, 27 May 2021 08:37:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA914613C9 for ; Thu, 27 May 2021 08:37:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235538AbhE0IjE (ORCPT ); Thu, 27 May 2021 04:39:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235516AbhE0IjB (ORCPT ); Thu, 27 May 2021 04:39:01 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32D63C061761 for ; Thu, 27 May 2021 01:37:28 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id z137-20020a1c7e8f0000b02901774f2a7dc4so4670501wmc.0 for ; Thu, 27 May 2021 01:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=AAhC2SD+HHcObf7qc+SmjKN88DhPbGlfJOHemGfoooI=; b=JydjFhD4+JogLSDgkVpvtYIOddKiYI/iFeZe2SejGQwvi1/psv6fACDYHQQtLfdORV RQR3IvP+ywuogeU6GRG1BPUEKyQ+50qrfjDXyY1C2NZ6tPzCJgyf5JBl18qgHF2lnCUK dVvu0cbG9wEFXYxwCfBfUmZtwoP/gj6l63TFNGA61RlEi5XTdIAf9N7EdVwwV4RxjtHh BQpeA9+i3w/W0cFf0FKaJICfx9U9u3DyZW+cx8/UyApdLJn3RsU4bGJ7iIpjOOBhG6AJ FqCcO7arcwcFuT3Rw+jWkYBBaH7HQqz7ZXRRg51U0PO8GF5Hy/gIv/etf5hxR0i+TtHD 24Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=AAhC2SD+HHcObf7qc+SmjKN88DhPbGlfJOHemGfoooI=; b=gIJvj8Derrwkov86OIDgYrurEEwMx9LOOb0jq2Dqmgrahljb1v8Uq7SrBJd2+bVhmk odBXtxQbgabLZQ2ISqcwEPB+VjfWqC/4Bjpw/fK0i3zp3H51ByvG9tNhUY6xOZi/FqcG aKkdYetsKyZL28CcJWvFXK6rkLpLRV106GK3b+wzd1ZYPduaIUsiEr8afwwsVjOtU35H pEnEVY0x1PuY7mAVoJGzagmU0Gi9tagMYNSAlFSnDNuyHymJHBI2nIcJyyalybwG/kre 5XMoBoyN220YH2hNZqJOW2W6sWLcgbhH8agCvUhrg81OptlwofJQRkl7EyzLWQag41sL S5bw== X-Gm-Message-State: AOAM532wK/3daYHeYX0tde4rS43yQc3jX9CiaZNPW+KClW1z7dmt2SZO 6Mzc0fMasjvg0Eg61GzSl2Q4QVAZ/sI= X-Google-Smtp-Source: ABdhPJyTM5TymO5Z3KTTQYJV6nsU5QiDfMcB3cseGP2+WZ4solIgkf3u1JhzNWXv29iogkzGw+5Tuw== X-Received: by 2002:a1c:c911:: with SMTP id f17mr2312096wmb.45.1622104646830; Thu, 27 May 2021 01:37:26 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t14sm8948381wmq.16.2021.05.27.01.37.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 01:37:26 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 27 May 2021 08:37:21 +0000 Subject: [PATCH 5/5] merge-ort: miscellaneous touch-ups Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , Jonathan Tan , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Add some notes in the code about invariants with match_mask when adding pairs. Also add a comment that seems to have been left out in my work of pushing these changes upstream. Signed-off-by: Elijah Newren --- merge-ort.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 6cb103c8e855..e174a8734a41 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -765,6 +765,7 @@ static void add_pair(struct merge_options *opt, int names_idx = is_add ? side : 0; if (is_add) { + assert(match_mask == 0 || match_mask == 6); if (strset_contains(&renames->cached_target_names[side], pathname)) return; @@ -772,6 +773,8 @@ static void add_pair(struct merge_options *opt, unsigned content_relevant = (match_mask == 0); unsigned location_relevant = (dir_rename_mask == 0x07); + assert(match_mask == 0 || match_mask == 3 || match_mask == 5); + /* * If pathname is found in cached_irrelevant[side] due to * previous pick but for this commit content is relevant, @@ -3470,6 +3473,8 @@ static void process_entry(struct merge_options *opt, */ if (!ci->merged.clean) strmap_put(&opt->priv->conflicted, path, ci); + + /* Record metadata for ci->merged in dir_metadata */ record_entry_for_tree(dir_metadata, path, &ci->merged); }