From patchwork Tue Jul 13 19:32:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374777 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76895C07E95 for ; Tue, 13 Jul 2021 19:33:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C7FD61279 for ; Tue, 13 Jul 2021 19:33:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231499AbhGMTf6 (ORCPT ); Tue, 13 Jul 2021 15:35:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229944AbhGMTf6 (ORCPT ); Tue, 13 Jul 2021 15:35:58 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2570C0613DD for ; Tue, 13 Jul 2021 12:33:07 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id t5so77756wrw.12 for ; Tue, 13 Jul 2021 12:33:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=UtksxVEuW8oPlg3lwNLFR4T2YJ+49CjWCzglLfAr+Zc=; b=dutAVEfHghJDAIHaD0hWzSx2FCY2mQtq3qXrG9kl38or964KR7f2AQTf6CaE3JDFod 1CQsoQHm9cUzlNVSlzv0fRSHbttt7TTP6/Nfh5YZps70rW6ZQuPL62qD9TlmtvtMbfD/ NPmwKmiZzxuMzqDhQoKJ+JGdPT4D52AwAMFGGAzrWNCCYn30D1uPiphnIzurjaE/rqyf j3V8rjXdROs5KsQ8Mxy/E97/IdcoYqnIarfLU3RWsEK6gROIXrFGC1xjtvez/Y9VZUFm HfXNBMwhmwc90JG+RQ0smp4qJHkUBzfz1g+Z/UAEbkHWUHsR8b/ynsVSP4denqLYtSRU m7hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=UtksxVEuW8oPlg3lwNLFR4T2YJ+49CjWCzglLfAr+Zc=; b=QwdhMigwfKnVRn8ONiDN4rHuVrPBq9G3Krh7AMS8PwTvwT9ukTEjgiRPDlwcvGxNz2 9I/K68bvkqs5rCwDHvwOL8L60DOu5vKCZiWvDNdZnzXKLsI1Mxa5q6dvKJBsDYqNdg06 S9/rUh0w7V9qLqD2rSQ0yUnA6F3gT+F+BGkEDRY2edIEsSZBNoM3Eq9tBjkTTTdT32i8 VeSRW50S31n7Zy3EH3q2MtnGZcoTVAcFj9QSH4WASVbWY1Bp1GoBEviFknBirMScysUS VZnRxwiLRvNYYK8f5ryno5wJFOMoz/+KlFjvBFUmbepqCD9epm038pTyi/5e6LktSuCi spYw== X-Gm-Message-State: AOAM531aTqa+Rs7lAMeiRTE1y3pPbcunT4nTU6ZtqzwkqRvi4r4BX8De bihzCwN5WIhpfnZDJEYYmCdggH+x5xo= X-Google-Smtp-Source: ABdhPJyTv5u6xmTz8oo63seHA0n9CoZjBZAYnkpiDPhQZ0RDgjfoD7uxF/+W1oTCYtTSyYa/SJFqWA== X-Received: by 2002:a5d:4351:: with SMTP id u17mr7706826wrr.47.1626204786494; Tue, 13 Jul 2021 12:33:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i15sm19965177wro.3.2021.07.13.12.33.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:06 -0700 (PDT) Message-Id: <5dca982c0b061ae8d6335d4f22b78811dced054a.1626204784.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 13 Jul 2021 19:32:57 +0000 Subject: [PATCH v2 1/7] merge-ort: resolve paths early when we have sufficient information Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When there are no directories involved at a given path, and all three sides have a file at that path, and two of the three sides of history match, we can immediately resolve the merge of that path in collect_merge_info() and do not need to wait until process_entries(). This is actually a very minor improvement: half the time when I run it, I see an improvement; the other half a slowdown. It seems to be in the range of noise. However, this idea serves as the beginning of some bigger optimizations coming in the following patches. Signed-off-by: Elijah Newren --- merge-ort.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index e3a5dfc7b31..6299b4f9413 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1023,6 +1023,43 @@ static int collect_merge_info_callback(int n, return mask; } + /* + * If the sides match, and all three paths are present and are + * files, then we can take either as the resolution. We can't do + * this with trees, because there may be rename sources from the + * merge_base. + */ + if (sides_match && filemask == 0x07) { + /* use side1 (== side2) version as resolution */ + setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+1, side1_null, 0, + filemask, dirmask, 1); + return mask; + } + + /* + * If side1 matches mbase and all three paths are present and are + * files, then we can use side2 as the resolution. We cannot + * necessarily do so this for trees, because there may be rename + * destinations within side2. + */ + if (side1_matches_mbase && filemask == 0x07) { + /* use side2 version as resolution */ + setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+2, side2_null, 0, + filemask, dirmask, 1); + return mask; + } + + /* Similar to above but swapping sides 1 and 2 */ + if (side2_matches_mbase && filemask == 0x07) { + /* use side1 version as resolution */ + setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, + names, names+1, side1_null, 0, + filemask, dirmask, 1); + return mask; + } + /* * Gather additional information used in rename detection. */ From patchwork Tue Jul 13 19:32:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ACC0C11F66 for ; Tue, 13 Jul 2021 19:33:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C3626128E for ; Tue, 13 Jul 2021 19:33:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234548AbhGMTf7 (ORCPT ); Tue, 13 Jul 2021 15:35:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229944AbhGMTf6 (ORCPT ); Tue, 13 Jul 2021 15:35:58 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83E8EC0613E9 for ; Tue, 13 Jul 2021 12:33:08 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id r11so98637wro.9 for ; Tue, 13 Jul 2021 12:33:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=jTYYZdm9bq7NPN5b67LXlVdIpBH+R0ThjCtl7RWxqOU=; b=uDpxEY5zdZlihGcPytYNdF0HjQew3eRlYI2/MibSAubbQeMPnsGk+pOYYQzZdVP+/N ZWA+WG0xFZaHf4x4DJVjIaLBUYaTwDBYIihV1opMxlUUsOyxMFEQYsmaf1QrbllplBI1 z2O6Cy+D4A1q81kDPM7XHGLet3MbZD7bXGzuM/lzc9ZkWozSuqRQ1ehn07y/srtb52Rj ljwBfa/HRLGGjbhIdrFhMdMcoT7gx3Wlmn/xcwHC1JrtrbZekbQJEgZnvQ8GNjgF0d8T WnL0vqp2mgpbx6CTRDX+ZuMHT1HegPLUNGYcAa2bHLSh7Mne9gE+YLOBVxxMlevFSneB ABSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=jTYYZdm9bq7NPN5b67LXlVdIpBH+R0ThjCtl7RWxqOU=; b=qMMbLe5MsIdp7eQHdZ7YcOLz3Jb/EdTfXcAo0FMHi8Gv6rzPDaPT0m72SGHzr1+M8z /0CJW8wwWidkjJuPG6yy1x3uBgWYd4yEjER1HozaoYT1JwkS1EOfVvi8w4+FTzHXM2dX JrRCLJu8V6vENMpoAiWqjV0PZs1/qWHUMCDvmde0rcv1+sbe3vH+ysS86/AwmxH5IMeR w+Jf3iijdE0dWcQzdhEHemV2cMW/LNP5h7Qpv/Baqkbsv7zUQ1VUNyW6NMahfOQwOe6w HuAaPS/xrl4Ss5fpJ8ELfoiTkjliBPpGueLCjjiuIWLhEvXQGNsNTCQXf8JruFAvdVYn 8SwQ== X-Gm-Message-State: AOAM5320GQBo9896dczumWPHqVdpHN6QwoRyfJ93Ka1bLCrpKrK2pp8I 81C0N8x8V0yqYpGn2YOWOpK6yMJN57A= X-Google-Smtp-Source: ABdhPJyeddWyVZx45z/3BoD113UCAhRah/iRzoKvh0Ncq1B8T34j+/tdY/BjxXyL+06VeF9Hb4yt8g== X-Received: by 2002:adf:f9cb:: with SMTP id w11mr7933189wrr.57.1626204787079; Tue, 13 Jul 2021 12:33:07 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o29sm14478023wms.13.2021.07.13.12.33.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:06 -0700 (PDT) Message-Id: <8aea3713902b7d006f527ccd76faf6f944529bdb.1626204784.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 13 Jul 2021 19:32:58 +0000 Subject: [PATCH v2 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The previous patch possibly raises some questions about whether additional cases in collect_merge_info_callback() can be handled early. Add some explanations in the form of comments to help explain these better. While we're at it, add a few comments to denote what a few boolean '0' or '1' values stand for. Signed-off-by: Elijah Newren --- merge-ort.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index 6299b4f9413..843fa693145 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1018,8 +1018,8 @@ static int collect_merge_info_callback(int n, if (side1_matches_mbase && side2_matches_mbase) { /* mbase, side1, & side2 all match; use mbase as resolution */ setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, - names, names+0, mbase_null, 0, - filemask, dirmask, 1); + names, names+0, mbase_null, 0 /* df_conflict */, + filemask, dirmask, 1 /* resolved */); return mask; } @@ -1061,14 +1061,24 @@ static int collect_merge_info_callback(int n, } /* - * Gather additional information used in rename detection. + * Sometimes we can tell that a source path need not be included in + * rename detection -- namely, whenever either + * side1_matches_mbase && side2_null + * or + * side2_matches_mbase && side1_null + * However, we call collect_rename_info() even in those cases, + * because exact renames are cheap and would let us remove both a + * source and destination path. We'll cull the unneeded sources + * later. */ collect_rename_info(opt, names, dirname, fullpath, filemask, dirmask, match_mask); /* - * Record information about the path so we can resolve later in - * process_entries. + * None of the special cases above matched, so we have a + * provisional conflict. (Rename detection might allow us to + * unconflict some more cases, but that comes later so all we can + * do now is record the different non-null file hashes.) */ setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, names, NULL, 0, df_conflict, filemask, dirmask, 0); From patchwork Tue Jul 13 19:32:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAFF9C07E95 for ; Tue, 13 Jul 2021 19:33:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A16ED610CB for ; Tue, 13 Jul 2021 19:33:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234622AbhGMTgC (ORCPT ); Tue, 13 Jul 2021 15:36:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234540AbhGMTf7 (ORCPT ); Tue, 13 Jul 2021 15:35:59 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 290AEC0613E9 for ; Tue, 13 Jul 2021 12:33:09 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id i94so132428wri.4 for ; Tue, 13 Jul 2021 12:33:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=pR0h5LFWhviNFphByFwqYkm2jiu1RUK/t+Oni4vKV4Y=; b=ImFjiAzr1PDlvc5xRtUW2xWww1BQM+tMaCjHJCHSy7K3NFzg6d0Al+ozeHbiSSyKn+ eJlAUcU2YLFw+z7eu5jVQ3DRp8EyTMAzMqs5+SB0DvlaBXuhfLy8Z8a8dMpB5LcjKy6b JZZfblfBlusw1HsNRCM4jOWGwQSMLRDG7VesYA61A5mX+v60oHYQWFO2O3e5q3BFJjAy TYKHz0FethCBwp01wsOOSfw1Cc1AnTgN6geQcWG8CFyonlF0p+bGo01cP7eFwI2gYf+G 2+sDe/2yeUq7BXYmWDqJOouOOLhT6zWrV/gsPPC/jvdroCAgl1AmPJPQcVnIFyb9N0me i0sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=pR0h5LFWhviNFphByFwqYkm2jiu1RUK/t+Oni4vKV4Y=; b=NDbAzEahR/cp3Stw5174GGltXbylVwsh83e/AXh5ULn1VLJhBGhrevSrWlv7KomFB6 vnxvvVXWdyxvJHkdjN577vsCx5kJFmekH7yj2Zj02fkKRMzUrAtRIqcxbH1QcKOYMqBn df2/yrXhFMoFCnE0DB0ggcqfaAyr/a0u7FlMJvoCQJKG4EL/4mwb7FmjjoE2GA/iEqkc W/K5WklAxcB53ariJKnSdIPRgCV0X+srJe/84Zi9JJ2RFjV5MCW7SgOPvwx3BJwQRvWT OmcB47NVj9QdfuHU7ltkH0SGV40BYcvS3oFVr/O/3qMHx4p8u7zqO6q9NQPS8y0A4mq4 j07g== X-Gm-Message-State: AOAM532rTj8HWXndPucpFriSoEh8mQlRn75zkLhA1Ra4uGBrLwJFuijU q/kDpZSfzfMPaYIffOFrMcuMOjkSU50= X-Google-Smtp-Source: ABdhPJxaK32kkL/4Vh2e3D/noPnb03PBU2UIEVwUxXQ3UZd9fZ6ZrCUfNEsSy/yAns+ab0i2qsqJrA== X-Received: by 2002:adf:ef03:: with SMTP id e3mr3922717wro.316.1626204787676; Tue, 13 Jul 2021 12:33:07 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s6sm9738518wrt.45.2021.07.13.12.33.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:07 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Tue, 13 Jul 2021 19:32:59 +0000 Subject: [PATCH v2 3/7] merge-ort: add data structures for allowable trivial directory resolves Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren As noted a few commits ago, we can resolve individual files early if all three sides of the merge have a file at the path and two of the three sides match. We would really like to do the same thing with directories, because being able to do a trivial directory resolve means we don't have to recurse into the directory, potentially saving us a huge amount of time in both collect_merge_info() and process_entries(). Unfortunately, resolving directories early would mean missing any renames whose source or destination is underneath that directory. If we somehow knew there weren't any renames under the directory in question, then we could resolve it early. Sadly, it is impossible to determine whether there are renames under the directory in question without recursing into it, and this has traditionally kept us from ever implementing such an optimization. In commit f89b4f2bee ("merge-ort: skip rename detection entirely if possible", 2021-03-11), we added an additional reason that rename detection could be skipped entirely -- namely, if no *relevant* sources were present. Without completing collect_merge_info_callback(), we do not yet know if there are no relevant sources. However, we do know that if the current directory on one side matches the merge base, then every source file within that directory will not be RELEVANT_CONTENT, and a few simple checks can often let us rule out RELEVANT_LOCATION as well. This suggests we can just defer recursing into such directories until the end of collect_merge_info. Since the deferred directories are known to not add any relevant sources due to the above properties, then if there are no relevant sources after we've traversed all paths other than the deferred ones, then we know there are not any relevant sources. Under those conditions, rename detection is unnecessary, and that means we can resolve the deferred directories without recursing into them. Note that the logic for skipping rename detection was also modified further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached renames when possible", 2021-01-30); in particular rename detection can be skipped if we already have cached renames for each relevant source. We can take advantage of this information as well with our deferral of recursing into directories where one side matches the merge base. Add some data structures that we will use to do these deferrals, with some lengthy comments explaining their purpose. Signed-off-by: Elijah Newren --- merge-ort.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 843fa693145..3d3f00b3b45 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -119,6 +119,51 @@ struct rename_info { */ struct strintmap relevant_sources[3]; + /* + * possible_trivial_merges: directories we defer recursing into + * + * possible_trivial_merges is a map of directory names to + * dir_rename_mask. When we detect that a directory is unchanged on + * one side, we can sometimes resolve the directory without recursing + * into it. Renames are the only things that can prevent such an + * optimization. However, for rename sources: + * - If no parent directory needed directory rename detection, then + * no path under such a directory can be a relevant_source. + * and for rename destinations: + * - If no cached rename has a target path under the directory AND + * - If there are no unpaired relevant_sources elsewhere in the + * repository + * then we don't need any path under this directory for a rename + * destination. The only way to know the last item above is to defer + * handling such directories until the end of collect_merge_info(), + * in handle_deferred_entries(). + * + * For each we store dir_rename_mask, since that's the only bit of + * information we need, other than the path, to resume the recursive + * traversal. + */ + struct strintmap possible_trivial_merges[3]; + + /* + * trivial_merges_okay: if trivial directory merges are okay + * + * See possible_trivial_merges above. The "no unpaired + * relevant_sources elsewhere in the repository" is a single boolean + * per merge side, which we store here. Note that while 0 means no, + * 1 only means "maybe" rather than "yes"; we optimistically set it + * to 1 initially and only clear when we determine it is unsafe to + * do trivial directory merges. + */ + unsigned trivial_merges_okay[3]; + + /* + * target_dirs: ancestor directories of rename targets + * + * target_dirs contains all directory names that are an ancestor of + * any rename destination. + */ + struct strset target_dirs[3]; + /* * dir_rename_mask: * 0: optimization removing unmodified potential rename source okay @@ -490,6 +535,9 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strintmap_func(&renames->dirs_removed[i]); strmap_func(&renames->dir_renames[i], 0); strintmap_func(&renames->relevant_sources[i]); + strintmap_func(&renames->possible_trivial_merges[i]); + strset_func(&renames->target_dirs[i]); + renames->trivial_merges_okay[i] = 1; /* 1 == maybe */ if (!reinitialize) assert(renames->cached_pairs_valid_side == 0); if (i != renames->cached_pairs_valid_side) { @@ -4045,12 +4093,17 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) strintmap_init_with_options(&renames->relevant_sources[i], -1 /* explicitly invalid */, NULL, 0); + strintmap_init_with_options(&renames->possible_trivial_merges[i], + 0, NULL, 0); + strset_init_with_options(&renames->target_dirs[i], + NULL, 1); strmap_init_with_options(&renames->cached_pairs[i], NULL, 1); strset_init_with_options(&renames->cached_irrelevant[i], NULL, 1); strset_init_with_options(&renames->cached_target_names[i], NULL, 0); + renames->trivial_merges_okay[i] = 1; /* 1 == maybe */ } /* From patchwork Tue Jul 13 19:33:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09AD6C11F66 for ; Tue, 13 Jul 2021 19:33:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E4E15610CB for ; Tue, 13 Jul 2021 19:33:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234665AbhGMTgF (ORCPT ); Tue, 13 Jul 2021 15:36:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234615AbhGMTgE (ORCPT ); Tue, 13 Jul 2021 15:36:04 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1816C0613DD for ; Tue, 13 Jul 2021 12:33:09 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id f8-20020a1c1f080000b029022d4c6cfc37so1079357wmf.5 for ; Tue, 13 Jul 2021 12:33:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=35ri/oVrMbO6uA75NOizGkqjB2GcqWgsCqqYDqf0wbs=; b=PrMxo0flnzhyUSk9x4sjR4L+YjttVIdpdy6z2nTVNX1FE/hLnkxH2s9mOmrri6kJhY humu3M01rg8eGSzo6ggpt+JZpeTLPBHBCkF0t1Jq+q/NJEAbkPdV+10Kwl6ChzXrPKoz D9dxmzoufgJEj8NGldi6tU4vankrgrDXlZoCDZx1yRz6z30i6cBM7rzoUOTHKqWqbpfp JcrWXsg+s/4Gmad/DlqEDKShKAZZ2UJOcvG6KM24z5xigTWzK7zmUYvUiqb0T8vGNIgW gOXsGd1Qt9fKsM+StM1QRAURrpmMiU/Jq7Su+kdGmzkSAs9l++m3Xaw4XJqB0k9PNVUS /5UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=35ri/oVrMbO6uA75NOizGkqjB2GcqWgsCqqYDqf0wbs=; b=bCokp8BGS/bgHQ75yAuAkfi/5uA6j9C8ngtzbz9rRb2uT5NXINOyfn23H8NbEpUKR1 XI7dC034bSgeTUvo5t3hLoCxe9T2Sr7WHGgbVoA2+5tGOchtciquXzkMdOxKkvT6irYt 3hJP82aqAF+En3xhHk4wH8zJtEN8c4HgxE1lDuZCDpnNkHEKogo53rjm3pOc0eDbWYsj K3g4W1+Jdvru6tk9gx8dGzV0XCZmqgRYydkcV7YBuVlWuofliAIJM1o25Pb7SqXWyTcZ bPNEQEOssufl668KGUriSXOz0kh9ynCGuX/f4nYW9wkoPcuxvrk9DDMk6T5nBqMKhmdU 3DWw== X-Gm-Message-State: AOAM531d2h3e0lAVWB2rkMFBQrYHAa7KL1W0C0sTcFZfHVsGOVgA2rFY MFz9uw0juFU0XRujQQDPQrL8AxB24K8= X-Google-Smtp-Source: ABdhPJyjgiQsiOofyKidDUR3h2ogC0hNT7QvYfCfIUFtpKRFLv411xltBNtnOcrvZb8UcWmbhJ3drA== X-Received: by 2002:a7b:c934:: with SMTP id h20mr6826318wml.59.1626204788329; Tue, 13 Jul 2021 12:33:08 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p11sm17640180wro.78.2021.07.13.12.33.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:08 -0700 (PDT) Message-Id: <7e28323b624ad2d8d12123783f00f5a8fbb248e8.1626204784.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 13 Jul 2021 19:33:00 +0000 Subject: [PATCH v2 4/7] merge-ort: add a handle_deferred_entries() helper function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren In order to allow trivial directory resolution, we first need to be able to gather more information to determine if the optimization is safe. To enable that, we need a way of deferring the recursion into the directory until a later time. Naturally, deferring the entry into a subtree means that we need some function that will later recurse into the subdirectory exactly the same way that collect_merge_info_callback() would have done. Add a helper function that does this. For now this function is not used but a subsequent commit will change that. Future commits will also make the function sometimes resolve directories instead of traversing inside. Signed-off-by: Elijah Newren --- merge-ort.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index 3d3f00b3b45..eb0e18d7546 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1196,6 +1196,70 @@ static int collect_merge_info_callback(int n, return mask; } +MAYBE_UNUSED +static int handle_deferred_entries(struct merge_options *opt, + struct traverse_info *info) +{ + struct rename_info *renames = &opt->priv->renames; + struct hashmap_iter iter; + struct strmap_entry *entry; + int side, ret = 0; + + for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) { + renames->trivial_merges_okay[side] = 0; + strintmap_for_each_entry(&renames->possible_trivial_merges[side], + &iter, entry) { + const char *path = entry->key; + unsigned dir_rename_mask = (intptr_t)entry->value; + struct conflict_info *ci; + unsigned dirmask; + struct tree_desc t[3]; + void *buf[3] = {NULL,}; + int i; + + ci = strmap_get(&opt->priv->paths, path); + VERIFY_CI(ci); + dirmask = ci->dirmask; + + info->name = path; + info->namelen = strlen(path); + info->pathlen = info->namelen + 1; + + for (i = 0; i < 3; i++, dirmask >>= 1) { + if (i == 1 && ci->match_mask == 3) + t[1] = t[0]; + else if (i == 2 && ci->match_mask == 5) + t[2] = t[0]; + else if (i == 2 && ci->match_mask == 6) + t[2] = t[1]; + else { + const struct object_id *oid = NULL; + if (dirmask & 1) + oid = &ci->stages[i].oid; + buf[i] = fill_tree_descriptor(opt->repo, + t+i, oid); + } + } + + ci->match_mask &= ci->filemask; + opt->priv->current_dir_name = path; + renames->dir_rename_mask = dir_rename_mask; + if (renames->dir_rename_mask == 0 || + renames->dir_rename_mask == 0x07) + ret = traverse_trees(NULL, 3, t, info); + else + ret = traverse_trees_wrapper(NULL, 3, t, info); + + for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) + free(buf[i]); + + if (ret < 0) + return ret; + } + } + return ret; +} + static int collect_merge_info(struct merge_options *opt, struct tree *merge_base, struct tree *side1, From patchwork Tue Jul 13 19:33:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0D43C07E96 for ; Tue, 13 Jul 2021 19:33:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8DC996128E for ; Tue, 13 Jul 2021 19:33:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234745AbhGMTgI (ORCPT ); Tue, 13 Jul 2021 15:36:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234172AbhGMTgG (ORCPT ); Tue, 13 Jul 2021 15:36:06 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3900EC0613E9 for ; Tue, 13 Jul 2021 12:33:10 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id f17so120628wrt.6 for ; Tue, 13 Jul 2021 12:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=IH+wXJBQVGwRASbH9nLVutMF1YTBqHdkkh0Trwn5GRI=; b=CMFhg/ROcw3K02h3cZ9efbnhS283L76M8x9nT2ajeuTw9aMCRnowt8bInJ7o3R3m9o ySlgPuVhu7fiSkksVW9FzT1gOWiZCLjG8Qi7O9tQeW2ctS6chCl8eR4GVVrB7U62NQGR XUqn5setqu89xKWH0XNvFIdnFu6ov/sFqEQLNpJrE2U4JrGV8th5uHx6Zy/xg10rZehQ 6OkOTvIN3Hll8LT04V/b0QxDl8ivNSVleRg9ZJ0z7P724pJTirep39ScJ7cNKPoLTt0G EXx6MgWTGAQxvaBtMniP2afQ9ZvJ9jylEG22ABJZDtPlqt4qyQ0i6WvDncHyLrGVfdKJ LCRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=IH+wXJBQVGwRASbH9nLVutMF1YTBqHdkkh0Trwn5GRI=; b=U3Lqh2wm+rTcuV6YbijobHmWwA+kWdXF4mx6LzEtk9UvmfxUaZbsZvcgOsDku8mzwg uyFqkh4ET0pL1xIpI9BG6C1sekqgpdz7curisnS1WZRM4i7IvvDaN2la5uJ6TQZcsxoG 2eYVwUPip8CYzb56Cw/bv8ldkURUlBwBSc5anYm3N3sFbSRBKCKI+DumuHQw+tqVoVOg YPPRmfV3bKqehMFhXA+0AhxfosOjUivrWMnN8SAGK95Q/wEu2YW9IdPr8qWx02CyXXIu /uiDyE8fGCgIIGrfe4LKk7KrGtnpN9cSpKYj782m1yFax/8IZcZX4JbWSXEDP7kMepMn JYXA== X-Gm-Message-State: AOAM533eZMVu67jqGqmJDqe3COje4ZcehhpKx3Par8rEkcyVR290iXrv +3OgGL0IjZYFbq/iYhE1C76oLKdAczA= X-Google-Smtp-Source: ABdhPJw6Ni3Ia3eKIZCU29UMJEqVNy1B9dEdwwSgczulLW3LkmX+0+5pPCahHAulwRFaiMcdBZPcqw== X-Received: by 2002:adf:de84:: with SMTP id w4mr7936554wrl.104.1626204788898; Tue, 13 Jul 2021 12:33:08 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j6sm10764427wrm.97.2021.07.13.12.33.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:08 -0700 (PDT) Message-Id: <317553eadb68ce162b5ebea24045a9ca75342e91.1626204784.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 13 Jul 2021 19:33:01 +0000 Subject: [PATCH v2 5/7] merge-ort: defer recursing into directories when merge base is matched Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren When one side of history matches the merge base (including when the merge base has no entry for the given directory), have collect_merge_info_callback() defer recursing into the directory. To ensure those entries are eventually handled, add a call to handled_deferred_entries() in collect_merge_info() after traverse_trees() returns. Note that the condition in collect_merge_info_callback() may look more complicated than necessary at first glance; renames->trivial_merges_okay[side] is always true until handle_deferred_entries() is called, and possible_trivial_merges[side] is always empty right now (and in the future won't be filled until handle_deferred_entries() is called). However, when handle_deferred_entries() calls traverse_trees() for the relevant deferred directories, those traverse_trees() calls will once again end up in collect_merge_info_callback() for all the entries under those subdirectories. The extra conditions are there for such deferred cases and will be used more as we do more with those variables. Signed-off-by: Elijah Newren --- merge-ort.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index eb0e18d7546..bf0712d18a0 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1141,8 +1141,35 @@ static int collect_merge_info_callback(int n, struct tree_desc t[3]; void *buf[3] = {NULL, NULL, NULL}; const char *original_dir_name; - int i, ret; + int i, ret, side; + /* + * Check for whether we can avoid recursing due to one side + * matching the merge base. The side that does NOT match is + * the one that might have a rename destination we need. + */ + assert(!side1_matches_mbase || !side2_matches_mbase); + side = side1_matches_mbase ? MERGE_SIDE2 : + side2_matches_mbase ? MERGE_SIDE1 : MERGE_BASE; + if (filemask == 0 && (dirmask == 2 || dirmask == 4)) { + /* + * Also defer recursing into new directories; set up a + * few variables to let us do so. + */ + ci->match_mask = (7 - dirmask); + side = dirmask / 2; + } + if (renames->dir_rename_mask != 0x07 && + (side != MERGE_BASE) && + renames->trivial_merges_okay[side] && + !strset_contains(&renames->target_dirs[side], pi.string)) { + strintmap_set(&renames->possible_trivial_merges[side], + pi.string, renames->dir_rename_mask); + renames->dir_rename_mask = prev_dir_rename_mask; + return mask; + } + + /* We need to recurse */ ci->match_mask &= filemask; newinfo = *info; newinfo.prev = info; @@ -1196,7 +1223,6 @@ static int collect_merge_info_callback(int n, return mask; } -MAYBE_UNUSED static int handle_deferred_entries(struct merge_options *opt, struct traverse_info *info) { @@ -1285,6 +1311,8 @@ static int collect_merge_info(struct merge_options *opt, trace2_region_enter("merge", "traverse_trees", opt->repo); ret = traverse_trees(NULL, 3, t, &info); + if (ret == 0) + ret = handle_deferred_entries(opt, &info); trace2_region_leave("merge", "traverse_trees", opt->repo); return ret; From patchwork Tue Jul 13 19:33:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7555C07E95 for ; Tue, 13 Jul 2021 19:33:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B8DD61353 for ; Tue, 13 Jul 2021 19:33:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229478AbhGMTgJ (ORCPT ); Tue, 13 Jul 2021 15:36:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234615AbhGMTgG (ORCPT ); Tue, 13 Jul 2021 15:36:06 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8F3AC0613EE for ; Tue, 13 Jul 2021 12:33:10 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id l4-20020a05600c4f04b0290220f8455631so3122112wmq.1 for ; Tue, 13 Jul 2021 12:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=D/PkCpEbGyhmPUstaGyLqY15IvUcrT8iiGfUOWLjl1M=; b=ug0+WvNBywwwpEMU8t2Rcb54lmh9k+pUdp3WE2CrypLx52uDzuDnM5v0SM+L7CezMa rJP1YcKAysOJejwvolhlyRerw1JFp15rZ19dp+dUAp2/XVKqM2BaQ0Z9QioABgkmORnc psQOEmh8Uhhqcf8tnNwL7yDKFkselbAwebT8GS5lhShFyNOOlu+XNDAaaJagnZq4dU0K pVHjdnBi144W4TlBn711jRcwTnantZTu8xi3S5RrEwBeTaHh47uFB1JqHCQhcYcDZ18j Gz0SXZIN1DxzW/yUHsl+nUCgGeoeZClX6R9yaUgPhNBvQF8yi2N5BoQjUhzJorGM7sTx ttBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=D/PkCpEbGyhmPUstaGyLqY15IvUcrT8iiGfUOWLjl1M=; b=SLdZZ42pfCEh8lyHg7xNRt2YCxkJ/iiJ4eG+8dJ99yi/szzjciseyc4iT8Gra99QUq eh8YQ0Plumbqu/U5vQONETHewHXStw4CK1ot/t9Y57G2FE5Yg4cTLdbPPLddp2XUzcy/ l1HVqoSA6MbutaFS7esogbQeBvqIVueaJ0q4MqOfhsRApfPqxkFBP+OHBAqR4gfgyuW2 qNqKkvQAO0SuU9J4LEU5bas4jA50ukoKDq5aLTBKVienawfU4yn/DNdEN+uGsTyKWNVi qs0ybRvAoNf+/m3ei4eGDVWmb7nb+N8yHwRHb5lz8LL/T+LGw1nxuKl/DblAs3p+e5Jc aNTw== X-Gm-Message-State: AOAM532jjrKEhVq1l0HwiS908AyjyfXBBrmjm1KDti5cR/2yhCU0O1IS y/W6zA4kvFZJx3+F4P6w/M3HxCqIWm8= X-Google-Smtp-Source: ABdhPJwb2alg6QRwmlMOldJYp2QWIYwKIxA4Lei8n5ClcGvLYm50V7H32XWx2YplHMtinvjjQk5/SQ== X-Received: by 2002:a05:600c:22d2:: with SMTP id 18mr6842146wmg.63.1626204789436; Tue, 13 Jul 2021 12:33:09 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j6sm10764452wrm.97.2021.07.13.12.33.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:09 -0700 (PDT) Message-Id: <3409a6cd631deb361d3ecb94491c0c297c52fb53.1626204784.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 13 Jul 2021 19:33:02 +0000 Subject: [PATCH v2 6/7] merge-ort: avoid recursing into directories when we don't need to MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren This combines the work of the last several patches, and implements the conditions when we don't need to recurse into directories. It's perhaps easiest to see the logic by separating the fact that a directory might have both rename sources and rename destinations: * rename sources: only files present in the merge base can serve as rename sources, and only when one side deletes that file. When the tree on one side matches the merge base, that means every file within the subtree matches the merge base. This means that the skip-irrelevant-rename-detection optimization from before kicks in and we don't need any of these files as rename sources. * rename destinations: the tree that does not match the merge base might have newly added and hence unmatched destination files. This is what usually prevents us from doing trivial directory resolutions in the merge machinery. However, the fact that we have deferred recursing into this directory until the end means we know whether there are any unmatched relevant potential rename sources elsewhere in this merge. If there are no unmatched such relevant sources anywhere, then there is no need to look for unmatched potential rename destinations to match them with. This informs our algorithm: * Search through relevant_sources; if we have entries, they better all be reflected in cached_pairs or cached_irrelevant, otherwise they represent an unmatched potential rename source (causing the optimization to be disallowed). * For any relevant_source represented in cached_pairs, we do need to to make sure to get the destination for each source, meaning we need to recurse into any ancestor directories of those destinations. * Once we've recursed into all the rename destinations for any relevant_sources in cached_pairs, we can then do the trivial directory resolution for the remaining directories. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.235 s ± 0.042 s 205.1 ms ± 3.8 ms mega-renames: 9.419 s ± 0.107 s 1.564 s ± 0.010 s just-one-mega: 480.1 ms ± 3.9 ms 479.5 ms ± 3.9 ms Signed-off-by: Elijah Newren --- merge-ort.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 98 insertions(+), 3 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index bf0712d18a0..c9cf7a158c8 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1223,6 +1223,18 @@ static int collect_merge_info_callback(int n, return mask; } +static void resolve_trivial_directory_merge(struct conflict_info *ci, int side) +{ + VERIFY_CI(ci); + assert((side == 1 && ci->match_mask == 5) || + (side == 2 && ci->match_mask == 3)); + oidcpy(&ci->merged.result.oid, &ci->stages[side].oid); + ci->merged.result.mode = ci->stages[side].mode; + ci->merged.is_null = is_null_oid(&ci->stages[side].oid); + ci->match_mask = 0; + ci->merged.clean = 1; /* (ci->filemask == 0); */ +} + static int handle_deferred_entries(struct merge_options *opt, struct traverse_info *info) { @@ -1232,9 +1244,71 @@ static int handle_deferred_entries(struct merge_options *opt, int side, ret = 0; for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) { - renames->trivial_merges_okay[side] = 0; - strintmap_for_each_entry(&renames->possible_trivial_merges[side], - &iter, entry) { + unsigned optimization_okay = 1; + struct strintmap copy; + + /* Loop over the set of paths we need to know rename info for */ + strset_for_each_entry(&renames->relevant_sources[side], + &iter, entry) { + char *rename_target, *dir, *dir_marker; + struct strmap_entry *e; + + /* + * if we don't know delete/rename info for this path, + * then we need to recurse into all trees to get all + * adds to make sure we have it. + */ + if (strset_contains(&renames->cached_irrelevant[side], + entry->key)) + continue; + e = strmap_get_entry(&renames->cached_pairs[side], + entry->key); + if (!e) { + optimization_okay = 0; + break; + } + + /* If this is a delete, we have enough info already */ + rename_target = e->value; + if (!rename_target) + continue; + + /* If we already walked the rename target, we're good */ + if (strmap_contains(&opt->priv->paths, rename_target)) + continue; + + /* + * Otherwise, we need to get a list of directories that + * will need to be recursed into to get this + * rename_target. + */ + dir = xstrdup(rename_target); + while ((dir_marker = strrchr(dir, '/'))) { + *dir_marker = '\0'; + if (strset_contains(&renames->target_dirs[side], + dir)) + break; + strset_add(&renames->target_dirs[side], dir); + } + free(dir); + } + renames->trivial_merges_okay[side] = optimization_okay; + /* + * We need to recurse into any directories in + * possible_trivial_merges[side] found in target_dirs[side]. + * But when we recurse, we may need to queue up some of the + * subdirectories for possible_trivial_merges[side]. Since + * we can't safely iterate through a hashmap while also adding + * entries, move the entries into 'copy', iterate over 'copy', + * and then we'll also iterate anything added into + * possible_trivial_merges[side] once this loop is done. + */ + copy = renames->possible_trivial_merges[side]; + strintmap_init_with_options(&renames->possible_trivial_merges[side], + 0, + NULL, + 0); + strintmap_for_each_entry(©, &iter, entry) { const char *path = entry->key; unsigned dir_rename_mask = (intptr_t)entry->value; struct conflict_info *ci; @@ -1247,6 +1321,13 @@ static int handle_deferred_entries(struct merge_options *opt, VERIFY_CI(ci); dirmask = ci->dirmask; + if (optimization_okay && + !strset_contains(&renames->target_dirs[side], + path)) { + resolve_trivial_directory_merge(ci, side); + continue; + } + info->name = path; info->namelen = strlen(path); info->pathlen = info->namelen + 1; @@ -1282,6 +1363,20 @@ static int handle_deferred_entries(struct merge_options *opt, if (ret < 0) return ret; } + strintmap_clear(©); + strintmap_for_each_entry(&renames->possible_trivial_merges[side], + &iter, entry) { + const char *path = entry->key; + struct conflict_info *ci; + + ci = strmap_get(&opt->priv->paths, path); + VERIFY_CI(ci); + + assert(renames->trivial_merges_okay[side] && + !strset_contains(&renames->target_dirs[side], + path)); + resolve_trivial_directory_merge(ci, side); + } } return ret; } From patchwork Tue Jul 13 19:33:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 12374787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1A2EC07E95 for ; Tue, 13 Jul 2021 19:33:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D63561284 for ; Tue, 13 Jul 2021 19:33:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234632AbhGMTgF (ORCPT ); Tue, 13 Jul 2021 15:36:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234540AbhGMTgD (ORCPT ); Tue, 13 Jul 2021 15:36:03 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83570C0613EF for ; Tue, 13 Jul 2021 12:33:11 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id k32so60522wms.4 for ; Tue, 13 Jul 2021 12:33:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=2+XbYLABmhaoPFpiW/KF+7H3aI2g7iQXDMzs6wA7gLA=; b=lH6AXzkciDwBuCUdxaQ69jzwhmIxUnhoOSngZExrfCdsdX8N9n2Uvl/T//tnPP3UBQ 7Oh+DLGjgLkdxFsv3ltYcvbr03mSMpGTN5GX76218so6iHr1esAL+pOnzvOJkM/t5jYO 0gSsnmorbikoqjbCp3rfl90oXtpHwJWBGmQRkuLBvGeMBNy+YRRUkYYKcmra4J9EUBL5 9ypPA22zmxWvRHgJ6hC7RDyeDvGkTwpnI0rro7suI30UVBVOMxkZYUY7dTu0UX027ZBe A7PgT8irc+8xqRjx3BB5j5Y9JDmAxsbeo2uprkccGbkK1zS8//SacqglJnydb7SHmsnm FZhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=2+XbYLABmhaoPFpiW/KF+7H3aI2g7iQXDMzs6wA7gLA=; b=phNMCAg2E6zYwku2D9r67xVFtIX4zAh8GyoBd5xCdvcHyTUF3BpHae3DcdUtNLbLrT w9SJbEIU6wvESMHjM3oBeSLUbQcTLmrBfU8StgBlGW+8WNrVF2QgCtwsQ1jcybPgCr1u 5JVI6+TcvbqC/Du3Yt0wQ8M3FzvEM0rrhqU5GZi84UVf9SPfxIOuu7GNui/bTrQVDc4k JxtbmxwmGRVSAdjkTrEXqK7hdMdL39YW7G0FKlWejARGG1VO7hYKTQoVBo1B7MZ5O3CF tvsn8y/eNFjUpfB4n5nHwjMX3ewv8een8WpmYVRMnrU2nM8uiLPr2joMCOdm2uGvYTq8 VW1A== X-Gm-Message-State: AOAM530d+ZZBFH4bXGSm8/7xShcIN2axbXOd7IkMJootAzU99xl7Da9g ZAIRM3KFxgyvKTfXSePF2I4wE106a08= X-Google-Smtp-Source: ABdhPJxgfeU+NVFpT99WnN5IYhq/ZDSApCFPiGSBQm8KjMwnZAHW7bNOsgVgE6PBd2TnmnvYo8+1WQ== X-Received: by 2002:a05:600c:2281:: with SMTP id 1mr6998425wmf.10.1626204790027; Tue, 13 Jul 2021 12:33:10 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s1sm3222412wmj.8.2021.07.13.12.33.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 12:33:09 -0700 (PDT) Message-Id: <7133f0efa520b3d0cadb059151daa12484fdb003.1626204784.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 13 Jul 2021 19:33:03 +0000 Subject: [PATCH v2 7/7] merge-ort: restart merge with cached renames to reduce process entry cost MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Elijah Newren , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren The merge algorithm mostly consists of the following three functions: collect_merge_info() detect_and_process_renames() process_entries() Prior to the trivial directory resolution optimization of the last half dozen commits, process_entries() was consistently the slowest, followed by collect_merge_info(), then detect_and_process_renames(). When the trivial directory resolution applies, it often dramatically decreases the amount of time spent in the two slower functions. Looking at the performance results in the previous commit, the trivial directory resolution optimization helps amazingly well when there are no relevant renames. It also helps really well when reapplying a long series of linear commits (such as in a rebase or cherry-pick), since the relevant renames may well be cached from the first reapplied commit. But when there are any relevant renames that are not cached (represented by the just-one-mega testcase), then the optimization does not help at all. Often, I noticed that when the optimization does not apply, it is because there are a handful of relevant sources -- maybe even only one. It felt frustrating to need to recurse into potentially hundreds or even thousands of directories just for a single rename, but it was needed for correctness. However, staring at this list of functions and noticing that process_entries() is the most expensive and knowing I could avoid it if I had cached renames suggested a simple idea: change collect_merge_info() detect_and_process_renames() process_entries() into collect_merge_info() detect_and_process_renames() collect_merge_info() detect_and_process_renames() process_entries() This may seem odd and look like more work. However, note that although we run collect_merge_info() twice, the second time we get to employ trivial directory resolves, which makes it much faster, so the increased time in collect_merge_info() is small. While we run detect_and_process_renames() again, all renames are cached so it's nearly a no-op (we don't call into diffcore_rename_extended() but we do have a little bit of data structure checking and fixing up). And the big payoff comes from the fact that process_entries(), will be much faster due to having far fewer entries to process. This restarting only makes sense if we can save recursing into enough directories to make it worth our while. Introduce a simple heuristic to guide this. Note that this heuristic uses a "wanted_factor" that I have virtually no actual real world data for, just some back-of-the-envelope quasi-scientific calculations that I included in some comments and then plucked a simple round number out of thin air. It could be that tweaking this number to make it either higher or lower improves the optimization. (There's slightly more here; when I first introduced this optimization, I used a factor of 10, because I was completely confident it was big enough to not cause slowdowns in special cases. I was certain it was higher than needed. Several months later, I added the rough calculations which make me think the optimal number is close to 2; but instead of pushing to the limit, I just bumped it to 3 to reduce the risk that there are special cases where this optimization can result in slowing down the code a little. If the ratio of path counts is below 3, we probably will only see minor performance improvements at best anyway.) Also, note that while the diffstat looks kind of long (nearly 100 lines), more than half of it is in two comments explaining how things work. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms Signed-off-by: Elijah Newren --- merge-ort.c | 111 ++++++++++++++++++++++++++-- t/t6423-merge-rename-directories.sh | 2 +- 2 files changed, 106 insertions(+), 7 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index c9cf7a158c8..99af64fce7d 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -209,6 +209,7 @@ struct rename_info { * MERGE_SIDE2: cached data from side2 can be reused * MERGE_SIDE1: cached data from side1 can be reused * 0: no cached data can be reused + * -1: See redo_after_renames; both sides can be reused. */ int cached_pairs_valid_side; @@ -254,6 +255,28 @@ struct rename_info { */ struct strset cached_irrelevant[3]; + /* + * redo_after_renames: optimization flag for "restarting" the merge + * + * Sometimes it pays to detect renames, cache them, and then + * restart the merge operation from the beginning. The reason for + * this is that when we know where all the renames are, we know + * whether a certain directory has any paths under it affected -- + * and if a directory is not affected then it permits us to do + * trivial tree merging in more cases. Doing trivial tree merging + * prevents the need to run process_entry() on every path + * underneath trees that can be trivially merged, and + * process_entry() is more expensive than collect_merge_info() -- + * plus, the second collect_merge_info() will be much faster since + * it doesn't have to recurse into the relevant trees. + * + * Values for this flag: + * 0 = don't bother, not worth it (or conditions not yet checked) + * 1 = conditions for optimization met, optimization worthwhile + * 2 = we already did it (don't restart merge yet again) + */ + unsigned redo_after_renames; + /* * needed_limit: value needed for inexact rename detection to run * @@ -540,7 +563,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, renames->trivial_merges_okay[i] = 1; /* 1 == maybe */ if (!reinitialize) assert(renames->cached_pairs_valid_side == 0); - if (i != renames->cached_pairs_valid_side) { + if (i != renames->cached_pairs_valid_side && + -1 != renames->cached_pairs_valid_side) { strset_func(&renames->cached_target_names[i]); strmap_func(&renames->cached_pairs[i], 1); strset_func(&renames->cached_irrelevant[i]); @@ -1242,7 +1266,9 @@ static int handle_deferred_entries(struct merge_options *opt, struct hashmap_iter iter; struct strmap_entry *entry; int side, ret = 0; + int path_count_before, path_count_after = 0; + path_count_before = strmap_get_size(&opt->priv->paths); for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) { unsigned optimization_okay = 1; struct strintmap copy; @@ -1377,7 +1403,51 @@ static int handle_deferred_entries(struct merge_options *opt, path)); resolve_trivial_directory_merge(ci, side); } + if (!optimization_okay || path_count_after) + path_count_after = strmap_get_size(&opt->priv->paths); } + if (path_count_after) { + /* + * Not sure were the right cut-off is for the optimization + * to redo collect_merge_info after we've cached the + * regular renames is. Basically, collect_merge_info(), + * detect_regular_renames(), and process_entries() are + * similar costs and all big tent poles. Caching the + * result of detect_regular_renames() means that redoing + * that one function will cost us virtually 0 extra, so it + * depends on the other two functions, which are both O(N) + * cost in the number of paths. Thus, it makes sense that + * if we can cut the number of paths in half, then redoing + * collect_merge_info() at half cost in order to get + * process_entries() at half cost should be about equal + * cost. If we can cut by more than half, then we would + * win. The fact that process_entries() is about 10%-20% + * more expensive than collect_merge_info() suggests we + * could make the factor be less than two. The fact that + * even when we have renames cached, we still have to + * traverse down to the individual (relevant) renames, + * which suggests we should perhaps use a bigger factor. + * + * The exact number isn't critical, since the code will + * work even if we get the factor wrong -- it just might be + * slightly slower if we're a bit off. For now, just error + * on the side of a bigger fudge. For the linux kernel + * testcases I was looking at with massive renames, the + * ratio came in around 50 to 250, which clearly would + * trigger this optimization and provided some *very* nice + * speedups. + */ + int wanted_factor = 3; + + /* We should only redo collect_merge_info one time */ + assert(renames->redo_after_renames == 0); + + if (path_count_after / path_count_before > wanted_factor) { + renames->redo_after_renames = 1; + renames->cached_pairs_valid_side = -1; + } + } else if (renames->redo_after_renames == 2) + renames->redo_after_renames = 0; return ret; } @@ -2820,8 +2890,8 @@ static int compare_pairs(const void *a_, const void *b_) } /* Call diffcore_rename() to update deleted/added pairs into rename pairs */ -static void detect_regular_renames(struct merge_options *opt, - unsigned side_index) +static int detect_regular_renames(struct merge_options *opt, + unsigned side_index) { struct diff_options diff_opts; struct rename_info *renames = &opt->priv->renames; @@ -2834,7 +2904,7 @@ static void detect_regular_renames(struct merge_options *opt, * side had directory renames. */ resolve_diffpair_statuses(&renames->pairs[side_index]); - return; + return 0; } partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]); @@ -2860,6 +2930,8 @@ static void detect_regular_renames(struct merge_options *opt, trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); + if (diff_opts.needed_rename_limit > 0) + renames->redo_after_renames = 0; if (diff_opts.needed_rename_limit > renames->needed_limit) renames->needed_limit = diff_opts.needed_rename_limit; @@ -2869,6 +2941,8 @@ static void detect_regular_renames(struct merge_options *opt, diff_queued_diff.nr = 0; diff_queued_diff.queue = NULL; diff_flush(&diff_opts); + + return 1; } /* @@ -2958,14 +3032,32 @@ static int detect_and_process_renames(struct merge_options *opt, struct diff_queue_struct combined; struct rename_info *renames = &opt->priv->renames; int need_dir_renames, s, clean = 1; + unsigned detection_run = 0; memset(&combined, 0, sizeof(combined)); if (!possible_renames(renames)) goto cleanup; trace2_region_enter("merge", "regular renames", opt->repo); - detect_regular_renames(opt, MERGE_SIDE1); - detect_regular_renames(opt, MERGE_SIDE2); + detection_run |= detect_regular_renames(opt, MERGE_SIDE1); + detection_run |= detect_regular_renames(opt, MERGE_SIDE2); + if (renames->redo_after_renames && detection_run) { + int i, side; + struct diff_filepair *p; + + /* Cache the renames, we found */ + for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) { + for (i = 0; i < renames->pairs[side].nr; ++i) { + p = renames->pairs[side].queue[i]; + possibly_cache_new_pair(renames, p, side, NULL); + } + } + + /* Restart the merge with the cached renames */ + renames->redo_after_renames = 2; + trace2_region_leave("merge", "regular renames", opt->repo); + goto cleanup; + } use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]); use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]); trace2_region_leave("merge", "regular renames", opt->repo); @@ -4380,6 +4472,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, opt->subtree_shift); } +redo: trace2_region_enter("merge", "collect_merge_info", opt->repo); if (collect_merge_info(opt, merge_base, side1, side2) != 0) { /* @@ -4399,6 +4492,12 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, result->clean = detect_and_process_renames(opt, merge_base, side1, side2); trace2_region_leave("merge", "renames", opt->repo); + if (opt->priv->renames.redo_after_renames == 2) { + trace2_region_enter("merge", "reset_maps", opt->repo); + clear_or_reinit_internal_opts(opt->priv, 1); + trace2_region_leave("merge", "reset_maps", opt->repo); + goto redo; + } trace2_region_enter("merge", "process_entries", opt->repo); process_entries(opt, &working_tree_oid); diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh index e834b7e6efe..d8919d276a1 100755 --- a/t/t6423-merge-rename-directories.sh +++ b/t/t6423-merge-rename-directories.sh @@ -4797,7 +4797,7 @@ test_setup_12f () { ) } -test_expect_merge_algorithm failure failure '12f: Trivial directory resolve, caching, all kinds of fun' ' +test_expect_merge_algorithm failure success '12f: Trivial directory resolve, caching, all kinds of fun' ' test_setup_12f && ( cd 12f &&