From patchwork Fri Dec 11 09:08:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3968C4167B for ; Fri, 11 Dec 2020 09:10:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8FBCB23F36 for ; Fri, 11 Dec 2020 09:10:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437414AbgLKJKS (ORCPT ); Fri, 11 Dec 2020 04:10:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437325AbgLKJJd (ORCPT ); Fri, 11 Dec 2020 04:09:33 -0500 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 999A1C0613D3 for ; Fri, 11 Dec 2020 01:08:52 -0800 (PST) Received: by mail-wr1-x444.google.com with SMTP id t16so8225516wra.3 for ; Fri, 11 Dec 2020 01:08:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=uKzxYlth5/h5Z+r4ojWksiqQ0jpLKXXnqR8B6eOh0po=; b=e8sI0YH643dToADOjgrV5EdoLbweFtYmCKJ8KEvUcJhoYKUUWZlrUI9JRnhix6iARn NahkbyW8ivXi0IBvrgdtPSAhTB1Crl0Q+H5IrQ1IjgLhI6dHIdnlLGmBtP1PVAgrYadI NluWDoeMQD7x3OErgqxPDxoau+BHKTwwxCY1EQI4zgBiyxqvtjy+Qv5lTq7J6LFolCpD DB6KFPcu7J/0pajCWODJzEz6G/le3EGQ2P3O02VYs+qhpA+MtMsAsO1HV29zZZZFdD/P /xNDX2Yhsud2VQhmWlF3WUixOztN3vKxh+qHP8+PoCti5BODXNpVhoDUg8cdHlFdVCgD YtnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=uKzxYlth5/h5Z+r4ojWksiqQ0jpLKXXnqR8B6eOh0po=; b=PDS1EZ51vyyX/12ID55ZuLapQzei4xceQt3CTr3jxqS5vqVxMgIO6HENT5cZeeHcFT 5c1ndgS1sHrbYbnFv9Fu4rukiZwLYn9HN2/LZDaUsMDt7YV7sDZSj2EYlv6lcj6+1/tn Jk1gMnp3yL1kGH4dxZ5IFs9rXQXSh4GOzOSBinNShTnm17LyzUaRyrlhOVznW38qFRFt Qg5QDsQx0EYt0tWGJx4vy6vF0lkn+eHVeKprtEUXrEMq0iB3fsoJZTLTgk9XNq3HxVIR VARbjAB4G6+IQtQzuOz52I4dlwpBKabRjGm3ca59WuXQoiJ8rjEhYlGX7qLclbwYlSxZ yVHw== X-Gm-Message-State: AOAM532l8JotTQusFHMukh8kZyx6z0vcN0lR2WE78RlStV/4Rpy3ER4z c1bwEOVdN/t13kWloJNvuCri/8a+pQw= X-Google-Smtp-Source: ABdhPJy6yLZ/0Q/rBjcmqmKjIhhzTyNEqtYagqlYsPfco8m0pYwI2dBdOMychLgvy5L5jGdy0/ngDA== X-Received: by 2002:adf:a388:: with SMTP id l8mr12778421wrb.354.1607677731122; Fri, 11 Dec 2020 01:08:51 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q15sm13746063wrw.75.2020.12.11.01.08.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:50 -0800 (PST) Message-Id: <428d8204894f668e925c640b2f72dc597164e706.1607677728.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:40 +0000 Subject: [PATCH v2 1/9] diffcore-rename: rename num_create to num_destinations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Our main data structures are rename_src and rename_dst. For counters of these data structures, num_sources and num_destinations seem natural; definitely more so than using num_create for the latter. Signed-off-by: Elijah Newren --- diffcore-rename.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index d367a6d2443..15a98f566e4 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -434,7 +434,7 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) * 1 if we need to disable inexact rename detection; * 2 if we would be under the limit if we were given -C instead of -C -C. */ -static int too_many_rename_candidates(int num_create, +static int too_many_rename_candidates(int num_destinations, struct diff_options *options) { int rename_limit = options->rename_limit; @@ -447,17 +447,17 @@ static int too_many_rename_candidates(int num_create, * This basically does a test for the rename matrix not * growing larger than a "rename_limit" square matrix, ie: * - * num_create * num_src > rename_limit * rename_limit + * num_destinations * num_src > rename_limit * rename_limit */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_create <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_create * (uint64_t)num_src + if ((num_destinations <= rename_limit || num_src <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)num_src <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 0; options->needed_rename_limit = - num_src > num_create ? num_src : num_create; + num_src > num_destinations ? num_src : num_destinations; /* Are we running under -C -C? */ if (!options->flags.find_copies_harder) @@ -469,8 +469,8 @@ static int too_many_rename_candidates(int num_create, continue; num_src++; } - if ((num_create <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_create * (uint64_t)num_src + if ((num_destinations <= rename_limit || num_src <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)num_src <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 2; return 1; @@ -505,7 +505,7 @@ void diffcore_rename(struct diff_options *options) struct diff_queue_struct outq; struct diff_score *mx; int i, j, rename_count, skip_unmodified = 0; - int num_create, dst_cnt; + int num_destinations, dst_cnt; struct progress *progress = NULL; if (!minimum_score) @@ -570,13 +570,13 @@ void diffcore_rename(struct diff_options *options) * Calculate how many renames are left (but all the source * files still remain as options for rename/copies!) */ - num_create = (rename_dst_nr - rename_count); + num_destinations = (rename_dst_nr - rename_count); /* All done? */ - if (!num_create) + if (!num_destinations) goto cleanup; - switch (too_many_rename_candidates(num_create, options)) { + switch (too_many_rename_candidates(num_destinations, options)) { case 1: goto cleanup; case 2: @@ -593,7 +593,8 @@ void diffcore_rename(struct diff_options *options) (uint64_t)rename_dst_nr * (uint64_t)rename_src_nr); } - mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx)); + mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_destinations), + sizeof(*mx)); for (dst_cnt = i = 0; i < rename_dst_nr; i++) { struct diff_filespec *two = rename_dst[i].two; struct diff_score *m; From patchwork Fri Dec 11 09:08:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E032C1B0D9 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E39023F36 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437415AbgLKJKV (ORCPT ); Fri, 11 Dec 2020 04:10:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437327AbgLKJJe (ORCPT ); Fri, 11 Dec 2020 04:09:34 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FBBFC0613D6 for ; Fri, 11 Dec 2020 01:08:53 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id r3so8246783wrt.2 for ; Fri, 11 Dec 2020 01:08:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=0Ue95Y6VgYE2ZtLzIIcPqdKzYogvM9npsEILWau0SY4=; b=rCDtqXBn+a7pa7kPlyGtJ/CD5nlnaOtmkoLO5R5uTUyodM2KdnicaTDJx33Dn1OwP5 XxeHiTzhZDuLZMktZPsAHmheu/eHG1xIj604YKZiVmVySorF1y8MC6p0JuufLb3xBmcA HZQnZ8jMtKnOtZlcvbKgXX7V5khdlHZys/F9lfyai839kYJ2+dsYsGQhYWVik9pM2s7A VdxXZqvdL7vsahmXdFAZu4o1FjyLHCqQi0C9LlcvL0A9b/UH2XEBAyiKRIwr94ODw31W D0tQVFK5csGAuB/yhFrZGGlFevuMROcw2RT2BesgjAk/Q957MQqDjGcnIPN9Xu/Psqma wC7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=0Ue95Y6VgYE2ZtLzIIcPqdKzYogvM9npsEILWau0SY4=; b=uRxW2TH1pLO9JPLV0YraIZCqDvuIArGvsNhWyw/p8UQpFymTEXyJY66jSam1WVyJA6 Ua3lbjUZse8F5uR6EahN8SD1ev9uW2UDfUJrf+R/+8igMyQEKKIweWuDOC07YyEuA0/L J5ONgi8X0ReMxo2LksC5vrJyt8uQFfZChW82LB8VrJW5i6NzNna2FG5F+8va25dZWNB6 rEcBgcq5mhHxkiDb+9i0KwEVtzoJultKFcM1nkXU1hXdip3d6/S4XZif8ErNFRMew+RH WeX6DEkvDD3HV5v0U381EgG2ZVhkVrDMWusMqus4OVLT/hkoYsN7+LXiEJ8LmjZaL2N/ ReyA== X-Gm-Message-State: AOAM530QchwB9mDXlPokOypPCCmGzq5BJGxQEf6GULNJqKxSTRgqSgF8 SMY8HQ2o3eQG7WComGgb9LUn7NuCfOo= X-Google-Smtp-Source: ABdhPJyCrDfU4P/SGNC+UK4sfbyNVkTg/2/BzyUuokJK+HW96ZCISyuV75j+gqzGcRZZmDiXnWQAww== X-Received: by 2002:adf:dcc5:: with SMTP id x5mr12678621wrm.167.1607677732056; Fri, 11 Dec 2020 01:08:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n3sm12382063wra.13.2020.12.11.01.08.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:51 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:41 +0000 Subject: [PATCH v2 2/9] diffcore-rename: avoid usage of global in too_many_rename_candidates() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren too_many_rename_candidates() got the number of rename destinations via an argument to the function, but the number of rename sources via a global variable. That felt rather inconsistent. Pass in the number of rename sources as an argument as well. While we are at it... We had a local variable, num_src, that served two purposes. Initially it was set to the global value, but later was used for counting a subset of the number of sources. Since we now have a function argument for the former usage, introduce a clearer variable name for the latter usage. This patch has no behavioral changes; it's just renaming and passing an argument instead of grabbing it from the global namespace. (You may find it easier to view the patch using git diff's --color-words option.) Signed-off-by: Elijah Newren --- diffcore-rename.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 15a98f566e4..1d6675c040d 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -434,12 +434,11 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) * 1 if we need to disable inexact rename detection; * 2 if we would be under the limit if we were given -C instead of -C -C. */ -static int too_many_rename_candidates(int num_destinations, +static int too_many_rename_candidates(int num_destinations, int num_sources, struct diff_options *options) { int rename_limit = options->rename_limit; - int num_src = rename_src_nr; - int i; + int i, limited_sources; options->needed_rename_limit = 0; @@ -447,30 +446,30 @@ static int too_many_rename_candidates(int num_destinations, * This basically does a test for the rename matrix not * growing larger than a "rename_limit" square matrix, ie: * - * num_destinations * num_src > rename_limit * rename_limit + * num_destinations * num_sources > rename_limit * rename_limit */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_destinations <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)num_src + if ((num_destinations <= rename_limit || num_sources <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)num_sources <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 0; options->needed_rename_limit = - num_src > num_destinations ? num_src : num_destinations; + num_sources > num_destinations ? num_sources : num_destinations; /* Are we running under -C -C? */ if (!options->flags.find_copies_harder) return 1; /* Would we bust the limit if we were running under -C? */ - for (num_src = i = 0; i < rename_src_nr; i++) { + for (limited_sources = i = 0; i < num_sources; i++) { if (diff_unmodified_pair(rename_src[i].p)) continue; - num_src++; + limited_sources++; } - if ((num_destinations <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)num_src + if ((num_destinations <= rename_limit || limited_sources <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)limited_sources <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 2; return 1; @@ -576,7 +575,8 @@ void diffcore_rename(struct diff_options *options) if (!num_destinations) goto cleanup; - switch (too_many_rename_candidates(num_destinations, options)) { + switch (too_many_rename_candidates(num_destinations, rename_src_nr, + options)) { case 1: goto cleanup; case 2: From patchwork Fri Dec 11 09:08:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F8F8C433FE for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B725A23F57 for ; Fri, 11 Dec 2020 09:10:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437380AbgLKJKU (ORCPT ); Fri, 11 Dec 2020 04:10:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437328AbgLKJJf (ORCPT ); Fri, 11 Dec 2020 04:09:35 -0500 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65C9DC061793 for ; Fri, 11 Dec 2020 01:08:54 -0800 (PST) Received: by mail-wm1-x342.google.com with SMTP id x22so6894689wmc.5 for ; Fri, 11 Dec 2020 01:08:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XL0RdMhJCBZ0gxMDc9iKNk8Af4u8BEydSMvvnNYLvbo=; b=cZLmAj6F/qCu9wdgsEXXr6BJ4S8o8Q6UZ0l6mNXEbNDXA7jOpvSj2kIiAttV4njvEY 71BK+4cg6djmt1Dj/DqOerHGFTU8Q+WnDZmL/rBBY1WXCSd6sp+nPwTgXNM+Y/aKyiqc 6S21Bj9/z8tJ7XDqtPHDgUNNlmfsadxIgMWdwqX20xymNzYUevx7LbuKiVoJmbQ+/SkB mUcyhJlSSty7M+Uy/9Ud/7i3CJ9xIx79Lo51++dmlO08Abz7ZQbhKzEzJVY+0s6+/uYt EOYwqM58UrzDVV0UNj1N3L8C6DrZPpyhWi/bAZckRUh1yv7L0u4w4MLcDsLN/P5kpPa5 g2tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XL0RdMhJCBZ0gxMDc9iKNk8Af4u8BEydSMvvnNYLvbo=; b=CuHVcV7RFhUwYaPNmTfSjZCwC8o6kP/3ADSHeQ/FRAiU5XpW44yyUsmQCEoXOvPySc +ePvu2PZb8JIrxPMhHHz0vHR1Tp7c5CnMm2Ud/8Se3DF3x9YU1LpwDXbyMBMBnxdEW/V N80SrVdAxj1BB3prmHrHGbCxfRTOWimYqgnW2+cxe9sO2KKRHkb2D/z2eT3qY15dPvrg cWFWFWLYb3gWyu7S30Q4TxqtgamHEFlC1VN6vsqsXlASponwHWoXh74eajq9sfyI3Ua8 vpPuApGy3sVoUrj5Ulav5rjj7ps7nsnbLYK57qxEtJPQN83kllv5kSMctz1+Y7VO04r+ LwaA== X-Gm-Message-State: AOAM531sGrSJIEGF870yx4UfpOmr4B4ecNvLY8jmLFx1O9dXh2cfcEXJ eaAqreg1f3LWSZGlkMfvE0icea/8L6I= X-Google-Smtp-Source: ABdhPJzQoodNMMhxa2/q9X18sdEfyQlcsiE4IzlwBPJ/xgakNmGQgvvTEPXvJyJiJqm6BDB7cVNtcA== X-Received: by 2002:a05:600c:d8:: with SMTP id u24mr12411954wmm.103.1607677732986; Fri, 11 Dec 2020 01:08:52 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v64sm13235007wme.25.2020.12.11.01.08.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:52 -0800 (PST) Message-Id: <7214fa73fdd13418744903f6c769fdb77c9512ce.1607677728.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:42 +0000 Subject: [PATCH v2 3/9] diffcore-rename: simplify limit check Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren diffcore-rename had two different checks of the form if ((a < limit || b < limit) && a * b <= limit * limit) This can be simplified to if (st_mult(a, b) <= st_mult(limit, limit)) which makes it clearer how we are checking for overflow, and makes it much easier to parse given the drop from 8 to 4 variable appearances. Signed-off-by: Elijah Newren --- diffcore-rename.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 1d6675c040d..16553ab259f 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -447,12 +447,16 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, * growing larger than a "rename_limit" square matrix, ie: * * num_destinations * num_sources > rename_limit * rename_limit + * + * We use st_mult() to check overflow conditions; in the + * exceptional circumstance that size_t isn't large enough to hold + * the multiplication, the system won't be able to allocate enough + * memory for the matrix anyway. */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_destinations <= rename_limit || num_sources <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)num_sources - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) + if (st_mult(num_destinations, num_sources) + <= st_mult(rename_limit, rename_limit)) return 0; options->needed_rename_limit = @@ -468,9 +472,8 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, continue; limited_sources++; } - if ((num_destinations <= rename_limit || limited_sources <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)limited_sources - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) + if (st_mult(num_destinations, limited_sources) + <= st_mult(rename_limit, rename_limit)) return 2; return 1; } From patchwork Fri Dec 11 09:08:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1677C2BBCD for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C6AD123F5B for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437417AbgLKJKX (ORCPT ); Fri, 11 Dec 2020 04:10:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437381AbgLKJJg (ORCPT ); Fri, 11 Dec 2020 04:09:36 -0500 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C504C061794 for ; Fri, 11 Dec 2020 01:08:55 -0800 (PST) Received: by mail-wr1-x442.google.com with SMTP id m5so8218585wrx.9 for ; Fri, 11 Dec 2020 01:08:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=WELx0XYYnbrJkWi3WvWrkOd+L7awiwNOh9Yvmq0axuw=; b=Q7GLG8iYCYd7+YxmmVCGhUgApafhNDXAC/M2sg6skfNj03HbYeJbv2E7wkQNblIxPS AAY0YbX/wPxp5NtANwP0SybkeqtB2vMUAhGNWWcQoIyaZAAQBHFjI0WBQ7JMa/ySvUoh ++cpYrKJLKtXCtea7QIn1rkTLxuUOTOesrt2uRk9Km/fjaXlyGWY4StB+1ZLeyP+i3yj UqWrvOO9Dy39ooXKfI4ulvDCAxDzvdP3ActgcafjbODjzIqxoTw+aijVxsAnF4YrGaBz Aygcuj/EPAnX0BUg47mkms4ypv3pkLYkrZkYZPWRpt+cygsi7Db6I/eFQRt2nBjjnA0g E+bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=WELx0XYYnbrJkWi3WvWrkOd+L7awiwNOh9Yvmq0axuw=; b=Xqf9x46Pg4zgwNYf+/ZiZLiyTreErh9eRrCZXej//B1vIDsyV5WZrHF+s9kOjnkvy5 dCM2vY7/dLO1U4yWK46ZrPaE8lmnWk5v10tm5QtwDV3Q2vgVuCzwmNZC6GMiEqqtIJcb ELsfCDHy7E2Xx1jnuuJtuFpls4gXeHB+chrayAwsWqg/4OEPDXRZFfdJWr3f8UlCs4aE VUgS98WTVck93E2++Ma9uccWCQELqYFXSWQIyjMs2kUszAOnAAl6ZUWpuVSL2xhKTVC4 JxqE0HOrIosKlFOB4+8h7gRmNIY2wgPzilBAij97jTSvFkFczF5JEFoWszP6Pdk5zVD9 v1Kg== X-Gm-Message-State: AOAM5331wivkF/b3JpuLeotb8yivojdodTz/x2KeGPBc/3S5qS+qgULE BAumE3ykdR/MsKh2/EE31uII2kLzxhk= X-Google-Smtp-Source: ABdhPJzkpDXm3sVuMoPnrL4YqUTh+lP6kbY+RdRIq4vERGtpF0Qy3EbVcDQJlQOXtLsnjlxdBQLw6Q== X-Received: by 2002:adf:f60b:: with SMTP id t11mr9498036wrp.401.1607677733913; Fri, 11 Dec 2020 01:08:53 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e15sm10404164wrx.86.2020.12.11.01.08.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:53 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:43 +0000 Subject: [PATCH v2 4/9] diffcore-rename: reduce jumpiness in progress counters Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Inexact rename detection works by comparing all sources to all destinations, computing similarities, and then finding the best matches among those that are sufficiently similar. However, it is preceded by exact rename detection that works by checking if there are files with identical hashes. If exact renames are found, we can exclude some files from inexact rename detection. The inexact rename detection loops over the full set of files, but immediately skips those for which rename_dst[i].is_rename is true and thus doesn't compare any sources to that destination. As such, these paths shouldn't be included in the progress counter. For the eagle eyed, this change hints at an actual optimization -- the first one I presented at Git Merge 2020. I'll be submitting that optimization later, once the basic merge-ort algorithm has merged. Signed-off-by: Elijah Newren --- diffcore-rename.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 16553ab259f..55a188abcc3 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -593,7 +593,7 @@ void diffcore_rename(struct diff_options *options) if (options->show_rename_progress) { progress = start_delayed_progress( _("Performing inexact rename detection"), - (uint64_t)rename_dst_nr * (uint64_t)rename_src_nr); + (uint64_t)num_destinations * (uint64_t)rename_src_nr); } mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_destinations), @@ -633,7 +633,8 @@ void diffcore_rename(struct diff_options *options) diff_free_filespec_blob(two); } dst_cnt++; - display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr); + display_progress(progress, + (uint64_t)dst_cnt * (uint64_t)rename_src_nr); } stop_progress(&progress); From patchwork Fri Dec 11 09:08:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, LOTS_OF_MONEY,MAILING_LIST_MULTI,MONEY_NOHTML,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA92C4361B for ; Fri, 11 Dec 2020 09:10:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EADD523F37 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437422AbgLKJKZ (ORCPT ); Fri, 11 Dec 2020 04:10:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437391AbgLKJKN (ORCPT ); Fri, 11 Dec 2020 04:10:13 -0500 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 503F0C06179C for ; Fri, 11 Dec 2020 01:08:56 -0800 (PST) Received: by mail-wm1-x344.google.com with SMTP id g185so7850519wmf.3 for ; Fri, 11 Dec 2020 01:08:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=QQatPLA2X4s8HeXNOn3kIQM7ZmA0C3EQfIlNmMO5AUY=; b=AZokjrsGMRJH1LXmu8/Jrsec/UCh82vA4QJMwtGy7hWi6kdhWedoRjheKnORBmV+bw tGHJv4H/8SXDzVhyS99Nmc1iQ+uWQtzKaWlwYGT6tpAMaCo6njS/DF4Hagy2Pj+qIOxr NmqRt47MNBLJz97Zpd7vjhofr6Z2ta6kKHyQYmxI3q3TYLcuOGPdwHKoAZ4E357kfSnj f7jhidCp2acDqkKwnkE+yayz+R13IMl8P1+EpHLuI+aNvfzpLqEeqUsbG63i1j81NkIO FSpZaaj3J5rlv+OA+xfmu7OBXhKue6bVdl6uOtYIQ6O9i9XVDB5EGd/TBGBho70dfeZY qfmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=QQatPLA2X4s8HeXNOn3kIQM7ZmA0C3EQfIlNmMO5AUY=; b=U6tyRXWzBdZAOjRkiPB1tFL8MMsTVyrNiECijxMFJUNxOrkPMQOgsLG23tIcN9qZZ4 +p8S7cu+0Wp7JbPryEP3bIYOo47KoQB0WUu8maxkjfdGgnBYI0rXkD7M+cCr0f2FOHEw cmgZjClBC2GhIzUUhS42Vajz81Zl50StzXag2a01bgUxLwdlWJ4b+JqjoMXLwYDWt3kw 1R4Lawu6BtJVMtoCLXCn+cY7BA+gVL/qSM2cY8Y+kD+NfPmLhoSAeghkUb+kETqCOWyB cWnC74SpS6CIEirGcwrKT6Xyrkwflt1BcxoaoWRq6cr0qZ+TM4u5KsHW84q2/ZePNr4l /hcw== X-Gm-Message-State: AOAM532urVOeA4TRNPSuYlD2A1gsdSJ8HjEQxWmXwFiUUR4xxMQfVkCb tcQgN3vzTM/XDdaEUBuOlSA0SFvXR/I= X-Google-Smtp-Source: ABdhPJw9aGGVMje9iDIJAU0uXT1iwXPqfoWydv6XGRYDMD+xkIhlazDy0EbUNRW2JBEb6atMusafBg== X-Received: by 2002:a1c:6689:: with SMTP id a131mr12397221wmc.33.1607677734788; Fri, 11 Dec 2020 01:08:54 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w17sm14059053wru.82.2020.12.11.01.08.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:54 -0800 (PST) Message-Id: <9a4a3159acf6144bfbfc773195a141afaa95dad3.1607677728.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:44 +0000 Subject: [PATCH v2 5/9] t4058: add more tests and documentation for duplicate tree entry handling Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate destinations", 2015-02-26) added t4058 to demonstrate that a workaround it added to avoid double frees (namely to just turn off rename detection when trees had duplicate entries) would indeed avoid segfaults. The tests, though, give the impression that the expected diffs are "correct" when in reality they are just "don't segfault, and do something semi-reasonable under the circumstances". Add some notes to make this clearer. Also, commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.", 2005-05-24) added a similar workaround to avoid segfaults, but for rename_src rather than rename_dst. I do not see any tests in the testsuite to cover the collision detection of entries limited to the source side, so add a couple. Signed-off-by: Elijah Newren --- t/t4058-diff-duplicates.sh | 47 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh index c24ee175ef0..bd685089561 100755 --- a/t/t4058-diff-duplicates.sh +++ b/t/t4058-diff-duplicates.sh @@ -1,5 +1,14 @@ #!/bin/sh +# NOTICE: +# This testsuite does a number of diffs and checks that the output match. +# However, it is a "garbage in, garbage out" situation; the trees have +# duplicate entries for individual paths, and it results in diffs that do +# not make much sense. As such, it is not clear that the diffs are +# "correct". The primary purpose of these tests was to verify that +# diff-tree does not segfault, but there is perhaps some value in ensuring +# that the diff output isn't wildly unreasonable. + test_description='test tree diff when trees have duplicate entries' . ./test-lib.sh @@ -57,7 +66,16 @@ test_expect_success 'create trees with duplicate entries' ' git tag two $outer_two ' -test_expect_success 'diff-tree between trees' ' +test_expect_success 'create tree without duplicate entries' ' + blob_one=$(echo one | git hash-object -w --stdin) && + outer_three=$(make_tree \ + 100644 renamed $blob_one + ) && + git tag three $outer_three +' + +test_expect_success 'diff-tree between duplicate trees' ' + # See NOTICE at top of file { printf ":000000 100644 $ZERO_OID $blob_two A\touter/inner\n" && printf ":000000 100644 $ZERO_OID $blob_two A\touter/inner\n" && @@ -71,9 +89,34 @@ test_expect_success 'diff-tree between trees' ' ' test_expect_success 'diff-tree with renames' ' - # same expectation as above, since we disable rename detection + # See NOTICE at top of file. git diff-tree -M -r --no-abbrev one two >actual && test_cmp expect actual ' +test_expect_success 'diff-tree FROM duplicate tree' ' + # See NOTICE at top of file. + { + printf ":100644 000000 $blob_one $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":000000 100644 $ZERO_OID $blob_one A\trenamed\n" + } >expect && + git diff-tree -r --no-abbrev one three >actual && + test_cmp expect actual +' + +test_expect_success 'diff-tree FROM duplicate tree, with renames' ' + # See NOTICE at top of file. + { + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 100644 $blob_one $blob_one R100\touter/inner\trenamed\n" + } >expect && + git diff-tree -M -r --no-abbrev one three >actual && + test_cmp expect actual +' + test_done From patchwork Fri Dec 11 09:08:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83055C2BBCA for ; Fri, 11 Dec 2020 09:10:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F86123F38 for ; Fri, 11 Dec 2020 09:10:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437420AbgLKJKY (ORCPT ); Fri, 11 Dec 2020 04:10:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437392AbgLKJKN (ORCPT ); Fri, 11 Dec 2020 04:10:13 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AADFC0617A6 for ; Fri, 11 Dec 2020 01:08:57 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id r3so8246970wrt.2 for ; Fri, 11 Dec 2020 01:08:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=n5Juu5PL0YoW+NlcBtXZ8308O94SMFa4fjKIyTcHubs=; b=RBxruUUPYSUaaDN4ENL0SVCj8FjSegd04/vd4A2kGvvXQKrGcELTp4ijfAaZqjIfIh TzjhWn7U+6eDZTPtyhOdsfGAoBvQsoYrAgWn57QRBAuUMeWuUUvaRNtS7nOiVGRdy/Oz fJ7GZz9+kXplwapuivwJYtvp+dBXIXClstE/9+DyTP3xTa/+VvWPJC5VtrktGpnxyqPO SaIAvaNMj/RilucGT8T3enS3wNtCi+pdqpNSFzRY65GcIz7GrZ8oZ6IW68j9TlMs1cFK HF5stHJcmZxQCJ9cCF3ujppHgHz8xISGvO8d3fRHXph9C4vi3Etsosu6CbRDVzxZ50ey rt+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=n5Juu5PL0YoW+NlcBtXZ8308O94SMFa4fjKIyTcHubs=; b=nUgbWZJPCAquZTvLDpYqA66/ILFdjp2N0JjjswcB4QWARbm9Uesj5Ct4OF5cchT3dQ OqzgcRKQ5FApwrxPIQd2yWwMiAk+lJ6lXG15FmaWxxogI4t9L4afWpZEZZzqSk4qcqDY j7GzT036iaFsXf9r1BLpuUgIZWqjC7ywhl4e6XDxWTklMZbH63flGCddPVyZmDsyNqyL gwGtDp/4VozjlEA/mApxINAxw6oIUomoBGMMGBMuCfCV83WGhnvsfR2MFdeLQwp+wAv9 Ix855zqcqhAiXWPsBO81h+ho1AzKNd11/HBLnH9T+E0HJDTa6tQSS9K+KFXzACRbhAbA wHsA== X-Gm-Message-State: AOAM531F5fPUmJl3CTf1HwAbdaH7wJu/cf9KS2zlveplEHwd3bbh6pw0 CvWAnCfqBLYm6lg31NtIJhhrvUQOg2U= X-Google-Smtp-Source: ABdhPJxF0GEnrJGY5/ZrqgH0148ljOMZQnvVt/HrP/tZtFaQMNcF1AeUuIZtdU6HT9KfvRFhCKwKlA== X-Received: by 2002:adf:9104:: with SMTP id j4mr13403789wrj.198.1607677735695; Fri, 11 Dec 2020 01:08:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i8sm12146453wma.32.2020.12.11.01.08.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:55 -0800 (PST) Message-Id: <8db27892c598a3976c0742e23563f1d360b8dee1.1607677728.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:45 +0000 Subject: [PATCH v2 6/9] t4058: explore duplicate tree entry handling in a bit more detail Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren While creating the last commit, I found a number of other cases where git would segfault when faced with trees that have duplicate entries. None of these segfaults are in the diffcore-rename code (they all occur in cache-tree and unpack-trees). Further, to my knowledge, no one has ever been adversely affected by these bugs, and given that it has been 15 years and folks have fixed a few other issues with historical duplicate entries (as noted in the last commit), I am not sure we will ever run into anyone having problems with these. So I am not sure these are worth fixing, but it doesn't hurt to at least document these failures in the same test file that is concerned with duplicate tree entries. Signed-off-by: Elijah Newren --- t/t4058-diff-duplicates.sh | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh index bd685089561..ad3f37d388d 100755 --- a/t/t4058-diff-duplicates.sh +++ b/t/t4058-diff-duplicates.sh @@ -119,4 +119,71 @@ test_expect_success 'diff-tree FROM duplicate tree, with renames' ' test_cmp expect actual ' +test_expect_success 'create a few commits' ' + git commit-tree -m "Duplicate Entries" two^{tree} >commit_id && + git branch base $(cat commit_id) && + + git commit-tree -p $(cat commit_id) -m "Just one" three^{tree} >up && + git branch update $(cat up) && + + git commit-tree -p $(cat up) -m "Back to weird" two^{tree} >final && + git branch final $(cat final) && + + rm commit_id up final +' + +test_expect_failure 'git read-tree does not segfault' ' + test_when_finished rm .git/index.lock && + test_might_fail git read-tree --reset base +' + +test_expect_failure 'reset --hard does not segfault' ' + test_when_finished rm .git/index.lock && + git checkout base && + test_might_fail git reset --hard +' + +test_expect_failure 'git diff HEAD does not segfault' ' + git checkout base && + GIT_TEST_CHECK_CACHE_TREE=false && + git reset --hard && + test_might_fail git diff HEAD +' + +test_expect_failure 'can switch to another branch when status is empty' ' + git clean -ffdqx && + git status --porcelain -uno >actual && + test_must_be_empty actual && + git checkout update +' + +test_expect_success 'forcibly switch to another branch, verify status empty' ' + git checkout -f update && + git status --porcelain -uno >actual && + test_must_be_empty actual +' + +test_expect_success 'fast-forward from non-duplicate entries to duplicate' ' + git merge final +' + +test_expect_failure 'clean status, switch branches, status still clean' ' + git status --porcelain -uno >actual && + test_must_be_empty actual && + git checkout base && + git status --porcelain -uno >actual && + test_must_be_empty actual +' + +test_expect_success 'switch to base branch and force status to be clean' ' + git checkout base && + GIT_TEST_CHECK_CACHE_TREE=false git reset --hard && + git status --porcelain -uno >actual && + test_must_be_empty actual +' + +test_expect_failure 'fast-forward from duplicate entries to non-duplicate' ' + git merge update +' + test_done From patchwork Fri Dec 11 09:08:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4C06C2BB48 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6355923F38 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437429AbgLKJK1 (ORCPT ); Fri, 11 Dec 2020 04:10:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437398AbgLKJKN (ORCPT ); Fri, 11 Dec 2020 04:10:13 -0500 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 135F6C0617A7 for ; Fri, 11 Dec 2020 01:08:58 -0800 (PST) Received: by mail-wr1-x444.google.com with SMTP id i9so8227726wrc.4 for ; Fri, 11 Dec 2020 01:08:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=eRAhQFYeUrdX9pZ+F6hjVvCTkeVuTmB0eYXdB7dAjOA=; b=oF4MX3gmdMAG+Kf2c2YWhQN5MV+4U9dNnbD4txapPZbnsUH0O6P4+4SBhETFtXsYy8 6E6rBaWXVEdU2WxMPyq1z1WnSoC9Zqhp1b0XWAiRVpIqOiSy8vuZUzgA97eGGruelSlB IXmlxFbvFj+myPsivrwOManwFjqrm9oTBGr4MUKwQHlrzeXc8qn7ABOshZCSEiZD3y1m g47KyHpG38Vn3N3Wu+rZplnhE/SkAeX/XwE0dfH0mzXXcfQnzwnyIe+h5u71HgqCS0Gk ow/PjojMuZK5/PESlt60WnpvO2gw5wCj6Mqouv3I0cN+M8CLvRaAf3gCzqncU1sZeQQx NQZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=eRAhQFYeUrdX9pZ+F6hjVvCTkeVuTmB0eYXdB7dAjOA=; b=AozuXVyNL48f3aV6dsXrTDKzphbip1SWuA57kq6CztTNhf1Uwwza8uu9LsE43+54su c/61qvD1giFCbSaeR9xi1nqm03f6XkCBSf+b3Kk4KoA1reqM6Yr7TGY33uyy+GuNvL9a nmO6SIj2g9e6KkT1vi5Sey4OslDwxKYvhInhi8ygp/+1S7Y4PaFffR/uOP2TRFizU6oB FiaUOtKC6jOYUeVVC4dC0+R0ckTY9qsSWxwuNtoa6jtC9nP1E+P5IDzvtzSeN5g+IaS2 gDtVm6dbGJ/nWeIdQP8ZP4w37+6RLZIJBZZDooO6waNlTaMUReTZ8FGHgxEPjf56gu9I VmSg== X-Gm-Message-State: AOAM532HMJmwtYVz9wbmzEAIh9rkix8Yh3/VX3Mlc3A+XPUUMJ8B3imA PoA1e7/K64jvvOCHPN/cMhQH0dbpLIY= X-Google-Smtp-Source: ABdhPJwGYGRcVICQ6x4ZFabTrISY/50nDg+AahXjMh2p8MjX5xg95pyEqtE1Qh3Rj4qgqGItVIlsHA== X-Received: by 2002:adf:e84c:: with SMTP id d12mr13250160wrn.382.1607677736653; Fri, 11 Dec 2020 01:08:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v1sm13002574wrr.48.2020.12.11.01.08.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:56 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:46 +0000 Subject: [PATCH v2 7/9] diffcore-rename: simplify and accelerate register_rename_src() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren register_rename_src() took pains to create an array in rename_src which was sorted by pathname of the contained diff_filepair. The sorting was entirely unnecessary since callers pass filepairs to us in sorted order. We can simply append to the end of the rename_src array, speeding up diffcore_rename() setup time. Also, note that I dropped the return type on the function since it was unconditionally discarded anyway. This patch is being submitted in a different order than its original development, but in a large rebase of many commits with lots of renames and with several optimizations to inexact rename detection, diffcore_rename() setup time was a sizeable chunk of overall runtime. This patch dropped execution time of rebasing 35 commits with lots of renames by 2% overall. Signed-off-by: Elijah Newren --- diffcore-rename.c | 39 +++++++++++++-------------------------- 1 file changed, 13 insertions(+), 26 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 55a188abcc3..a215421a9cb 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -76,36 +76,23 @@ static struct diff_rename_src { } *rename_src; static int rename_src_nr, rename_src_alloc; -static struct diff_rename_src *register_rename_src(struct diff_filepair *p) +static void register_rename_src(struct diff_filepair *p) { - int first, last; - struct diff_filespec *one = p->one; - unsigned short score = p->score; - - first = 0; - last = rename_src_nr; - while (last > first) { - int next = first + ((last - first) >> 1); - struct diff_rename_src *src = &(rename_src[next]); - int cmp = strcmp(one->path, src->p->one->path); - if (!cmp) - return src; - if (cmp < 0) { - last = next; - continue; - } - first = next+1; - } + /* + * If we have multiple entries at the same path in the source tree + * (an invalid tree, to be sure), avoid using more more than one + * such entry in rename detection. Once upon a time, doing so + * caused segfaults; see commit 25d5ea410f ("[PATCH] Redo + * rename/copy detection logic.", 2005-05-24). + */ + if (rename_src_nr > 0 && + !strcmp(rename_src[rename_src_nr-1].p->one->path, p->one->path)) + return; - /* insert to make it at "first" */ ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); + rename_src[rename_src_nr].p = p; + rename_src[rename_src_nr].score = p->score; rename_src_nr++; - if (first < rename_src_nr) - MOVE_ARRAY(rename_src + first + 1, rename_src + first, - rename_src_nr - first - 1); - rename_src[first].p = p; - rename_src[first].score = score; - return &(rename_src[first]); } static int basename_same(struct diff_filespec *src, struct diff_filespec *dst) From patchwork Fri Dec 11 09:08:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967729 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2915C2BB9A for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85F5A23F58 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437427AbgLKJK0 (ORCPT ); Fri, 11 Dec 2020 04:10:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437399AbgLKJKN (ORCPT ); Fri, 11 Dec 2020 04:10:13 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 512BAC0617B0 for ; Fri, 11 Dec 2020 01:08:59 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id g185so7850647wmf.3 for ; Fri, 11 Dec 2020 01:08:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=WaJHuUi1VRAEs4Q9bzkYu+RL2+OdKIl0dfAFTIDEIMU=; b=QFMknMYtk5ZuTj57/r3mG3pkIU5DS9ZT2x1GHI5Nn0eI15jVkY2rzS04tXaIbwGxsP RRANXBzTCrJcQ1frQoj1/XaTAvSpVgX7W0yHqw0p5XZ68z52IgzoPz7ALIOy49OmNbZp qjnlC1idsERw0X7lH3GZSU86JvMvww3DS3oZqhFHN21amesQi60+KXM7xQhXNQnL5tPz fFDr6DowMWoK2CjdCX186R+vNaJZoQgg5PHyf2x+J3OfmCvnTDqeNW4GOvNhmZ8oiWOp fcah5M9vWvfmnMmKnvGGqpb6rFvtjA/Urlk9dJqOfRAVotVmpE4Tji4DeCQ6eYi7/7Nh pjjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=WaJHuUi1VRAEs4Q9bzkYu+RL2+OdKIl0dfAFTIDEIMU=; b=hT4x9XISa94xf9UaXEcKD7Y4cYrl+XimE0tbSEFE5XbFPIjnptjahWCMZQhzLV9uGe 3pYKzaJ6hCwz8THJvl1616H9fdjRIbTI0x1FXWHNb25QEsYoIbNjBTd3MiSHRBF4oUuz l/XMBKO7x0ET/l9dae/uBEDLTPxpXyfiE+sycnXsKGmZIVSywqTyCi8l2nyTqpAsFBcB vPfStAkcN88XWFdo3IKkcL91DvIw5hRmr5tEmAv/Yo3csRnD6XYyY6PmhmqLlSpgcDeh IfQVVWKu0UJvvtfTK5HM4H6HdTIHoGRWB5IRvnjmHKz4VHUIbNc7g9W0yusg/fW2oyz9 iYJg== X-Gm-Message-State: AOAM531O3J8TIsgtgzI3fAuab7wzAb+NirSpPkynRsWFmShiYQFK3Fix PX6unDQlDVO9PH7vGO5miBCvzCX2Wzc= X-Google-Smtp-Source: ABdhPJwavAinXw0euaClPmJbmUxgTJJg3jzQRY/yMLNulXMd2hbFg/whcGLbXywEEMCrjxyeYpYJEA== X-Received: by 2002:a1c:25c3:: with SMTP id l186mr12062279wml.113.1607677737539; Fri, 11 Dec 2020 01:08:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c190sm13990025wme.19.2020.12.11.01.08.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:57 -0800 (PST) Message-Id: <85714e1583d8c47858e6efd2d5c63bdcca748a25.1607677728.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:47 +0000 Subject: [PATCH v2 8/9] diffcore-rename: accelerate rename_dst setup Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren register_rename_src() simply references the passed pair inside rename_src. In contrast, add_rename_dst() did something entirely different for rename_dst. Instead of copying the passed pair, it made a copy of the second diff_filespec from the passed pair, referenced it, and then set the diff_rename_dst.pair field to NULL. Later, when a pairing is found, record_rename_pair() allocated a full diff_filepair via diff_queue() and pointed its src and dst fields at the appropriate diff_filespecs. This contrast between register_rename_src() for the rename_src data structure and add_rename_dst() for the rename_dst data structure is oddly inconsistent and requires more memory and work than necessary. Let's just reference the original diff_filepair in rename_dst as-is, just as we do with rename_src. Add a new rename_dst.is_rename field, since the rename_dst.p field is never NULL unlike the old rename_dst.pair field. Taking advantage of this change and the fact that same-named paths will be adjacent, we can get rid of the sorting of the array and most of the lookups on it, allowing us to instead just append as we go. However, there is one remaining reason to still keep locate_rename_dst(): handling broken pairs (i.e. when break detection is on). Those are somewhat rare, but we can set up a simple strintmap to get the map between the source and the index. Doing that allows us to still have a fast lookup without sorting the rename_dst array. Since the sorting had been done in a weakly quadratic manner, when many renames are involved this time could add up. There is still a strcmp() in add_rename_dst() that I have left in place to make it easier to verify that the algorithm has the same results. This strcmp() is there to check for duplicate destination entries (which was the easiest way at the time to avoid segfaults in the diffcore-rename code when trees had multiple entries at a given path). The underlying double free()s are no longer an issue with the new algorithm, but that can be addressed in a subsequent commit. This patch is being submitted in a different order than its original development, but in a large rebase of many commits with lots of renames and with several optimizations to inexact rename detection, both setup time and write back to output queue time from diffcore_rename() were sizeable chunks of overall runtime. This patch accelerated the setup time by about 65%, and final write back to the output queue time by about 50%, resulting in an overall drop of 3.5% on the execution time of rebasing a few dozen patches. Signed-off-by: Elijah Newren --- diffcore-rename.c | 148 ++++++++++++++++++++-------------------------- 1 file changed, 65 insertions(+), 83 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index a215421a9cb..2a8fcd52928 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -9,63 +9,48 @@ #include "hashmap.h" #include "progress.h" #include "promisor-remote.h" +#include "strmap.h" /* Table of rename/copy destinations */ static struct diff_rename_dst { - struct diff_filespec *two; - struct diff_filepair *pair; + struct diff_filepair *p; + struct diff_filespec *filespec_to_free; + int is_rename; /* false -> just a create; true -> rename or copy */ } *rename_dst; static int rename_dst_nr, rename_dst_alloc; +/* Mapping from break source pathname to break destination index */ +static struct strintmap *break_idx = NULL; -static int find_rename_dst(struct diff_filespec *two) -{ - int first, last; - - first = 0; - last = rename_dst_nr; - while (last > first) { - int next = first + ((last - first) >> 1); - struct diff_rename_dst *dst = &(rename_dst[next]); - int cmp = strcmp(two->path, dst->two->path); - if (!cmp) - return next; - if (cmp < 0) { - last = next; - continue; - } - first = next+1; - } - return -first - 1; -} - -static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two) +static struct diff_rename_dst *locate_rename_dst(struct diff_filepair *p) { - int ofs = find_rename_dst(two); - return ofs < 0 ? NULL : &rename_dst[ofs]; + /* Lookup by p->ONE->path */ + int idx = break_idx ? strintmap_get(break_idx, p->one->path) : -1; + return (idx == -1) ? NULL : &rename_dst[idx]; } /* * Returns 0 on success, -1 if we found a duplicate. */ -static int add_rename_dst(struct diff_filespec *two) +static int add_rename_dst(struct diff_filepair *p) { - int first = find_rename_dst(two); - - if (first >= 0) + /* + * If we have multiple entries at the same path in the destination + * tree (an invalid tree, to be sure), turn off rename detection + * entirely. Once upon a time, diffcore-rename had double free() + * issues due to such duplicate paths, resulting in segfaults. See + * commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate + * destinations", 2015-02-26). + */ + if (rename_dst_nr > 0 && + !strcmp(rename_dst[rename_dst_nr-1].p->two->path, p->two->path)) return -1; - first = -first - 1; - /* insert to make it at "first" */ ALLOC_GROW(rename_dst, rename_dst_nr + 1, rename_dst_alloc); + rename_dst[rename_dst_nr].p = p; + rename_dst[rename_dst_nr].filespec_to_free = NULL; + rename_dst[rename_dst_nr].is_rename = 0; rename_dst_nr++; - if (first < rename_dst_nr) - MOVE_ARRAY(rename_dst + first + 1, rename_dst + first, - rename_dst_nr - first - 1); - rename_dst[first].two = alloc_filespec(two->path); - fill_filespec(rename_dst[first].two, &two->oid, two->oid_valid, - two->mode); - rename_dst[first].pair = NULL; return 0; } @@ -89,6 +74,14 @@ static void register_rename_src(struct diff_filepair *p) !strcmp(rename_src[rename_src_nr-1].p->one->path, p->one->path)) return; + if (p->broken_pair) { + if (!break_idx) { + break_idx = xmalloc(sizeof(*break_idx)); + strintmap_init(break_idx, -1); + } + strintmap_set(break_idx, p->one->path, rename_dst_nr); + } + ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); rename_src[rename_src_nr].p = p; rename_src[rename_src_nr].score = p->score; @@ -128,14 +121,14 @@ static void prefetch(void *prefetch_options) struct oid_array to_fetch = OID_ARRAY_INIT; for (i = 0; i < rename_dst_nr; i++) { - if (rename_dst[i].pair) + if (rename_dst[i].p->renamed_pair) /* * The loop in diffcore_rename() will not need these * blobs, so skip prefetching. */ continue; /* already found exact match */ diff_add_if_missing(options->repo, &to_fetch, - rename_dst[i].two); + rename_dst[i].p->two); } for (i = 0; i < rename_src_nr; i++) { if (options->skip_unmodified && @@ -245,26 +238,24 @@ static int estimate_similarity(struct repository *r, static void record_rename_pair(int dst_index, int src_index, int score) { - struct diff_filespec *src, *dst; - struct diff_filepair *dp; + struct diff_filepair *src = rename_src[src_index].p; + struct diff_filepair *dst = rename_dst[dst_index].p; - if (rename_dst[dst_index].pair) + if (dst->renamed_pair) die("internal error: dst already matched."); - src = rename_src[src_index].p->one; - src->rename_used++; - src->count++; + src->one->rename_used++; + src->one->count++; - dst = rename_dst[dst_index].two; - dst->count++; + rename_dst[dst_index].filespec_to_free = dst->one; + rename_dst[dst_index].is_rename = 1; - dp = diff_queue(NULL, src, dst); - dp->renamed_pair = 1; - if (!strcmp(src->path, dst->path)) - dp->score = rename_src[src_index].score; + dst->one = src->one; + dst->renamed_pair = 1; + if (!strcmp(dst->one->path, dst->two->path)) + dst->score = rename_src[src_index].score; else - dp->score = score; - rename_dst[dst_index].pair = dp; + dst->score = score; } /* @@ -310,7 +301,7 @@ static int find_identical_files(struct hashmap *srcs, struct diff_options *options) { int renames = 0; - struct diff_filespec *target = rename_dst[dst_index].two; + struct diff_filespec *target = rename_dst[dst_index].p->two; struct file_similarity *p, *best = NULL; int i = 100, best_score = -1; unsigned int hash = hash_filespec(options->repo, target); @@ -476,7 +467,7 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i (mx[i].score < minimum_score)) break; /* there is no more usable pair. */ dst = &rename_dst[mx[i].dst]; - if (dst->pair) + if (dst->is_rename) continue; /* already done, either exact or fuzzy. */ if (!copies && rename_src[mx[i].src].p->one->rename_used) continue; @@ -511,7 +502,7 @@ void diffcore_rename(struct diff_options *options) else if (!options->flags.rename_empty && is_empty_blob_oid(&p->two->oid)) continue; - else if (add_rename_dst(p->two) < 0) { + else if (add_rename_dst(p) < 0) { warning("skipping rename detection, detected" " duplicate destination '%s'", p->two->path); @@ -586,10 +577,10 @@ void diffcore_rename(struct diff_options *options) mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_destinations), sizeof(*mx)); for (dst_cnt = i = 0; i < rename_dst_nr; i++) { - struct diff_filespec *two = rename_dst[i].two; + struct diff_filespec *two = rename_dst[i].p->two; struct diff_score *m; - if (rename_dst[i].pair) + if (rename_dst[i].is_rename) continue; /* dealt with exact match already. */ m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST]; @@ -646,22 +637,8 @@ void diffcore_rename(struct diff_options *options) diff_q(&outq, p); } else if (!DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) { - /* - * Creation - * - * We would output this create record if it has - * not been turned into a rename/copy already. - */ - struct diff_rename_dst *dst = locate_rename_dst(p->two); - if (dst && dst->pair) { - diff_q(&outq, dst->pair); - pair_to_free = p; - } - else - /* no matching rename/copy source, so - * record this as a creation. - */ - diff_q(&outq, p); + /* Creation */ + diff_q(&outq, p); } else if (DIFF_FILE_VALID(p->one) && !DIFF_FILE_VALID(p->two)) { /* @@ -682,8 +659,10 @@ void diffcore_rename(struct diff_options *options) */ if (DIFF_PAIR_BROKEN(p)) { /* broken delete */ - struct diff_rename_dst *dst = locate_rename_dst(p->one); - if (dst && dst->pair) + struct diff_rename_dst *dst = locate_rename_dst(p); + if (!dst) + BUG("tracking failed somehow; failed to find associated dst for broken pair"); + if (dst->is_rename) /* counterpart is now rename/copy */ pair_to_free = p; } @@ -693,16 +672,14 @@ void diffcore_rename(struct diff_options *options) pair_to_free = p; } - if (pair_to_free) - ; - else + if (!pair_to_free) diff_q(&outq, p); } else if (!diff_unmodified_pair(p)) /* all the usual ones need to be kept */ diff_q(&outq, p); else - /* no need to keep unmodified pairs */ + /* no need to keep unmodified pairs; FIXME: remove earlier? */ pair_to_free = p; if (pair_to_free) @@ -715,11 +692,16 @@ void diffcore_rename(struct diff_options *options) diff_debug_queue("done collapsing", q); for (i = 0; i < rename_dst_nr; i++) - free_filespec(rename_dst[i].two); + if (rename_dst[i].filespec_to_free) + free_filespec(rename_dst[i].filespec_to_free); FREE_AND_NULL(rename_dst); rename_dst_nr = rename_dst_alloc = 0; FREE_AND_NULL(rename_src); rename_src_nr = rename_src_alloc = 0; + if (break_idx) { + strintmap_clear(break_idx); + FREE_AND_NULL(break_idx); + } return; } From patchwork Fri Dec 11 09:08:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11967735 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5AC4C1B0E3 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A6EC623F59 for ; Fri, 11 Dec 2020 09:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437424AbgLKJKZ (ORCPT ); Fri, 11 Dec 2020 04:10:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437405AbgLKJKO (ORCPT ); Fri, 11 Dec 2020 04:10:14 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5465C06138C for ; Fri, 11 Dec 2020 01:08:59 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id c5so4597747wrp.6 for ; Fri, 11 Dec 2020 01:08:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=f0/XtdIyw6iwQJQ5kZT23INEqlES4/JfjcvNloEQ/Fk=; b=Sl7+pOdYsuoAKsMVMdFHd5BvmrkGmHsQadclhJTQwCbi+nlzI47Zt94x4cEG9Zrpkj bCPYFTJfVtoSeqOIbVMsTUfzzD8HuVib1zwB8znSkrbmt7jp3sO11T17oCMZs568o9Dk I5i3P54hd+T6RB8uJMv4LgMorSSbdLNO98h5RKeV//uabJmwxacpIcAhPVeyOj7SppHS bwUloWSwKRQ80TupI30dwt1VPyyWuTBCCYkghpXBS2/ZXIK8tDeOAMNO+F3Y8xhyxaHC wcrSa2hXsircGVlgn1FLJEGu20lDr56sxoO+N7DTsZCzxjPuDU81nrxVEzsLl/A5wMS9 SB8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=f0/XtdIyw6iwQJQ5kZT23INEqlES4/JfjcvNloEQ/Fk=; b=lkWlJc/11ZfEXpwf8qlwBAmhvgWG6nMLz4k6j49nhXF51wv/uDQfsJC9B6p6JUxWDn rKcmeLktsDNCgnJCb5hfes1bLrEyR5VAOS1ZZbXQsqNyADHbMD9hfBGzJHXmYD1AQbrN codFLnVwGSyccn1yDrls9Z9+bJ4IYGyP9bu+xCl4S49rUWFl7wmO6qN3L5nTveSSza4f PBC69oMbSblBOsS0KALkr/oFc5B3pk5f0iPXu+6npeoa4V42WEFG5nRcRXOVLJM3tuLX G1o99fMhPWIS22pfkteGjNx+Iz+lN3qraAkytq5RfVuSOIXBG8SmfdHi+O+CIZrUWAKv U1eQ== X-Gm-Message-State: AOAM530RboDjOHTWKWatpAKFnKew0fAlK24ygEgsecYT4DJUr0YlddDN jSZ18LN6SGlTpW8035iz4zs57b/OncQ= X-Google-Smtp-Source: ABdhPJzo3n3sO5RTLnTDPvk+cPykzofAozqSF7kzCx3hopmx30YdeIbS9kjBBUaF12eIHVbPSdgDYQ== X-Received: by 2002:adf:fdcc:: with SMTP id i12mr12559889wrs.317.1607677738238; Fri, 11 Dec 2020 01:08:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h13sm14189718wrm.28.2020.12.11.01.08.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Dec 2020 01:08:57 -0800 (PST) Message-Id: In-Reply-To: References: Date: Fri, 11 Dec 2020 09:08:48 +0000 Subject: [PATCH v2 9/9] diffcore-rename: remove unneccessary duplicate entry checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.", 2005-05-24) added a duplicate entry check on rename_src in order to avoid segfaults; the code at the time was prone to double free()s and an easy way to avoid it was just to turn off rename detection for any duplicate entries. Note that the form of the check was modified two commits ago in this series. Similarly, commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate destinations", 2015-02-26) added a duplicate entry check on rename_dst for the exact same reason -- the code was prone to double free()s, and an easy way to avoid it was just to turn off rename detection entirely. Note that the form of the check was modified in the commit just before this one. In the original code in both places, the code was dealing with individual diff_filespecs and trying to match things up, instead of just keeping the original diff_filepairs around as we do now. The intervening change in structure has fixed the accounting problems and the associated double free()s that used to occur, and thus we already have a better fix. As such, we can remove the band-aid checks for duplicate entries. Due to the last two patches, the diffcore_rename() setup is no longer a sizeable chunk of overall runtime. Thus, in a large rebase of many commits with lots of renames and several optimizations to inexact rename detection, this patch only speeds up the overall code by about half a percent or so and is pretty close to the run-to-run variability making it hard to get an exact measurement. However, with some trace2 regions around the setup code in diffcore_rename() so that I can focus on just it, I measure that this patch consistently saves almost a third of the remaining time spent in diffcore_rename() setup. Signed-off-by: Elijah Newren --- diffcore-rename.c | 23 ----------------------- t/t4058-diff-duplicates.sh | 2 +- 2 files changed, 1 insertion(+), 24 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 2a8fcd52928..90db9ebd6d0 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -34,18 +34,6 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filepair *p) */ static int add_rename_dst(struct diff_filepair *p) { - /* - * If we have multiple entries at the same path in the destination - * tree (an invalid tree, to be sure), turn off rename detection - * entirely. Once upon a time, diffcore-rename had double free() - * issues due to such duplicate paths, resulting in segfaults. See - * commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate - * destinations", 2015-02-26). - */ - if (rename_dst_nr > 0 && - !strcmp(rename_dst[rename_dst_nr-1].p->two->path, p->two->path)) - return -1; - ALLOC_GROW(rename_dst, rename_dst_nr + 1, rename_dst_alloc); rename_dst[rename_dst_nr].p = p; rename_dst[rename_dst_nr].filespec_to_free = NULL; @@ -63,17 +51,6 @@ static int rename_src_nr, rename_src_alloc; static void register_rename_src(struct diff_filepair *p) { - /* - * If we have multiple entries at the same path in the source tree - * (an invalid tree, to be sure), avoid using more more than one - * such entry in rename detection. Once upon a time, doing so - * caused segfaults; see commit 25d5ea410f ("[PATCH] Redo - * rename/copy detection logic.", 2005-05-24). - */ - if (rename_src_nr > 0 && - !strcmp(rename_src[rename_src_nr-1].p->one->path, p->one->path)) - return; - if (p->broken_pair) { if (!break_idx) { break_idx = xmalloc(sizeof(*break_idx)); diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh index ad3f37d388d..54614b814db 100755 --- a/t/t4058-diff-duplicates.sh +++ b/t/t4058-diff-duplicates.sh @@ -91,7 +91,7 @@ test_expect_success 'diff-tree between duplicate trees' ' test_expect_success 'diff-tree with renames' ' # See NOTICE at top of file. git diff-tree -M -r --no-abbrev one two >actual && - test_cmp expect actual + test_must_be_empty actual ' test_expect_success 'diff-tree FROM duplicate tree' '