From patchwork Tue Dec 29 20:05:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4325C433E9 for ; Tue, 29 Dec 2020 20:06:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87A1022209 for ; Tue, 29 Dec 2020 20:06:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726285AbgL2UGZ (ORCPT ); Tue, 29 Dec 2020 15:06:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726185AbgL2UGX (ORCPT ); Tue, 29 Dec 2020 15:06:23 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2DFDC061793 for ; Tue, 29 Dec 2020 12:05:32 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id i9so15620347wrc.4 for ; Tue, 29 Dec 2020 12:05:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=uKzxYlth5/h5Z+r4ojWksiqQ0jpLKXXnqR8B6eOh0po=; b=OJ2U4+LeLOh+XJX2PdGRYR+GiT2HIQQCLZlMpvOHWfBORAE21PNSlVspMNHwBIdFg8 jDRZKYXqcue1s85S1aI14PK8mPN4yXk8FCh7rhtszCvHi7lrT5eC/MlvSBsWwjRTRCYb 8wVuj3iMsOvtShzoAfHPBMFpCosKlPduqnyibuL2BtiXulRSBYVrLNg6sePcGYzTMP3R /VtkJc6jFIDdIIXMBdEzZkNaVyP61sxuYdWR/wTHzkUUgnZ9tupk42ejrF+sbX7LpFHn ATGUMHNMu78r8B+w/W4mU+COhvFNTa7Zwqe0Z5lGeMIjLGWOkOWgkV5E1HNOzbnERUKr nfCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=uKzxYlth5/h5Z+r4ojWksiqQ0jpLKXXnqR8B6eOh0po=; b=nIk+hkci08pbYCfdMHogHqZ90ZRMY15yKSkaS39mH95WyA4s+9KU5EO4OdTdfjSMVg /yZaWD1nSrh8ldj4DRNcSZUlV5Rs+EZToiGkrEdcFkr+S1PSqalfJLxIolNdiTOkQ17/ ujLiP5uUv/Nch0YYLv7ymLRnE5CHNQS2bdKv5QADm9auZUpCJ0zyTLmZ6VZQUcolepC3 wTHvuNXSmU6LlrItfUb7WwaQh11hksXvCffAEhCCfXTWhLUqQ6CNpPB4pCvgSzm3g3Da /5LP1pZitFoNsobhJ7T0nCu2Jsli8NfdAI8sxSRfow/qNaNjJJ7rvPVYCf24PNFzpI/u x2SA== X-Gm-Message-State: AOAM5331dfTOTOKFG0/0tQqtO07SM3iQ+yaRB6U6dTp6cuXQJYPFG8KZ gFiAY1y6P4g1ARFq+mGiuMTcFAFoJKw= X-Google-Smtp-Source: ABdhPJx9DvwNzmRfSnq6lyL/ZZ65Z5HdpNYTXQcEBCEE36fZ++hdKvvt3CBGcWtjLpDe/dUND/I/CQ== X-Received: by 2002:a5d:5385:: with SMTP id d5mr57015964wrv.384.1609272331138; Tue, 29 Dec 2020 12:05:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r1sm64211813wrl.95.2020.12.29.12.05.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:30 -0800 (PST) Message-Id: <428d8204894f668e925c640b2f72dc597164e706.1609272328.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:20 +0000 Subject: [PATCH v3 1/9] diffcore-rename: rename num_create to num_destinations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Our main data structures are rename_src and rename_dst. For counters of these data structures, num_sources and num_destinations seem natural; definitely more so than using num_create for the latter. Signed-off-by: Elijah Newren --- diffcore-rename.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index d367a6d2443..15a98f566e4 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -434,7 +434,7 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) * 1 if we need to disable inexact rename detection; * 2 if we would be under the limit if we were given -C instead of -C -C. */ -static int too_many_rename_candidates(int num_create, +static int too_many_rename_candidates(int num_destinations, struct diff_options *options) { int rename_limit = options->rename_limit; @@ -447,17 +447,17 @@ static int too_many_rename_candidates(int num_create, * This basically does a test for the rename matrix not * growing larger than a "rename_limit" square matrix, ie: * - * num_create * num_src > rename_limit * rename_limit + * num_destinations * num_src > rename_limit * rename_limit */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_create <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_create * (uint64_t)num_src + if ((num_destinations <= rename_limit || num_src <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)num_src <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 0; options->needed_rename_limit = - num_src > num_create ? num_src : num_create; + num_src > num_destinations ? num_src : num_destinations; /* Are we running under -C -C? */ if (!options->flags.find_copies_harder) @@ -469,8 +469,8 @@ static int too_many_rename_candidates(int num_create, continue; num_src++; } - if ((num_create <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_create * (uint64_t)num_src + if ((num_destinations <= rename_limit || num_src <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)num_src <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 2; return 1; @@ -505,7 +505,7 @@ void diffcore_rename(struct diff_options *options) struct diff_queue_struct outq; struct diff_score *mx; int i, j, rename_count, skip_unmodified = 0; - int num_create, dst_cnt; + int num_destinations, dst_cnt; struct progress *progress = NULL; if (!minimum_score) @@ -570,13 +570,13 @@ void diffcore_rename(struct diff_options *options) * Calculate how many renames are left (but all the source * files still remain as options for rename/copies!) */ - num_create = (rename_dst_nr - rename_count); + num_destinations = (rename_dst_nr - rename_count); /* All done? */ - if (!num_create) + if (!num_destinations) goto cleanup; - switch (too_many_rename_candidates(num_create, options)) { + switch (too_many_rename_candidates(num_destinations, options)) { case 1: goto cleanup; case 2: @@ -593,7 +593,8 @@ void diffcore_rename(struct diff_options *options) (uint64_t)rename_dst_nr * (uint64_t)rename_src_nr); } - mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx)); + mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_destinations), + sizeof(*mx)); for (dst_cnt = i = 0; i < rename_dst_nr; i++) { struct diff_filespec *two = rename_dst[i].two; struct diff_score *m; From patchwork Tue Dec 29 20:05:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47962C4332E for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E87C21D94 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726336AbgL2UGb (ORCPT ); Tue, 29 Dec 2020 15:06:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726189AbgL2UGX (ORCPT ); Tue, 29 Dec 2020 15:06:23 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B9ABC061796 for ; Tue, 29 Dec 2020 12:05:33 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id 91so15597349wrj.7 for ; Tue, 29 Dec 2020 12:05:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=0Ue95Y6VgYE2ZtLzIIcPqdKzYogvM9npsEILWau0SY4=; b=T6Rqb4OQM3HsU19TaI/EaL19plL9hT93BXTAunzlQhjRGQ36chfNX6Z2hljaiSKYHe DUGdd64OlmmqJvRbjyEi6LNyinKjYFBjcxQKB4wTPpemMvWxBtZHRrTj+mtYwqcKaP5b M+nwotpslV+GGvnk6/8/lYjeZOQ8P5KaYbk2O/cav7+tT7819WlBTBCAXaeCPoOe+anE ok0o2pCVuztm4Q8MmlTgDlcsLkbb/3T0In2qU0cbIKCbnSYQbZxKILZHxbUul1Doruuu PjpnoRZvXt5XEM1yb2hrShQzc9QpQo/qWyD9KISpNhiFxEDB1S7Y4TCNLYHF8dOAnotp TEBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=0Ue95Y6VgYE2ZtLzIIcPqdKzYogvM9npsEILWau0SY4=; b=IJZc7i85LiqDU4Ps1EM5+vpDnejrB7Ji6WGAPSghcWllcq3S50b5CS0dNZrBRDFIPe BooVJZETxSnPT9eAfpMdCSlmxGkof1Q/Z9f+7gvtlx1oDs53Oa6hViQn0F7cpuEN49z/ ait55tY7o9EJ+hVhE1VIOVduntulZhXLYI+H1ZhX+xPT9tfncj1i3uamcX3M0GN2C6Na P4Q5OomlFOCHthxQgRnEtKX7rAbqN6tUBxBCskO5BcgGemDBObUmXt5r4pfavpSLbhqM BhgXSd1RATuHGQKtwRWFGUsfUQE3E3ZSqpFOj2sScRtunm7OEUjM5tTUXLB4+7p3hYHL 30qg== X-Gm-Message-State: AOAM530MxY11H90e1ks+HSMg3Jv+PYgMkUVyjm1xnasoccAuvInvVTik URuCmZYt+VUOMGnxsBoOOe5j6w76DI0= X-Google-Smtp-Source: ABdhPJzbWxaEWTIK1rrsCy+b8DRFlBjvJ+hy364K1O0e8barb0MZ/nhaAYrXhb1u5snee/1640GlOA== X-Received: by 2002:a05:6000:10c4:: with SMTP id b4mr58249210wrx.170.1609272331982; Tue, 29 Dec 2020 12:05:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v20sm64854237wra.19.2020.12.29.12.05.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:31 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:21 +0000 Subject: [PATCH v3 2/9] diffcore-rename: avoid usage of global in too_many_rename_candidates() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren too_many_rename_candidates() got the number of rename destinations via an argument to the function, but the number of rename sources via a global variable. That felt rather inconsistent. Pass in the number of rename sources as an argument as well. While we are at it... We had a local variable, num_src, that served two purposes. Initially it was set to the global value, but later was used for counting a subset of the number of sources. Since we now have a function argument for the former usage, introduce a clearer variable name for the latter usage. This patch has no behavioral changes; it's just renaming and passing an argument instead of grabbing it from the global namespace. (You may find it easier to view the patch using git diff's --color-words option.) Signed-off-by: Elijah Newren --- diffcore-rename.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 15a98f566e4..1d6675c040d 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -434,12 +434,11 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) * 1 if we need to disable inexact rename detection; * 2 if we would be under the limit if we were given -C instead of -C -C. */ -static int too_many_rename_candidates(int num_destinations, +static int too_many_rename_candidates(int num_destinations, int num_sources, struct diff_options *options) { int rename_limit = options->rename_limit; - int num_src = rename_src_nr; - int i; + int i, limited_sources; options->needed_rename_limit = 0; @@ -447,30 +446,30 @@ static int too_many_rename_candidates(int num_destinations, * This basically does a test for the rename matrix not * growing larger than a "rename_limit" square matrix, ie: * - * num_destinations * num_src > rename_limit * rename_limit + * num_destinations * num_sources > rename_limit * rename_limit */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_destinations <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)num_src + if ((num_destinations <= rename_limit || num_sources <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)num_sources <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 0; options->needed_rename_limit = - num_src > num_destinations ? num_src : num_destinations; + num_sources > num_destinations ? num_sources : num_destinations; /* Are we running under -C -C? */ if (!options->flags.find_copies_harder) return 1; /* Would we bust the limit if we were running under -C? */ - for (num_src = i = 0; i < rename_src_nr; i++) { + for (limited_sources = i = 0; i < num_sources; i++) { if (diff_unmodified_pair(rename_src[i].p)) continue; - num_src++; + limited_sources++; } - if ((num_destinations <= rename_limit || num_src <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)num_src + if ((num_destinations <= rename_limit || limited_sources <= rename_limit) && + ((uint64_t)num_destinations * (uint64_t)limited_sources <= (uint64_t)rename_limit * (uint64_t)rename_limit)) return 2; return 1; @@ -576,7 +575,8 @@ void diffcore_rename(struct diff_options *options) if (!num_destinations) goto cleanup; - switch (too_many_rename_candidates(num_destinations, options)) { + switch (too_many_rename_candidates(num_destinations, rename_src_nr, + options)) { case 1: goto cleanup; case 2: From patchwork Tue Dec 29 20:05:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EF56C4332B for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB2F8221F8 for ; Tue, 29 Dec 2020 20:06:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726330AbgL2UGb (ORCPT ); Tue, 29 Dec 2020 15:06:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726198AbgL2UGX (ORCPT ); Tue, 29 Dec 2020 15:06:23 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E86AC061798 for ; Tue, 29 Dec 2020 12:05:34 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id q18so15655722wrn.1 for ; Tue, 29 Dec 2020 12:05:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=XL0RdMhJCBZ0gxMDc9iKNk8Af4u8BEydSMvvnNYLvbo=; b=n1e2OIexEXXAsoq04mAIPFhiIhGVLQ9MFkA4YCkEp6uk29X85xHEb/F7UELDnlrDc0 Ow9eUuxKzucImD+dFVvkTvl+nmrNfSSFmjTQpqud5czIpZ6yjicWFAGGuj77cZxmPX26 RSCRmRefysS9lhmvDhb8tHXsFiNA7dOO1nWM0/l/Cwwddhs3t/S3ANSxw3GSnhXwvsNo JMHCgvz5PU5Zvb8C8YFdtooFRTipuGuSYwWApmn2yu/B3unkhDAXvcx2QpfRcGyrbLj1 OU1464ZppQyiNDXb8ajkUY4JGRKBSfxzZhrTr7mS9VvcKLSBxpOyMcBUbxP7CE8VwS0g JQ6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=XL0RdMhJCBZ0gxMDc9iKNk8Af4u8BEydSMvvnNYLvbo=; b=eeowYa0YNVO6mA9JEeUnZNoPkAfCpjkD+SbxQu/TE5nYtTPoKzj8RSi6D5xD/0NIOo PuqM61rOb1tUx8LvTDM6bGY+mpY8x+GOnTs0+CigKX1gfQgcjaHBYIuT7MJDqelQ/th+ I4WJLhiFVqH6sxe4GdOxuPlmQ3JLfjcn1byDgML3A4rELlT5L8xcKAZjz1/TlN0mwYAu V03HV2wqZAGguGdh4JFYxs5OBTD000/epJG12EBAaKic1SYN+NWRP1sCKPCQ91N5RGdZ 8u2dtcD9Y8NxUNdd+31YV/Lz6Lnpij7hWnH9nfOXyJ7WrKK9nW1yqfwTn9MNkukuBU+o ZZiw== X-Gm-Message-State: AOAM531JPxqyW/hHOecBKIAAtx/ozJIA96KMyzuxsYXZYoEiNP8RLQ+s 23dTd3frbWrPYUlECZ8CQNkWzsrZypY= X-Google-Smtp-Source: ABdhPJwDvaraAy3FXLY7SZgksNg1ZunomljQEWwaNvSUYlHsLjeOcNfW5qpIS1Yg90eP6m/1YwCg8A== X-Received: by 2002:a05:6000:185:: with SMTP id p5mr56917833wrx.403.1609272332770; Tue, 29 Dec 2020 12:05:32 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u3sm70755596wre.54.2020.12.29.12.05.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:32 -0800 (PST) Message-Id: <7214fa73fdd13418744903f6c769fdb77c9512ce.1609272328.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:22 +0000 Subject: [PATCH v3 3/9] diffcore-rename: simplify limit check Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren diffcore-rename had two different checks of the form if ((a < limit || b < limit) && a * b <= limit * limit) This can be simplified to if (st_mult(a, b) <= st_mult(limit, limit)) which makes it clearer how we are checking for overflow, and makes it much easier to parse given the drop from 8 to 4 variable appearances. Signed-off-by: Elijah Newren --- diffcore-rename.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 1d6675c040d..16553ab259f 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -447,12 +447,16 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, * growing larger than a "rename_limit" square matrix, ie: * * num_destinations * num_sources > rename_limit * rename_limit + * + * We use st_mult() to check overflow conditions; in the + * exceptional circumstance that size_t isn't large enough to hold + * the multiplication, the system won't be able to allocate enough + * memory for the matrix anyway. */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_destinations <= rename_limit || num_sources <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)num_sources - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) + if (st_mult(num_destinations, num_sources) + <= st_mult(rename_limit, rename_limit)) return 0; options->needed_rename_limit = @@ -468,9 +472,8 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, continue; limited_sources++; } - if ((num_destinations <= rename_limit || limited_sources <= rename_limit) && - ((uint64_t)num_destinations * (uint64_t)limited_sources - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) + if (st_mult(num_destinations, limited_sources) + <= st_mult(rename_limit, rename_limit)) return 2; return 1; } From patchwork Tue Dec 29 20:05:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DCD0C4332D for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 422BF21D1B for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726325AbgL2UGa (ORCPT ); Tue, 29 Dec 2020 15:06:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726203AbgL2UGX (ORCPT ); Tue, 29 Dec 2020 15:06:23 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02813C061799 for ; Tue, 29 Dec 2020 12:05:35 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id d26so15572632wrb.12 for ; Tue, 29 Dec 2020 12:05:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=WELx0XYYnbrJkWi3WvWrkOd+L7awiwNOh9Yvmq0axuw=; b=LskyPAUxz8Go86btY6u9ImTO5CBhPWWlp2WUjhHrnJ3esFFeO2llPBC+adJtfGiQNG MF18R8R2tYN/vgXWgIUE71SwYGyZywIW1oZaUvjZ2Sywhe+P9MOyAVQao7RWpYnYrgUG TU8zQ5Q2Cim5jWANHAgyMN+OsIDYCq0MkYczAYPLP62fvxB8EdWUckDpouzVi0IE29/x QVq/4MtkfBCXoIf5HVVLEEiMuIgyeu6UYyqEnTivZnALN1gPuCgX6u4iqlRmyzGn2hXj glT6oxKQF5+DBFmJUTYmE6Z9rD4HGHPN6lLiip9IBLfZ4gwhkMIBFK9FVhPPZqb+9139 zd4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=WELx0XYYnbrJkWi3WvWrkOd+L7awiwNOh9Yvmq0axuw=; b=YpiDGKFB9hlBg888nV6R5TOyd2T0adez8mwYx3oHx2nWIuk7Nq5m41oXfNnyyBg8jD w4f9thSkH3eE3A0PD9PTOawWKRLNAEGP58l0Wav7CvMntRWRaD52+IgHJkg+4rzrXUHK SIFZx57tAbfpflxxmycN8PT0v2zYYMdVg8n+22b7WejM5mYPtU8bj3WX0k5SsAx+Ke1u VmwNoDQShEHzRlKGdSYnRye6hZTXrFuy22Z1Qim3l7i6KLoxwtjtkaL7JhuLUCKnoJQK 1UcUYwW618PMqgwfIRP/FLbguD+ZQME9jas3igdrVHNhRx9XFlmN7ni6EGX82YsKIppF 3Dfw== X-Gm-Message-State: AOAM532XpIqsOaBEKMx/ynOVimD+c1txR4wgB658SKu7KS5+N2/zCzjz 4FuXZXDEJB2NZytLx6xjNsKfChNiGcY= X-Google-Smtp-Source: ABdhPJxO/nN5CyZgMjel1+i03WZYblbIyAdPt8aUR3gLs+UvyhfA4HIDYeKX0mYq1wYmxKWpuAfAYA== X-Received: by 2002:adf:8342:: with SMTP id 60mr57863201wrd.140.1609272333603; Tue, 29 Dec 2020 12:05:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m2sm4406604wml.34.2020.12.29.12.05.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:33 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:23 +0000 Subject: [PATCH v3 4/9] diffcore-rename: reduce jumpiness in progress counters Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Inexact rename detection works by comparing all sources to all destinations, computing similarities, and then finding the best matches among those that are sufficiently similar. However, it is preceded by exact rename detection that works by checking if there are files with identical hashes. If exact renames are found, we can exclude some files from inexact rename detection. The inexact rename detection loops over the full set of files, but immediately skips those for which rename_dst[i].is_rename is true and thus doesn't compare any sources to that destination. As such, these paths shouldn't be included in the progress counter. For the eagle eyed, this change hints at an actual optimization -- the first one I presented at Git Merge 2020. I'll be submitting that optimization later, once the basic merge-ort algorithm has merged. Signed-off-by: Elijah Newren --- diffcore-rename.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 16553ab259f..55a188abcc3 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -593,7 +593,7 @@ void diffcore_rename(struct diff_options *options) if (options->show_rename_progress) { progress = start_delayed_progress( _("Performing inexact rename detection"), - (uint64_t)rename_dst_nr * (uint64_t)rename_src_nr); + (uint64_t)num_destinations * (uint64_t)rename_src_nr); } mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_destinations), @@ -633,7 +633,8 @@ void diffcore_rename(struct diff_options *options) diff_free_filespec_blob(two); } dst_cnt++; - display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr); + display_progress(progress, + (uint64_t)dst_cnt * (uint64_t)rename_src_nr); } stop_progress(&progress); From patchwork Tue Dec 29 20:05:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, LOTS_OF_MONEY,MAILING_LIST_MULTI,MONEY_NOHTML,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95396C43332 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65A3D20867 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726318AbgL2UGa (ORCPT ); Tue, 29 Dec 2020 15:06:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726214AbgL2UGX (ORCPT ); Tue, 29 Dec 2020 15:06:23 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAC78C06179A for ; Tue, 29 Dec 2020 12:05:35 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id 3so3047447wmg.4 for ; Tue, 29 Dec 2020 12:05:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=QQatPLA2X4s8HeXNOn3kIQM7ZmA0C3EQfIlNmMO5AUY=; b=UMuH1WgHp2l5kX4ZaHMHfmgvG4XmNCR7qqRk85V4cFaBbkVyLcAf697IdLOPvRkODB ZABdkQzJAzxW9vJvrclrj4yhoTOMPEpEgzr8TzWVtKFw+zR+rKxFO/F2FoB0J/HA1qFv uDouCd+eTP7q8gtp3MCKDvQ3Ktl8KumKUzVUxWOIH379eNRs5G/7ocfM2fkyvwRyxcWA Z1j8h/N8DxiuM/9Gj0dy3pm7v5elm88KS+KyRSrnm1dnD92K3/TKE2zSSpixMMVzShBe W84NBj4GdRocGd+rBIo/DPZoug2VgW2qQeOgIVEiuQvRk/3kbwFBwG93lTPEIOxl9eWu X1UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=QQatPLA2X4s8HeXNOn3kIQM7ZmA0C3EQfIlNmMO5AUY=; b=jWO06EViPOX5B94ZYgIPcRw9P6MPaBABC8A2Q2b0FEKi9Kv1dRqXKA2CxRZkBCYISE GnbkF/+Nq7FT9xF/qLQP5vHbS9Xlj8kTsEtw1s7S5jeSD6/IY/4r/5MLoub0u54yolSH 8jp/Nl0iQDRFRxcSK1Q3K93NuUnvr/riaFwMnz8ScgwSkmo2cPawf3t98wf8WGNdpuvY fUXPjKrltuPYEchJolaVqhCxrUduk1j8G7PRPjIKx4x3hud+LY20OXsCU2eL5T/v6dEK 8NbGu99eXZnayI0nLkP/FZOsKC6jknN5I4ZpqivHW8qJPu/gmdw9HZVMeDULwPo0Xaqo 4WGg== X-Gm-Message-State: AOAM530jlMUP2JlW4CIJiwtgykTv7YlHPxozyUvlJW4GvCjM9q8IHU9c ou9y9AmYM7/pfQp6LvrvnU7GTCBNvLc= X-Google-Smtp-Source: ABdhPJzzJWjhjgw+e2P4K+5T/7D66vo+j3to9qihQ3BdinJxMb2A/nMesCraF/pwLyLadagOZ6ulYQ== X-Received: by 2002:a1c:e2c2:: with SMTP id z185mr4935521wmg.49.1609272334385; Tue, 29 Dec 2020 12:05:34 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r13sm62346469wrs.6.2020.12.29.12.05.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:33 -0800 (PST) Message-Id: <9a4a3159acf6144bfbfc773195a141afaa95dad3.1609272328.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:24 +0000 Subject: [PATCH v3 5/9] t4058: add more tests and documentation for duplicate tree entry handling Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate destinations", 2015-02-26) added t4058 to demonstrate that a workaround it added to avoid double frees (namely to just turn off rename detection when trees had duplicate entries) would indeed avoid segfaults. The tests, though, give the impression that the expected diffs are "correct" when in reality they are just "don't segfault, and do something semi-reasonable under the circumstances". Add some notes to make this clearer. Also, commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.", 2005-05-24) added a similar workaround to avoid segfaults, but for rename_src rather than rename_dst. I do not see any tests in the testsuite to cover the collision detection of entries limited to the source side, so add a couple. Signed-off-by: Elijah Newren --- t/t4058-diff-duplicates.sh | 47 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh index c24ee175ef0..bd685089561 100755 --- a/t/t4058-diff-duplicates.sh +++ b/t/t4058-diff-duplicates.sh @@ -1,5 +1,14 @@ #!/bin/sh +# NOTICE: +# This testsuite does a number of diffs and checks that the output match. +# However, it is a "garbage in, garbage out" situation; the trees have +# duplicate entries for individual paths, and it results in diffs that do +# not make much sense. As such, it is not clear that the diffs are +# "correct". The primary purpose of these tests was to verify that +# diff-tree does not segfault, but there is perhaps some value in ensuring +# that the diff output isn't wildly unreasonable. + test_description='test tree diff when trees have duplicate entries' . ./test-lib.sh @@ -57,7 +66,16 @@ test_expect_success 'create trees with duplicate entries' ' git tag two $outer_two ' -test_expect_success 'diff-tree between trees' ' +test_expect_success 'create tree without duplicate entries' ' + blob_one=$(echo one | git hash-object -w --stdin) && + outer_three=$(make_tree \ + 100644 renamed $blob_one + ) && + git tag three $outer_three +' + +test_expect_success 'diff-tree between duplicate trees' ' + # See NOTICE at top of file { printf ":000000 100644 $ZERO_OID $blob_two A\touter/inner\n" && printf ":000000 100644 $ZERO_OID $blob_two A\touter/inner\n" && @@ -71,9 +89,34 @@ test_expect_success 'diff-tree between trees' ' ' test_expect_success 'diff-tree with renames' ' - # same expectation as above, since we disable rename detection + # See NOTICE at top of file. git diff-tree -M -r --no-abbrev one two >actual && test_cmp expect actual ' +test_expect_success 'diff-tree FROM duplicate tree' ' + # See NOTICE at top of file. + { + printf ":100644 000000 $blob_one $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":000000 100644 $ZERO_OID $blob_one A\trenamed\n" + } >expect && + git diff-tree -r --no-abbrev one three >actual && + test_cmp expect actual +' + +test_expect_success 'diff-tree FROM duplicate tree, with renames' ' + # See NOTICE at top of file. + { + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 000000 $blob_two $ZERO_OID D\touter/inner\n" && + printf ":100644 100644 $blob_one $blob_one R100\touter/inner\trenamed\n" + } >expect && + git diff-tree -M -r --no-abbrev one three >actual && + test_cmp expect actual +' + test_done From patchwork Tue Dec 29 20:05:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEABAC43333 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 882C0221F8 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726313AbgL2UGa (ORCPT ); Tue, 29 Dec 2020 15:06:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726218AbgL2UGX (ORCPT ); Tue, 29 Dec 2020 15:06:23 -0500 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C133FC06179B for ; Tue, 29 Dec 2020 12:05:36 -0800 (PST) Received: by mail-wr1-x429.google.com with SMTP id t16so15624984wra.3 for ; Tue, 29 Dec 2020 12:05:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=n5Juu5PL0YoW+NlcBtXZ8308O94SMFa4fjKIyTcHubs=; b=ZQwlLgJ7iN0MQaT3LEuTYwSmobwxaSor9wFSZhlbucMyI8de0JJkDlcCgBtN28ZMmf vvWyxljCQO19Itbuj97D5wnqOSwKhli4JJRPTsaAwD6W4zc01geJyPDlo5eHW1UUr/Xr IxfH1U+zgNGposfqNgzY6GUYqP59hP1mu2pFFoPIhKTADuUSe2kajqpCtY5bQdNzf87j TN07xUzX+vrXFFI19TIAe8b9VqVbe6H0RsEWO37NVkYkV0aD4EVGAZc8FBKOpuf4e/VN HbGZssIFK6psm/3aZFQkw0xasvYbRtkjDio0MWlRvrp/GNkR5neFy8b2ag+MLDzwx8RR xsCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=n5Juu5PL0YoW+NlcBtXZ8308O94SMFa4fjKIyTcHubs=; b=dbvRrydQ4O5uiVk1cH+j1laTYB04pZ/Q/X4oL3k5Y7HEPl6+qiRgLcqprYxfzIQXz8 sfdtab6n9tRh6cVOcCGTAeHpgy2/tgONk+3NqkexwHGdSgzqGwD2r/xvlsauR4/ZmT2B /gNuBujTXpbkP8a5J1NvWoiBLk+ZI+hTKVGsmywmrdUmCtv1vQCIf3PufwJGiytVyL4T qPjZKMUovWkYUWgkOjFbBlG7pm8Wl1SsInt0FRCpwb4Fk3njPZ4ZMYKYlHV6Hm1RFqrs H3ywqAKOhPPcMJHsP/IVbyAgSZ55ICkX+xmsAXjfoZeK5wvvnosJR0n6H1zYzXI1TDqe xcsw== X-Gm-Message-State: AOAM532h040FL4Dm1NSKiV6BTEFhn0cnTehdmZqNqfsyc7oMP7vQgJbs wPdwJqZjyYhnnf28udZ+p+uGce81clo= X-Google-Smtp-Source: ABdhPJw86yDib4Rfne3QN/tBsmyDUDCc3sikukTFCKHqEt96ZhhhW4ANYsR6RVQnKSKARl1wj82BkQ== X-Received: by 2002:a5d:6607:: with SMTP id n7mr55303982wru.206.1609272335411; Tue, 29 Dec 2020 12:05:35 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g192sm4788969wme.48.2020.12.29.12.05.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:34 -0800 (PST) Message-Id: <8db27892c598a3976c0742e23563f1d360b8dee1.1609272328.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:25 +0000 Subject: [PATCH v3 6/9] t4058: explore duplicate tree entry handling in a bit more detail Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren While creating the last commit, I found a number of other cases where git would segfault when faced with trees that have duplicate entries. None of these segfaults are in the diffcore-rename code (they all occur in cache-tree and unpack-trees). Further, to my knowledge, no one has ever been adversely affected by these bugs, and given that it has been 15 years and folks have fixed a few other issues with historical duplicate entries (as noted in the last commit), I am not sure we will ever run into anyone having problems with these. So I am not sure these are worth fixing, but it doesn't hurt to at least document these failures in the same test file that is concerned with duplicate tree entries. Signed-off-by: Elijah Newren --- t/t4058-diff-duplicates.sh | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh index bd685089561..ad3f37d388d 100755 --- a/t/t4058-diff-duplicates.sh +++ b/t/t4058-diff-duplicates.sh @@ -119,4 +119,71 @@ test_expect_success 'diff-tree FROM duplicate tree, with renames' ' test_cmp expect actual ' +test_expect_success 'create a few commits' ' + git commit-tree -m "Duplicate Entries" two^{tree} >commit_id && + git branch base $(cat commit_id) && + + git commit-tree -p $(cat commit_id) -m "Just one" three^{tree} >up && + git branch update $(cat up) && + + git commit-tree -p $(cat up) -m "Back to weird" two^{tree} >final && + git branch final $(cat final) && + + rm commit_id up final +' + +test_expect_failure 'git read-tree does not segfault' ' + test_when_finished rm .git/index.lock && + test_might_fail git read-tree --reset base +' + +test_expect_failure 'reset --hard does not segfault' ' + test_when_finished rm .git/index.lock && + git checkout base && + test_might_fail git reset --hard +' + +test_expect_failure 'git diff HEAD does not segfault' ' + git checkout base && + GIT_TEST_CHECK_CACHE_TREE=false && + git reset --hard && + test_might_fail git diff HEAD +' + +test_expect_failure 'can switch to another branch when status is empty' ' + git clean -ffdqx && + git status --porcelain -uno >actual && + test_must_be_empty actual && + git checkout update +' + +test_expect_success 'forcibly switch to another branch, verify status empty' ' + git checkout -f update && + git status --porcelain -uno >actual && + test_must_be_empty actual +' + +test_expect_success 'fast-forward from non-duplicate entries to duplicate' ' + git merge final +' + +test_expect_failure 'clean status, switch branches, status still clean' ' + git status --porcelain -uno >actual && + test_must_be_empty actual && + git checkout base && + git status --porcelain -uno >actual && + test_must_be_empty actual +' + +test_expect_success 'switch to base branch and force status to be clean' ' + git checkout base && + GIT_TEST_CHECK_CACHE_TREE=false git reset --hard && + git status --porcelain -uno >actual && + test_must_be_empty actual +' + +test_expect_failure 'fast-forward from duplicate entries to non-duplicate' ' + git merge update +' + test_done From patchwork Tue Dec 29 20:05:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CBF5C43381 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BFF9022209 for ; Tue, 29 Dec 2020 20:06:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726305AbgL2UG2 (ORCPT ); Tue, 29 Dec 2020 15:06:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726244AbgL2UGY (ORCPT ); Tue, 29 Dec 2020 15:06:24 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5B89C06179C for ; Tue, 29 Dec 2020 12:05:37 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id t30so15640778wrb.0 for ; Tue, 29 Dec 2020 12:05:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=eRAhQFYeUrdX9pZ+F6hjVvCTkeVuTmB0eYXdB7dAjOA=; b=eCGLmvZZaZvf94hDS84A9/DirkkQRi2sc7j0XYMemfGutZhAV5tqqgHTyBzUAT0s6a 8LRCkqxdLoWZQuGWRitY3LiI/a/vXccom7mzl8M+OCJo7kdBvRaIuyAJ+GI6A3lqcU1N y5bJ3wIjVGcE2uKz4IMLQCeFl+QTko3o+4zG0MjD55zBQb6QtSS8jWrOn9a+vrsWunXc hr7Y1ADwgZhrhdsBHiSjjgYTHGHdG5bxiiCNYEJ1jxF0Fwa9GEf6pebIr97t38coCPjZ hWys5aC4MjzhpTtt0LnhDszDwxwoBM/JCc19YRGhLFk8Rjbz83NBpxpLsvCgdJJt8ymG E8wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=eRAhQFYeUrdX9pZ+F6hjVvCTkeVuTmB0eYXdB7dAjOA=; b=aldfTgaacjUGFCwwIoIbs3rS9oh7n1Jqlj8xwZRmPrmbjkhpQea4vJgarABUvaQez0 nGe4Q8NM0NUI1hrL6Nd8I9CPjkjhftH3CW33YsgYlNK81THPJUUSN5PDbOybE/cW8A3C Vo4BdydUoQ3mY2PEy7Nixd8yBdrNPHJIp6N9faYyJ7MlB0IpUELbWQhPcCoAI1kHzxru lRYSgJQ8TtFOJrLoF0v/aW4oENQG+OxcN1egvPAolYF8U41kPUUOZCUUbRWGHYX98scc 3sTj2ktC7yMO64PVzr/sqIDnTH3d/15/wAYkpxpxbYkqIGo97nuyvmEEaXTlI/T4SFZq x9NQ== X-Gm-Message-State: AOAM533bytXgDxOL5da9X6Oglz+0Amee/rPDweZmzaQQQUlofzgHiVxs QoCSKlHKGtpKLJx3k9F9Cd3mzkno8Ak= X-Google-Smtp-Source: ABdhPJzFEwcxyJimhRRjLIc2rYJvAkmBhJLr5oW4w6YjVdaO8TG+bSYTl2fY4KsWOH/mKCPJcNbgRw== X-Received: by 2002:adf:c403:: with SMTP id v3mr55934656wrf.55.1609272336222; Tue, 29 Dec 2020 12:05:36 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y130sm5005845wmc.22.2020.12.29.12.05.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:35 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:26 +0000 Subject: [PATCH v3 7/9] diffcore-rename: simplify and accelerate register_rename_src() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren register_rename_src() took pains to create an array in rename_src which was sorted by pathname of the contained diff_filepair. The sorting was entirely unnecessary since callers pass filepairs to us in sorted order. We can simply append to the end of the rename_src array, speeding up diffcore_rename() setup time. Also, note that I dropped the return type on the function since it was unconditionally discarded anyway. This patch is being submitted in a different order than its original development, but in a large rebase of many commits with lots of renames and with several optimizations to inexact rename detection, diffcore_rename() setup time was a sizeable chunk of overall runtime. This patch dropped execution time of rebasing 35 commits with lots of renames by 2% overall. Signed-off-by: Elijah Newren --- diffcore-rename.c | 39 +++++++++++++-------------------------- 1 file changed, 13 insertions(+), 26 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 55a188abcc3..a215421a9cb 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -76,36 +76,23 @@ static struct diff_rename_src { } *rename_src; static int rename_src_nr, rename_src_alloc; -static struct diff_rename_src *register_rename_src(struct diff_filepair *p) +static void register_rename_src(struct diff_filepair *p) { - int first, last; - struct diff_filespec *one = p->one; - unsigned short score = p->score; - - first = 0; - last = rename_src_nr; - while (last > first) { - int next = first + ((last - first) >> 1); - struct diff_rename_src *src = &(rename_src[next]); - int cmp = strcmp(one->path, src->p->one->path); - if (!cmp) - return src; - if (cmp < 0) { - last = next; - continue; - } - first = next+1; - } + /* + * If we have multiple entries at the same path in the source tree + * (an invalid tree, to be sure), avoid using more more than one + * such entry in rename detection. Once upon a time, doing so + * caused segfaults; see commit 25d5ea410f ("[PATCH] Redo + * rename/copy detection logic.", 2005-05-24). + */ + if (rename_src_nr > 0 && + !strcmp(rename_src[rename_src_nr-1].p->one->path, p->one->path)) + return; - /* insert to make it at "first" */ ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); + rename_src[rename_src_nr].p = p; + rename_src[rename_src_nr].score = p->score; rename_src_nr++; - if (first < rename_src_nr) - MOVE_ARRAY(rename_src + first + 1, rename_src + first, - rename_src_nr - first - 1); - rename_src[first].p = p; - rename_src[first].score = score; - return &(rename_src[first]); } static int basename_same(struct diff_filespec *src, struct diff_filespec *dst) From patchwork Tue Dec 29 20:05:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7915C433E6 for ; Tue, 29 Dec 2020 20:06:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A209021D1B for ; Tue, 29 Dec 2020 20:06:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726293AbgL2UG2 (ORCPT ); Tue, 29 Dec 2020 15:06:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726261AbgL2UGY (ORCPT ); Tue, 29 Dec 2020 15:06:24 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0098AC06179E for ; Tue, 29 Dec 2020 12:05:38 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id 190so3368352wmz.0 for ; Tue, 29 Dec 2020 12:05:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=WaJHuUi1VRAEs4Q9bzkYu+RL2+OdKIl0dfAFTIDEIMU=; b=b2hG4l3OgfyrMtPCf5Ba9k/PWHDDjTP4a0H/Zqwnnn4ebeLCs0k5bnYxdZSmQiZvoT 5i5a8zSGFftft8lBUerMfgUKwMBgOUIreCs/qXcDIomoFfFdwf5KsIOmMtn8vSVBU7lU +JV2R/kqpHpjfpUmIl8RhCCekwXuWd4QGummzlrTjtDBtXXd2A2WNYfZZp08dmyD93Ee WbgbvnLtpnDK5rq36MwVxTudEkfWw5Kz0sRFk+6X2csvYZCI3L/52C5MdZQxf5HM3lRf l8SwMtsk52vST1S3iNXjHa2FWfLRLAx4qfOeQx/kEVn5cymN9RJwI3c5NbUgtVVcIAPQ dplQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=WaJHuUi1VRAEs4Q9bzkYu+RL2+OdKIl0dfAFTIDEIMU=; b=tOUxqHsDFLldYhrFJxUZcygQQ5x6N9CiqpK0hwf42t7TzAOINrs9yPmk9gPUdinnuk ejy/GD20A9U+8O3cPscOxIJK3RWwuhGN7BcTvdH/lSxOfsB7KD8YlyO5y/eTnUNG0m1G 8a0YjejqggfEUN6JhAmiagaA8c2XHicznRSiuyOrQz1YV6fDb+piS9Maxfz5jtM7nCz+ vSi3DF7ErWVsx7KMWKmsiD5+lT0a6SwSw4mJgyak3Hp0LCZCpetWqcJJMHhBL8nhejXD iqrN/HIow2pwv8gSNLGYJDiOrMFSbBHKq2XyqgvhUYbyIF1UFhThiAMsZ0/jNoTaopX/ A0Yg== X-Gm-Message-State: AOAM5333SANEbDFIOBUYSQS7zMUThNZqIizPb7M9gu6C5fDiIXWcrLKL /r33EIYLbHLv6SwMABK1jYHUoTP4LMs= X-Google-Smtp-Source: ABdhPJyrBJd83CayPx+JkKVBO2kxs7O/qQwbfHavEoRderdO1x/+sAAzaMhSxyDPAllR4bbiEuWjLg== X-Received: by 2002:a7b:cf0d:: with SMTP id l13mr4862220wmg.168.1609272337193; Tue, 29 Dec 2020 12:05:37 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r13sm62606310wrt.10.2020.12.29.12.05.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:36 -0800 (PST) Message-Id: <85714e1583d8c47858e6efd2d5c63bdcca748a25.1609272328.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:27 +0000 Subject: [PATCH v3 8/9] diffcore-rename: accelerate rename_dst setup Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren register_rename_src() simply references the passed pair inside rename_src. In contrast, add_rename_dst() did something entirely different for rename_dst. Instead of copying the passed pair, it made a copy of the second diff_filespec from the passed pair, referenced it, and then set the diff_rename_dst.pair field to NULL. Later, when a pairing is found, record_rename_pair() allocated a full diff_filepair via diff_queue() and pointed its src and dst fields at the appropriate diff_filespecs. This contrast between register_rename_src() for the rename_src data structure and add_rename_dst() for the rename_dst data structure is oddly inconsistent and requires more memory and work than necessary. Let's just reference the original diff_filepair in rename_dst as-is, just as we do with rename_src. Add a new rename_dst.is_rename field, since the rename_dst.p field is never NULL unlike the old rename_dst.pair field. Taking advantage of this change and the fact that same-named paths will be adjacent, we can get rid of the sorting of the array and most of the lookups on it, allowing us to instead just append as we go. However, there is one remaining reason to still keep locate_rename_dst(): handling broken pairs (i.e. when break detection is on). Those are somewhat rare, but we can set up a simple strintmap to get the map between the source and the index. Doing that allows us to still have a fast lookup without sorting the rename_dst array. Since the sorting had been done in a weakly quadratic manner, when many renames are involved this time could add up. There is still a strcmp() in add_rename_dst() that I have left in place to make it easier to verify that the algorithm has the same results. This strcmp() is there to check for duplicate destination entries (which was the easiest way at the time to avoid segfaults in the diffcore-rename code when trees had multiple entries at a given path). The underlying double free()s are no longer an issue with the new algorithm, but that can be addressed in a subsequent commit. This patch is being submitted in a different order than its original development, but in a large rebase of many commits with lots of renames and with several optimizations to inexact rename detection, both setup time and write back to output queue time from diffcore_rename() were sizeable chunks of overall runtime. This patch accelerated the setup time by about 65%, and final write back to the output queue time by about 50%, resulting in an overall drop of 3.5% on the execution time of rebasing a few dozen patches. Signed-off-by: Elijah Newren --- diffcore-rename.c | 148 ++++++++++++++++++++-------------------------- 1 file changed, 65 insertions(+), 83 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index a215421a9cb..2a8fcd52928 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -9,63 +9,48 @@ #include "hashmap.h" #include "progress.h" #include "promisor-remote.h" +#include "strmap.h" /* Table of rename/copy destinations */ static struct diff_rename_dst { - struct diff_filespec *two; - struct diff_filepair *pair; + struct diff_filepair *p; + struct diff_filespec *filespec_to_free; + int is_rename; /* false -> just a create; true -> rename or copy */ } *rename_dst; static int rename_dst_nr, rename_dst_alloc; +/* Mapping from break source pathname to break destination index */ +static struct strintmap *break_idx = NULL; -static int find_rename_dst(struct diff_filespec *two) -{ - int first, last; - - first = 0; - last = rename_dst_nr; - while (last > first) { - int next = first + ((last - first) >> 1); - struct diff_rename_dst *dst = &(rename_dst[next]); - int cmp = strcmp(two->path, dst->two->path); - if (!cmp) - return next; - if (cmp < 0) { - last = next; - continue; - } - first = next+1; - } - return -first - 1; -} - -static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two) +static struct diff_rename_dst *locate_rename_dst(struct diff_filepair *p) { - int ofs = find_rename_dst(two); - return ofs < 0 ? NULL : &rename_dst[ofs]; + /* Lookup by p->ONE->path */ + int idx = break_idx ? strintmap_get(break_idx, p->one->path) : -1; + return (idx == -1) ? NULL : &rename_dst[idx]; } /* * Returns 0 on success, -1 if we found a duplicate. */ -static int add_rename_dst(struct diff_filespec *two) +static int add_rename_dst(struct diff_filepair *p) { - int first = find_rename_dst(two); - - if (first >= 0) + /* + * If we have multiple entries at the same path in the destination + * tree (an invalid tree, to be sure), turn off rename detection + * entirely. Once upon a time, diffcore-rename had double free() + * issues due to such duplicate paths, resulting in segfaults. See + * commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate + * destinations", 2015-02-26). + */ + if (rename_dst_nr > 0 && + !strcmp(rename_dst[rename_dst_nr-1].p->two->path, p->two->path)) return -1; - first = -first - 1; - /* insert to make it at "first" */ ALLOC_GROW(rename_dst, rename_dst_nr + 1, rename_dst_alloc); + rename_dst[rename_dst_nr].p = p; + rename_dst[rename_dst_nr].filespec_to_free = NULL; + rename_dst[rename_dst_nr].is_rename = 0; rename_dst_nr++; - if (first < rename_dst_nr) - MOVE_ARRAY(rename_dst + first + 1, rename_dst + first, - rename_dst_nr - first - 1); - rename_dst[first].two = alloc_filespec(two->path); - fill_filespec(rename_dst[first].two, &two->oid, two->oid_valid, - two->mode); - rename_dst[first].pair = NULL; return 0; } @@ -89,6 +74,14 @@ static void register_rename_src(struct diff_filepair *p) !strcmp(rename_src[rename_src_nr-1].p->one->path, p->one->path)) return; + if (p->broken_pair) { + if (!break_idx) { + break_idx = xmalloc(sizeof(*break_idx)); + strintmap_init(break_idx, -1); + } + strintmap_set(break_idx, p->one->path, rename_dst_nr); + } + ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); rename_src[rename_src_nr].p = p; rename_src[rename_src_nr].score = p->score; @@ -128,14 +121,14 @@ static void prefetch(void *prefetch_options) struct oid_array to_fetch = OID_ARRAY_INIT; for (i = 0; i < rename_dst_nr; i++) { - if (rename_dst[i].pair) + if (rename_dst[i].p->renamed_pair) /* * The loop in diffcore_rename() will not need these * blobs, so skip prefetching. */ continue; /* already found exact match */ diff_add_if_missing(options->repo, &to_fetch, - rename_dst[i].two); + rename_dst[i].p->two); } for (i = 0; i < rename_src_nr; i++) { if (options->skip_unmodified && @@ -245,26 +238,24 @@ static int estimate_similarity(struct repository *r, static void record_rename_pair(int dst_index, int src_index, int score) { - struct diff_filespec *src, *dst; - struct diff_filepair *dp; + struct diff_filepair *src = rename_src[src_index].p; + struct diff_filepair *dst = rename_dst[dst_index].p; - if (rename_dst[dst_index].pair) + if (dst->renamed_pair) die("internal error: dst already matched."); - src = rename_src[src_index].p->one; - src->rename_used++; - src->count++; + src->one->rename_used++; + src->one->count++; - dst = rename_dst[dst_index].two; - dst->count++; + rename_dst[dst_index].filespec_to_free = dst->one; + rename_dst[dst_index].is_rename = 1; - dp = diff_queue(NULL, src, dst); - dp->renamed_pair = 1; - if (!strcmp(src->path, dst->path)) - dp->score = rename_src[src_index].score; + dst->one = src->one; + dst->renamed_pair = 1; + if (!strcmp(dst->one->path, dst->two->path)) + dst->score = rename_src[src_index].score; else - dp->score = score; - rename_dst[dst_index].pair = dp; + dst->score = score; } /* @@ -310,7 +301,7 @@ static int find_identical_files(struct hashmap *srcs, struct diff_options *options) { int renames = 0; - struct diff_filespec *target = rename_dst[dst_index].two; + struct diff_filespec *target = rename_dst[dst_index].p->two; struct file_similarity *p, *best = NULL; int i = 100, best_score = -1; unsigned int hash = hash_filespec(options->repo, target); @@ -476,7 +467,7 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i (mx[i].score < minimum_score)) break; /* there is no more usable pair. */ dst = &rename_dst[mx[i].dst]; - if (dst->pair) + if (dst->is_rename) continue; /* already done, either exact or fuzzy. */ if (!copies && rename_src[mx[i].src].p->one->rename_used) continue; @@ -511,7 +502,7 @@ void diffcore_rename(struct diff_options *options) else if (!options->flags.rename_empty && is_empty_blob_oid(&p->two->oid)) continue; - else if (add_rename_dst(p->two) < 0) { + else if (add_rename_dst(p) < 0) { warning("skipping rename detection, detected" " duplicate destination '%s'", p->two->path); @@ -586,10 +577,10 @@ void diffcore_rename(struct diff_options *options) mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_destinations), sizeof(*mx)); for (dst_cnt = i = 0; i < rename_dst_nr; i++) { - struct diff_filespec *two = rename_dst[i].two; + struct diff_filespec *two = rename_dst[i].p->two; struct diff_score *m; - if (rename_dst[i].pair) + if (rename_dst[i].is_rename) continue; /* dealt with exact match already. */ m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST]; @@ -646,22 +637,8 @@ void diffcore_rename(struct diff_options *options) diff_q(&outq, p); } else if (!DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) { - /* - * Creation - * - * We would output this create record if it has - * not been turned into a rename/copy already. - */ - struct diff_rename_dst *dst = locate_rename_dst(p->two); - if (dst && dst->pair) { - diff_q(&outq, dst->pair); - pair_to_free = p; - } - else - /* no matching rename/copy source, so - * record this as a creation. - */ - diff_q(&outq, p); + /* Creation */ + diff_q(&outq, p); } else if (DIFF_FILE_VALID(p->one) && !DIFF_FILE_VALID(p->two)) { /* @@ -682,8 +659,10 @@ void diffcore_rename(struct diff_options *options) */ if (DIFF_PAIR_BROKEN(p)) { /* broken delete */ - struct diff_rename_dst *dst = locate_rename_dst(p->one); - if (dst && dst->pair) + struct diff_rename_dst *dst = locate_rename_dst(p); + if (!dst) + BUG("tracking failed somehow; failed to find associated dst for broken pair"); + if (dst->is_rename) /* counterpart is now rename/copy */ pair_to_free = p; } @@ -693,16 +672,14 @@ void diffcore_rename(struct diff_options *options) pair_to_free = p; } - if (pair_to_free) - ; - else + if (!pair_to_free) diff_q(&outq, p); } else if (!diff_unmodified_pair(p)) /* all the usual ones need to be kept */ diff_q(&outq, p); else - /* no need to keep unmodified pairs */ + /* no need to keep unmodified pairs; FIXME: remove earlier? */ pair_to_free = p; if (pair_to_free) @@ -715,11 +692,16 @@ void diffcore_rename(struct diff_options *options) diff_debug_queue("done collapsing", q); for (i = 0; i < rename_dst_nr; i++) - free_filespec(rename_dst[i].two); + if (rename_dst[i].filespec_to_free) + free_filespec(rename_dst[i].filespec_to_free); FREE_AND_NULL(rename_dst); rename_dst_nr = rename_dst_alloc = 0; FREE_AND_NULL(rename_src); rename_src_nr = rename_src_alloc = 0; + if (break_idx) { + strintmap_clear(break_idx); + FREE_AND_NULL(break_idx); + } return; } From patchwork Tue Dec 29 20:05:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 11992605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAB28C43331 for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A830021D1B for ; Tue, 29 Dec 2020 20:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726350AbgL2UGj (ORCPT ); Tue, 29 Dec 2020 15:06:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726189AbgL2UGj (ORCPT ); Tue, 29 Dec 2020 15:06:39 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F3C5C06179F for ; Tue, 29 Dec 2020 12:05:39 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id r4so3039215wmh.5 for ; Tue, 29 Dec 2020 12:05:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=f0/XtdIyw6iwQJQ5kZT23INEqlES4/JfjcvNloEQ/Fk=; b=hr0BEsck5iBRkNmKF48zG/XaB7SMnB4vprjzei0cPvxMkpTcQaN1fCvsQ86G4/k/v9 0A2E0u/gKaQni1b7s1rirMVtH9rWoxkaRSl4ncifLbwx6058rta8BvI8Ns/6h6slbILS 5lW/OiYKmewbxujlC1N00xdD7BdjcqGQ4wmWpoVhHnWMXMU5FAHx3Kj8Q1OnCKIsmT+f UvDvDt/G7Q/7qYqzm6hk4T45LbRZaLzs09ZBX0sTvrKnVBKvS9q6pWJ4hfbdoR4WY5dg 9tuPg2CtxDPhuYi7vYIqZRKA0G7A82oldTHrmvSskEJqri1mTUVqOb2Dn2mydbWKLo42 kA4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=f0/XtdIyw6iwQJQ5kZT23INEqlES4/JfjcvNloEQ/Fk=; b=JswqKeH/tliIfbtqMcPSX5UveqCc4FYIU2rMh5AdTsl0HzVbAdzwfFpG7vSuDrduE5 /aCPspcwN07ymulJ5OYgc+xJXrPNzCRr5yLhIyulOqztzW+llhf8zVTRYPDJ451SWOE/ HySjrzwGEHeLbUHl19k5+3rkYe158ottGuN31nxlV9xagA2wjL8CRfXgaUWRzRCSydfB kHTPLC5MwUisJEhXYXQ2nzVK25QK3z4/tvng2wi1YsAx6wJKAGDpnTCUqOT7rRhrUQjG /sCsNqLUNuODkQ1U3RU+VqtdVqxv+bu3FgoZarjSibTRZVwpg1W9lMXfixeuzWGLAq8p g0SQ== X-Gm-Message-State: AOAM530osmRzOz7T4kgOpPlhmiEBawSyVpfPsFDdeR01ZSDU197+D7nk T12HcUaDSfSQCoN0a1rH2MLxbhGlfwg= X-Google-Smtp-Source: ABdhPJz6q6TPo+oyt+aFqXmd73iWGoX68klCVL3kWx/qe+jYEgLTFHOiP+cMTbFUILJ1AwQl8uHCkw== X-Received: by 2002:a1c:4604:: with SMTP id t4mr4555810wma.17.1609272337898; Tue, 29 Dec 2020 12:05:37 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n11sm48041153wra.9.2020.12.29.12.05.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Dec 2020 12:05:37 -0800 (PST) Message-Id: <723b0e55e75d264f83407c3fb50f3ccc16d35062.1609272328.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 29 Dec 2020 20:05:28 +0000 Subject: [PATCH v3 9/9] diffcore-rename: remove unnecessary duplicate entry checks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Elijah Newren , Taylor Blau , Christian Couder , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren From: Elijah Newren Commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.", 2005-05-24) added a duplicate entry check on rename_src in order to avoid segfaults; the code at the time was prone to double free()s and an easy way to avoid it was just to turn off rename detection for any duplicate entries. Note that the form of the check was modified two commits ago in this series. Similarly, commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate destinations", 2015-02-26) added a duplicate entry check on rename_dst for the exact same reason -- the code was prone to double free()s, and an easy way to avoid it was just to turn off rename detection entirely. Note that the form of the check was modified in the commit just before this one. In the original code in both places, the code was dealing with individual diff_filespecs and trying to match things up, instead of just keeping the original diff_filepairs around as we do now. The intervening change in structure has fixed the accounting problems and the associated double free()s that used to occur, and thus we already have a better fix. As such, we can remove the band-aid checks for duplicate entries. Due to the last two patches, the diffcore_rename() setup is no longer a sizeable chunk of overall runtime. Thus, in a large rebase of many commits with lots of renames and several optimizations to inexact rename detection, this patch only speeds up the overall code by about half a percent or so and is pretty close to the run-to-run variability making it hard to get an exact measurement. However, with some trace2 regions around the setup code in diffcore_rename() so that I can focus on just it, I measure that this patch consistently saves almost a third of the remaining time spent in diffcore_rename() setup. Signed-off-by: Elijah Newren --- diffcore-rename.c | 23 ----------------------- t/t4058-diff-duplicates.sh | 2 +- 2 files changed, 1 insertion(+), 24 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 2a8fcd52928..90db9ebd6d0 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -34,18 +34,6 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filepair *p) */ static int add_rename_dst(struct diff_filepair *p) { - /* - * If we have multiple entries at the same path in the destination - * tree (an invalid tree, to be sure), turn off rename detection - * entirely. Once upon a time, diffcore-rename had double free() - * issues due to such duplicate paths, resulting in segfaults. See - * commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate - * destinations", 2015-02-26). - */ - if (rename_dst_nr > 0 && - !strcmp(rename_dst[rename_dst_nr-1].p->two->path, p->two->path)) - return -1; - ALLOC_GROW(rename_dst, rename_dst_nr + 1, rename_dst_alloc); rename_dst[rename_dst_nr].p = p; rename_dst[rename_dst_nr].filespec_to_free = NULL; @@ -63,17 +51,6 @@ static int rename_src_nr, rename_src_alloc; static void register_rename_src(struct diff_filepair *p) { - /* - * If we have multiple entries at the same path in the source tree - * (an invalid tree, to be sure), avoid using more more than one - * such entry in rename detection. Once upon a time, doing so - * caused segfaults; see commit 25d5ea410f ("[PATCH] Redo - * rename/copy detection logic.", 2005-05-24). - */ - if (rename_src_nr > 0 && - !strcmp(rename_src[rename_src_nr-1].p->one->path, p->one->path)) - return; - if (p->broken_pair) { if (!break_idx) { break_idx = xmalloc(sizeof(*break_idx)); diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh index ad3f37d388d..54614b814db 100755 --- a/t/t4058-diff-duplicates.sh +++ b/t/t4058-diff-duplicates.sh @@ -91,7 +91,7 @@ test_expect_success 'diff-tree between duplicate trees' ' test_expect_success 'diff-tree with renames' ' # See NOTICE at top of file. git diff-tree -M -r --no-abbrev one two >actual && - test_cmp expect actual + test_must_be_empty actual ' test_expect_success 'diff-tree FROM duplicate tree' '